Issue
A&A
Volume 667, November 2022
Gaia Early Data Release 3
Article Number A148
Number of page(s) 31
Section Celestial mechanics and astrometry
DOI https://doi.org/10.1051/0004-6361/202243483
Published online 23 November 2022

© Gaia Collaboration (Klioner, S. A., et al.) 2022

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe-to-Open model. Subscribe to A&A to support open access publication.

1 Introduction

Gaia is the second highly successful astrometric space mission of the European Space Agency (ESA1; Gaia Collaboration 2016), vastly extending and improving the achievements of its predecessor Hipparcos (ESA 1997; Perryman et al. 1997; van Leeuwen 2007). While the latter delivered astrometry at the level of a milliarcsecond (mas) for nearly 120000 preselected stars at the mean epoch 1991.25, the latest data release Gaia EDR3 contains astrometry for about 1.8 billion sources with accuracies from a few tens of microarcseconds (µas) to ~1 mas, mainly depending on the brightness (Gaia Collaboration 2021a; Lindegren et al. 2021b).

It is well known that the observational principles of Gaia (and Hipparcos) do not allow one to fully fix the reference frame, that is the coordinate system of positions and proper motions on the sky. Gaia effectively measures angular separations of sources on the sky, and these separations remain unchanged if one rotates the whole solution by an arbitrary angle at each moment of time. Although the degeneracy with rotation is not mathematically exact, owing to the effects of aberration and gravitational light bending which are coupled to the fixed solar system and Gaia ephemerides, those effects are too small to be of any practical help in fixing the reference frame. In the standard astrometric solution, the apparent motion of a source on the sky is modelled by five freely adjusted parameters: two angular components of its position at some reference epoch, the parallax, and two components of the mean proper motion over the observed time interval. This model gives a six-fold near-degeneracy in the astrometric solution: three constant angles of rotation for the positions at the reference epoch (the ‘orientation’ of the solution), and three components of the constant angular velocity of rotation that affect the proper motions (the ‘spin’ of the solution). In short, although there are many more observations than unknowns to fit, the resulting system always has a rank deficiency of six. To fix the reference frame of a Gaia solution, external information should be used to constrain precisely these six degrees of freedom, and nothing more.

Gaia routinely delivers astrometry for sources as faint as G = 21 mag and, unlike Hipparcos, it can observe a large number of extragalactic sources. This capability plays an important role for the whole project. By directly linking the astrometric solution to a set of remote extragalactic sources, Gaia implements the principles set forth in the International Celestial Reference System (ICRS; Arias et al. 1995), a concept originally devised for the use in radio astrometry observing radio-loud quasars at cosmological distances. In practice, the term ‘extragalactic’ here refers to objects more distant than about 50 Mpc, that is well beyond the Local Group. The reasoning for choosing remote extragalactic sources is simple: because they are not affected by the complex motions in our Galaxy and have negligible apparent motions on the sky (thanks to their large distances), they provide reliable and stable beacons for astrometry. The suggestion to rely on very distant and nearly motionless sources can be traced back to W. Herschel and P.S. Laplace two centuries ago, well before quasars were spotted (see Sect. 6 of Fricke 1985 for a brief historical account).

The ICRS embraces two ideas of a very different nature. The first one concerns the orientation of a celestial reference frame at a given epoch while the second one deals with the rotational state (spin) of the reference frame. The orientation of a celestial reference frame at a given epoch is a pure convention, with no physical significance to the particular choice of axes. In this respect the choice of origin for the right ascension in ICRS is as arbitrary as, for example, the choice of Greenwich as the Prime Meridian for the geographic longitude in 1884. Gaia uses its own astrometric observations of the optical counterparts to sources from the well-established radio astrometric catalogue ICRF3 (Charlot et al. 2020) to fix the orientation of the Gaia catalogue. This choice is of a practical nature, namely to ensure consistency between the optical and radio frames and continuity of the orientation used in astronomy over past decades.

The second aspect, much more significant, is the use of remote extragalactic sources to fix the rotational state of the reference frame. Individual sources may show some (even nonlinear) positional variations in time, owing to their peculiar motions or intrinsic variations of their structure (e.g. Taris et al. 2018), but it is assumed that objects at cosmological distances, on the average, show ‘no rotation’.

In the framework of Newtonian physics, the assumption of no rotation is interpreted as no rotation with respect to Newtonian absolute space. In the modern framework of the General Theory of Relativity, the procedure results in a reference frame in which the Hubble flow has no rotational component. One can argue that the no rotation assumption is related to Mach’s principle (e.g. Schiff 1964). However, we stress that Gaia’s observations of distant extragalactic objects can neither prove nor disprove that principle. To test Mach s principle is a separate task beyond the scope of the current discussion.

Besides the task of fixing the Gaia reference frame, having a set of astrometrically well-behaved extragalactic sources is also important for assessing systematic errors in the Gaia astrometric solution. Again, this is possible because we can assume that the true values of the proper motions and parallaxes of these sources are all very close to zero. Such applications of the Gaia-CRF3 sample are found, for example, in Lindegren et al. (2021a,b).

Based on the way Gaia automatically detects and observes all point-like objects of sufficient brightness (de Bruijne et al. 2015), Gaia EDR3 is expected to contain a few million extragalactic objects suitable for defining a reference frame. However, it is not easy to reliably identify those sources in the Gaia catalogue. As was the case for Gaia-CRF1 (Mignard et al. 2016) and Gaia-CRF2 (Gaia Collaboration 2018), the identification of well-behaved extragalactic sources for Gaia EDR3 relies on a dedicated cross-matching with external catalogues of quasistellar objects (QSOs; also often dubbed ‘quasars’ irrespective of their radio loudness) and active galactic nuclei (AGNs). For the purpose of this work the actual nature of the compact extragalactic sources plays no role, and we refer to them as ‘QSO-like sources, whether they are genuine QSOs of various kinds, galaxies with astrometrically compact nuclei, or other types of extragalactic objects. The resulting set of QSO-like objects selected on this basis constitutes Gaia-CRF3 – the third version of the Gaia-CRF. This paper describes the procedure used to select the sources of Gaia-CRF3 and presents its most important properties and its relation to the radio ICRF.

According to the principles of the ICRS, Gaia-CRF3 is entirely defined by the set of ~1.6 million QSO-like sources described in this paper. It is implicitly assumed that the positions and proper motions of all other (mostly stellar) sources in Gaia DR3 are expressed in the same reference frame, and thus constitute a secondary, much denser reference frame that extends to the brightest magnitudes. For stars of similar magnitude and colour as the QSO-like sources, the consistency of the reference frames should be guaranteed by the methods of observation and data analysis in Gaia, which are exactly the same for both kinds of objects. For much brighter sources, and perhaps also for objects of extreme colours, this may not be the case owing to various instrumental effects and deficiencies in their modelling. Indeed, Cantat-Gaudin & Brandt (2021) have shown that the proper motions of bright (G ≲ 13 mag) sources in Gaia EDR3 have a residual spin with respect to the fainter stars by up to 80 µas yr−1, and provide a recipe for their correction. Similarly, the alignment errors for the bright sources in Gaia EDR3 were investigated by Lunz et al. (2021), who found differences of about 0.5 mas. The properties of the secondary (stellar) reference frame in Gaia EDR3 are not further discussed in this paper.

The structure of the paper is as follows. Section 2 describes the multi-step procedure used to select the Gaia-CRF3 sources. The properties of those sources are discussed in Sect. 3. The Gaia-CRF3 sources matched to the radio sources of ICRF3 give an important link to radio astrometry and are discussed in Sect. 4. Section 5 reviews the role of the Gaia-CRF3 sources in the ‘frame rotator’ of the primary astrometric solution for Gaia EDR3, which is where the reference frame was actually fixed. Some conclusions and prospects for future versions of the Gaia-CRF are given in Sect. 6.

The appendices provide more detailed information on a few more technical issues. Appendix B provides some additional details of the external catalogues used for Gaia-CRF3. Appendix C demonstrates how to select the Gaia-CRF3 sources in the Gaia Archive and trace which external catalogues have matches of those sources. In Appendix D we discuss the ‘confusion sources’ in Gaia EDR3, that is sources, mainly in the Galaxy or Local Group, whose astrometric parameters are roughly consistent with them being at cosmological distances, and that could therefore potentially be confused with QSO-like sources. Finally, Appendix E describes the algorithm used by the frame rotator to estimate the spin and orientation corrections for astrometric solutions.

Table 1

Statistics of the cross-matches of Gaia EDR3 to external QSO catalogues used in the construction of Gaia-CRF3.

2 Selection of Gaia sources

The goal of the source selection process for the Gaia-CRF is to find a set of QSO-like sources that combines three conflicting requirements: it should be as large as possible, as clean as possible (that is, free of stellar contaminants), and homogeneously distributed on the sky. We expect that this selection can eventually be made exclusively from Gaia’s own observations, using the astrophysical characterisation of sources based on a combination of astrometric and photometric information (e.g. Bailer-Jones et al. 2013, 2019; Delchambre 2018; Heintz et al. 2018). This was not possible for Gaia EDR3, and similarly to what was done for the first two Gaia data releases, Gaia EDR3 uses external catalogues of QSOs and AGNs to identify QSO-like sources among the nearly two billion Gaia sources. As detailed below, the Gaia-CRF3 content is the result of cross-matches with 17 external catalogues, followed by a two-stage astrometric filtering. These external catalogues were all publicly available at the time of the final construction of Gaia-CRF3 in mid-June 2020.

2.1 Cross-matching

For the construction of Gaia-CRF3 we cross-matched the Gaia EDR3 astrometric catalogue with the external catalogues listed in Table 1 (see also Appendix B). Only astrometric information was used for the cross-matches. The first four entries in the table – three ICRF3 catalogues and OCARS – are based on VLBI astrometry in the radio domain and have an astrometric quality comparable to that of Gaia EDR3. For those catalogues the maximal cross-match radius was set to ∆max = 100 mas. The other catalogues have various levels of astrometric quality and for them the cross-match radius was set to ∆max = 2 arcsec. For virtually all catalogues the cross-matching results in a (relatively small) number of ambiguous matches, where more than one Gaia source was matched to a source in the external catalogue, or more than one source in the external catalogue was matched to the same Gaia source. In all such cases only the closest positional match was retained. For each external catalogue this gives a list of ‘unique’ matches in the form of pairs ‘Gaia source identifier’ – ‘external source identifier’, in which all identifiers appear only once. Table 1 gives the total number of sources in the catalogue and, in column 3, the number of unique matches in Gaia EDR3, to which the two-stage filtering described below was applied.

This cross-matching procedure adopted for Gaia-CRF3 is not perfect and could be refined in future releases. For example, the maximum cross-matching radius could be better tuned to the precisions of the individual external catalogues and the expected densities of stars and quasars in the different situations. A particular case concerns multiple lensed quasar images, where ideally all the images should be retained instead of just the closest match, as is presently done. In some ambiguous cases the cross-matching could benefit from using also the photometric information available in the external catalogues. This is however far from straightforward owing to the expected variability of the QSOs, differences in wavelength bands and resolution between the instruments, etc., which may even require a manual check of individual cases. For the current cross-matching we have chosen not to use such information.

2.2 Astrometric filtering: Individual sources

The first stage of the astrometric filtering ensures that the retained Gaia sources have five- and six-parameter astrometric solutions in Gaia EDR3 compatible with the assumed extra-galactic nature of the source. For all the catalogues in Table 1 we applied the three criteria in Eqs. (1)(3) below. This was found to be sufficient for the VLBI catalogues (the first four entries in the table), but for the remaining catalogues the two criteria in Eqs. (4) and (5) were also applied. Some of the criteria are alternatively formulated as parts of a query in the Gaia Archive2 using the Astronomical Data Query Language (ADQL).

The first criterion is that the source has a valid parallax and proper motion. This is ensured by selecting sources with five- or six-parameter astrometric solutions: astrometric_params_solved = 31 ORastrometric_params_solved = 95.$\matrix{{{\rm{astrometric\_params\_solved}}\,{\rm{ = }}\,{\rm{31}}\,{\rm{OR}}} \hfill \cr {{\rm{astrometric\_params\_solved = 95}}{\rm{.}}} \hfill \cr} $(1)

We note that astrometric_params_solved is a binary flag indicating which astrometric parameters are provided for the source, that is 112, 111112, and 10111112 for two-, five-, and six-parameter solutions, with the seventh bit representing pseudo-colour.

Secondly, the measured parallax ϖ of the source, corrected for the median parallax bias, should be zero within five times the formal uncertainty σϖ: | ϖ+0.017 masσϖ|<5,$\left| {{{\varpi + 0.017{\rm{mas}}} \over {{\sigma _\varpi }}}\right| < 5} ,$(2)

which translates to the ADQL query

abs((parallax + 0.017)/parallax_error) < 5.

This takes into account the median parallax of −0.017 mas obtained from a provisional selection of QSO-like sources in Gaia EDR3, but ignores the complicated dependence of the parallax bias on the magnitude and other parameters discussed by Lindegren et al. (2021a). As shown by their Fig. 5, the variation of the parallax bias for the QSO-like sources is typically less than ±0.010 mas. Because this is many times smaller than the individual uncertainties (for example, 99% of the Gaia-CRF3 sources have σϖ > 0.1 mas), these variations were deemed uncritical for the astrometric filtering. Assuming Gaussian errors, (ϖ + 0.017 mas)/σϖ is then expected to follow the standard normal distribution, and the probability that Eq. (2) is violated by random errors is, nominally, erfc(5/2)5.7×107${\rm{erfc}}\left( {5/\sqrt 2 } \right) \simeq 5.7 \times {10^{ - 7}}$.

The third criterion is that the proper motion components µα* and µδ should be small compared to their uncertainties σµα* and σµδ. Here we take into account the correlation ρα*, µδ) between the two parameters. We require Xμ2[ μα*μδ ]Cov(μ)1[ μα*μδ ]<25,$X_{\rm{\mu }}^2 \equiv \left[ {\matrix{{{{\rm{\mu }}_{\alpha *}}} & {{{\rm{\mu }}_\delta }} \cr} } \right]{\rm{Cov}}{\left( {\rm{\mu }} \right)^{ - 1}}\left[ {\matrix{{{{\rm{\mu }}_{\alpha *}}} \cr {{{\rm{\mu }}_\delta }} \cr} } \right] < 25,$(3)

where Cov(µ) is the covariance matrix given by Eq. (E.4). This corresponds to the query

( power(pmra/pmra_error,2)
+power(pmdec/pmdec_error,2)
−2*pmra_pmdec_corr
*pmra/pmra_error
*pmdec/pmdec_error )
/(1-power(pmra_pmdec_corr,2)) < 25.

For Gaussian errors of the proper motion components, Xμ2$X_{\rm{\mu }}^2$ is expected to follow the chi-squared distribution with two degrees of freedom (see Appendix B of Mignard et al. 2016). In this case the nominal probability that a source violates Eq. (3) is exp (−25/2) ≃ 3.7 × 10−6. We neglect here the systematic proper motions of extragalactic objects caused by the acceleration of the solar system barycentre (e.g. Gaia Collaboration 2021b). The amplitude of this effect, about 0.005 mas yr−1, is much smaller than the typical proper-motion uncertainty of QSO-like sources in Gaia EDR3 (see below) and is therefore ignored in the filtering. We note that Xµ is independent of the reference frame, so that, in particular, it plays no role if the equatorial or, say, galactic components of the proper motions are used in Eq. (3).

For the VLBI catalogues based on high-accuracy radio astrometry (the first four entries in Table 1), the application of Eqs. (1)(3) results in filtered samples that show no traces of stellar contamination. This is to be expected thanks to the high positional accuracy of the VLBI data and the small cross-match radius used (100 mas). Consequently, no additional filtering was considered for these catalogues.

The situation is very different for the remaining (non-VLBI) catalogues in Table 1. An analysis of their matches, filtered by Eqs. (1)(3), revealed that a significant number of chance matches to (stellar) sources in Gaia EDR3 happen through a combination of two factors: (i) the high density of potentially matching stellar sources in certain areas of the sky; and (ii) the lower positional accuracy of these catalogues which motivated the use of a much larger cross-matching radius (2 arcsec) than for the VLBI catalogues.

To investigate the likely effects of chance matches we make use of the ‘confusion sources’ discussed in Appendix D. These are all the sources in Gaia EDR3 that satisfy all three conditions in Eqs. (1)(3), irrespective of their true nature. The total number of confusion sources in Gaia EDR3 is approximately 213.5 million, of which 30.7 million have five-parameter solutions and 182.8 million have six-parameter solutions.

The galactic plane is especially problematic for the cross-matching: 23% of the confusion sources with five-parameter astrometric solutions and 50% of the confusion sources with six-parameter solutions are located in the 10% of the celestial sphere with the galactic latitude b such that | sin b | < 0.1. The analysis of the matches from non-VLBI catalogues showed that the resulting samples of QSO candidates in that area are too much contaminated by stars and cannot reasonably be further cleaned by means of astrometric criteria alone. We therefore decided to completely avoid this area of the sky for Gaia-CRF3 for the matches to the non-VLBI catalogues.

Considering the distribution of the Gaia-CRF3 sources and the confusion sources in galactic latitude shown on Figs. 5 and D.2, respectively, our results for Gaia EDR3 generally agree with the discussion of the stellar contamination at various galactic latitudes provided by Heintz et al. (2020; see their Fig. 9).

The second factor important for the chance matches – the lower quality of the astrometric data in the non-VLBI catalogues – is also more problematic in the areas of higher density of confusion sources. Figure D.2 shows that the density of confusion sources is a strong function of the Galactic latitude b. To avoid too many chance matches at low |b| we require a smaller matching distance Δ towards the Galactic plane. For the matches to the non-VLBI catalogues we therefore add the two criteria | sinb |>0.1$\left| {\sin b} \right| > 0.1$(4)

and Δ<(2 arcsec)×| sinb |.${\rm{\Delta < }}\left( {{\rm{2arcsec}}} \right) \times \left| {\sin b} \right|.$(5)

We note that, after the initial cross-matching, this last criterion is the only one that uses astrometric data (positions) from the external catalogue. The number of unique matches that survived the filtering by Eqs. (1)(5), or Eqs. (1)(3) for the VLBI catalogues, is given in the fourth column of Table 1.

2.3 Astrometric filtering: Sample statistics

The second stage of the filtering is based on an assessment of the level of stellar contamination among the filtered unique matches in the different external catalogues. To this end we analysed histograms of the normalised parallaxes ϖ/σϖ and normalised proper motion components μα*/σμα*${{\rm{\mu }}_{\alpha *}}{\rm{/}}{\sigma _{{{\rm{\mu }}_{\alpha *}}}}$ and μδ/σμδ${{\rm{\mu }}_\delta }{\rm{/}}{\sigma _{{{\rm{\mu }}_\delta }}}$. Ignoring the small systematic and random offsets discussed in Sect. 2.2, we expect that, for a clean sample of QSOs, the normalised quantities should have Gaussian distributions with mean values close to zero and standard deviations only slightly greater than one (reflecting the slight underestimation of the formal uncertainties; see Sect. 3). Conversely, if their distributions deviate significantly from the expected Gaussian, it indicates that the sample is contaminated by stellar objects with non-zero parallaxes and/or proper motions.

Although the criteria in Eqs. (1)(3) are based on the Gaussian hypothesis, they absolutely do not guarantee that the normalised parallaxes and proper motions are Gaussian after the filtering. Figures D.3 and D.4 show examples of the distributions of the normalised parameters for general Gaia EDR3 sources satisfying the criteria. While a fraction of the sources in these samples are QSOs (including Gaia-CRF3 sources), most of them are stars in our Galaxy or in nearby satellite galaxies such as the Large Magellanic Cloud (LMC) and Small Magellanic Cloud (SMC), and the histograms of the normalised quantities are very far from Gaussian.

A high level of stellar contamination in a sample can be directly seen in the histograms of the normalised parameters as a distortion of the expected Gaussian distribution. Any sub-sample large enough to create representative histograms can be checked separately. This gives a flexible tool for screening samples of QSO candidates against stellar contamination.

For the proper motions the same effect is illustrated in Fig. 1. This shows the relative frequencies of Xμ2$X_{\rm{\mu }}^2$ from Eq. (3) for the filtered matches to the catalogues in Table 1. If Xμ2$X_{\rm{\mu }}^2$ comes purely from the Gaussian errors of the Gαiα proper motions, it should follow the χ2 distribution with two degrees of freedom, with normalised frequency exp(12x/s2)$\exp \left( { - {1 \over 2}x{\rm{/}}{s^2}} \right)$. Here s is a scaling factor that takes into account that the uncertainties of the astrometric parameters in Gaia EDR3 are slightly underestimated. In Fig. 1 this expected frequency is a straight line through unity at Xμ2=0$X_{\rm{\mu }}^2 = 0$.

Applying this tool to the full sets of filtered matches to each external catalogue in Table 1 reveals that only the sources matched to three of the catalogues (C75, Milliquas v6.5, and Gaia-unWISE) show clear evidence of stellar contamination. These catalogues are marked with an asterisk in the table. The filtered matches of the remaining 14 catalogues show no obvious traces of stellar contamination and therefore qualify as being part of Gaia-CRF3.

The same Gaia source usually appears in the filtered crossmatches to several external catalogues. One reason for this, as discussed in Appendix B, is that many of the catalogues are at least partially based on the same observational material (the IR photometry of WISE, certain data releases of SDSS, etc.), although their criteria for the selection of QSOs may be different. But even when the observational material is different, it is no wonder that different approaches and algorithms result in strongly overlapping samples, namely if they tend to find the same objects. It is therefore reasonable to expect that the circumstance that a Gaia source has matches in several external catalogues increases the reliability of its classification as a QSO. If we split the set of Gaia sources matching a given external catalogue (after the astrometric filtering) into subsets that are present also in other catalogues and a subset that appears only in the cross-match with this particular catalogue, we can therefore use an analysis of these subsets to improve the purity of the final QSO selection.

Using this idea, we selected three sets of Gaia sources that appear only among the filtered matches of C75, Milliquas v6.5, and Gaia-unWISE, but do not appear in any of the other catalogues: we call these the c75-only, m65-only, and guw-only sets. All three sets show significant stellar contamination. We found no obvious way to clean the c75-only sources and this whole set of 95 988 sources was therefore dropped from further consideration. The Milliquas v6.5 and Gaia-unWISE catalogues include estimates of the probability that a source is a QSO. Selecting from the m65-only set only the sources with the highest probability of being a QSO did not remove the contamination and that set of 26 326 sources was therefore also dropped. For the guw-only set we found, on the contrary, that the subset of sources with the highest QSO probability (PROB_RF = 1) appears to be free of stellar contaminants. This gave an additional set of 229 914 sources for Gaia-CRF3. The other 491894 guw-only sources were dropped (any relaxation of PROB_RF = 1 immediately gave samples with clear signs of stellar contamination). The number of sources from the cross-matches of each catalogue that passed also this second stage of astrometric filtering is given in the fifth column of Table 1, summing up to a total of 1614173 sources. The relative frequencies of in Gaia-CRF3 and in the three sets of sources dropped from C75, Milliquas v6.5, and Gaia-unWISE are shown in Fig. 2. Deviations from the exponential distribution expected for Gaussian errors are very obvious for the dropped sets; for the final selection the distribution is briefly discussed below (Sect. 2.4).

A separate investigation was done for the QSO candidates in the areas of the LMC and SMC, which have a very high density of confusion sources (Fig. D.1) and therefore a higher risk of chance matches. The statistics of confusion sources in these areas are discussed in Appendix D. The Gaia-CRF3 contains 5411 and 2779 sources, respectively, in the LMC and SMC areas (as defined in the appendix). The distributions of the normalised quantities indicate a low level of stellar contamination in these sets.

Further checks of the selection were done by splitting the Gaia-CRF3 sources into subsets according to magnitude, colour, and Galactic or ecliptic latitude, and examining the distributions of normalised parallaxes and proper motions. No clear contamination was detected in any of the subsets.

Gaia-CRF3 contains only 195 sources with | sin b | < 0.1. They all come from the VLBI-based catalogues, for which Eqs. (4) and (5) were not applied. We made an attempt to find a clean set of QSO-like sources also in the Galactic plane area, that is without applying the criterion in Eq. (4). Using the crossmatches and the remaining four criteria, we found nearly 100 000 additional QSO candidates with | sinb | < 0.1. However, the distributions of their normalised parameters indicate a high level of stellar contamination. This demonstrates that the cross-matches and/or external astrophysical classifications are unreliable in this crowded part of the sky, and none of these sources were further considered for Gaia-CRF3. However, we note that this set should contain a significant percentage of genuine QSOs, part of which may be recovered in future versions of the Gaia-CRF (see Sect. 2.5).

thumbnail Fig. 1

Relative frequencies of Xμ2$X_{\rm{\mu }}^2$ for the filtered matches to the catalogues in Table 1. The catalogues are identified by the codes given in column 6 of the table, with the eight larger catalogues in the upper panel and the smaller catalogues in the lower panel. The four radio catalogues (ICRF3 SIX, ICRF3 K, ICRF X/Ka, and OCARS) are combined under ‘radio’, and the three blasar catalogues (Roma-BZCAT, 2WHSP, and ALMA calibrators) are combined under ‘blasars’.

thumbnail Fig. 2

Relative frequencies of Xμ2$X_{\rm{\mu }}^2$ for the final selection of Gaia-CRF3 sources, and for the filtered matches to C75, Milliquas v6.5, and Gaia-unWISE that were dropped in the final selection.

2.4 Final selection

The union of all Gαiα sources that passed both stages of the filtering contains 1614173 sources and constitutes the final source list of Gaia-CRF3. Of these, 1215 942 (75%) have five-parameter solutions in Gaia EDR3 and 398 231 (25%) have six-parameter solution. Appendix C explains how to find the astrometric parameters of these sources and how to trace the matches of these sources in the external catalogues.

Figure 3 shows the distributions of the normalised parallaxes and proper motion components for both kinds of solutions. The standard deviations of the three normalised quantities are 1.052, 1.055, and 1.063 for the five-parameter solutions and 1.073, 1.099, and 1.116 for the six-parameter solutions. All distributions are very symmetric and close to the Gaussian shape (i.e. parabolic in the logarithmic plots), suggesting a low level of stellar contamination. The small excess of sources with normalised parameters beyond ±4 is not necessarily a sign of contamination, but could be produced by source structure or instrumental effects as well as by statistical fluctuations in the sample.

The number of (non-QSO) chance matches in Gaia-CRF3 can be estimated by counting the confusion sources that satisfy Eqs. (4) and (5) around positions that are slightly offset from the Gaia-CRF3 positions. Using offsets of +0.1 deg in ∆α cosδ or ∆δ we conclude that Gaia-CRF3 is expected to contain up to 420 chance matches, of which 150 have five-parameter solutions and 270 have six-parameter solutions. This corresponds to a probable contamination of 0.012 and 0.068%, respectively, of the five- and six-parameter sources in Gaia-CRF3.

The chance matches are obviously not the only cause of stellar contamination in Gaia-CRF3. Further contaminants may come from erroneous classification of some sources in the external catalogues, even among the sources that passed the astrometric filtering in Eqs. (1)(5). We have found no means to quantify this effect, but expect it to be quite small.

A certain hint on the possible (total) level of contamination could be given by the relative frequency of Xμ2$X_{\rm{\mu }}^2$ for Gaia-CRF3, as shown by the red curve in Fig. 2. This distribution can be rather well described as the mixture of two Gaussian distributions: one with standard deviation of 1.051 containing 98.0% of the sources, the other with standard deviation of 2.038 containing 2.0% of the sources. We stress that this cannot directly be interpreted in terms of the purity of the Gaia-CRF3: on one hand, genuine QSOs may show deviations from Gaussianity for other reasons (as mentioned above); on the other hand, some contaminants will have measured proper motion that are perfectly consistent with the extragalactic hypothesis. Nevertheless, we consider it unlikely that the level of contamination in Gaia-CRF3 is higher than 2%. Further investigations are needed to show if this claim is justified.

2.5 Discussion of the selection

Assuming that QSO-like sources have zero true values for the parallaxes and proper motions, and that the astrometric errors are Gaussian with uncertainties as given in Gαiα EDR3, we expect that very few actual QSOs in a sample of a few million candidates will violate Eqs. (2) or (3). The numerical boundaries in Eqs. (2) and (3) were selected empirically, based on the data themselves, and represent a compromise taking into account possible non-Gaussian systematic errors and effects of complex and sometimes time-dependent source structure. The latter are to be expected for QSOs and could produce apparent proper motions over several years of observations, while parallaxes are much less affected by such effects. This justifies setting separate criteria for parallaxes and proper motions as described above. In future Gaia data releases, when these effects may be better known and statistically quantified, it will be possible to adopt a mathematically more consistent filtering approach based on the 3 × 3 covariance matrix of the parallax and proper motion components.

As already mentioned, we expect that future versions of Gaia-CRF will to a higher degree depend on Gaia’s own (astrometric, photometric, and spectroscopic) observations, and that external catalogues of AGNs will consequently become less important for the selection of QSO-like sources. These catalogues will continue to be important for validation purposes, but their uses in the selection process may become more complex if the astrometry in external catalogues is already Gaia-based. For example, a criterion such as Eq. (5) may not be meaningful if the position in the external catalogue comes from a previous Gaia data release.

The search for the QSOs in the Galactic plane is an important scientific problem that requires further investigation. We note that more advanced methods to combine different datasets in the search for QSOs in the Galactic plane are under development (e.g., Fu et al. 2021). Combined with the expected use of Gaia’s own object classification, one can hope that these methods will result in a much improved completeness of the Gaia-CRF in the Galactic plane in the future.

The selection of QSO-like objects described here is mainly driven by the reliability of the resulting set of sources. The resulting set of QSO-like sources can be considered ‘reliable’, but by no means ‘complete’. It is clear that the selection algorithm rejects some genuine quasars, for example if they show observable mean proper motions for the time span of the Gaia EDR3 data. A prominent example here is 3C273 – the brightest quasar on the sky, which has significant proper motion in declination in Gaia EDR3 and Xμ2=29.7$X_{\rm{\mu }}^2 = 29.7$ (Eq. (3)).

For the purposes of aligning the reference frame of Gaia EDR3 with the ICRF, and especially for ensuring that it has negligible spin with respect to distant galaxies, one can use a much smaller set of QSO-like sources than the full Gaia-CRF3 without significantly increasing the statistical uncertainties of the global system. Indeed, this system was defined in the final stages of the primary astrometric solution, as described in Sect. 5, using only about 0.4 million ‘frame rotation’ sources. However, the purpose of Gaia-CRF3 is not just to fix the system for the Gaia astrometric solution, but to provide the community with a direct access to this system with sources as faint as permitted by the sensitivity of Gaia, while still having a very low level of stellar contamination.

thumbnail Fig. 3

Distributions of the normalised parallaxes and proper motion components for the Gaia-CRF3 sources with five-parameter (blue) and six-parameter (magenta) solutions. The turquoise curves show the corresponding best-fit Gaussian distributions.

thumbnail Fig. 4

Sky distribution of the Gaia-CRF3 sources. The plot shows the density of sources per square degree computed from the source counts per pixel using HEALPix of level 6 (pixel size ≃0.84 deg2). The magnitude distribution and further properties of the sources are shown on Fig. 6. This full-sky map uses a Hammer-Aitoff projection in galactic coordinates, with l = b = 0 at the centre, north up, and l increasing from right to left.

3 Statistical properties of the Gaia-CRF3 sources

In this section we present the overall properties of Gaia-CRF3 primarily in terms of its distributions in position, magnitude, colour, and astrometric quality.

3.1 Distribution of the sources on the sky

The spatial distribution of the ~1.6 million Gaia-CRF3 sources is shown in Fig. 4. Their distributions in galactic longitude and latitude are shown in Fig. 5. In the avoidance zone | sin b | < 0.1 Gaia-CRF3 has only 195 sources matched to the radio catalogues as described in Sect. 2. Outside the avoidance zone the average density is about 42 deg−2, but with significant variations with galactic latitude; the density is typically below average for |b| < 30° and above at higher latitudes. This comes from the source catalogues used in our compilation combined with the selection function in Eq. (4), designed to avoid stellar contaminants near the Galactic plane. From Fig. B.1 one can see that also most of the external catalogues avoid crowded regions on the sky. We note that the density of Gaia-CRF3 reaches about 110 deg−2 in some areas on the sky. Given the expected isotropic distribution of QSOs and the effects of Galactic extinction, a complete version of a Gaia-CRF could therefore potentially comprise about 4 million sources. This estimate is also suggested by recent studies (e.g. Shu et al. 2019; Bailer-Jones et al. 2019; Fu et al. 2021) of the QSOs as well as by the extragalactic results of Gaia DR3.

3.2 Main characteristics of the sources

Table 2 and Fig. 6 show the median values and distributions of several important properties of the Gaia-CRF3 sources, split according to the nature of the solution (five- or six-parameter solution): magnitude (G), colour index (GbpGrp), and the renormalised unit weight error (RUWE; Lindegren et al. 2021b). The RUWE is a measure of the goodness-of-fit of the astrometric model to the observations of the source. The expected value for a good fit is 1.0. A higher value could indicate that the source is not point-like at the optical resolution of Gaia (≃ 0.1″), or that it has a time-variable structure. As seen in the left panel of Fig. 6, the proportion of six-parameter solutions increases for G ≳ 19, a feature that also implies that these sources are on average of lower astrometric accuracy, not only because they are fainter but also because the model fit is often worse, as shown in the right panel. The sources with six-parameter solutions are also on average redder (middle panel).

The sky distributions of the median magnitude G and the colour index GbpGrp (Fig. 7) clearly show the effects of the Galactic extinction and reddening. We also see that the selection favours brighter sources in the crowded areas such as the Galactic plane and the LMC and SMC areas.

Table 2

Characteristics of the Gaia-CRF3 sources.

thumbnail Fig. 5

Distribution of the Gaia-CRF3 sources in galactic coordinates. There are only 195 sources in the avoidance zone | sin b| < 0.1. The distribution is shown for the whole Gaia-CRF3 and separately for the sources with five- and six-parameter astrometric solutions.

3.3 Astrometric parameters of the sources

The left panel of Fig. 8 shows the distribution of the formal positional uncertainty in Gaia-CRF3 separately for the five- and six-parameter solution. The positional uncertainty, σpos,max, is defined as the semi-major axis of the uncertainty ellipse (see Eq. (1) in Gaia Collaboration 2018) at the reference epoch J2016.0. The middle panel is a scatter plot of σpos,max versus G, with the smoothed median indicated by the dashed curve. In the right panel, the black curve (labeled ‘all sources’) gives the fraction of the full sky that has a density of Gaia-CRF3 sources exceeding the value on the horizontal axis. The other curves give the fractions when only sources with σpos,max below a certain level are counted. In the 10% of the sky nearest to the Galactic plane (|sin b| < 0.1) the source density is negligible (cf. Fig. 4), which explains why none of the curves reaches above 90% sky coverage. To compute the fractions, the celestial sphere was divided into 49 152 pixels of equal area (≃ 0.8393 deg, i.e. HEALPix level 6) and the density computed in each pixel. It should be noted that the curves therefore refer to the source densities at a spatial resolution of about 1 deg and would not be the same at a different resolution (pixel size). The plot shows, for example, that for a minimum density of 35 Gaia-CRF3 sources per square degree, about 62% of the sky could be covered; this fraction is reduced to 55 or 20% if we require, in addition, that σpos,max is below 1.0 or 0.5 mas. Conversely, at 62% coverage there could be at least 30, 18, and 3 sources per square degree with σpos,max < 1, 1.0 or 0.5 mas, if in each case the most populated pixels are chosen. The corresponding magnitude distributions can be gleaned from the scatter plot in the middle panel, where the different levels of positional uncertainty have been marked.

Figure 9 shows the distributions of the median parallax and proper motion components on the sky. Both large-scale systematics and small-scale inhomogeneities are visible in these maps. The large-scale systematics are discussed in Sect. 3.4. Small-scale inhomogeneities, seen as fluctuating median values between neighbouring HEALPix cells in the maps, are mainly caused by the statistical scatter of the astrometric parameters in each cell. They are generally stronger in areas of low source density (Fig. 4) or fainter median magnitude (Fig. 7, top), and also in the ecliptic belt (ecliptic latitude |β| ≲ 45°), where the scanning law usually results in less accurate astrometric solutions than at high ecliptic latitudes (see, for example, Sect. 5.4 and Figs. 8 and 9 in Lindegren et al. 2021b). Numerous streaks, edges, and other small-scale features can also be traced to the scanning law. The overall scatter of the median values in the three maps is about 0.11 mas (ϖ), 0.12 mas yr−1α*), and 0.11 mas yr−1δ), if only the cells with at least five QSOs are counted (covering 85% of the sky). The positional errors are expected to be of a roughly similar size as the parallax scatter. Thus, we conclude that Gaia-CRF3, on an angular scale of about 1° and over most of the sky, defines a local reference frame accurate to better than 0.2 mas in position and 0.2 mas yr−1 in proper motion.

thumbnail Fig. 6

Histograms of some important characteristics of the Gaia-CRF3 sources with five-parameter solutions (blue) and six-parameter solutions (magenta). Left to right: G magnitudes, colour index GBPGRP, and the astrometric quality indicator RUWE.

thumbnail Fig. 7

Median G magnitude (top) and median colour index GBPGRP (bottom) of the Gaia-CRF3 sources. These maps use a Hammer-Aitoff projection in galactic coordinates, with l = b = 0 at the centre, north up, and l increasing from right to left.

3.4 Systematic errors

Systematic errors in Gaia-CRF3 were discussed already in Lindegren et al. (2021b, Sect. 5.6). In particular, the left column of their Fig. 13 contains smoothed maps of the parallaxes and proper motions of the Gaia-CRF3 sources that have five-parameter astrometric solutions. These maps were obtained by a special smoothing procedure applied to the parameters of the individual sources. Including also the less numerous and fainter sources with six-parameter solutions does not noticeably change these maps.

Here we present the results of a different way to analyse systematics in the Gaia-CRF3 proper motions, namely by fitting an expansion of vector spherical harmonics (VSH; Mignard & Klioner 2012) up to a certain degree ℓ. Although the VSH provide a smooth model of the systematic errors also in the regions where only a few sources are present, such as in the Galactic plane, the higher-degree harmonics ( ≳ 15) are only weakly constrained in these areas, resulting in unreliable estimates of the VSH coefficients also in other parts of the sky because of the global scope of the fitted VSH functions.

Analogous to the discussion of parallax systematics in Sect. 5.7 of Lindegren et al. (2021b), using (scalar) spherical harmonics, the VSH allow us to estimate the angular power spectrum of the systematics in the Gaia-CRF3 proper motions. To this end we used the set of 1215942 Gaia-CRF3 sources with five-parameter solutions. Given the vector field of proper motions of those sources we computed the VSH expansion as given by Eq. (30) of Mignard & Klioner (2012), including all the terms of degree 1 ≤ L. It was found that the VSH fit is reasonably stable for L ≤ 12. When higher degrees are included, the fits become questionable, mainly because of the inhomogeneous distribution of the sources on the sky (see Sect. 3.1 above and Figs. 3 and 4 of Gaia Collaboration 2021b). All subsequent computations used L = 12. The fitted VSH coefficients confirmed the maps of systematic errors in proper motion shown on Fig. 13 of Lindegren et al. (2021b). From the same coefficients we obtained the powers Pt,Ps$P_\ell ^t,P_\ell ^s$ of the toroidal and spheroidal signals of degree , as defined by Eq. (76) of Mignard & Klioner (2012). The total power of degree is P=Pt+Ps${P_\ell } = P_\ell ^t + P_\ell ^s$. The mean powers of the toroidal and spheroidal harmonics of degree are then Ct=Pt/(2+1)$C_\ell ^t = P_\ell ^t{\rm{/}}\left( {2\ell + 1} \right)$ and Cs=Ps/(2+1)$C_\ell ^s = P_\ell ^s{\rm{/}}\left( {2\ell + 1} \right)$, where 2 + 1 is the number of harmonics at the degree ℓ. This is fully analogous to the definitions for the scalar spherical harmonics given by Eq. (31) of Lindegren et al. (2021b).

We note that the VSH components of degree = 1 in the proper motions should be analysed separately from other systematic effects. The toroidal harmonic of degree = 1 represents a residual spin of the reference frame, and should in principle be negligible thanks to the way the reference frame is constructed (see Sect. 5 below). The spheroidal harmonic of degree = 1 represents a ‘glide’ component of the proper motions that is expected as a consequence of the acceleration of the Solar system in the rest frame of remote sources (Gaia Collaboration 2021b). Although these terms need to be included in the VSH fit, the subsequent discussion of systematics in the Gaia-CRF3 proper motions only considers the VSH components of degree = 2,…, 12.

Formally, the powers and the mean powers are weighted sums of the squared VSH coefficients. Appendix A of Gaia Collaboration (2021b) demonstrates that such quantities are biased (overestimated) when the VSH coefficient are estimated in a least squares fit. Equation (A.1) of that appendix gives the exact formula to correct the bias. In particular, the corrected (unbiased) values of the mean powers are C^t=CtNt,$\hat C_\ell ^t = C_\ell ^t - N_\ell ^t,$(6) C^s=CsNs,$\hat C_\ell ^s = C_\ell ^s - N_\ell ^s,$(7)

where the noise corrections Nt$N_\ell ^t$ and Ns$N_\ell ^s$ are obtained by the same formulas as Ct$C_\ell ^t$ and Cs$C_\ell ^s$ but using the uncertainties of the VSH coefficients instead of the coefficients themselves. Explicit computations for the present data show that NtNs3.440.070.012$N_\ell ^t \simeq N_\ell ^s \simeq 3.44 - 0.07\ell - 0.01{\ell ^2}$ µas2 yr−2 (2 ≤ ≤ 12).

Figure 10 shows the corrected estimates C^t$\hat C_\ell ^t$ and C^s$\hat C_\ell ^s$ for = 2,…, 12. The ragged shape of (especially) the toroidal spectrum is very real, as shown by the error bars. One sees that the toroidal powers are substantially larger than the spheroidal ones for the odd degrees, while they are more similar for even degrees. We have no explanation for this but presume it is related to the specific parameters of the scanning law and basic angle.

In analogy with Eq. (32) of Lindegren et al. (2021b), it is convenient to consider the cumulative characteristics Rt(max)=14π=2max(2+1)C^t,${R^t}\left( {{\ell _{\max }}} \right) = \sqrt {{1 \over {4\pi }}\sum\limits_{\ell = 2}^{{\ell _{\max }}} {\left( {2\ell + 1} \right)\hat C_\ell ^t} } ,$(8) Rs(max)=14π=2max(2+1)C^s,${R^s}\left( {{\ell _{\max }}} \right) = \sqrt {{1 \over {4\pi }}\sum\limits_{\ell = 2}^{{\ell _{\max }}} {\left( {2\ell + 1} \right)\hat C_\ell ^s} } ,$(9) R(max)=14π=2max(2+1)(C^t+C^s)$R\left( {{\ell _{\max }}} \right) = \sqrt {{1 \over {4\pi }}\sum\limits_{\ell = 2}^{{\ell _{\max }}} {\left( {2\ell + 1} \right)\left( {\hat C_\ell ^t + \hat C_\ell ^s} \right)} } $(10)

quantifying the RMS variations of the toroidal, spheroidal and total proper motion systematics on angular scales ≳ 180°/max (Fig. 11). We note that the toroidal components completely dominate the systematics for ≥ 5.

Summarising all this information we can claim that the frame-independent RMS of the vector field representing systematic errors in proper motions is ≲12 µas yr−1 down to angular scales of ~15°. This level of systematic errors is low compared to the random errors of individual sources given in Table 2 and made it possible to reliably measure the subtle physical effect of the acceleration of the Solar system in Gaia EDR3 (Gaia Collaboration 2021b).

thumbnail Fig. 8

Positional uncertainty and source density distribution in Gaia-CRF3. Left: histogram of the positional uncertainty σpos,max for Gaia-CRF3 sources with five-parameter solutions (blue) and six-parameter solutions (magenta). Middle: scatter plot of σpos,max versus the G magnitude. The dashed curve is the smoothed median uncertainty versus magnitude. The green points are the sources matched to ICRF3 S/X. Right: fraction of sky covered by a given minimum density of Gaia-CRF3 sources at different levels of positional uncertainty (see text for further explanation).

thumbnail Fig. 9

Sky distributions of the parallax ϖ and the components of proper motion µα* and µδ. Median values over the pixels of HEALPix level 6 (pixel size ≃0.84 deg2) are shown. These maps use a Hammer–Aitoff projection in galactic coordinates, with l = b = 0 at the centre, north up, and l increasing from right to left.

thumbnail Fig. 10

Corrected mean toroidal and spheroidal powers C^t$\hat C_\ell ^t$ and C^s$\hat C_\ell ^s$ of the systematics in proper motions of the Gaia-CRF3 sources with five-parameter solutions. The error bars show the ±1 standard deviation of each point, calculated on the assumption that the 2 + 1 contributing VSH coefficients have Gaussian errors.

thumbnail Fig. 11

Corrected cumulative angular power spectrum of the systematic in proper motions of the Gaia-CRF3 sources with five-parameter solutions.

4 Gaia-CRF3 counterparts of ICRF3 sources

Radio-loud sources in Gaia-CRF3 deserve special attention as they provide a direct link between the ICRF implementations in the radio and optical regimes. In this section we focus on the properties of the Gaia-CRF3 sources matched to ICRF3 (Charlot et al. 2020) and a statistical comparison of the optical and radio positions.

4.1 Statistics of ICRF3 sources in Gaia-CRF3

Gaia-CRF3 contains a total of 4723 radio-loud sources, of which 3181 are optical counterparts of radio sources in ICRF3 (Charlot et al. 2020), while an additional 1542 sources are found only in OCARS. Because the ICRF3 sources have the best quality of the astrometry in the radio domain and represent the official realisation of the ICRF in the radio, we restrict the discussion in this section to the 3181 Gaia-CRF3 sources in common with ICRF3. This set is the union of 3142, 660, and 576 sources matching the ICRF3 sources in the S/X, K, and X/Ka bands, respectively (69.3, 80.1 and 85.0% of the corresponding catalogues). Most of the analysis below is further limited to the 3142 sources in ICRF3 S/X. Very few additional Gaia EDR3 sources are found only in the K or X/Ka bands of the ICRF3 (24 and 31 sources, respectively), and their overall properties are not too different from those of the S/X set.

From the sky distribution of the 3181ICRF3 sources in Gaia-CRF3 (Fig. 12) we see that a band along the Galactic plane and the region with δ < −30° (part of the fourth quadrant of the map in the Galactic coordinates) are underpopulated. This reflects the strong optical extinction in the Galactic plane and the fact that ICRF3 itself has a lower density of sources for δ < −30°.

The left panel in Fig. 13 shows the magnitude distribution of the 3142 Gaia-CRF3 sources in common with ICRF3 S/X and for the subset of 259 sources that are defining sources in ICRF3. As expected, there is no marked difference between the two sets, since the optical brightness plays no role in the ICRF3 selection of defining sources. The median G = 18.9 is about one magnitude brighter than for the whole Gaia-CRF3 (Table 2).

The central panel in Fig. 13 shows the distribution of the positional uncertainty σpos,max in Gaia-CRF3 for the same selections. The median uncertainty is 194 µas for the whole set of ICRF3 S/X sources, and 162 µas for the defining sources. These values are less than one half of the median positional uncertainty in the full Gaia-CRF3 sample, which can be attributed to their different distributions in G. Indeed, as shown by the middle panel of Fig. 8, the ICRF3 sources follow the same general variation of σpos,max with G as other Gaia-CRF3 sources. In other words, nothing in the Gaia astrometry suggests that the ICRF3 are better or worse than other Gaia-CRF3 sources of comparable magnitude.

The right panel in Fig. 13 shows the distribution of the positional uncertainty in the radio domain for the same sources. By coincidence, the median positional uncertainty of the 3142 sources in ICRF3, 197 µas, is practically the same as in Gaia-CRF3. For the subset of defining sources, however, the positional uncertainties in ICRF3 are much smaller (the median is 39 µas), as these sources were selected precisely for their high radioastrometric quality. In Gaia-CRF3 the defining sources are only marginally more accurate than the non-defining sources, consistent with the former being, on average, about 0.2 mag brighter (cf. Fig. 13, left panel).

thumbnail Fig. 12

Distribution of the ICRF3 sources in Gaia-CRF3 in Galactic coordinates. Sources from ICRF3 S/X are shown by red dots, from ICRF3 K by blue crosses, and from ICRF3 X/Ka by turquoise squares. This map uses a Hammer–Aitoff projection in galactic coordinates, with l = b = 0 at the centre, north up, and l increasing from right to left.

4.2 Positional differences

We now compare the individual positions of ICRF3 S/X sources in ICRF3 and Gaia-CRF3. Thanks to the high positional accuracy in both catalogues and the small cross-matching radius ∆max = 100 mas used, cross-matching errors between the two catalogues are highly unlikely and we can assume that the optical and radio emissions being compared originate in broadly the same extragalactic object. Positional differences ∆α* = (αEDR3αICRF3) cos δ and ∆δ = δEDR3δICRF3 are computed after propagating the ICRF3 positions from the ICRF3 epoch (J2015.0) to the Gaia-CRF3 epoch (J2016.0), using the precepts described in Charlot et al. (2020, Sect. 5.3); this takes into account the small shifts (<5 µas) expected from the Galactic acceleration.

The top panel of Fig. 14 is a scatter diagram of the positional differences. Of the 3142 sources, 127 (4%) have a positional difference >4 mas in either coordinate and fall outside the boundaries of the plot. The median separation ∆ = (∆α*2 + ∆δ2)1/2 is 0.516 mas for all 3142 sources, and 0.292 mas for the 259 defining sources (Fig. 15).

The distributions of ∆α* and ∆δ are roughly symmetric around zero with a median offset of +4 µas in right ascension and −6 µas in declination. Similarly, the scatter diagrams of the positional differences versus right ascension and declination (Fig. 16) show no significant systematics or trends in either coordinate. The distributions are very regular with a pronounced central concentration of width below 1 mas, plus a scatter of several hundred outliers at several mas. Furthermore, an analysis of the positional differences using VSH does not reveal any significant systematic effects. The statistical significance of the fitted coefficients is low, while the largest effects represented by the VSH expansion are located in the areas with low source density (around the Galactic plane and for δ < −30°).

The lower panel of Fig. 14 shows the same position differences as in the upper panel, but normalised by the quadratic combination of the uncertainties in the ICRF3 and Gaia-CRF3 catalogues. If the optical and radio centres of emission coincided, we would expect the normalised position differences to follow a normal distribution with standard deviation only slightly above 1.0. In fact, the distributions are significantly wider (the RSE3 is 1.92 in right ascension and 1.80 in declination), with fatter tails than expected for a Gaussian: 225 sources fall outside the plotted frame, and more than 100 are outside ±5 × RSE in each coordinate.

Both in ICRF3 and in Gaia-CRF3 there are significant correlations between the errors in right ascension and declination, and a useful statistic that takes this into account is XΔ2[ Δα*Δδ ]Cov(Δ)1[ Δα*Δδ ],$X_{\rm{\Delta }}^2 \equiv \left[ {\matrix{{{\rm{\Delta }}\alpha {\rm{*}}} & {{\rm{\Delta }}\delta } \cr} } \right]{\rm{Cov}}{\left( {\rm{\Delta }} \right)^{ - 1}}\left[ {\matrix{{{\rm{\Delta }}\alpha {\rm{*}}} \cr {{\rm{\Delta }}\delta } \cr} } \right],$(11)

where Cov(∆) is the covariance matrix given by Eq. (E.7). This is completely analogous to the statistic Xμ2$X_{\rm{\mu }}^2$ for the proper motions, defined in Eq. (3), and like Δ it is independent of the reference frame used4. The distributions of XΔ2$X_\Delta ^2$ are shown in Fig. 17. For Gaussian errors we expect exponential distributions. The dashed lines are the functions 1450exp[ 12(XΔ/1.3)2 ]$1450\exp \left[ { - {1 \over 2}{{\left( {{X_{\rm{\Delta }}}{\rm{/1}}{\rm{.3}}} \right)}^2}} \right]$ and 120exp[ 12(XΔ/1.3)2 ]$120\exp \left[ { - {1 \over 2}{{\left( {{X_{\rm{\Delta }}}{\rm{/1}}{\rm{.3}}} \right)}^2}} \right]$ fitted by eye to the inner parts of the histograms (XΔ210)$\left( {X_\Delta ^2 \mathbin{\lower.3ex\hbox{$\buildrel<\over {\smash{\scriptstyle\sim}\vphantom{_x}}$}} 10} \right)$. The integrals of these exponential are 2450.5 and 202.8, that is 78% of respective sample. One possible interpretation of this is that the (combined) uncertainties are generally underestimated by a factor 1.3, and that about 22% of the objects in the samples have radio-optical positional offsets that are statistically significant at the current level of accuracy in the two catalogues; this fraction is, moreover, the same for the defining sources as for the full sample. Using a criterion such as XΔ2>25$X_\Delta ^2 > 25$ to pick out the individual sources that have a significant offset yields 433 sources, 36 of which are defining ones, or 14% of the respective samples. The angular offsets of these sources range from 0.17 mas to tens of mas, with a median value of 2.3 mas for the 433 sources from the full sample, and 0.84 mas for the 36 defining sources.

We note that the median Xμ2$X_{\rm{\mu }}^2$ is slightly higher (2.36) for the 433 sources with XΔ2>25$X_\Delta ^2 > 25$ than for the full sample of 3142 sources (1.82). This could indicate that the positional offsets are not constant over the time scale of the Gaia observations.

These results confirm and amplify the conclusion reached already with Gaia DR 1 and Gaia DR2, namely that radio–optical offsets are significant for a large fraction of the ICRF3 sources. The correlation with jets has been well established, for example by Petrov et al. (2019) and Plavin et al. (2019), and the physical signature of offsets is also supported in the studies by Lambert et al. (2021) and Liu et al. (2021).

The question now is therefore not so much the existence of such offsets but rather what consequences they have for the relation between the radio and optical frames. A plain match between the two catalogues leaves sources with true differences in position, much larger than the expected random scatter. These differences will not decrease with future versions of the radio and optical frames. Although this does not impact the quality of either realisation, it could limit the attainable accuracy of their mutual alignment. To circumvent this problem, specific procedures may be required, such as the selection of a common set of reference sources for the alignment. More generally, the importance of physical deviations between the reference frames at different wavelength bands needs to be better understood. This problem will be even more important for the future high-accuracy astrometric observations in the near infrared (Hobbs et al. 2021).

A closely related question concerns the interpretation and use of the reference frame at different epochs. ICRF3 is the first realisation of the VLBI frame that has an epoch (2015.0) and an official procedure to transform from this epoch to another (Charlot et al. 2020). The reason for the epoch transformation is the effect of the acceleration of the Solar system relative to the remote celestial sources, often referred to as ‘Galactic acceleration, although it may contain other components as well. If ignored, the Galactic acceleration will slowly distort the frame at a rate of about 6 µas yr−1. The effect should therefore be taken into account when comparing ICRF3 with the Gaia-CRF, as was indeed done for this paper. The transformation depends on three parameters that can be determined by observation; for ICRF3, however, a set of fixed values are specified. In case of Gaia, full astrometric solutions including parallaxes and proper motions are computed for all Gaia-CRF sources. Each version of Gaia-CRF has a particular epoch common to that of the astrometric catalogue of the corresponding Gaia DR; for Gaia-CRF3 this is 2016.0. No specific correction for the Galactic acceleration is applied or recommended for Gaia-CRF. However, the effect is implicitly contained in the measured proper motions and a normal epoch transformation would take also this effect into account, albeit with a positional accuracy that degrades with the epoch difference. In Gaia Collaboration (2021b) the components of the acceleration of the Solar system were determined from the proper motions of Gaia-CRF3. Although that determination differs from the ICRF3 prescription by less than 1 µas yr−1, there thus is a conceptual difference between how the effect is treated in the two reference frames.

thumbnail Fig. 13

Distribution of magnitudes and formal positional uncertainties for the sources common to Gaia-CRF3 and ICRF3 S/X. The full set of 3142 sources is shown by the grey histograms, and the 259 defining sources by the red histograms. Left: distribution of magnitudes G. Middle: distribution of positional uncertainties in Gaia EDR3. There are 126 sources with σpos,maxEDR3>1$\sigma _{{\rm{pos,max}}}^{{\rm{EDR3}}} > 1$ mas, of which eight are defining sources in ICRF3. Right: distribution of positional uncertainties in ICRF3 S/X. There are 155 sources with σpos,maxICRF3>1$\sigma _{{\rm{pos,max}}}^{{\rm{ICRF3}}} > 1$ mas, of which one is a defining source.

thumbnail Fig. 14

Positional differences in α and δ between Gaia-CRF3 and ICRF3 S/X for the 3142 common sources. The 259 defining sources in ICRF3 are shown as filled red circles, other sources as black crosses. Top: differences in mas. Errors bars representing the quadratically combined uncertainties in the two catalogues are shown only for the defining sources. Bottom: same differences divided by the quadratically combined uncertainties in the two catalogues. No error bars are shown in this plot, as they would all be ±1 unit.

thumbnail Fig. 15

Histograms of the separations Δ between Gaia-CRF3 and ICRF3 S/X for the full set of 3142 common sources (blue; the median is 0.516 mas) and the 259 defining sources (red; the median is 0.292 mas).

thumbnail Fig. 16

Positional differences between Gaia-CRF3 and ICRF3 S/X versus a (top) and sin δ (bottom). Defining sources in ICRF3 are shown as filled red circles, with errors bars representing the combined positional uncertainties in the two catalogues; other sources are shown as black crosses.

thumbnail Fig. 17

Histograms of the normalised separations XΔ2$X_\Delta ^2$ between Gaia-CRF3 and ICRF3 S/X for the full set of 3142 common sources (blue) and the 259 defining sources (red). The dashed curves are the exponential functions described in the text. A total of 195 sources have XΔ > 10 and fall outside this plot (10 of them are defining sources in ICRF3).

5 Use of Gaia-CRF3 sources in the frame rotator of AGIS 3.2

In the astrometric core solution for Gaia (the Astrometric Global Iterative Solution, AGIS; Lindegren et al. 2012), the frame rotator ensures that the reference frame of the astrometric solution complies, in a global sense, with the ICRS requirements. The frame rotator uses a pre-selected set of QSO-like sources, including optical counterparts of ICRF3 sources, to estimate the six rotation parameters (spin and orientation) for the solutions obtained in successive iterations, and corrects them in such a way that the QSO-like sources have no net rotation and are globally aligned with the ICRF3 sources.

The frame rotator sources are a subset of the much larger set of primary sources (about 14.3 million for EDR3) that contribute to the estimation of attitude, calibration, and global parameters in AGIS. Special selection rules apply to the primary sources; in particular, they must have colour information from Gaia s BP and RP photometers of adequate quality for the calibration of colour-dependent instrumental effects. The selection of primary sources, including the frame rotator sources, is revised with each new release of astrometric data, but for logistic reasons it is mainly based on the astrometric and photometric data of the previous release. Thus, for Gaia EDR3 the selection of frame rotator sources was primarily based on the 556 869 QSO-like sources, including 2820 ICRF3 counterparts, that constitute the Gaia-CRF2 (Gaia Collaboration 2018). However, not all of them had adequate colour information or passed the more stringent astrometric criteria for primary sources in Gaia EDR3, and the number of frame rotator sources used for EDR3 is therefore substantially smaller (429 249 sources). All of them have five-parameter solutions in Gaia EDR3 and were filtered by a procedure similar to the one used for Gaia-CRF3 but using a preliminary version of the Gaia EDR3 catalogue known as AGIS 3.1 (Lindegren et al. 2021b).

Details of the frame rotator as used for Gaia EDR3 are given in Appendix E. As described in the appendix, the orientation (ε) and spin (ω) are estimated by separate least-squares solutions to the position differences of the ICRF3 sources (for the orientation) and to the proper motions of the many more QSO-like sources (for the spin). The algorithm includes an iterative procedure to identify and eliminate outliers, that is sources that deviate very significantly from the overall fit in one or both coordinates. Thus, we need to distinguish between the sources that were ‘considered by the algorithm, and the subset of non-outliers that was actually ‘used to determine the orientation or spin. The auxiliary table gaiaedr3.frame_rotator_source in the Gaia Archive lists the 429 249 frame rotator sources for Gaia EDR3 and contains flags to indicate which of them were considered and used for the orientation and spin solutions. As specified in the table, 428 034 of the sources considered for the spin were actually used to determine the spin in the final iteration of the primary solution, while 1215 (0.3%) were rejected as outliers. Of the 2269 sources with counterparts in ICRF3 S/X considered for the orientation, 2007 sources were actually used to determine the orientation, while 262 (11.5%) were rejected as outliers. Eight of the sources were rejected in both the spin and orientation solutions.

Because the preliminary AGIS 3.1 astrometry had to be used to select the frame rotator sources, it turns out that 46 of the latter do not pass the stricter criteria for the Gaia-CRF3 selection described in Sect. 2. These sources are thus listed in gaiaedr3.frame_rotator_source although they are not part of the Gaia-CRF3. This slight inconsistency has minimal impact on the resulting reference frame, as only two of the 46 sources were accepted by the frame rotator algorithm for the determination of the spin, and none of them for the determination of the orientation.

If the frame rotator algorithm is applied to this subset of sources, using the astrometric data as given in Gaia EDR3, the resulting estimates of the orientation and spin vectors are not strictly zero, as might be expected. For the spin, we obtain |ω| ≃ 0.8 µas yr−1 with an uncertainty of about 0.4 µas yr−1 in each axis; for the orientation, the result is |ε(2016.0) | ≃ 6.2 µas with an uncertainty of about 7 µas per axis. These deviations from zero can be traced to small differences in the positions and proper motions between the last iteration of the primary solution and the secondary source updates, from which the final data are taken. The reason for this (undesirable) effect is currently under investigation.

If the same algorithm is instead applied to the full set of Gaia-CRF3 sources, the result for the spin is ω=[ 3.44±0.30+1.57±0.281.24±0.32 ] μas yr1,$\omega = \left[ {\matrix{{ - 3.44 \pm 0.30} \cr { + 1.57 \pm 0.28} \cr { - 1.24 \pm 0.32} \cr} } \right]\,{\rm{\mu as}}\,{\rm{y}}{{\rm{r}}^{ - 1}},$(12)

in which 1 607 289 (99.6%) of the sources are accepted; for the orientation correction the result is ε(2016.0)=[ +3.41±6.71+8.99±6.501.47±6.04 ] μas,$\varepsilon \left( {2016.0} \right) = \left[ {\matrix{{ + 3.41 \pm 6.71} \cr { + 8.99 \pm 6.50} \cr { - 1.47 \pm 6.04} \cr} } \right]\,{\rm{\mu as,}}$(13)

in which 2738 (87.1%) of the 3142 ICRF3 S/X sources are accepted. Generally speaking, the result is sensitive to the precise selection of sources at the level of a few µas yr−1 in spin and several µas in the orientation.

6 Conclusions and prospects

In this paper we have reported on the construction and properties of the Gaia-CRF3, a celestial reference frame materialised in the optical domain resulting from the Gaia all-sky survey. Although built on only 33 months of data (from August 2014 to May 2017), the astrometric accuracy is already very similar to what was expected for the 5-yr nominal mission, and with a larger set of QSOs than originally anticipated.

The Gaia-CRF3 consists of an astrometric catalogue of more than 1.6 million QSOs selected using a number of existing catalogues, with further filtering based on the Gaia data to ensure that the sample is as much as possible free of stellar contaminants. The G magnitude extends from ~ 13 to ~ 21 mag, with a peak density at G ≃ 20.5 mag. The formal uncertainty is primarily ruled by the G magnitude, with a typical precision of 1 mas at G = 20.6, 0.4 mas at G = 20 and 0.1 mas at G = 17.9. There are 32 000 sources with formal position uncertainty < 0.1 mas and 210000 with <0.2 mas. Gaia-CRF3 is aligned to the radio ICRF3 by minimising the position differences for about 2000 common sources with high-quality solutions.

The comparison with ICRF3 shows that the floor formal accuracies are comparable in the optical and radio domains and better than 0.05 mas for the best measured sources. However the normalised positional differences are not compatible with a standard normal distribution and an estimated ~20% of the common sources have statistically significant positional radio-optical offsets. For many sources, if not all, these offsets are probably real, although this is hard to prove without a detailed investigation at the source level (Petrov et al. 2019; Plavin et al. 2019). The radio and optical CRFs are both internally very consistent and materialise the ICRS in their respective wavelength range, but in view of the detected offsets it remains unclear how much their mutual consistency can be improved without a better knowledge of the physical processes responsible for the light and radio emissions. It is hoped that high-resolution observations and improved radio and optical positions will further characterise these offsets and lead to a better understanding of the underlying mechanisms (Johnson et al. 2019).

It should be understood that Gaia-CRF3 is entirely defined by the Gaia EDR3 data for the ~ 1.6 million QSO-like sources discussed in this paper. These data include not only the barycentric positions of the sources at the reference epoch 2016.0, but also their proper motions (and parallaxes). Although individually insignificant in the vast majority of cases, these data provide invaluable information on the quality of the data and on possible systematic effects in the reference frame, both of instrumental and astrophysical origin. The reference frame defined by the proper motions of the Gaia-CRF3 sources is globally non-rotating at a level of a few µas yr-1. It is not possible to quantify more precisely than this how well the Gaia-CRF3 complies with the ‘no rotation’ condition, because different subsets of the Gaia-CRF3 lead to determinations of the spin that differ on this level, which is also consistent with the estimated RMS level of systematics on large angular scales (low values of max) depicted in Fig. 11. As mentioned in Sect. 1, the secondary frame defined by the positions and proper motions of stars in Gaia EDR3, in particular at bright magnitudes, may have significantly higher systematics.

Gaia-CRF3 is by far the best realisation to date of the celestial reference frame in the visible. It meets the ICRS requirements for a realisation based on extragalactic sources. The IAU, during its XXXIst General Assembly in August 2021, resolved (Resolution B3) that ICRF3 at radio wavelengths and Gaia-CRF3 in the optical are realisations (ICRF) of the International Celestial Reference System (ICRS) to be used as a fundamental standard in their respective domain.

A new version of the Gaia-CRF comes out with every release of a new astrometric solution based on a larger volume of Gaia data. The current version, Gaia-CRF3, is common to Gaia EDR3 and Gaia DR3, for which the basic astrometric data are identical. Two more releases are expected, Gaia DR4 based on 66 months of observations, and Gaia DR5 using up to 120 months of data. The positional accuracy improves roughly as T−1/2, where T is the time baseline, leading to an improvement by almost a factor two for the final version compared with Gaia-CRF3.

It is expected that the systematics will also be improved in the future releases, although this cannot easily be quantified. A dramatic illustration of the improvement from Gaia-CRF2 to Gaia-CRF3 is the successful use of the latter to determine the acceleration of the Solar System (Gaia Collaboration 2021b). Of paramount importance for the quality of the reference frame is the accuracy of proper motions, which impacts both the astrometric filtering of the sources (thus reducing stellar contamination) and the accuracy of the positions at earlier and later epochs. Proper motions generally improve as T−3/2, which could give a factor seven more precise proper motions in the final release. Corresponding improvements can be expected for the determination of the reference frame (especially, its spin) as well as for the various physical effects that may be seen in the proper motions (e.g. the solar system acceleration; see Sect. 8 in Gaia Collaboration 2021b for a brief discussion of other effects).

As mentioned in Sect. 3, up to 4 million QSOs could be recognised by Gaia. In future releases the identification of the QSOs will be refined using a combination of all kinds of Gaia data: an improved astrometry, spectrophotometry, and variability analysis. Special efforts will be made to better identify QSOs in the crowded areas and to improve the homogeneity of the sky coverage (a reduced source density at the Galactic plane will however persist due to the effects of Galactic extinction). The prospect for the future versions of the Gaia-CRF therefore lies not only in a further improvement of positional accuracy, but even more in the number of sources, a better homogeneity of the source distribution of the sky, the purity of the selection and in the overall consistency of the solution.

Acknowledgements

The acknowledgements are available in Appendix A.

Appendix A Acknowledgments

This work presents results from the European Space Agency (ESA) space mission Gaia. Gaia data are being processed by the Gaia Data Processing and Analysis Consortium (DPAC). Funding for the DPAC is provided by national institutions, in particular the institutions participating in the Gaia MultiLateral Agreement (MLA). The Gaia mission website is https://www.cosmos.esa.int/gaia. The Gaia archive website is https://archives.esac.esa.int/gaia.

The Gaia mission and data processing have financially been supported by, in alphabetical order by country:

the Algerian Centre de Recherche en Astronomie, Astrophysique et Géophysique of Bouzareah Observatory; the Austrian Fonds zur Förderung der wissenschaftlichen Forschung (FWF) Hertha Firnberg Programme through grants T359, P20046, and P23737; the BELgian federal Science Policy Office (BELSPO) through various PROgramme de Développement d’Expériences scientifiques (PRODEX) grants and the Polish Academy of Sciences - Fonds Wetenschappelijk Onderzoek through grant VS.091.16N, and the Fonds de la Recherche Scientifique (FNRS), and the Research Council of Katholieke Universiteit (KU) Leuven through grant C16/18/005 (Pushing AsteRoseismology to the next level with TESS, GaiA, and the Sloan DIgital Sky SurvEy – PARADISE); the Brazil-France exchange programmes Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) and Coordenação de Aperfeicoamento de Pessoal de Nível Superior (CAPES) – Comité Français d’Evaluation de la Coopération Universitaire et Scientifique avec le Brésil (COFECUB); the Chilean Agencia Nacional de Investigación y Desarrollo (ANID) through Fondo Nacional de Desarrollo Científico y Tecnológico (FONDECYT) Regular Project 1210992 (L. Chemin); the National Natural Science Foundation of China (NSFC) through grants 11573054, 11703065, and 12173069, the China Scholarship Council through grant 201806040200, and the Natural Science Foundation of Shanghai through grant 21ZR1474100; the Tenure Track Pilot Programme of the Croatian Science Foundation and the École Polytechnique Fédérale de Lausanne and the project TTP-2018-07-1171 ‘Mining the Variable Sky’, with the funds of the Croatian-Swiss Research Programme; the Czech-Republic Ministry of Education, Youth, and Sports through grant LG 15010 and INTER-EXCELLENCE grant LTAUSA18093, and the Czech Space Office through ESA PECS contract 98058; the Danish Ministry of Science; the Estonian Ministry of Education and Research through grant IUT40-1; the European Commission’s Sixth Framework Programme through the European Leadership in Space Astrometry (ELSA) Marie Curie Research Training Network (MRTN-CT-2006-033481), through Marie Curie project PIOF-GA-2009-255267 (Space AsteroSeismology & RR Lyrae stars, SAS-RRL), and through a Marie Curie Transfer-of-Knowledge (ToK) fellowship (MTKD-CT-2004-014188); the European Commission’s Seventh Framework Programme through grant FP7-606740 (FP7-SPACE-2013-1) for the Gaia European Network for Improved data User Services (GENIUS) and through grant 264895 for the Gaia Research for European Astronomy Training (GREAT-ITN) network; the European Cooperation in Science and Technology (COST) through COST Action CA18104 ‘Revealing the Milky Way with Gaia (MW-Gaia)’; the European Research Council (ERC) through grants 320360, 647208, and 834148 and through the European Union’s Horizon 2020 research and innovation and excellent science programmes through Marie Sklodowska-Curie grant 745617 (Our Galaxy at full HD – Gal-HD) and 895174 (The build-up and fate of self-gravitating systems in the Universe) as well as grants 687378 (Small Bodies: Near and Far), 682115 (Using the Magellanic Clouds to Understand the Interaction of Galaxies), 695099 (A sub-percent distance scale from binaries and Cepheids – CepBin), 716155 (Structured ACCREtion Disks – SACCRED), 951549 (Sub-percent calibration of the extra-galactic distance scale in the era of big surveys – UniverScale), and 101004214 (Innovative Scientific Data Exploration and Exploitation Applications for Space Sciences – EXPLORE); the European Science Foundation (ESF), in the framework of the Gaia Research for European Astronomy Training Research Network Programme (GREAT-ESF); the European Space Agency (ESA) in the framework of the Gaia project, through the Plan for European Cooperating States (PECS) programme through contracts C98090 and 4000106398/12/NL/KML for Hungary, through contract 4000115263/15/NL/IB for Germany, and through PROgramme de Développement d’Expériences scientifiques (PRODEX) grant 4000127986 for Slovenia; the Academy of Finland through grants 299543, 307157, 325805, 328654, 336546, and 345115 and the Magnus Ehrnrooth Foundation; the French Centre National d’Études Spatiales (CNES), the Agence Nationale de la Recherche (ANR) through grant ANR-10-IDEX-0001-02 for the ‘Investissements d’avenir’ programme, through grant ANR-15-CE31-0007 for project ‘Modelling the Milky Way in the Gaia era’ (MOD4Gaia), through grant ANR-14-CE33-0014-01 for project ‘The Milky Way disc formation in the Gaia era’ (ARCHEOGAL), through grant ANR-15-CE31-0012-01 for project ‘Unlocking the potential of Cepheids as primary distance calibrators’ (UnlockCepheids), through grant ANR-19-CE31-0017 for project ‘Secular evolution of galxies’ (SEGAL), and through grant ANR-18-CE31-0006 for project ‘Galactic Dark Matter’ (GaDaMa), the Centre National de la Recherche Scientifique (CNRS) and its SNO Gaia of the Institut des Sciences de l’Univers (INSU), its Programmes Nationaux: Cosmologie et Galaxies (PNCG), Gravitation Références Astronomie Métrologie (PNGRAM), Planétologie (PNP), Physique et Chimie du Milieu Interstellaire (PCMI), and Physique Stellaire (PNPS), the ‘Action Fédératrice Gaia’ of the Observatoire de Paris, the Région de Franche-Comté, the Institut National Polytechnique (INP) and the Institut National de Physique nucléaire et de Physique des Particules (IN2P3) co-funded by CNES; the German Aerospace Agency (Deutsches Zentrum für Luft- und Raumfahrt e.V., DLR) through grants 50QG0501, 50QG0601, 50QG0602, 50QG0701, 50QG0901, 50QG1001, 50QG1101, 50QG1401, 50QG1402, 50QG1403, 50QG1404, 50QG1904, 50QG2101, 50QG2102, and 50QG2202, and the Centre for Information Services and High Performance Computing (ZIH) at the Technische Universität Dresden for generous allocations of computer time; the Hungarian Academy of Sciences through the Lendület Programme grants LP2014-17 and LP2018-7 and the Hungarian National Research, Development, and Innovation Office (NKFIH) through grant KKP-137523 (‘SeismoLab’); the Science Foundation Ireland (SFI) through a Royal Society - SFI University Research Fellowship (M. Fraser); the Israel Ministry of Science and Technology through grant 3-18143 and the Tel Aviv University Center for Artificial Intelligence and Data Science (TAD) through a grant; the Agenzia Spaziale Italiana (ASI) through contracts I/037/08/0, I/058/10/0, 2014-025-R.0, 2014-025-R.1.2015, and 2018-24-HH.0 to the Italian Istituto Nazionale di Astrofisica (INAF), contract 2014-049-R.0/1/2 to ΓNAF for the Space Science Data Centre (SSDC, formerly known as the ASI Science Data Center, ASDC), contracts I/008/10/0, 2013/030/I.0, 2013-030-I.0.1-2015, and 2016-17-I.0 to the Aerospace Logistics Technology Engineering Company (ALTEC S.p.A.), INAF, and the Italian Ministry of Education, University, and Research (Ministero dell’Istruzione, dell’Università e della Ricerca) through the Premiale project ‘Mining The Cosmos Big Data and Innovative Italian Technology for Frontier Astrophysics and Cosmology’ (MITiC); the Netherlands Organisation for Scientific Research (NWO) through grant NWO-M-614.061.414, through a VICI grant (A. Helmi), and through a Spinoza prize (A. Helmi), and the Netherlands Research School for Astronomy (NOVA); the Polish National Science Centre through HARMONIA grant 2018/30/M/ST9/00311 and DAINA grant 2017/27/L/ST9/03221 and the Ministry of Science and Higher Education (MNiSW) through grant DIR/WK/2018/12; the Portuguese Fundação para a Ciência e a Tecnologia (FCT) through national funds, grants SFRH/BD/128840/2017 and PTDC/FIS-AST/30389/2017, and work contract DL 57/2016/CP1364/CT0006, the Fundo Europeu de Desenvolvimento Regional (FEDER) through grant POCI-01-0145-FEDER-030389 and its Programa Operacional Competitividade e Internacionalização (COMPETE2020) through grants UIDB/04434/2020 and UIDP/04434/2020, and the Strategic Programme UIDB/00099/2020 for the Centro de Astrof sica e Gravitação (CENTRA); the Slovenian Research Agency through grant P1-0188; the Spanish Ministry of Economy (MINECO/FEDER, UE), the Spanish Ministry of Science and Innovation (MICIN), the Spanish Ministry of Education, Culture, and Sports, and the Spanish Government through grants BES-2016-078499, BES-2017-083126, BES-C-2017-0085, ESP2016-80079-C2-1-R, ESP2016-80079-C2-2-R, FPU16/03827, PDC2021-121059-C22, RTI2018-095076-B-C22, and TIN2015-65316-P (‘Computación de Altas Prestaciones VII’), the Juan de la Cierva Incorporación Programme (FJCI-2015-2671 and IJC2019-04862-I for F. Anders), the Severo Ochoa Centre of Excellence Programme (SEV2015-0493), and MICIN/AEI/10.13039/501100011033 (and the European Union through European Regional Development Fund ‘A way of making Europe’) through grant RTI2018-095076-B-C21, the Institute of Cosmos Sciences University of Barcelona (ICCUB, Unidad de Excelencia ‘Mar a de Maeztu’) through grant CEX2019-000918-M, the University of Barcelona’s official doctoral programme for the development of an R+D+i project through an Ajuts de Personal Investigador en Formació (APIF) grant, the Spanish Virtual Observatory through project AyA2017-84089, the Galician Regional Government, Xunta de Galicia, through grants ED431B-2021/36, ED481A-2019/155, and ED481A-2021/296, the Centro de Investigación en Tecnolog as de la Información y las Comunicaciones (CITIC), funded by the Xunta de Galicia and the European Union (European Regional Development Fund – Galicia 2014-2020 Programme), through grant ED431G-2019/01, the Red Española de Supercomputación (RES) computer resources at MareNostrum, the Barcelona Supercomputing Centre – Centro Nacional de Supercomputación (BSC-CNS) through activities AECT-2017-2-0002, AECT-2017-3-0006, AECT-2018-1-0017, AECT-2018-2-0013, AECT-2018-3-0011, AECT-2019-1-0010, AECT-2019-2-0014, AECT-2019-3-0003, AECT-2020-1-0004, and DATA-2020-1-0010, the Departament d’Innovació, Universitats i Empresa de la Generalitat de Catalunya through grant 2014-SGR-1051 for project ‘Models de Programació i Entorns d’Execució Parallels’ (MPEXPAR), and Ramon y Cajal Fellowship RYC2018-025968-I funded by MICIN/AEI/10.13039/501100011033 and the European Science Foundation (‘Investing in your future’); the Swedish National Space Agency (SNSA/Rymdstyrelsen); the Swiss State Secretariat for Education, Research, and Innovation through the Swiss Activités Nationales Complémentaires and the Swiss National Science Foundation through an Eccellenza Professorial Fellowship (award PCEFP2_194638 for R. Anderson); the United Kingdom Particle Physics and Astronomy Research Council (PPARC), the United Kingdom Science and Technology Facilities Council (STFC), and the United Kingdom Space Agency (UKSA) through the following grants to the University of Bristol, the University of Cambridge, the University of Edinburgh, the University of Leicester, the Mullard Space Sciences Laboratory of University College London, and the United Kingdom Rutherford Appleton Laboratory (RAL): PP/D006511/1, PP/D006546/1, PP/D006570/1, ST/I000852/1, ST/J005045/1, ST/K00056X/1, ST/K000209/1, ST/K000756/1, ST/L006561/1, ST/N000595/1, ST/N000641/1, ST/N000978/1, ST/N001117/1, ST/S000089/1, ST/S000976/1, ST/S000984/1, ST/S001123/1, ST/S001948/1, ST/S001980/1, ST/S002103/1, ST/V000969/1, ST/W002469/1, ST/W002493/1, ST/W002671/1, ST/W002809/1, and EP/V520342/1.

Further acknowledgments can be found at https://gea.esac.esa.int/archive/documentation/GEDR3/Miscellaneous/sec_acknowl/.

Appendix B Source distributions in external catalogues

Figure B.1 shows the distribution of the sources in the 17 external catalogues used for Gaia-CRF3 as discussed in Sect. 2 and listed in Table 1. The maps show all the sources in the respective catalogue, irrespective of whether they are also in Gaia-CRF3. For the VLBI and blazar catalogues (ICRF3 S/X, ICRF3 K, ICRF3 X/Ka, Roma-BZCAT, 2WHSP, and the ALMA calibrators) that have a small source density on the sky, individual sources are plotted. All three parts of ICRF3 are shown on one plot using different colours. For the other catalogues, the maps show the density of sources per square degree, computed from the source counts per pixel using HEALPix of level 6 (pixel size ≃ 0.84 deg2). One sees the strong inhomogeneity of the input catalogues caused by the special observational programmes not covering the whole sky (e.g. SDSS and LAMOST), by the observational schedule of the WISE satellite (AllWISE, R90, and C75), and by the artificial de-selection of certain crowded areas applied by the authors of some catalogues (e.g. the Galactic plane and the LMC and SMC areas in R90, C75, and Gaia-unWISE). It is also obvious from the maps that not all catalogues are independent. For example, one directly sees that LQRF and LQAC-5 considered the results of 2QZ and that SDSS DR14Q was used in LQAC-5. Detailed information on all these aspects can be found in the corresponding original publications cited in Table 1. All these inhomogeneities contribute to the resulting inhomogeneity of Gaia-CRF3, the density of which is shown in Fig. 4.

thumbnail Fig. B.1

Distribution of the sources in the external catalogues listed in Table 1. These maps use a Hammer-Aitoff projection in galactic coordinates, with l = b = 0 at the centre, north up, and l increasing from right to left.

Appendix C Access to the Gaia-CRF3 data

Information on the Gaia-CRF3 sources is distributed over several tables in the Gaia Archive.5 Table gaiaedr3.agn_cross_id lists all sources (source_id) of Gaia-CRF3 together with the name of one external catalogue where the source can also be found as well as the name of the source in that catalogue. Table gaiaedr3.frame_rotator_source contains information on the sources used by the frame rotator (see Sect. 5) to define the orientation and spin of the Gaia EDR3 catalogue. Together with the Gaia EDR3 source identifier source_id, it contains the boolean flags considered_for_reference_frame_orientation and considered_for_reference_frame_spin to indicate if the source was considered by the frame rotator algorithm to fix the orientation and spin of the Gaia EDR3 catalogue, respectively. The frame rotator algorithm has a mechanism to detect outliers (see Appendix E), and another pair of boolean flags, used_for_reference_frame_orientation and used_for_reference_frame_spin, to specify which sources were used to fix, respectively, the orientation and rotation of the Gaia EDR3 catalogue. The sources that were considered but not used are the ones identified as outliers by the frame rotator algorithm.

Using these two tables and the main Gaia EDR3 astrometric table gaiaedr3.gaia_source, one can retrieve astrometric data of the Gaia-CRF3 sources and their subsets. We give here a few useful examples of ADQL queries that can be directly used in the Gaia Archive. The query

SELECT agn.source_name_in_catalogue, agn.catalogue_name, edr3.*
FROM gaiaedr3.agn_cross_id AS agn
INNER JOIN gaiaedr3.gaia_source AS edr3
USING (source_id)

gives a table containing astrometric data for all 1 614173 Gaia-CRF3 sources. Adding a conditional clause at the end of the same query,

SELECT agn.source_name_in_catalogue, agn.catalogue_name, edr3.*
FROM gaiaedr3.agn_cross_id AS agn
INNER JOIN gaiaedr3.gaia_source AS edr3
USING (source_id)
WHERE edr3.astrometric_params_solved = 31

allows one to retrieve the astrometric data for only the 1 215 942 Gaia-CRF3 sources that have five-parameter astrometric solutions. This dataset was used for example in Gaia Collaboration (2021b) to measure the acceleration of the Solar System. Changing ‘31’ to ‘95’ in the last line, one gets instead the 398 231 Gaia-CRF3 sources with six-parameter astrometric solutions.

For the sources used to define the orientation and spin of the Gaia EDR3 catalogue the following two queries are useful. The query

SELECT sub.source_name_in_catalogue, sub.catalogue_name, edr3.*
FROM (
SELECT frame.source_id, agn.source_name_in_catalogue, agn.catalogue_name
FROM gaiaedr3.frame_rotator_source AS frame
LEFT OUTER JOIN gaiaedr3.agn_cross_id AS agn
USING (source_id)
WHERE frame.used_for_reference_frame_orientation=‘true’
) AS sub
INNER JOIN gaiaedr3.gaia_source AS edr3
USING (source_id)

gives data for the 2007 sources that were used to define the orientation of the Gaia EDR3 catalogue, while

SELECT sub.source_name_in_catalogue,
sub.catalogue_name, edr3.*
FROM (
SELECT frame.source_id, agn.source_name_in_catalogue, agn.catalogue_name
FROM gaiaedr3.frame_rotator_source AS frame
LEFT OUTER JOIN gaiaedr3.agn_cross_id AS agn
USING (source_id)
WHERE frame.used_for_reference_frame_spin = ‘true’
) AS sub
INNER JOIN gaiaedr3.gaia_source AS edr3
USING (source_id)

gives data for the 428 034 sources used to define the rotation (spin) of Gaia EDR3. The field names used_for_… can be replaced by considered_for_… to get the data of all sources considered by the reference frame algorithm. As discussed in Sect. 5, all sources that were considered by the frame rotator algorithm for fixing the orientation of the Gaia EDR3 catalogue also belong to Gaia-CRF3. However, among the sources that were considered for fixing the spin, there are 46 sources that are not part of Gaia-CRF3. Two of them were used to fix the spin. For those 46 or 2 sources, the respective queries return empty names of the source and the external catalogue.

As described in Sect. 2, the Gaia-CRF3 sources resulted from a filtered cross-match between the Gaia EDR3 catalogue and a number of external AGN/QSO catalogues. The information on which sources were found in the different external catalogues is given in table gaiadr3.gaia_crf3_xm. We note that gaiaedr3.agn_cross_id cannot be used to retrieve all Gaia-CRF3 sources from a given external catalogue, because a Gaia-CRF3 source may be found in several external catalogues but gaiaedr3.agn_cross_id only gives its identifier in one of them. To obtain a list of the Gaia-CRF3 sources that were selected from a specific external catalogue one can use the query:6

SELECT xm.$$$_name, edr3.*
FROM gaiaedr3.gaia_source AS edr3
INNER JOIN gaiadr3.gaia_crf3_xm AS xm
USING (source_id)
WHERE xm.$$$ = ’true’

where ‘ $$$’ should be replaced by the catalogue code from column 6 of Table 1. For the sources in the three ICRF3 catalogues (the catalogue codes icrf3sx, icrf3k, icrf3xka) gaiadr3.gaia_crf3_xm provides two source designations as also given in the original ICRF3 catalogues: the ICRF and IERS designations as icrf_name and iers_name, respectively. Additionally, the above ADQL query with the catalogue code ‘b19’ can be used to see the 920 599 Gaia-CRF3 sources that were also found in the catalogue of Bailer-Jones et al. (2019).

We note that when considering an external catalogue ‘$$$’ some sources were rejected only because a lower quality of astrometry used in that catalogue resulted in the cross-match distance ∆ being too large to satisfy Eq. (5) in Sect. 2. For those sources the field gaiadr3.gaia_crf3_xm.$$$ is set to false. In some cases, however, the same Gaia source was found also in other external catalogues and eventually selected for Gaia-CRF3. In those cases, the name gaiadr3.gaia_crf3_xm.$$$_name of that Gaia source in catalogue ‘$$$’ is also given. To obtain all Gaia-CRF3 sources that appear in an external catalogue one should replace the last line in the previous ADQL query by ‘WHERE xm.$$$_name IS NOT NULL’. This helps to mitigate the effect of a lower quality astrometry in some of the external catalogues and improve the overall cross-match quality for those catalogues.

Appendix D Confusion sources in Gaia EDR3

In this appendix we give the number and distributions of sources in Gaia EDR3 that satisfy Eqs. (1)(3), ensuring that the Gaia astrometry is statistically compatible with the hypothesis that a source is extragalactic with zero parallax and proper motion. (As discussed in Sect. 2, we ignore the small effects of the Galactic acceleration and possible apparent proper motions induced by variable source structure, but take into account an overall parallax offset of -0.017 mas.) In this paper we call these the ‘confusion sources’, because most of them are in fact distant stars in our Galaxy or its satellite galaxies, and stellar contaminants resulting from erroneous cross-matching with external QSO catalogue (or from stellar contamination in these catalogues) can be expected to have similar characteristics as the confusion sources.

We give separate statistics for confusion sources with five- and six-parameter solutions in Gaia EDR3. For the five-parameter solutions, an explicit query of the Gaia EDR3 cata-logue7

SELECT * FROM gaiaedr3.gaia_source AS g
WHERE g.astrometric_params_solved = 31
AND abs((g.parallax + 0.017)/g.parallax_error) < 5
AND ( power(g.pmra/g.pmra_error,2)
+power(g.pmdec/g.pmdec_error,2)
−2*g.pmra_pmdec_corr*
g.pmra/g.pmra_error*
g.pmdec/g.pmdec_error)/
(1-power(g.pmra_pmdec_corr,2)) < 25

results in 30723 995 sources satisfying Eqs. (1)(3). Their sky density is shown in the top panel of Fig. D.1. Of those, 7090844 sources are in the Galactic zone with | sin b | < 0.1, thus violating the fourth criterion in Eq. (4). The remaining 23 633 151 confusion sources with five-parameter astrometric solutions in Gaia EDR3 satisfy all four criteria and could, based on their astrometry alone, be potential candidates for the Gaia-CRF.

A corresponding query of Gaia EDR3 for confusion sources with six-parameter solutions (obtained by changing the second line of the query to ‘ WHERE g.astrometric_params_solved = 95’) returns 182 814 959 sources. Their sky density is shown in the lower panel of Fig. D.1). 90 979 403 of the sources have | sin b | > 0.1 and thus satisfy all four criteria in Eqs. (1)(4).

Among the many features seen in Fig. D.1 we recognise the expected density distribution due to the overall distribution of Galactic stars and a strong over-density in the areas of the LMC and SMC. There are several more compact high-density areas outside the Galactic plane. A detailed (but not exhaustive) inspection of the density map revealed about 20 stellar (mostly globular) clusters and 10 (dwarf) galaxies on the maps. Not counting the LMC and SMC areas, the total number of sources in these clusters and galaxies is however only about 5000, which is ignored in the statistical considerations below. We note that the Sagittarius stream is clearly visible (mainly in the top panel), as well as numerous streaks of under-density related to the scanning law.

A prominent feature in Fig. D.1 is the relatively low density of sources with five-parameter solutions along the Galactic equator and in the centres of the Magellanic Clouds. This is caused by the difficulty to obtain reliable colour information from the BP and RP spectra on faint sources in these very crowded areas (Riello et al. 2021). As a result, the astrometric solution did not use a spectrophotometrically determined colour (effective wavenumber) for many of these sources, but instead determined a pseudo-colour as the sixth astrometric parameter.

Because the confusion sources in the LMC and SMC tend to have distinctly different kinematics from the Galactic stars, it is advantageous to separate out the sources in the LMC and SMC areas in the further analysis. Guided by the density maps, we take a circle of radius 9° centred on (α, δ) = (81.3°, −68.7°) to represent the LMC, and a circle of radius 6° centred on (α, δ) = (16.0°, −72.8°) to represent the SMC. These areas contain, respectively, 2 657 059 and 496520 confusion sources with five-parameter solutions (6 817 915 and 1 307 591 with six-parameter solutions). This simplistic selection does not pretend to any deeper physical significance, and only partially coincides with the carefully selected samples of Gaia EDR3 sources in the LMC and SMC discussed in Gaia Collaboration (2021c).

Figure D.2 shows the distribution of the confusion sources in galactic coordinates and magnitude, split by the kind of solution (five or six parameters) and further divided into four disjoint groups: sources in the LMC and SMC areas defined above, and for the remaining sources according to | sin b | ≷ 0.1. The confusion sources outside the LMC and SMC areas are called ‘Galactic’ below, although it is clear that this set contains also many extragalactic objects.

Figures D.3 and D.4 show the distributions of the normalised parallaxes and proper motions of the confusion sources separately for the Galactic sources (split by galactic latitude) and the LMC and SMC areas. The distributions are strongly non-Gaussian (with the possible exception of parallaxes in the LMC/SMC areas) and very different from those of the Gaia-CRF3 sources in Fig. 3. In Sect. 2, this difference was used to check the purity of the filtered sources in the external QSO catalogues and in the final Gaia-CRF3. The proper motions give the strongest indication of impurity. Interestingly, for the Galactic confusion sources there is no clear dependence according to | sin b | ≷ 0.1. On the other hand, the confusion sources in the LMC and SMC areas give rather different and distinct distributions. The distributions for the five- and six-parameter solutions are not drastically different and mainly reflect the generally higher uncertainties of the six-parameter sources, which makes their distributions slightly less non-Gaussian.

Selection of QSO-like objects in the Gaia catalogue, based exclusively on parallaxes and proper motions, is an interesting prospect already discussed in the literature (Heintz et al. 2015; 2018). Among the Gaia EDR3 sources with five-parameter solutions, there are 23.6 million satisfying Eqs. (1)(4), of which 1.2 million (5.1%) are also in Gaia-CRF3. Allowing that the actual number of QSOs among the five-parameter sources in Gaia EDR3 could be a factor two higher (cf. Sect. 3), we conclude that the purity of such a selection from Gaia EDR3 would be at most about 10% – too low to be of practical use, but still interesting and promising. In certain areas on the sky the percentage of true QSO-like sources in the sample is much higher. For example, excluding | b | < 20° and the LMC/SMC areas, there are 5 767 163 confusion sources, of which 1410314 (24.5%) are Gaia-CRF3 sources. The purity of such selections in future Gaia data releases will improve drastically thanks to the higher accuracy of parallaxes and (especially) proper motions (cf. Sect. 2.5).

thumbnail Fig. D.1

Distribution of the confusion sources in Gaia EDR3. These maps use a Hammer-Aitoff projection in galactic coordinates, with l = b = 0 at the centre, north up, and l increasing from right to left. The black lines bound the Galactic zone | sin b | < 0.1 where sources are normally excluded by Eq. (4). Top: sources with five-parameter solutions. Bottom: sources with six-parameter solutions.

thumbnail Fig. D.2

Distribution of confusion sources in (from left to right) galactic longitude, sine of galactic latitude, and magnitude. Top: sources with five-parameter solutions. Bottom: sources with six-parameter solutions. The selections indicated in the legend are further explained in the text.

thumbnail Fig. D.3

Histograms of the normalised parallaxes and proper motion components for the Galactic confusion sources. Top: sources with five-parameter solutions. Bottom: sources with six-parameter solutions. The different colours show the selections indicated in the legend.

thumbnail Fig. D.4

Histograms of the normalised parallaxes and proper motion components for the confusion sources in the LMC and SMC areas. Top: sources with five-parameter solutions. Bottom: sources with six-parameter solutions.

Appendix E Estimating the spin and orientation corrections

As described in Sect. 6.1 of Lindegren et al. (2012), the frame rotator is an integral part of AGIS with three main functions: (i) to estimate the spin and orientation corrections needed to bring the current astrometric solution onto the ICRS; (ii) to apply the estimated corrections to the astrometric parameters of the primary sources; and (iii) to apply the corresponding corrections to the attitude parameters. This appendix describes the algorithm used for the first function in the construction of Gaia EDR3.

The 2012 paper gave a stringent mathematical framework for all three functions and outlined how they could be implemented in AGIS. The frame rotator used for the first two releases of Gaia data followed these prescriptions quite rigorously, which however resulted in code that was unnecessarily complex and not very transparent. For example, it permitted the use of different reference epochs for the AGIS solution, comparison catalogue, and orientation parameters, while in practice this was never needed. It also foresaw the use of radio stars observed by VLBI to help align the reference frame, which is no longer considered necessary or even desirable. On the other hand, the details of the numerical algorithm used for the robust least-squares estimation (LSE) of the rotation parameters were not at all described in Lindegren et al. (2012), making it very difficult for outside users to reproduce how the frame rotator worked in those releases.

In 2018 it was decided to implement a new version of the frame rotator to be used for the third and subsequent data releases. The main changes with respect to the earlier version are as follows: only two kinds of sources are used for estimating the rotation parameters, namely QSO-like sources with high-accuracy external positions (the VLBI positions of ICRF3 S/X in Gaia EDR3), used for both orientation and spin, and QSO-like sources without accurate external positions, used only for the spin. The same reference epoch is used for rotation parameters as for the astrometric data (2016.0 for (E)DR3), which permits to separate completely the solution for the orientation from that of the spin. Furthermore, the observation equations are written directly in terms of the proper motions and position differences (Sect. E.1), rather than using the rigorous but less transparent formalism of Lindegren et al. (2012). The solution takes into account the correlation between the components in right ascension and declination of both Gaia and external data (Sect. E.2), and has a transparent scheme for the detection and treatment of outliers (Sect. E.3). Finally, the solution provides reasonable estimates of the statistical uncertainties of the rotation parameters also when the uncertainties of the input data are underestimated (Sect. E.4), and may optionally include additional terms, such as a glide in the proper motions or more general expansions of the proper motions and position differences (Sect. E.5).

All these changes concern the estimation algorithm, that is the first function of the frame rotator. The remaining two functions, that is the application of the corrections to the source and attitude data, are still done as described in Sects. 6.1.2 and 6.1.3 of the 2012 paper and are not further discussed here.

We denote by Q0 the set of QSO-like sources considered for the determination of the spin correction, of which the subset Q was actually used after elimination of outliers; similarly, R0 and R are the set of sources considered and used for the determination of the orientation. Although not required by the algorithm, we normally have R0Q0. The number of sources in each set is denoted n(Q0), etc. We use σ(x), ρ(x, y), and Cov(x, y) for the uncertainty, correlation coefficient, and covariance of the arbitrary parameters x and y. The asterisk in α* indicates that the cos δ factor is implicit, as in µα* = α cos δ. To distinguish between data in the current astrometric solution and the corresponding data in the external catalogue, we put an overline (-) on the latter. The prime ‘ denotes the matrix transpose.

Appendix E.1 Observation equations

Let ω = [ωX, ωY, ωZ]′ be the spin correction of the current solution, expressed by its components along the principal axes of the ICRS. (As indicated by the prime, ω is a column matrix.) In matrix form, the observation equations in proper motion for source i are Aiω=μi,${{\bf{A}}_i}\omega = {{\rm{\mu }}_i},$(E.1)

with coefficient matrix A=[ cosαisinδi+sinαisinδi+cosδi+sinαicosαi0 ]${\bf{A}} = \left[ {\matrix{{ - \cos {\alpha _i}\sin {\delta _i}} \hfill & { + \sin {\alpha _i}\sin {\delta _i}} \hfill & { + \cos {\delta _i}} \hfill \cr { + \sin {\alpha _i}} \hfill & { - \cos {\alpha _i}} \hfill & 0 \hfill \cr} } \right]$(E.2)

and right-hand side μi=[ μα*,iμδi ].${{\rm{\mu }}_i} = \left[ {\matrix{{{{\rm{\mu }}_{\alpha *,i}}} \cr {{{\rm{\mu }}_{\delta i}}} \cr} } \right].$(E.3)

Here, αi, δi, µα*i and µδi are the position and proper motion components of the source in the current solution, with reference epoch tref (= 2016.0 for Gaia EDR3). For the weight normalisation and decorrelation (Sect. E.2), we also need the covariance of the right-hand side in Eq. (E.3), Cov(μi)=  [ σ(μα*,i)2ρ(μα*,i,μδi)σ(μα*,i)σ(μδi)ρ(μα*,i,μδi)σ(μα*,i)σ(μδi)σ(μδi)2 ].$\matrix{{{\rm{Cov}}\left( {{{\rm{\mu }}_i}} \right) = } \hfill \cr {\quad \quad \left[ {\matrix{{\sigma {{\left( {{{\rm{\mu }}_{\alpha *,i}}} \right)}^2}} & {\rho \left( {{{\rm{\mu }}_{\alpha *,i}},{{\rm{\mu }}_{\delta i}}} \right)\sigma \left( {{{\rm{\mu }}_{\alpha *,i}}} \right)\sigma \left( {{{\rm{\mu }}_{\delta i}}} \right)} \cr {\rho \left( {{{\rm{\mu }}_{\alpha *,i}},{{\rm{\mu }}_{\delta i}}} \right)\sigma \left( {{{\rm{\mu }}_{\alpha *,i}}} \right)\sigma \left( {{{\rm{\mu }}_{\delta i}}} \right)} & {\sigma {{\left( {{{\rm{\mu }}_{\delta i}}} \right)}^2}} \cr} } \right].} \hfill \cr} $(E.4)

Similarly, let ε = [εX, εY, εZ]′ be the orientation correction at the reference epoch tref, expressed by its components in the ICRS. The linearised observation equations in position for source i are Aiε=Δi,${{\bf{A}}_i}{\bf{\varepsilon }} = {\Delta _i},$(E.5)

with the same coefficient matrix Ai as before. The right-hand side, Δi[ Δαi*Δδi ]=[ (αiα¯i)cosδiδiδ¯i ],${\Delta _i} \equiv \left[ {\matrix{{\Delta \alpha _i^*} \cr {\Delta {\delta _i}} \cr} } \right] = \left[ {\matrix{{\left( {{\alpha _i} - {{\bar \alpha }_i}} \right)\cos {\delta _i}} \cr {{\delta _i} - {{\bar \delta }_i}} \cr} } \right],$(E.6)

is the difference between the position (αi, δi) in the current solution and the position (α¯i,δ¯i)$\left( {{{\bar \alpha }_i},{{\bar \delta }_i}} \right)$ according to the external catalogue. We note that the external positions at tref (= 2016.0 for EDR3) should be used here. In particular, for Gaia EDR3 the galactic acceleration should be taken into account with the ICRF3 S/X positions as prescribed in Sect. 5.3 of Charlot et al. (2020). The covariance of ∆i in Eq. (E.6) is computed on the assumption that the positional errors of the current solution and of the external catalogue are independent: Cov(Δi)=Cov(αi*,δi)+Cov(α¯i*,δ¯i),${\rm{Cov}}\left( {{\Delta _i}} \right) = {\rm{Cov}}\left( {\alpha _i^*,{\delta _i}} \right) + {\rm{Cov}}\left( {\bar \alpha _i^*,{{\bar \delta }_i}} \right),$(E.7)

where Cov(αi*,δi)=  [ σ(αi*)2ρ(αi*,δi)σ(αi*)σ(δi)ρ(αi*,δi)σ(αi*)σ(δi)σ(δi)2 ],$\matrix{{{\rm{Cov}}\left( {\alpha _i^*,{\delta _i}} \right) = } \hfill \cr {\quad \quad \left[ {\matrix{{\sigma {{\left( {\alpha _i^*} \right)}^2}} & {\rho \left( {\alpha _i^*,{\delta _i}} \right)\sigma \left( {\alpha _i^*} \right)\sigma \left( {{\delta _i}} \right)} \cr {\rho \left( {\alpha _i^*,{\delta _i}} \right)\sigma \left( {\alpha _i^*} \right)\sigma \left( {{\delta _i}} \right)} & {\sigma {{\left( {{\delta _i}} \right)}^2}} \cr} } \right],} \hfill \cr} $(E.8)

and correspondingly for the external data. We note that σ(Δαi*)=σ(αi*)2+σ(α¯i*)2,$\sigma \left( {\Delta \alpha _i^*} \right) = \sqrt {\sigma {{\left( {\alpha _i^*} \right)}^2} + \sigma {{\left( {\bar \alpha _i^*} \right)}^2}} ,$(E.9) σ(Δδi)=σ(δi)2+σ(δ¯i)2,$\sigma \left( {\Delta {\delta _i}} \right) = \sqrt {\sigma {{\left( {{\delta _i}} \right)}^2} + \sigma {{\left( {{{\bar \delta }_i}} \right)}^2}} ,$(E.10) ρ(Δαi*,Δδi)=ρ(αi*,δi)σ(αi*)σ(δi)+ρ(α¯i*,δ¯i)σ(α¯i*)σ(δ¯i)σ(Δαi*)σ(Δδi).$\rho \left( {\Delta \alpha _i^*,\Delta {\delta _i}} \right) = {{\rho \left( {\alpha _i^*,{\delta _i}} \right)\sigma \left( {\alpha _i^*} \right)\sigma \left( {{\delta _i}} \right) + \rho \left( {\bar \alpha _i^*,{{\bar \delta }_i}} \right)\sigma \left( {\bar \alpha _i^*} \right)\sigma \left( {{{\bar \delta }_i}} \right)} \over {\sigma \left( {\Delta \alpha _i^*} \right)\sigma \left( {\Delta {\delta _i}} \right)}}.$(E.11)

The total number of observation equations considered for ω is 2n(Q0), and for ε it is 2n(R0). A feature of the algorithm described here is that the observation equations always come in pairs, as in Eqs. (E.1) and (E.5), and that the two components in a pair are always treated together. This simplifies the decorrelation of data (see below), and affects the way outliers are detected and removed. The resulting solution and statistics are independent of the coordinate system in which they are calculated.

Appendix E.2 Weight normalisation and decorrelation

Because the observation equations have different uncertainties (heteroscedasticity) and are pairwise correlated, they should be solved by the generalised least-squares method for optimality. Equivalently, the ordinary (unweighted) least-squares method can be applied to the normalised equations, KiAiω=Kiμi,${{\bf{K}}_i}{{\bf{A}}_i}\omega = {{\bf{K}}_i}{{\rm{\mu }}_i},$(E.12) LiAiε=LiΔi,${{\bf{L}}_i}{{\bf{A}}_i}\varepsilon = {{\bf{L}}_i}{{\rm{\Delta }}_i},$(E.13)

where Ki and Li are 2 × 2 matrices chosen in such a way that the transformed right-hand sides are uncorrelated and of unit variance. This condition does not define Ki and Li uniquely, since any orthogonal transformation of them will give the same result; the form used here is the inverse lower Cholesky factor of the covariance matrix, or Ki=[ 1σ(μα*,i)0ρ(μα*,i,μδi)σ(μα*,i)1ρ(μα*,i,μδi)21σ(μδi)1ρ(μα*,i,μδi)2 ]${{\bf{K}}_i} = \left[ {\matrix{{{1 \over {\sigma \left( {{{\rm{\mu }}_{\alpha *,i}}} \right)}}} & 0 \cr {{{ - \rho \left( {{{\rm{\mu }}_{\alpha *,i}},{{\rm{\mu }}_{\delta i}}} \right)} \over {\sigma \left( {{{\rm{\mu }}_{\alpha *,i}}} \right)\sqrt {1 - \rho {{\left( {{{\rm{\mu }}_{\alpha *,i}},{{\rm{\mu }}_{\delta i}}} \right)}^2}} }}} & {{1 \over {\sigma \left( {{{\rm{\mu }}_{\delta i}}} \right)\sqrt {1 - \rho {{\left( {{{\rm{\mu }}_{\alpha *,i}},{{\rm{\mu }}_{\delta i}}} \right)}^2}} }}} \cr} } \right]$(E.14)

and Li=[ 1σ(Δαi*)0ρ(Δαi*,Δδi)σ(Δαi*)1ρ(Δαi*,Δδi)21σ(Δδi)1ρ(Δαi*,Δδi)2 ].${{\bf{L}}_i} = \left[ {\matrix{{{1 \over {\sigma \left( {{\rm{\Delta }}\alpha _i^*} \right)}}} & 0 \cr {{{ - \rho \left( {{\rm{\Delta }}\alpha _i^*,{\rm{\Delta }}{\delta _i}} \right)} \over {\sigma \left( {{\rm{\Delta }}\alpha _i^*} \right)\sqrt {1 - \rho {{\left( {{\rm{\Delta }}\alpha _i^*,{\rm{\Delta }}{\delta _i}} \right)}^2}} }}} & {{1 \over {\sigma \left( {{\rm{\Delta }}{\delta _i}} \right)\sqrt {1 - \rho {{\left( {{\rm{\Delta }}\alpha _i^*,{\rm{\Delta }}{\delta _i}} \right)}^2}} }}} \cr} } \right].$(E.15)

It is readily verified that, as required, KiCov(µi)Ki = I and LiCov(∆i)Li = I, where I is the identity matrix.

Given the subsets Q, R of the accepted sources, the generalised least-squares solutions are obtained by minimising the sum of squared normalised residuals, X2(ω,Q)=iQ| KiμiKiAiω |2,${X^2}\left( {\omega ,Q} \right) = \sum\limits_{i \in Q} {{{\left| {{{\bf{K}}_i}{\mu _i} - {{\bf{K}}_i}{{\bf{A}}_i}\omega } \right|}^2},} $(E.16) X2(ε,R)=iR| LiΔiLiAiε |2.${X^2}\left( {\varepsilon ,R} \right) = \sum\limits_{i \in R} {{{\left| {{{\bf{L}}_i}{\Delta _i} - {{\bf{L}}_i}{{\bf{A}}_i}\varepsilon } \right|}^2}.} $(E.17)

This can be done by a range of standard numerical techniques, including orthogonal transformations and the use of normal equations (e.g. Lawson & Hanson 1995, Björck 1996). For the frame rotator, the normalised observation equations in Eqs. (E.12) and (E.13) are solved by QR decomposition, using Householder transformations.

Appendix E.3 Residual statistics and treatment of outliers

Standard least-squares fitting is sensitive to outliers, that is data points that do not reasonably fit the observation equations, considering the uncertainties and the typical scatter of residuals. Outliers must be expected among the external counterparts, as well as among the other QSO-like sources, for a variety of reasons. It is therefore essential that the fit is robust against outliers. The method adopted for the frame rotator is to identify the outliers that must be removed from the originally considered samples (Q0, R0) to create the subsets R and Q used in Eqs. (E.17) and (E.16). Two things should be noted concerning this process. Firstly, outliers are identified at source level, that is, a given source is either rejected in both coordinates (α and δ, or µα* and µδ), or accepted in both coordinates. This makes sense from a physical viewpoint, because a ‘bad’ source is probably bad (at some level) in both coordinates. Secondly, because ω and ε are treated separately, it happens that a source rejected in R (for the orientation) is accepted in Q (for the spin), or vice versa. Again, this could be motivated on physical grounds.

Details of the subsequent algorithm are exactly the same for Eqs. (E.16) and (E.17), and are therefore described in terms of the generic function X2(x,S)=iSXi2(x),${X^2}\left( {x,S} \right) = \sum\limits_{i \in S} {X_i^2\left( x \right),} $(E.18)

where Xi(x)=| diDix |.${X_i}\left( x \right) = \left| {{{\bf{d}}_i} - {{\bf{D}}_i}x} \right|.$(E.19)

Here, x could be either ω or ε, with S, d, and D correspondingly defined according to Eq. (E.16) or (E.17).

We note that the functions Xi(x), which can be computed for all the sources in S0, are a natural measure of the discrepancy of source i from the model for given parameter vector x. A useful overall statistic is the median discrepancy X0.5(x)=mediS0(Xi(x)),${X_{0.5}}\left( x \right) = \mathop {{\rm{med}}}\limits_{i \in {S_0}} \left( {{X_i}\left( x \right)} \right),$(E.20)

and a robust estimator could be to simply minimise this quantity. However, if the errors are approximately Gaussian, with a moderate fraction of outliers, robust least-squares estimation is more accurate and therefore preferred for the frame rotator. If x is the true parameter vector, and all the errors are Gaussian with nominal covariances, X0.5(x) is expected to be around ln41.774$\sqrt {\ln 4} \simeq 1.774$ (which is the median of the chi-distribution with two degrees of freedom).

The adopted solution x^${\hat x}$ and accepted subset S simultaneously satisfy the three conditions x^=argminxiSXi2(x),iS:Xi(x^)κX0.5(x^),iS0\S:Xi(x^)>κX0.5(x^) }$\left. {\matrix{{\hat x = \mathop {\arg \min }\limits_x \sum\limits_{i \in S} {X_i^2\left( x \right),} } \cr {\forall i \in S:{X_i}\left( {\hat x} \right) \le \kappa {X_{0.5}}\left( {\hat x} \right),} \cr {\forall i \in {S_0}{\rm{\backslash }}S:{X_i}\left( {\hat x} \right) > \kappa {X_{0.5}}\left( {\hat x} \right)} \cr} } \right\}$(E.21)

for the chosen clip limit κ ≥ 1 (see below). It is not obvious that these conditions lead to a unique solution; indeed, it is easy to construct examples where this is not the case, for example if S0 consists of disjoint subsets with distinctly different rotation parameters. However, these are somewhat contrived examples and in practice the non-uniqueness is hardly an issue. More relevant is the question whether a solution always exists – to which we do not know the answer. We take the pragmatic view that a unique solution ‘close enough’ to satisfying Eq. (E.21) can always be found by the following algorithm.

Start by provisionally accepting all the sources in S0, compute the least-squares estimate x^0${{\hat x}_0}$ by minimising X2(x, S0), and hence obtain the statistic X0.5(x^0)${X_{0.5}}\left( {{{\hat x}_0}} \right)$. In the next iteration, let S1 be the subset of sources satisfying Xi(x^0)κX0.5(x^0)${X_i}\left( {{{\hat x}_0}} \right) \le \kappa {X_{0.5}}\left( {{{\hat x}_0}} \right)$, leading to the estimate x^1${{\hat x}_1}$ minimising X2(x, S1) and the new statistic X0.5(x^1)${X_{0.5}}\left( {{{\hat x}_1}} \right)$, from which the subset S2 is constructed, and so on. This gives a sequence of subsets Sk, k = 0, 1, 2, … that may converge to a stable subset (i.e. Sk+1 = Sk for some k). If so, the stable subset and the corresponding estimate x^k${{\hat x}_k}$ is the desired solution. But even if the sequence does not converge, it will sooner or later be found that the same subsets reappear cyclically in the sequence, that is Sk+p = Sk, where p > 1 is the cycle period. In such a case any of the solutions in the cycle might be accepted. However, by adopting the rule that the sequence is stopped at the lowest k where Sk = Skp for some p > 0, we arrive at a unique result both for a converging sequence (p = 1) and in the cyclic case (p > 1).

To avoid having to save and compare the (potentially many and large) subsets Sk, we assign a fixed random number ri to each source in S0, and save from previous iterations only the outlier ‘checksum’ ok=iS0\Skri${o_k} = \sum\nolimits_{i \in {S_0}\backslash {S_k}} {{r_i}} $. The iteration stops as soon as ok = okp for some p > 0, at which point we adopt S = Sk and x^=x^k$\hat x = {{\hat x}_k}$ as the solution.

The clip limit κ is a dimensionless number, typically in the range 2-5, to be chosen by the researcher. A value κ ≥ 1 ensures that at most 50% of the sources in S0 are rejected as outliers, but in practice a higher value should always be used. In principle, an optimum clip level that minimises the expected uncertainty of x can be estimated by careful modelling of the actual error distribution in S0, but for the frame rotator we always use a fixed κ = 3. In the nominal case of purely Gaussian errors and no actually bad data, this will (erroneously) reject about 1% of the sources. This appears to be an acceptable price to pay for an estimation algorithm that is both efficient and very robust.

Appendix E.4 Uncertainty of the estimates

The covariance of x^${\hat x}$ is estimated as Cov(x^)=[ iDDDi ]1×f,${\rm{Cov}}\left( {\hat x} \right) = {\left[ {\sum\limits_{i \in D} {{\bf{D'}}{{\bf{D}}_i}} } \right]^{ - 1}} \times f,$(E.22)

where the first factor (the inverse of the weighted normal matrix) is the nominal covariance obtained on the assumption that the data uncertainties in Cov(∆i) and Cov(µi) are correctly estimated, and that no outlier rejection is applied. An empirical correction factor f > 0 takes into account that the data uncertainties may be systematically wrong (usually underestimated), and that the nominal covariance estimate is biased by the outlier rejection process even if the data uncertainties are correctly estimated.

In practice, the correction factor f must be derived from the residuals of the least-squares solution, or in our case from the statistics of Xi(x^)${X_i}\left( {\hat x} \right)$. An obvious possibility is to use the reduced chi-square (the square of the unit weight error u), u2=X2(x^,S)2n(S)m,${u^2} = {{{X^2}\left( {\hat x,S} \right)} \over {2n\left( S \right) - m}},$(E.23)

as an estimate of f, where n(S) is the number of accepted sources and m is the dimension of x (here, m = 3). This is accurate if n(S) is large (as it usually is in our applications), the fraction of outliers is very small, and the data uncertainties are biased by an approximately constant factor. However, when a significant fraction of the sources are rejected as outliers, the use of f = u2 tends to severely underestimate the uncertainties of x^${\hat x}$. A more reasonable correction factor should therefore consider the error distribution in the full sample S0, not just in the accepted subset S. Discrepancy quantiles such as X0.5 in Eq. (E.20) provide robust characterisation of the full error distribution.

Based on numerical simulations, we have adopted the following recipe for the correction factor to be used in Eq. (E.22): f=max[ u2,X0.52/(ln4) ].$f = \max \left[ {{u^2},X_{0.5}^2{\rm{/}}\left( {\ln 4} \right)} \right].$(E.24)

In the ideal case of Gaussian errors, correctly estimated data uncertainties, and a negligible fraction of outliers, we expect u2 ≃ 1 and X0.52ln4$X_{0.5}^2 \simeq \ln 4$, and hence f ≃ 1. If the data uncertainties are biased by a certain factor, both u and X0.5 scale by the same factor, and the recipe still works. For samples yielding a non-negligible fraction of outliers we normally have u2<X0.52/(ln4)${u^2} < X_{0.5}^2{\rm{/}}\left( {\ln 4} \right)$, and the higher factor then tends to give a more accurate estimate of the uncertainties, hence Eq. (E.24).

Appendix E.5 Optional parameters

The estimation as described above has only three unknowns, that is the components of ω or ε. The algorithm, including weight normalisation, decorrelation, and outlier treatment, is readily adapted to more general linear models, for example using VSH to represent higher-order distortions of the proper motions and position differences (Mignard & Klioner 2012), or including terms depending on the colour and magnitude of the source. However, while such more complex models are useful for investigating the properties of the final catalogue, they should normally not be used in the frame rotator.

One possible exception is the glide in proper motion, which is expected to be present on physical grounds. Already in Gaia EDR3 this effect is very clear, and found to be consistent with the expected Galactic acceleration of the solar system barycentre (Gaia Collaboration 2021b). The proper motion model implicit in Eq. (E.1) neglects this effect, which was deemed acceptable for the frame rotator in the first three releases of Gaia astrometry. For future releases this may not be accurate enough (since the effects of rotation and glide can be correlated and mutually disturb each other). One possible remedy is then to correct the observed proper motions in Eq. (E.3) for the assumed Galactic glide. However, a better option may be to estimate the glide g together with ω, using the augmented observation equations Aiω+Big=μi.${{\bf{A}}_i}\omega + {{\bf{B}}_i}{\bf{g}} = {{\rm{\mu }}_i}.$(E.25)

The estimation follows the generic scheme outlined above, minimising X2 = ∑iQ | diDix|2 with x=[ ωg ],di=Liμi,andDi=Li[ AiBi ].$\matrix{{x = \left[ {\matrix{\omega \cr g \cr} } \right]} & {{{\bf{d}}_i} = {{\bf{L}}_i}{{\rm{\mu }}_i},} & {{\rm{and}}} & {{{\bf{D}}_i} = {{\bf{L}}_i}\left[ {\matrix{{{{\bf{A}}_i}} & {{{\bf{B}}_i}} \cr} } \right].} \cr} $(E.26)

It should be noted that the application of the spin and orientation corrections to the source and attitude data, that is functions (ii) and (iii) of the frame rotator, is in any case restricted to the pure rotations given by ω and ε. In particular, the published source parameters are never corrected for the (assumed or estimated) glide in proper motion.

References

  1. Andrei, A. H., Souchay, J., Zacharias, N., et al. 2009, A&A, 505, 385 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  2. Arias, E. F., Charlot, P., Feissel, M., & Lestrade, J.-F. 1995, A&A, 303, 604 [Google Scholar]
  3. Assef, R. J., Stern, D., Noirot, G., et al. 2018, ApJS, 234, 23 [Google Scholar]
  4. Bailer-Jones, C. A. L., Andrae, R., Arcay, B., et al. 2013, A&A, 559, A74 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  5. Bailer-Jones, C. A. L., Fouesneau, M., & Andrae, R. 2019, MNRAS, 490, 5615 [CrossRef] [Google Scholar]
  6. Björck, Å. 1996, Other Titles in Applied Mathematics, Numerical Methods for Least Squares Problems (Philadelphia, PA: Society for Industrial and Applied Mathematics), 102 [Google Scholar]
  7. Bonato, M., Liuzzo, E., Herranz, D., et al. 2019, MNRAS, 485, 1188 [NASA ADS] [CrossRef] [Google Scholar]
  8. Cantat-Gaudin, T., & Brandt, T. D. 2021, A&A, 649, A124 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  9. Chang, Y. L., Arsioli, B., Giommi, P., & Padovani, P. 2017, A&A, 598, A17 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  10. Charlot, P., Jacobs, C. S., Gordon, D., et al. 2020, A&A, 644, A159 [EDP Sciences] [Google Scholar]
  11. Croom, S. M., Smith, R. J., Boyle, B. J., et al. 2004, MNRAS, 349, 1397 [NASA ADS] [CrossRef] [Google Scholar]
  12. de Bruijne, J. H. J., Allen, M., Azaz, S., et al. 2015, A&A, 576, A74 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  13. Delchambre, L. 2018, MNRAS, 473, 1785 [NASA ADS] [CrossRef] [Google Scholar]
  14. ESA. 1997, The Hipparcos and Tycho Catalogues, ESA SP-1200 [Google Scholar]
  15. Flesch, E. W. 2019, ArXiv e-prints [arXiv:1912.05614] [Google Scholar]
  16. Fricke, W. 1985, Celest. Mech., 36, 207 [NASA ADS] [CrossRef] [Google Scholar]
  17. Fu, Y., Wu, X.-B., Yang, Q., et al. 2021, ApJS, 254, 6 [NASA ADS] [CrossRef] [Google Scholar]
  18. Gaia Collaboration (Prusti, T., et al.) 2016, A&A, 595, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  19. Gaia Collaboration (Mignard, F., et al.) 2018, A&A, 616, A14 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  20. Gaia Collaboration (Brown, A. G. A., et al.) 2021a, A&A, 649, A1 (Gaia EDR3 SI) [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  21. Gaia Collaboration (Klioner, S. A., et al.) 2021b, A&A, 649, A9 (Gaia EDR3 SI) [EDP Sciences] [Google Scholar]
  22. Gaia Collaboration (Luri, X., et al.) 2021c, A&A, 649, A7 (Gaia EDR3 SI) [EDP Sciences] [Google Scholar]
  23. Heintz, K. E., Fynbo, J. P. U., & Høg, E. 2015, A&A, 578, A91 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  24. Heintz, K. E., Fynbo, J. P. U., Høg, E., et al. 2018, A&A, 615, L8 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  25. Heintz, K. E., Fynbo, J. P. U., Geier, S. J., et al. 2020, A&A, 644, A17 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  26. Hobbs, D., Brown, A., Høg, E., et al. 2021, Exp. Astron., 51, 783 [NASA ADS] [CrossRef] [Google Scholar]
  27. Johnson, M., Schinzel, F., Darling, J., et al. 2019, BAAS, 51, 273 [NASA ADS] [Google Scholar]
  28. Lambert, S., Liu, N., Arias, E. F., et al. 2021, A&A, 651, A64 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  29. Lawson, C., & Hanson, R. 1995, Classics in Applied Mathematics, Solving Least Squares Problems (Philadelphia, PA: Society for Industrial and Applied Mathematics), 15 [Google Scholar]
  30. Lindegren, L., Lammers, U., Hobbs, D., et al. 2012, A&A, 538, A78 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  31. Lindegren, L., Bastian, U., Biermann, M., et al. 2021a, A&A, 649, A4 [EDP Sciences] [Google Scholar]
  32. Lindegren, L., Klioner, S. A., Hernández, J., et al. 2021b, A&A, 649, A2 [EDP Sciences] [Google Scholar]
  33. Liu, N., Lambert, S. B., Charlot, P., et al. 2021, A&A, 652, A87 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  34. Lunz, S., Anderson, J., Xu, M. H., et al. 2021, in EGU General Assembly Conference Abstracts, EGU21-8604 [Google Scholar]
  35. Malkin, Z. 2018, ApJS, 239, 20 [Google Scholar]
  36. Massaro, E., Maselli, A., Leto, C., et al. 2015, Ap&SS, 357, 75 [Google Scholar]
  37. Mignard, F., & Klioner, S. 2012, A&A, 547, A59 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  38. Mignard, F., Klioner, S., Lindegren, L., et al. 2016, A&A, 595, A5 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  39. Paris, I., Petitjean, P., Aubourg, É., et al. 2018, A&A, 613, A51 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  40. Perryman, M. A. C., Lindegren, L., Kovalevsky, J., et al. 1997, A&A, 500, 501 [NASA ADS] [Google Scholar]
  41. Petrov, L., Kovalev, Y. Y., & Plavin, A. V. 2019, MNRAS, 482, 3023 [Google Scholar]
  42. Plavin, A. V., Kovalev, Y. Y., & Petrov, L. Y. 2019, ApJ, 871, 143 [Google Scholar]
  43. Riello, M., De Angeli, F., Evans, D. W., et al. 2021, A&A, 649, A3 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  44. Schiff, L. I. 1964, Rev. Mod. Phys., 36, 510 [NASA ADS] [CrossRef] [Google Scholar]
  45. Secrest, N. J., Dudik, R. P., Dorland, B. N., et al. 2015, ApJS, 221, 12 [NASA ADS] [CrossRef] [Google Scholar]
  46. Shu, Y., Koposov, S. E., Evans, N. W., et al. 2019, MNRAS, 489, 4741 [Google Scholar]
  47. Souchay, J., Gattano, C., Andrei, A. H., et al. 2019, A&A, 624, A145 [EDP Sciences] [Google Scholar]
  48. Taris, F., Damljanovic, G., Andrei, A., et al. 2018, A&A, 611, A52 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  49. van Leeuwen, F. 2007, A&A, 474, 653 [CrossRef] [EDP Sciences] [Google Scholar]
  50. Yao, S., Wu, X.-B., Ai, Y. L., et al. 2019, ApJS, 240, 6 [NASA ADS] [CrossRef] [Google Scholar]

1

Some further acronyms used in this paper: EDR3 – Gaia Early third Data Release; DR3 – Gaia third Data Release; QSO – quasi stellar object (here understood as all compact extragalactic sources); AGN – active galactic nuclei; ICRS – International Celestial Reference System; ICRF3 – third realisation of the International Celestial Reference Frame; Gaia-CRFn – nth realisation of the Gaia Celestial Reference Frame; VLBI – very long baseline interferometry; S/X, K, X/Ka (after ICRF3) denote the radio frequency bands used in VLBI.

3

The robust scatter estimate (RSE) is defined as [ 22erf1(4/5) ]10.390152$\eqalign{& {\left[ {2\sqrt 2 {\rm{er}}{{\rm{f}}^{ - 1}}\left( {4/5} \right)} \right]^{ - 1}} \approx 0.390152 \cr & X_\Delta ^2\quad X_\Delta ^2\quad \left( {{{\bar \alpha }_i},{{\bar \delta }_i}} \right)\quad \sqrt {\ln 4} \simeq 1.774\quad \hat x\quad {{\hat x}_0}\quad {X_{0.5}}\left( {{{\hat x}_0}} \right)\quad {X_i}\left( {{{\hat x}_0}} \right) \le \kappa {X_{0.5}}\left( {{{\hat x}_0}} \right)\quad {{\hat x}_1}\quad {X_{0.5}}\left( {{{\hat x}_1}} \right)\quad {{\hat x}_k} \cr} $ times the difference between the 90th and 10th percentiles of the distribution of the variable. For a Gaussian distribution it equals the standard deviation. The RSE is widely used in Gaia as a standardised, robust measure of dispersion.

4

The quantity XΔ2$X_\Delta ^2$ is important also for another reason: as explained in Appendix E we consider the optical and radio frames to be aligned if the sum of XΔ2$X_\Delta ^2$ over the ‘good’ sources is minimal; cf. Eq. (E.18). The equivalent statistic was introduced by Mignard et al. (2016) already for the analysis of the reference frame in Gaia DR1; see their Eqs. (1) and (3)(5).

6

Table gaiadr3.gaia_crf3_xm will be published in the Gaia Archive together with Gaia DR3. Alternatively, the table can be downloaded from https://www.cosmos.esa.int/web/gaia/gaia-crf3-cross-match-table and uploaded to the Gaia Archive as a user table. In that case the table name in the ADQL query should be modified accordingly.

7

The queries discussed in this Appendix are costly and result in very big tables. To cope with the technical limitations, one can either split the results in chunks by using some conditions on random_index (the random index assigned for each source in the Gaia Archive) or execute the equivalent queries offline using a local copy of the whole Gaia EDR3 catalogue.

All Tables

Table 1

Statistics of the cross-matches of Gaia EDR3 to external QSO catalogues used in the construction of Gaia-CRF3.

Table 2

Characteristics of the Gaia-CRF3 sources.

All Figures

thumbnail Fig. 1

Relative frequencies of Xμ2$X_{\rm{\mu }}^2$ for the filtered matches to the catalogues in Table 1. The catalogues are identified by the codes given in column 6 of the table, with the eight larger catalogues in the upper panel and the smaller catalogues in the lower panel. The four radio catalogues (ICRF3 SIX, ICRF3 K, ICRF X/Ka, and OCARS) are combined under ‘radio’, and the three blasar catalogues (Roma-BZCAT, 2WHSP, and ALMA calibrators) are combined under ‘blasars’.

In the text
thumbnail Fig. 2

Relative frequencies of Xμ2$X_{\rm{\mu }}^2$ for the final selection of Gaia-CRF3 sources, and for the filtered matches to C75, Milliquas v6.5, and Gaia-unWISE that were dropped in the final selection.

In the text
thumbnail Fig. 3

Distributions of the normalised parallaxes and proper motion components for the Gaia-CRF3 sources with five-parameter (blue) and six-parameter (magenta) solutions. The turquoise curves show the corresponding best-fit Gaussian distributions.

In the text
thumbnail Fig. 4

Sky distribution of the Gaia-CRF3 sources. The plot shows the density of sources per square degree computed from the source counts per pixel using HEALPix of level 6 (pixel size ≃0.84 deg2). The magnitude distribution and further properties of the sources are shown on Fig. 6. This full-sky map uses a Hammer-Aitoff projection in galactic coordinates, with l = b = 0 at the centre, north up, and l increasing from right to left.

In the text
thumbnail Fig. 5

Distribution of the Gaia-CRF3 sources in galactic coordinates. There are only 195 sources in the avoidance zone | sin b| < 0.1. The distribution is shown for the whole Gaia-CRF3 and separately for the sources with five- and six-parameter astrometric solutions.

In the text
thumbnail Fig. 6

Histograms of some important characteristics of the Gaia-CRF3 sources with five-parameter solutions (blue) and six-parameter solutions (magenta). Left to right: G magnitudes, colour index GBPGRP, and the astrometric quality indicator RUWE.

In the text
thumbnail Fig. 7

Median G magnitude (top) and median colour index GBPGRP (bottom) of the Gaia-CRF3 sources. These maps use a Hammer-Aitoff projection in galactic coordinates, with l = b = 0 at the centre, north up, and l increasing from right to left.

In the text
thumbnail Fig. 8

Positional uncertainty and source density distribution in Gaia-CRF3. Left: histogram of the positional uncertainty σpos,max for Gaia-CRF3 sources with five-parameter solutions (blue) and six-parameter solutions (magenta). Middle: scatter plot of σpos,max versus the G magnitude. The dashed curve is the smoothed median uncertainty versus magnitude. The green points are the sources matched to ICRF3 S/X. Right: fraction of sky covered by a given minimum density of Gaia-CRF3 sources at different levels of positional uncertainty (see text for further explanation).

In the text
thumbnail Fig. 9

Sky distributions of the parallax ϖ and the components of proper motion µα* and µδ. Median values over the pixels of HEALPix level 6 (pixel size ≃0.84 deg2) are shown. These maps use a Hammer–Aitoff projection in galactic coordinates, with l = b = 0 at the centre, north up, and l increasing from right to left.

In the text
thumbnail Fig. 10

Corrected mean toroidal and spheroidal powers C^t$\hat C_\ell ^t$ and C^s$\hat C_\ell ^s$ of the systematics in proper motions of the Gaia-CRF3 sources with five-parameter solutions. The error bars show the ±1 standard deviation of each point, calculated on the assumption that the 2 + 1 contributing VSH coefficients have Gaussian errors.

In the text
thumbnail Fig. 11

Corrected cumulative angular power spectrum of the systematic in proper motions of the Gaia-CRF3 sources with five-parameter solutions.

In the text
thumbnail Fig. 12

Distribution of the ICRF3 sources in Gaia-CRF3 in Galactic coordinates. Sources from ICRF3 S/X are shown by red dots, from ICRF3 K by blue crosses, and from ICRF3 X/Ka by turquoise squares. This map uses a Hammer–Aitoff projection in galactic coordinates, with l = b = 0 at the centre, north up, and l increasing from right to left.

In the text
thumbnail Fig. 13

Distribution of magnitudes and formal positional uncertainties for the sources common to Gaia-CRF3 and ICRF3 S/X. The full set of 3142 sources is shown by the grey histograms, and the 259 defining sources by the red histograms. Left: distribution of magnitudes G. Middle: distribution of positional uncertainties in Gaia EDR3. There are 126 sources with σpos,maxEDR3>1$\sigma _{{\rm{pos,max}}}^{{\rm{EDR3}}} > 1$ mas, of which eight are defining sources in ICRF3. Right: distribution of positional uncertainties in ICRF3 S/X. There are 155 sources with σpos,maxICRF3>1$\sigma _{{\rm{pos,max}}}^{{\rm{ICRF3}}} > 1$ mas, of which one is a defining source.

In the text
thumbnail Fig. 14

Positional differences in α and δ between Gaia-CRF3 and ICRF3 S/X for the 3142 common sources. The 259 defining sources in ICRF3 are shown as filled red circles, other sources as black crosses. Top: differences in mas. Errors bars representing the quadratically combined uncertainties in the two catalogues are shown only for the defining sources. Bottom: same differences divided by the quadratically combined uncertainties in the two catalogues. No error bars are shown in this plot, as they would all be ±1 unit.

In the text
thumbnail Fig. 15

Histograms of the separations Δ between Gaia-CRF3 and ICRF3 S/X for the full set of 3142 common sources (blue; the median is 0.516 mas) and the 259 defining sources (red; the median is 0.292 mas).

In the text
thumbnail Fig. 16

Positional differences between Gaia-CRF3 and ICRF3 S/X versus a (top) and sin δ (bottom). Defining sources in ICRF3 are shown as filled red circles, with errors bars representing the combined positional uncertainties in the two catalogues; other sources are shown as black crosses.

In the text
thumbnail Fig. 17

Histograms of the normalised separations XΔ2$X_\Delta ^2$ between Gaia-CRF3 and ICRF3 S/X for the full set of 3142 common sources (blue) and the 259 defining sources (red). The dashed curves are the exponential functions described in the text. A total of 195 sources have XΔ > 10 and fall outside this plot (10 of them are defining sources in ICRF3).

In the text
thumbnail Fig. B.1

Distribution of the sources in the external catalogues listed in Table 1. These maps use a Hammer-Aitoff projection in galactic coordinates, with l = b = 0 at the centre, north up, and l increasing from right to left.

In the text
thumbnail Fig. D.1

Distribution of the confusion sources in Gaia EDR3. These maps use a Hammer-Aitoff projection in galactic coordinates, with l = b = 0 at the centre, north up, and l increasing from right to left. The black lines bound the Galactic zone | sin b | < 0.1 where sources are normally excluded by Eq. (4). Top: sources with five-parameter solutions. Bottom: sources with six-parameter solutions.

In the text
thumbnail Fig. D.2

Distribution of confusion sources in (from left to right) galactic longitude, sine of galactic latitude, and magnitude. Top: sources with five-parameter solutions. Bottom: sources with six-parameter solutions. The selections indicated in the legend are further explained in the text.

In the text
thumbnail Fig. D.3

Histograms of the normalised parallaxes and proper motion components for the Galactic confusion sources. Top: sources with five-parameter solutions. Bottom: sources with six-parameter solutions. The different colours show the selections indicated in the legend.

In the text
thumbnail Fig. D.4

Histograms of the normalised parallaxes and proper motion components for the confusion sources in the LMC and SMC areas. Top: sources with five-parameter solutions. Bottom: sources with six-parameter solutions.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.