Gaia Data Release 3 Cross-match of Gaia sources with variable objects from the literature

Context. In the current ever increasing data volumes of astronomical surveys, automated methods are essential. Objects of known classes from the literature are necessary for training supervised machine learning algorithms, as well as for veriﬁcation / validation of their results. Aims. The primary goal of this work is to provide a comprehensive data set of known variable objects from the literature cross-matched with Gaia DR3 sources, including a large number of both variability types and representatives, in order to cover as much as possible sky regions and magnitude ranges relevant to each class. In addition, non-variable objects from selected surveys are targeted to probe their variability in Gaia and possible use as standards. This data set can be the base for a training set applicable in variability detection, classiﬁcation, and validation. Methods. A statistical method that employed both astrometry (position and proper motion) and photometry (mean magnitude) was applied to selected literature catalogues in order to identify the correct counterparts of the known objects in the Gaia data. The cross-match strategy was adapted to the properties of each catalogue and the veriﬁcation of results excluded dubious matches. Results. Our catalogue gathers 7841723 Gaia sources among which 1.2 million non-variable objects and 1.7 million galaxies, in addition to 4.9 million variable sources representing over 100 variability (sub)types. Conclusions. This data set served the requirements of Gaia ’s variability pipeline for its third data release (DR3), from classiﬁer training to result validation, and it is expected to be a useful resource for the scientiﬁc community that is interested in the analysis of variability in the Gaia data and other surveys.


Introduction
Variable stars have been proven extremely useful tool to investigate a diverse set of astronomical problems.Their variability properties allowed us to measure physical quantities such as dise-mail: panagiotis.gavras@esa.intour knowledge on the early universe.Thus, since the early days scientists have started to register and classify sources that appear to be variable.Over the years the number of known variables and the number of (sub)types of variability have increased significantly.GCVS (Samus' et al. 2017) has been one of the first catalogues of variable stars started in 1946.The American Association of Variable Star Observers (AAVSO) maintain the international variable star index (VSX; Watson et al. 2006) that in its latest version contains more than 2.1 million objects.The advance of modern astronomy allowed the identification of variable sources by large-scale surveys.All-Sky Automated Survey (ASAS; Pojmanski 2002), All-Sky Automated Survey for Supernovae (ASAS-SN; Shappee et al. 2014;Jayasinghe et al. 2018Jayasinghe et al. , 2019a,b),b), the Optical Gravitational Lensing Experiment (OGLE; Udalski et al. 2015), the Catalina Real-Time Transient Survey (Drake et al. 2014b), Zwicky Transient Facility (ZTF; Graham et al. 2019) and Gaia (Clementini et al. 2016;Rimoldini et al. 2019a;Clementini et al. 2019) are only some projects that have increased significantly the number of known variables.
The Gaia consortium released 3194 variable stars of 2 variability types in its first data release (DR1; Eyer et al. 2017), which increased to 550 737 variables and 6 types in DR2 (Holl et al. 2018), and to 13 million and 30 (sub)types including galaxies in DR3 (Eyer et al. 2022).Moreover, it is foreseen that the increase in the number of variables will continue in DR4 by an order of magnitude.This abundance of data has made the need of automated methods of detection and classification of sources imperative.Thus, most of the modern all-sky surveys use some type of machine learning method for the identification of variables.Supervised machine learning methods use a labelled set of known variables (usually from the literature) in order to train classifiers.The creation of an unbiased training set is a challenging task.It needs to have a large number of sources adequately covering all variability classes aimed by the project, in order to be able to select training sources that do not suffer from selection biases, e.g., in the distribution in the sky or by incomplete coverage of magnitudes.It may also include contaminants, where in the case of variable sources can be non variable or other types of objects that exhibit artificial variability.Details on artificial variability in Gaia can be found in Holl et al. (2022).
Producing an optical catalogue by cross-matching many input catalogues, with data in the radio, mid and near-infrared, optical, and X-ray bands, is not a straightforward task.Each catalogue has its own unique properties, such as astrometric and photometric qualities, observational bands, and with different needs of propagation of proper motion (when available), depending on object distance and observational time difference (i.e.different survey epoch), which need to be fine tuned, and some fraction of mismatches becomes inevitable.In the case of Gaia, a crossmatch with external catalogues was provided in all data releases (e.g., see Marrese et al. 2019 and online documentation), but their focus were not variable objects, leaving the vast majority of the known variables unmatched.
The variability processing of Gaia employed data sets from literature to train its classifiers.Cross-match techniques varied in each data release but their results were not published before.DR1 was limited to two variability types and a specific region in the sky (Eyer et al. 2017), for which 7 literature catalogues were cross-matched with Gaia using a random forest classifier (Rimoldini et al. 2019b).In DR2, astrometry was combined with transformed photometry and time series features to create a multi-dimensional distance, which was used to match 70 catalogues from the literature (Rimoldini et al. 2019a, online documentation) with Gaia sources.Machine learning supervised classification and special variability detection in the third data release of Gaia contains ∼10.5 million variables sources and 24 different classes, which required a larger and more diverse training data set.The base of this training set is our cross-match catalogue.In this first publication of the cross-match catalogue we cross-matched the sources found in a selection of 152 catalogues with Gaia results.Our catalogue contains 7.8 million unique objects.
This paper presents the method, the results and the caveats of the cross-match between the 152 catalogues and Gaia DR3 sources.We describe the creation of this data set in Sect. 2. Section 3 presents the properties of the produced catalogue.We discuss the properties of selected variability types in Sect. 4 indicating the overall quality of the catalogue.Section 5 shows an effort to identify stars that are the least variable and conclusions are in Sect.6.The cross-match catalogue is made available exclusively online through the Centre de Données astronomiques de Strasbourg website1 .

Input catalogues selection
There are many interesting catalogues that we could select for this work.However as the idea was to create a large data set with many variability types, we used well-known diverse catalogues that contain various variability types.Also we selected smaller catalogues of objects of particular interest or of rare variability types.Finally we assembled a list of 152 different input catalogues.Some of these were compiled and used internally by Gaia Data Processing and Analysis Consortium (DPAC) members.
In order to facilitate the identification and basic properties of each catalogue, we constructed and used an informative catalogue label.This label is derived from the mission, survey or compilation name, the type of targets that the catalogue contains, the name of first author (or the person who compiled it), and the date of publication.We use this label throughout the rest of the paper.
All input catalogues are listed alphabetically in Table 1: the first column provides the catalogue label, the second column presents the number of stars finally cross-matched with Gaia sources, and the last column lists the references for each catalogue.The selection of the literature catalogues is limited to those published before 2021 with only exception EROSITA_AGN_LIU_2021 (Liu et al. 2021).
In addition to variable sources, the cross-match catalogue includes a limited number of non-varying sources according to surveys with similar precision to Gaia (named constants hereafter), for use e.g. in variability detection or to capture objects with insufficient or corrupt variability.The HIPPARCOS_VAR_ESA_1997 (ESA 1997) and SDSS_CST_IVEZIC_2007 (Ivezić et al. 2007) catalogues are the main providers of non-varying objects, but the former lacks faint objects and the latter misses bright sources and is limited to the SDSS Stripe 82 footprint.Given the gap in magnitude (12 < G < 14) from these two catalogues and the nonrepresentative distribution in the sky for faint objects, two new catalogues of constant stars were created using data from TESS (Ricker et al. 2015), to fill the magnitude gap, and ZTF (Masci et al. 2019), for improved sky distribution.This effort it is not the main focus of this paper and it is is described in Sect. 5.

The pipeline
The pipeline we built to identify the correct counterpart of an input source is divided in two major parts.The fist part is performing a positional cross-match of each source in a literature catalogue with the Gaia DR3 sources and the second is the cleaning of the results of the first part from false identifications.

Positional cross-match
The cross-match of an input catalogue with the sources of Gaia DR3 was performed in the database deployed at the data processing centre of Geneva (DPCG) at the homonym observatory.This process was divided in two steps to facilitate processing.The first step was to make a simple cone search with a radius typically of 1 around the coordinates of each source existing in an input catalogue to Gaia sources.The large radius was used to cover the positional uncertainties and most of the proper motion effects while keeping the computational load low and speed up the cross-match.The second step was to make a cone search with a radius of 5 applying epoch propagation of the positions using the relevant function of the Q3C library (Koposov & Bartunov 2006) and Gaia proper motions.This way the cross-match was fine-tuned in a fraction of sources instead of the ∼1.8 billion sources in Gaia DR3.
The radius of the cross-match was adjusted in some catalogues to larger or smaller values, e.g., HIPPAR-COS_VAR_ESA_1997 (ESA 1997) to a larger value to take into account the proper motion effect that is more evident due to its bright magnitude limit (including mostly nearby stars) and the large difference in time of observations.For the majority of catalogues, we were able to find the date of observations and perform epoch propagation of the positions.The epoch used was the mean epoch of observations.However, epoch propagation was not applied to catalogues that were compilations of papers or few others for which we were unable to identify the date of observations.

Cleaning of false identifications
The results of the positional cross-match may return a large number of candidate counterparts, depending on the properties of the input catalogue.For the second part of the pipeline, to further refine from the many candidates, we selected matches using a synthetic distance metric ρ synth that combines angular sky separation and photometric differences: where ∆θ denotes the angular distance and ∆mag is the magnitude difference between Gaia and a given survey.Medians and median absolute deviations (MAD) are computed on all neighbours within 5 (or the adapted value used) radius of the targeted sources.The Gaia magnitudes used in this process come directly from the photometry and have not passed through the data cleaning process of the variability detection described in (Eyer et al. 2022).
With respect to the DR2 approach, we reduced the complexity for DR3 (excluding time series features and photometric transformations) to favour the use of a robust and consistent ρ synth for most catalogues, at the cost of a loss in precision of ∆mag when comparing stars of multiple spectral types in G versus other bands.For example, the different wavelength coverage of the OGLE I band with respect to Gaia G causes redder objects to be brighter in I than in G.When this is combined with a catalogue that includes both blue and red objects, as for eclipsing binaries, the uncorrected photometric comparison of main sequence stars and red giants may form even separate ∆mag clumps of valid counterparts.Although in general the selection of matches was conservatively biased towards the clump associated with the smallest ρ synth , correct matches from secondary clumps could be recovered by sources overlapping with other catalogues (more specific to the group of missed matches, or more generic and thus with a larger MAD of ∆mag).Consequently, the completeness of the cross-match for a given catalogue can be larger than it appears from the simple association of sources with a catalogue.
The distance ρ synth takes into account the angular distance of all sources in the table and the difference in magnitude between the input source and the Gaia source in the G band.Of course, most of the catalogues contain photometry in different filters than G and some are in multiple bands (in which cases only one of the bands was used, typically the most similar to G or the most sampled one).For catalogues that are compilations of often many data sets, like the VSX, each source may have different positional precision or use different filters, so the quality of the cross-match cleaning is degraded as the efficiency of a common ρ synth is reduced.Moreover, a number of catalogues was cross-matched when the Gaia DR3 photometry was not available yet, so the DR2 photometry was used instead.With the above constraints it is clear that the values of ρ synth depend on each individual input catalogue and are not comparable between catalogues, therefore, a universal constraint in ρ synth cannot be set.
Applying a threshold on ρ synth reduces the number of multiple identifications however there were some left so a final cleaning was applied.The selection of the best match among other matches for the same target used the lowest value of ρ synth or angular distance, depending on the catalogue.Finally, the sources flagged as astrometric duplicated source were rejected (these sources are not published in the Gaia archive either).
Figures 1-3 present examples with data taken from the processing step 2 of the CATALINA_VAR_DRAKE_2017 (Catalina Surveys Southern periodic variable star catalogue, Drake et al. 2017).This catalogue contains 37 745 variable sources.After the end of positional cross-match, we obtained ∼45 000 candidate counterparts.Figure 1 shows the distribution of the angular distance between all targets and their potential counterpart.The output of part 1 with a maximum angular distance of 5 includes obvious mismatches, which are removed by setting an upper limit to ρ synth .Figure 2 shows the distribution of the ρ synth .The abscissa is in log-scale to facilitate the clarity of the plot.This plot led to the selection of the cut off value ρ synth = 6.The difference in photometric magnitudes between Gaia and Catalina is shown in Fig. 3, where the Gaia magnitude of the candidate Gaia matches are in the horizontal axis and the corresponding Catalina magnitude is given in the vertical axis.The Gaia sources selected by the ρ synth < 6 constraint are coloured red and ∼36 900 sources left.The next step is to reduce any remaining multiple matches to a single one.In this example, less than 200 sources are multiple matches and we kept those with the lowest value of ρ synth .The final cross-matched catalogue for Catalina Surveys Southern periodic variable star catalogue contains 36 584 sources.

Assembly of the final catalogue
After cross-matching of all of the individual catalogues, we merged the per-catalogue results to form a single cross-match data set.It is expected that many of the input catalogues over-lap and some of the these sources may appear in several input catalogues with different information, as name, variability type, or their variability period.In order to guide users towards the most likely class and period, we defined an approximate catalogue ranking list.This was not a perfect solution as catalogue classifications may be more accurate for some types of objects rather than other ones.Catalogues that did not overlap with others had no reason to compete in ranking so their relative position is not meaningful and could go to any place.Table 2 shows the rank-ordered list of literature catalogues, where generally the higher a catalogue is in the list, the more accurate it is.
Source matches of multiple catalogues to the same Gaia source identifiers were merged.The resulting cross-match catalogue contains one Gaia source per row and information from all relevant catalogues, sorted according to their rank.For convenience, information from the highest ranked catalogue, for a given source, are replicated in single-element 'primary' fields (like primary_var_type and primary_period, see Table 4).
During the assembly of the catalogue, we tried to homogenize the labels of the variability classes used in the literature.So the often different literature class labels for the same types were made homogeneous following the nomenclature used by the AAVSO,2 except for a few exceptions (e.g., SARG, OSARG, GTTS, IMTTS).Some type labels were relabelled as 'OMIT' as primary class, because they were too generic, uncertain, or with insufficient variability characterization, and thus should be omitted from training or completeness and purity estimates.There is a large number of literature catalogues for eclipsing binaries and different authors use different labels in their works.In order to homogenise the naming of eclipsing binaries, we grouped them into four subclasses: EA, EB, EW, and ECL, with the latter denoting the generic class when there is no further information or the subclass is uncertain.Table 3 shows the grouping of labels as defined in our catalogue.Information on the original labels from literature was however preserved.As a special case, sources from the Gaia alerts3 have class labels set to OMIT if they were recorded after 28 May 2017 (Gaia DR3 observation time limit).There are 5676 such sources and 49% of them were reported by Gaia alerts, the rest were also included in other input catalogues.In some catalogues, sources could be associated with multiple types, in which cases OR as | and AND as + were used.We respected the source classification given in the original catalogues, thus class labels may refer to any level of a possible hierarchy.For example, a source may be classified as AGN, QSO, BLAZAR, or BLLAC, without implying that a subtype (e.g., BLLAC) does not belong to its superclass (like BLAZAR or AGN).
After merging information of overlapping catalogues, the final cross-match catalogue contains 7 841 723 unique Gaia DR3 source ids.A subset of catalogues with particularly low contamination rates is indicated by a boolean column selection and includes 6 697 530 sources.Sources with class 'OMIT' are filtered out from the selection.The properties of the final catalogue are discussed in the next section.

Caveats and exceptions in the pipeline
The method described above, using the statistics of each catalogue, has the advantage of automatically eliminating large numbers of outliers and provides a clean data set.However, as a sta- tistical process, it may sometimes reject perfectly good candidates.For example, Gaia source_id 4040728046945051264 exists in both OGLE4_CEP_OGLE_2020 (Soszyński et al. 2020), as OGLE-BLG-T2CEP-0346, and COMP_VAR_VSX_2019 (Watson et al. 2006), with OID=33239.The angular distance of this source with respect to its Gaia counterpart is ∆θ = 0.89 in both catalogues (VSX includes OGLE stars).Figure 4 shows that for OGLE4_CEP_OGLE_2020 the bulk of the counterpart sources exist within 0.3 , while in COMP_VAR_VSX_2019, which is a compilation of sources from various catalogues, it is close to 1 .Thus, due to different cuts, this source is eliminated from OGLE4_CEP_OGLE_2020 but not from COMP_VAR_VSX_2019.
The cross-match was purely astrometric (based only on the smallest ∆θ) in the following special cases: catalogues with highly non-uniform photometry (bands, methods, etc.), whose distribution of ρ synth was not adequate to split matches from mismatches, catalogues whose photometry was biased by extreme variability (e.g., sampling only the peak brightness of cataclysmic variables), and very small catalogues for which a statistical procedure was not applicable.
Exceptionally, some catalogues that required no cross-match were included, such as DPAC internal catalogues with preassigned Gaia source_id and EROSITA_AGN_LIU_2021 for which the authors had already performed cross-match with Gaia in Salvato et al. (2021) using methods optimised for X-ray data sets, therefore their results were used.

The cross-match catalogue
In this section, a description of the catalogue and its general properties are discussed.It is published online through the Centre de Données astronomiques de Strasbourg website.

Description of the catalogue
The cross-match catalogue contains in total 7 841 723 sources of various types (6 697 530 of them are flagged as selection=true).
Table 4 shows the available fields in the published catalogue and provides a short description.Columns in plural may contain multiple values, separated by a semicolon, as a source may exist in multiple literature catalogues.Their order follows the ranking list.The fields start with primary contain the information from the highest ranking catalogue that a specific source exists.The primary_superclass field has been introduced in order to group smaller classes and facilitate the selection of generic types.Table 5 presents the available types in primary_superclass, the number of sources, and classes assigned to each superclass.The assignment was performed based on the class in the highest ranking catalogue which is given in primary_var_type.var_types contains the homogenised variability class, original_var_types the original variability type from the literature (i.e.not homogenized), and original_alt_var_types any alternative variability types provided in literature.Despite our best effort at minimising mismatches, the cross-match catalogue may still associate sources with incorrect classifications, because of remaining mismatched sources or inaccurate classifications in the literature.No cleaning or corrections were performed with respect to the information from literature.Thus, depending on purpose, users might need to verify or clean some objects of interest, especially if not using the selection flag.
The final product contains 112 different types of objects.Some of them are not variable, like constants (CST), generic white dwarfs (WD), non variable DQ dwarfs (DQ, HOT DQ, WARM DQ), or galaxies which appear artificially as variable in Gaia (Holl et al. 2022), as they might be relevant (depending on purpose) to differentiate genuine vs spurious light variations.The full list of the 112 different types alphabetically ordered is presented in Table 6, together with the number of objects: the first column shows the variability class, the following two columns present the number of objects of this type in primary_var_type and the last two columns refer to the number of unique sources that were classified in any input catalogue as the specific class (in var_types).

Properties of the catalogue
The sky distributions of cross-match sources are presented in Figs.5-7, where all sources, only variable stars (without WD, CST, AGN, and GALAXY types), and only constant sources, respectively, are shown.The sky distributions for the extragalactic content are presented and discussed in Sect.4.6.The Galactic center, Magellanic Clouds, the Kepler fields, and the SDSS Stripe 82 are prominent as some literature catalogues are focused in those fields.Figure 8 shows the distribution of magnitudes of all sources (black) and of constants (blue), variable objects (red), and galaxies (green dashed).The galaxies appear at the fainter end of the catalogue, while variable and constant sources are distributed along the full magnitude range.

Quality of the cross-matched sources
The following subsections assess the quality of the crossmatched sources per variability class.The assessment is based on a visual inspection of the variable sources loci in the CaMD with respect to a reference set defined with all the following cri-teria: These criteria applied on all ∼1.8 billion Gaia DR3 sources and the outcome was further reduced by sampling on their parallax.The result of this process was a set of 4.2 million sources with high astrometric and photometric quality.This reference set serves as background in the CaMDs that follow in order to help the reader locate the areas the different variability types should exist.
Considering the significantly lower number of sources per class in the cross-match catalogue, less strict constraints were applied in order to select the sources of the various classes: The sources within the Magellanic Clouds were excluded from the CaMDs.With these constraints, some rare types (like Black Hole X-ray Binaries (BHXB) and Small Amplitude Red Variables (SARV)) did not have sufficient representatives for the CaMD.In the following subsections, a short description of the properties of each class and discussion about the quality of the cross-match are given.More information about the various generic properties for each variability type can be found in the variable star type designations of the AAVSO VSX5 .

Pulsating Variables
The cross-match catalogue contains many different classes of pulsating stars.Results are discussed separately for pulsating stars in dwarfs, sub-dwarfs, BLAPs, long period variables, semiregulars, Cepheids, δ Scuti, γ Doradus, RR Lyrae stars, and other types.

White dwarfs, sub-dwarfs, and blue large amplitude pulsators
There are ten different variability classes of variable white dwarfs (WD) and sub-dwarfs in the cross-match catalogue.Figure 11 shows the CaMD for these classes.As it is shown, several classes are overlapping or they are different sub-classes of a larger class, like the ZZ Ceti stars (for a detailed review of pulsating white dwarfs, see Córsico et al. 2019).Class labels are defined as follows.
-HOT_DQV: These sources are DQ white dwarfs variables with C and H rich atmospheres.In the CaMD plot it is clear that the majority of the stars are in the WD sequence below the area of V777 Herculis stars.-ZZ_Ceti: For ZZ Ceti types there are 6 subtypes in the catalogue, three of them concerning ZZA (DAV).But also there are few ZZ Ceti with no further subclassification.
-ZZ: These are generic ZZ Ceti without detailed class, they lay in the correct position in the CaMD.-ZZA: ZZA (or DAV) are classical ZZ Ceti stars with DA spectral type with H atmospheres.They lay in the expected area in the CaMD of Fig. 11, but there are 4 ZZA that seem to be well beyond the ZZ Ceti location.These sources originate from the VSX, which is very useful because of its diversity, but its cross-match is prone to mismatches.
-HOT_ZZA: The only HOT-ZZA that survived the quality cuts for the CaMD is in the correct place with respect to ZZA and ZZB, as their effective temperature is in a similar range.-ELM_ZZA: Extremely low mass (ELM) ZZA tend to have temperatures between 7 800 and 10 000K, the difference between ELM-ZZA and ZZA is clear.The ELM-ZZA on the right is SDSS J184037.78+642312.3, the first identified ELM-ZZA (Hermes et al. 2012).-V777HER: The V777 Herculis (or ZZB, DBV) are stars with He-rich atmospheres and their periods range between 100 and 1400 s (Bognár et al. 2014).They are well defined in the CaMD, grouping in the WD sequence between the warmer GWVIR and the cooler ZZA.-GWVIR: GW Virginis (or ZZO, DOV, PG1159) stars are a subtype of ZZ Ceti with absorption lines of HeII and CIV, and it's the hottest known type of pulsating WD and pre-white dwarfs.The population in our catalogue is well defined.There are some sources off the white dwarf sequence which lie closer to the horizontal branch.-Sub-dwarfs: The cross-match catalogue contains two classes of sub-dwarf B stars: V361 Hya and V1093 Her.The two types are concentrated as expected in the extreme horizontal branch.Some of them (mostly V361 Hya stars) can be redder than the main clump, but they follow the blue horizontal branch (BHB).Most of V361 Hya stars are hotter (with effective temperatures in 28 000-35 000 K) than V1093 Her (23 000-30 000 K), so the two populations are not distinct and overlap as predicted by their temperature range (Heber 2016).-BLAP: Blue large amplitude pulsators (BLAPs) have temperatures as hot as sub-dwarfs but with larger amplitudes (Pietrukowicz et al. 2017).In our catalogue, only two sur-vived the astrometric cuts and they lay in the horizontal branch.

Long period and semi-regular variables
The result of this work contains several classes of long period variables.As for white dwarfs, several classes overlap or are subclasses of a generic one.Figures.12 and 13, present CaMD for long period and semi-regular varibles respectively.
-LPV: Long period variables include sources from surveys or catalogues that did not subclassify them.They are generally in the expected location in the CaMD, among the red giants, however ∼3% of them are found in the main sequence.Some of them are due to literature misclassifications (with periods of less than a day), others might be mis-matched.-SARG: They are small amplitude red giants pulsating with periods from 10 to 100 days, a large fraction of them with long secondary periods and laying in the RGB or AGB branches.Most of them are well defined in Fig. 12, but there is a minority that is too blue or falls in the main sequence.-OSARG: They are SARGs from OGLE; their location in the red giant branch has very little contamination.-LSP: Long secondary period variables are luminous red giants stars which have secondary period an order of magnitude longer than their primary (Wood et al. 1999).One third of LPVs exhibit this type of behaviour (Soszyński 2007) and their periods range from 200 to 1500 days.In Fig. 12 they lay into a well defined expected area overlapping with other LPVs, although some outliers extend to the main sequence.
Semi-regular variables, generally, are giants or supergiants that exhibit irregular periods that vary, and some of them even show time lags of constancy.Figure 13 presents the CaMD for such classes.
-SR: Semi-regular variables are giants or supergiants of late type with no strict periodicity.Most of the ones that are in the main sequence are imported from ZTF_Periodic_Variables (Chen et al. 2020), likely due to misclassifications rather than mis-matches, as several of them were verified to have the same periods in the Gaia counterparts.-SRA and SRB: Late type giants, semi-regular variables.SRA stars tend to have periods of 35-1200 days, while the SRB stars are more irregular, with cycles of 20-2300 days and also time intervals that show no variability.These classes are well defined in the CaMD and with only few outliers, although overlapping with the other SR types.This is justified as typically they all are of the same spectral type.-SRC: This subclass consists of late type supergiants with periods that fall into the interval 30 to thousands of days.In the CaMD, they occupy a well defined area above the SRA and SRB stars.-SRD: They are giants and supergiants of types earlier than SRA, SRB, and SRC, with variability periods from 40 to 1100 days.In the CaMD, they are close to but separate from the other subclasses, towards earlier spectral types.-SRS: They are red giants with shortest periods than other SR, varying from a few days up to a month.This class is defined in the same area as the other SR stars in the CaMD, but they appear to have also several contaminants.All of the SRS stars originate from the VSX.-PPN: Protoplanetary nebulae with yellow supergiant post-AGB stars, exhibiting variability that resembles the SRD variables.The few that survived the quality cuts are found in reasonable places in the CaMD.

RR Lyrae stars
Many input catalogues contain RR Lyrae stars, allowing us to construct a significant sample of this type of variable stars and of its subclasses.They are A to F type stars showing periodicity of less than a day and amplitudes that can reach 2 magnitudes in the optical.The RR Lyrae variables in the cross-match are divided into four subclasses and a generic one for the catalogues that do not provide detailed classification.Figure 14 shows that the majority of the sources falls into the expected place, but a significant fraction does not.Many sources are located in the lower part of the main sequence and some are between the main and white dwarf sequences.Visual inspection shows that some of them were mismatched sources.When in dense regions, two Gaia sources may have a similar angular distance to the literature target and the most compatible magnitude associated with the in-  correct counterpart.The user is encouraged to verify sources of these classes, especially if not filtering input catalogues.
-RR: This is the generic class of RR Lyrae stars from catalogues that do not provide subclasses.The majority of those stars are from PS1_RRL_SESAR_2017 (Sesar et al. 2017), which contains several problematic cases of sources laying in the white dwarf sequence or between the white dwarf and the main sequence, as expected because no selection based on score was applied to those candidates (unlike in PS1_RRL_SESAR_SELECTION_2017).
-RRAB: This is the most common RR Lyrae class, with asymmetric light curves and periods between 0.3 and 1 day.The majority of them lay in the expected place in the CaMD, with a few outliers towards the white dwarf sequence.-RRC: They have symmetric and sinusoidal light curves and shorter periods than RRAB stars.In the CaMD, their majority has G BP − G RP ∼ 0.5 mag, but extend also in the main sequence.
-RRD: They are double mode pulsators, which occupy a well defined region at G BP − G RP ∼ 0.5 mag, but there are also some very red outliers.-ARRD: They are anomalous RRD, double-mode pulsators that are similar to RRD but their ratio of periods is different.Very few ARRD survived the quality cuts for the CaMD and they are scattered towards the red part of the main sequence.

Cepheids
In our catalogue, we have included several types of Cepheids and the relevant CaMD is presented in Fig. 15.-CEP: Cepheids are radial pulsating giants and supergiants with a large range of periodicities from ∼1 to more than 100 days.Their spectral type varies depending on their phase from F to K.This class label includes all types of Cepheids from catalogues that do not provide detailed classification.
Only a small number of sources of this class is included in the CaMD, half of them lay in a reasonable location, while the others fall on the main sequence.-DSCT: δ Scuti are pulsating variables similar to δ Cepheids.They follow the same period-luminosity relation, but they have shorter periods (from 0.01 to 0.2 days).Their brightness varies with amplitudes between 0.003 to 0.9 magnitudes.Their spectral type is between A0 and F5.Usually, δ Scuti stars lie on the instability strip.

Other pulsating variables
Additional pulsating types are presented in this subsection.
-ACYG: α Cygni stars are B-A supergiants exhibiting nonradial pulsations with a large range of periods.Their typical amplitude of photometric variability is about 0.1 magnitudes.In Fig. 18, they may spread more than anticipated for A or B type stars.-BCEP: β Cepheid stars are main sequence stars of O8-B6 spectral type exhibiting photometric and radial velocity variability with short periods between 0.1 and 0.6 days.A large number of BCEP stars was cross-matched, with the majority originating from KEPLER_VAR_DEBOSSCHER_2011, without applying probability thresholds, so the vast majority are misclassified sources and none of the ones in the CaMD lays in the expected region.If the selection flag is not active, we encourage to reject unfiltered BCEP stars with primary_var_type originating from KEPLER_VAR_DEBOSSCHER_2011 and also from ASAS_VAR_RICHARDS_2012, which fall on the RGB.

Cataclysmic variables
A few types of cataclysmic variables are included in the crossmatch.The most important ones are shown in Fig. 19 and discussed in this section.
-PCEB: Pre-Cataclysmic variables or Post-Common Envelope binaries are binaries of a white dwarf and a main sequence star or a brown dwarf.In Fig. 19, most of them lay in the extreme horizontal branch.-CV: Generic type of cataclysmic variables including novae and dwarf novae, typically fall between the main sequence and the white dwarf sequence in the CaMD.-ZAND: Z Andromedae stars include inhomogeneous types of symbiotic binary variables stars composed of a giant and a white dwarf.They display irregular variability with large amplitudes.Among the few cases that are present in Fig. 19, the majority lay in the AGB branch.-SYST: Symbiotic stars, which, like ZAND, form a heterogeneous group of objects, usually with a red giant or AGB star and a white dwarf.Most of them fall on the AGB branch in the CaMD.Although not a system of binary stars, stars with transiting exoplanets (EP) are added to Fig. 20, where they lie on the main sequence.

Eruptive
The compilation of variables from the literature contains 18 eruptive variability types.Many of them are different subtypes of T Tauri stars (TTS), which are plotted in Fig. 22(a) separately from other eruptive types in Fig. 22(b).In both plots there are stars fainter and bluer than the expected pre-main sequence locus.It is likely due to the circumstellar disks of these stars at high inclination.Thus the photosphere is strongly extincted, and their optical colours are bluer due to the light scattered by the disk atmosphere.A short discussion of the properties of all available eruptive stars follows.
-TTS: T Tauri is the generic class of pre-main sequence objects.They are generally low to intermediate mass stars in a stage between protostars and low-mass main sequence stars.In Fig. 22(a), they occupy the expected region, however there is a small fraction bluer than the main sequence or falling on the main sequence.Most of these stars are in the Orion Molecular Cloud.There are several TTS subclasses in the cross-match catalogue, depending on their spectra (Herbst et al. 1994;Herbst & Shevchenko 1999).
-CTTS: Classical TTS are well-studied stars.They are young accreting stars in their late stages of their evolution from protostars to the main sequence.They are well defined in Fig. 22, although there are a few misplaced sources, half of which are part of the Orion Molecular Cloud.
-GTTS: G-type TTS are G and K0 type TTS.Only a few GTTS are available and they fall in the expected place in the CaMD.-WTTS: Weak-lined or 'naked' TTS have little or even no accretion disk.They also follow the TTS trend in the CaMD with a few exceptions.-IMTTS: Intermediate mass TTS have masses between 1 and 4M and are considered precursors to the PMS Herbig Ae/Be stars (Lavail et al. 2017).
-Flares: This is a generic type, encapsulating several other types exhibiting flares due to magnetic activity (UV Ceti, TTS, etc.).In the CaMD, it is evident that they are spread all over the main sequence and in the RGB.-WR: Wolf-Rayet is a group of massive stars that present broad emission lines.They have high temperatures and luminosities and considered as descendants of O-type stars.Their variability is not periodic.In the CaMD, few WR stars lay at the expected area and others appear to be reddened.

Rotational
Rotational variables are stars whose variability is caused by their rotation and asymmetries in shape or non-uniform surface brightness.The cross-match catalogue contains 13 types of rotational variables (some of them overlapping) and their CaMD is shown in Fig. 23.
-ROT: This is a generic class of spotted stars and are scattered everywhere in the CaMD.-RS: RS Canum Venaticorum variables are close binary systems of late spectral type that exhibit chromospheric activity.As shown in Fig. 23, they extend to all of the main sequence and many of them are very red.The vast majority of these sources originates from the automatic classification of ZTF_PERIODIC_CHEN 2020.-ACV: α 2 Canum Venaticorum variables are chemical peculiar main sequence stars of B8p-A7p type with strong magnetic fields.They have periods that vary from 0.5 to more than 100 days.Most of the ACV stars fall into the expected region of the CaMD with a few red outliers.A large fraction these outliers are listed in the VSX and originate from Kabath et al. (2009).-SXARI: SX Arietis are B-type chemical peculiar stars with strong magnetic fields and periods of about 1 day.They are similar to ACV stars but with higher temperatures, therefore there is some overlap with their distribution in the CaMD.
Our list includes a few SXARI stars whose period is much longer than 1 day and thus their class is spurious.surface brightness and exhibit chromospheric activity.They have periodic variability with periods that can vary from less than a day to more than 120 days.The cross-match catalogue has a large number of BY variables that fall on the main sequence, however a significant fraction of them has been classified as other classes as well.-ELL: Ellipsoidal variables are close binaries whose light curves do not contain an eclipse but their variability is due to the distortion of their shape from the mutual gravitational fields, thus the observed light varies because of varying projected surface towards the observer.The sample of ellipsoidal variables is scattered in all the CaMD with the majority laying on the main sequence.-HB: Heartbeat variables are binary star systems with eccentric orbits that cause both variations of stellar shapes and vibrations induced by such changes.There are about 150 heartbeat stars in the cross-match catalogue, 91% of which is also classified as eclipsing binary in various catalogues.The majority of the ones that passed the quality cuts for the CaMD have colour 0.1<G BP − G RP <1.0 mag, only a few of them are redder than that.-SOLAR_LIKE: These stars exhibit chromospheric activity and include BY, ROT, and Flares types.In the CaMD, most representatives fall on the main sequence, although there are other sources in the red giant branch.-R: close binaries that exhibit strong reflection in their light curves (re-radiation of light of the hotter star from the surface of the cooler one).Most of these stars fall in the region between the main sequence and the white dwarfs; some are found in the extreme horizontal branch too.-BY|ROT: similar to SOLAR_LIKE, it includes stars of types BY or ROT as defined before.

Extragalactic content
A large number of sources in the cross-match catalogue concerns galaxies and active galactic nuclei.Most of input catalogues used to cross-match both galaxies and AGN were internal Gaia catalogues and their content could be identified by their source ids.The catalogues containing AGN had various levels of detail in  their classification (AGN, BLAZAR, BLLAC, QSO), although the majority the sources were grouped as QSO as a generic class.
For galaxies, no subclasses were reported.Figures 24 and 25 show the sky distribution all types of quasars and galaxies with darker colours indicating higher density of objects.Both figures show that the galactic plane is avoided.As the main contributions for both galaxies and quasars are from Gaia products, their properties are discussed in detail in their corresponding papers (Krone-Martins et al. 2022;Gaia Collaboration et al. 2022)

Class overlaps
Due to the large number of catalogues that contributed to this cross-match, different classes might be associated with the same sources.Table 8 (available through the Centre de Données astronomiques de Strasbourg website) shows the number of sources that overlap based on their superclasses, alphabetically ordered.The first column shows the primary_superclass and the 51 columns that follow, the overlapping superclasses taken from var_types.Not to confuse with the same classes, the numbers of sources classified as the same type in different catalogues (i.e., the diagonal of the table) have been set to zero.Some reasons that lead to class overlaps are listed below.
-Mismatches: Due to the statistical approach used and the fact that each catalogue was treated separately, it is possible that Gaia sources are erroneously assigned to input catalogue counterparts.This problem may occur more frequently in crowded regions and depends also on the astrometric accuracy of each catalogue.-Misclassifications: Input catalogues might include misclassified sources, especially when generated by automatic methods.An example is presented in Table 7 for Gaia DR3 source_id 4066039874096072576, which is matched in 4 catalogues.This table lists the input catalogues, the iden- tifiers of the source in each catalogue (with an additional online information, if present), the coordinates, variability types, and periods.ASAS-SN classified this source as a semi-regular (with a classification probability of 0.537) and identified a variability period of ∼18 days.However, the other catalogues classified it as an eclipsing binary (EW, EB, and ECL) with a period of ∼2 days.The ASAS-SN database6 was used to download the photometric data of the source.Running a Lomb-Scargle (Lomb 1976;Scargle 1982) period search (using the R implementation of the lomb package; Ruf 2019) for each camera separately, it was found that the periods were consistent with each other, and after doubling them (as often needed for eclipsing binaries, as they have two minima per cycle instead of only one, as targeted by the sine function in this period search method; see fig. 1 of Holl et al. 2014), they corresponded to the 2.1091 day period identified in the other catalogues (see Fig. 26a).This source is published also in Gaia DR3 as an eclipsing binary with the same period.The period provided from ASAS-SN was recovered as a secondary peak in the frequencygramme, but the folded light curve was worse, suggesting that the correct type is EW rather than SR.-Multiple classes: Some classes are not excluding other ones and sources could be identified in the literature as a combination of two (or more) classes, such as Cepheids in eclipsing binaries, BY Draconis stars with flares of UV Ceti variables, etc.
Table 8 shows that the most overlapped class is ECL as primary_superclass with RR Lyrae stars with 58 811 cases.However, the rate of overlap is a low as the crossmatch catalogue contains more than 1.1 million sources which their primary_superclass is ECL.Looking at the catalogue_labels of these ∼59K sources reveals that 56 361 of these are in PS1_RRL_SESAR_2017.This catalogue contains sources without filtering on the class probability.The high probability sources of this catalogue are provided in PS1_RRL_SESAR_SELECTION_2017 and only 1 553 cases overlap with ECL.Moreover, the shapes of the light curves of EW and RRC stars are very similar and prone to confusion (Hoffman et al. 2009).Indeed of the 1 553 overlapped sources in PS1_RRL_SESAR_SELECTION_2017, 1 162 are classified as RRC and EW.
Another significant overlap is between AGN and CST sources where 19 434 cases exist.In this case the main contributor is GAIA_WD_GENTILEFUSILLO_2019 with 18 001 sources while the rest are from SDSS_CST_IVEZIC_2007. Regarding the first catalogue no filtering was applied, selecting only the reliable sources (see Gentile Fusillo et al. 2019) reveals that 441 sources are overlapping.Also here it should noted that the overlap rate is very low as there are ∼1.8 million sources with AGN as primary_superclass.

Least variable sources in ZTF
In order to increase the number of constant stars and widen their sky distribution, it was decided to take advantage of the wealth of the Zwicky Transient Facility (Masci et al. 2019) (hereafter ZTF).ZTF is a project started in 2017 in Palomar observatory.Its goal is to provide a high cadence data stream, enhancing science in stellar astrophysics, supernovae, active galactic nuclei, etc.Each image is captured by a 47 square degree field camera mounted on the 48 inch Schmidt telescope.On average, ZTF observes the entire Northern sky more than 300 times per year (see Fig. 27) and makes a data release every two months.ZTF data release 2 has become available in December 2019 containing ∼2.3 billion light curves.
The idea was to obtain the ZTF photometric data and any statistic that is available, in order to detect the least variable stars.However, there is no need to download all sources from the ZTF database as the aim is not a comprehensive detection of constant sources in ZTF.For this reason, a dense grid of points scattered all over the ZTF observable sky has been created and extracted ZTF sources by performing a cone search with a radius of 2 .The grid contained 36 000 points limited to δ > −30 • and it was created by getting healpix with depth 6 and nside 64.In total, about 3 million ZTF sources were extracted.In all those sources, the median absolute deviation of their photometric time series was already available and was used to select the least variable stars.The 3 million source sample was divided in 250 magnitude bins, and the sources with MAD under the 10th, 5th, and the 1st percentile of the MAD distribution of each bin were selected.Figure 28 shows the ZTF median g magnitude versus the time series MAD.The sources with MAD over the 10th percentile per magnitude bin are shown in grey, the ones with MAD between the 10th-5th and 5th-1st percentiles are in red and blue, respectively, while sources with MAD less than 1st percentile are in green.Figure 29 shows the spatial distribution of the selected sources per percentile.
The next step was to cross-match the selected sources with the Gaia DR3 data set, which was performed with the same method as the rest of the catalogues in this document.At the end of this cross-match process, 267 784, 133 112 and 26 217 sources for the 3 different cut-offs (10th, 5th , and 1st percentiles) were left.Figure 30 shows the G magnitude distribution of these sources depending on their corresponding percentile range.Due to the very low number of sources at the bright end, it was decided to select an upper limit for the number of stars per bin (for a more fair representation of all magnitudes).Figure 31 shows the MAD versus G magnitude of the selected stars (depicted in red), with MAD less than the 10th percentile and including up to 2000 sources per 0.5 magnitude bin. Figure 32 shows the G magnitude distribution of the final selection of sources.

Selection of least variable sources from TESS
The Transiting Exoplanet Survey Satellite (hereafter TESS; Ricker et al. 2015) is a NASA space telescope with primary goal to search for exoplanets.It observes both hemispheres divided in 26 sectors and its targets are bright stars with the majority being brighter than T ∼ 12 mag.In order to overcome the lack of constant sources with magnitude around G∼12 and considering the targets TESS observes, it was decided to apply the same process described in Sect.5.1 to TESS sources.The time series of ∼99 thousand unique stars covering 11 sectors were used (see Fig. 33).We remind that our aim was not to cross-match the full TESS targets but to identify a sufficient number of least variable stars in a specific magnitude range.The light curves were downloaded from the TESS bulk download website, where a script that extracts data per sector is provided.About half of the source were duplicated from sector overlaps at the Ecliptic poles and thus they were removed.These photometric time series contained the Simple Aperture Photometry (SAP) and the Pre-Search Data Conditioned Simple Aperture Photometry (PD-CSAP) corrected flux of each object.SAP flux is the raw flux while in PDCSAP flux long term trends have been removed.This removal must be taken with caution as it can alter the true flux changes of variable sources.Fluxes have been converted to mag-  nitudes using a preliminary zero point magnitude (László Molnár, private communication) and the MAD is calculated for each source.The same procedure as in Sect.5.1 has been followed in order to select those sources with lower MAD per magnitude bin. Figure 34 shows the MAD per magnitude per sector, revealing that sectors can have different MAD thresholds, so the 10% least variable stars are selected per sector. Figure 35 shows the final spacial distribution of the least variable stars in TESS.After the cross-matching with Gaia, 5100 sources were selected.The magnitude distribution of the selected sources is shown in Fig. 36 and it covers the magnitude gap of non-variable objects from Hipparcos and SDSS Stripe 82.

Conclusions
We present the creation of a large and diverse cross-match catalogue with Gaia from variable and constant sources in the literature.In total 152 different input catalogues from the literature were cross-matched with Gaia DR3 in order to find the counterpart sources, compiling a large data set of more than 7.8 million sources and 112 different types of variables, constants and galaxies.Each input was cross-matched individually performing an epoch propagation and using a synthetic distance that encapsulates the astrometric distance and the photometric difference between targets and counterparts.The catalogue is available online to the scientific community through the Centre de Données   astronomiques de Strasbourg website.Users of this catalogue might still need to verify or filter out some objects of interest depending on the purpose of the analysis.This catalogue is a valuable resource for the studies of variable sources as it provides a single data set for the Gaia mission containing with uniform photometry and astrometry.

Fig. 1 .
Fig. 1.Distribution of angular distance between Catalina CSS South targets and their Gaia candidate counterparts found within 5 radius.

Fig. 2 .
Fig. 2. Distribution of the synthetic distance of candidate counterparts obtained by the cross-match between Gaia and Catalina CSS South.An upper limit of ρ synth = 6 was applied to filter mismatches out.

Fig. 5 .
Fig. 5. Sky density of all sources in the cross-match catalogue.

Fig. 8 .
Fig. 8. Magnitude distribution of the cross-match catalogue for galaxies, variables, and constants.

Figure 9
Figure 9 presents the values of an amplitude proxy in G (A proxy,G ; Mowlavi et al. 2021) versus the mean G-band magnitude for the variable stars and constant sources.The amplitude proxy is a measure of the scatter in the light curve of each source.A proxy,G is defined in eq.2 where N G is the number of observations contributing to G photometry, I G and ε(I G ) are the G-band mean flux and its error.A fraction of stars classified as variable in the literature have low A proxy,G , sometimes lower than constant sources.Such an example is source_id 3328974248568180864, which has A proxy,G = 0.0016 and is classified as eclipsing binary with a period of 1.04 days by ASAS-SN (ASASSN − V J061917.53+094328.8).The

Fig. 9 .
Fig. 9. Amplitude proxy G vs Mean G magnitude for constant and variable stars.

Fig. 10 .
Fig. 10.ZTF time series of Gaia DR3 source_id 395015018457259904 selected as least variable but has large A proxy,G .

Fig. 11 .
Fig. 11.Colour-absolute magnitude diagram (CaMD) of white dwarfs, sub-dwarf variables, and blue large amplitude pulsators.The sources of the reference data set are plotted in grey-scale as background to facilitate locating the different areas.
-M: M (or Mira, o Ceti) variables are late type stars with periods between 80 and 1000 days.They are very well defined in the red part of the CaMD with little contamination.-M|SR: These are long period variables that includes Mira and Semi-regular stars identified by classification in Gaia DR2 (Rimoldini et al. 2019a).The majority of this class occupies the expected area in the CaMD, overlapping the regions of Miras and LPVs, but there are some contaminants falling on the main sequence likely Young Stellar Objects (Mowlavi et al. 2018).

Fig. 16 .
Fig. 16.Period distribution of the Type II Cepheids, as reported in the literature.The different colours show the three subclasses and the generic class.
-ACEP: Anomalous Cepheids, or BL Bootis, are pulsating variables that lay on the instability strip.Typically they have periods from a few hours to 2 days.The ACEP found in the CaMD occupy the expected position.-DCEP: δ Cepheids, or classical Cepheids, tend to be brighter than the Type II Cepheids, although there are sources in the cross-match that fall on the main sequence.-T2CEP: This is a generic class of Type II Cepheids from catalogues that don't provide further details about their subclass.They are pulsating variables with periods in the interval from 1 to more than 50 days.They are similar to classical Cepheids but with lower masses and luminosities, and tend to be older.They can be divided in 3 subclasses: BLHER, CW, and RV TAU.These subclasses have different period range, as shown in Fig. 16, while the generic class spreads in the full range of periods in the plot.-BLHER: BL Herculis (or CWB) are the Type II Cepheids variables with the shortest periods among the different subclasses.They have periods from 1 to 4 days and they lay in the region between the horizontal branch and the asymptotic giant branch.Only 90 BLHER survived after the quality cuts, some of them are found to be redder than expected.-CW: W Virginis stars have periods between 10 and 20 days and are crossing the instability strip.They expand to the areas of BL Her and RV Tau in the CaMD.Their period distribution reported in the literature has tails that extend to the full range shown in Fig. 16.-RV: RV Tauri variables are radially pulsating supergiants that change their spectral type along with their magnitude.Their spectral type span from F-G class to K-M, depending on their phase.Their periods are longer than 30 days, with typical values between 40-50 days.The RV Tau stars in the cross-match catalogue fall into expected CaMD location.4.1.5.δ Scuti and γ Doradus variables Since δ Scuti and γ Doradus variables are closely related and can also be hybrids, they are presented together in Fig. 17 .
Figure 17 shows several contaminants as the δ Scuti representatives cover a large fraction of the main sequence, with some sources on the white dwarf sequence.-SXPHE: SX Phoenicis are considered similar to δ Scuti stars that are sub-dwarfs with periods typically in the lower part of the DSCT range.They lay in the expected place of the CaMD.-DSCT|SXPHE: δ Scuti or SX Phoenicis stars identified by the classification of the variable sources of Gaia data release 2. Generally they occupy a correct region, although there are some outliers.-GDOR: γ Doradus are dwarfs with late A to late F spectral type that exhibit variability with non-radial pulsations.They usually have periods around 1 day.In Fig. 17, they occupy the expected region, except for some outliers.-DSCT+GDOR: The δ Scuti and γ Doradus hybrids are variable stars that exhibit both g (GDOR) and p (DSCT) mode pulsations.They are found in the expected location of the CaMD.

Fig. 18 .
Fig. 18.CaMD of the other types of pulsating variable stars.

4. 3 .
Eclipsing Binaries, Double Periodic Variables, and stars with exoplanet The CaMD for the eclipsing binary stars and stars with exoplanets in the data set is presented in Fig. 20.The eclipsing binaries can be scattered throughout the HR diagram as shown in the figure.-EA: Algol (β Persei) type eclipsing binaries have stars with spherical or only slightly elliptical shape and the secondary eclipse is not always present in the time series.In Fig. 20, it is clear that this type of objects can be anywhere in the CaMD.-EB: β Lyrae eclipsing binaries have elliptical components and the secondary minimum is always visible in their light curve.The majority of such eclipsing binaries have periods larger than half a day.They usually cover the upper part of the main sequence and extend to the giants.-EW: W UMa-type eclipsing binaries are composed of two stars of similar spectral type between A and K with most of them being F or G.They have short periods, typically between 0.25 and 1 day.There are many red stars in Fig. 20 and ∼3% have periods longer than 2 days in the literature, ∼30% of which have different classifications (e.g., ROT, YSO) in other catalogues.The literature period distribution of this class is shown in Fig. 21 with a strong peak at around 0.37 days, as expected (Jiang et al. 2012), but also with a tail extending to more than 200 days.-DPV: Double periodic variables are semi-detached interacting eclipsing binaries that exhibit photometric variability with two distinct periods.Only 3 of them are shown in Fig. 20.

Fig. 21 .
Fig. 21.Distribution of periods from the literature for EW type eclipsing binaries.

Fig. 22 .
Fig. 22. CaMD of eruptive type variable stars.The upper plot (a) shows the various TTS types and flare stars while the lower plot (b) the rest of eruptive type stars.
-HAEBE: Herbig Ae/Be variables are young stars of spectral types A or B. There are only a few but mostly in the expected location of the CaMD, including some very reddened ones.-FUOR: FU Orionis variables are pre-main sequence stars closely related to the evolutionary stages of T Tauri stars.They are characterised by rapid and strong photometric and spectral variability.There are only 3 FUOR stars surviving the quality cuts, out of the 9 in the cross-match catalogue, and they are in reasonable places in the corresponding CaMD.-UV: UV Ceti flare stars have spectral types K or M. Figure 22(b) shows a lot of stars spreading in the main sequence up to earlier spectral types.-GCAS: γ Cassiopeiae stars are of O9-A0 type and thus occupy the expected place in the CaMD.Some are of later types but with no obvious problems.Some of them have been assigned different types in other input catalogues (e.g., some of those in the extreme horizontal branch are also classified as CV).

Fig. 26 .
Fig. 26.Folded light curve for Gaia DR3 4066039874096072576 using different colours for each ASAS-SN camera.(a) The recovered period is almost always the same at 1.0545 days in the 3 different cameras provided by ASAS-SN, which is half of the period referred to in the literature.The folded light curve is plotted with twice the period recovered in each camera.(b) The same source using data and the period found in Gaia DR3.

Fig. 27 .
Fig. 27.ZTF r filter sky depth-of-coverage in Galactic coordinates.The colour scale corresponds to the number of observation epochs per approximate CCD-quadrant footprint.Image from ZTF.

Fig. 28 .
Fig. 28.MAD versus magnitude per percentile cut for ZTF sources in this work.Sources between the 10th and 5th percentiles are in red, those with MAD between then 5th and 1st percentiles are in blue, and the ones under the 1st percentile are in green crosses.

Fig. 29 .
Fig. 29.Sky map of the least variable sources.The same colour coding of Fig. 28 for percentile (pc) thresholds has been used.

Fig. 30 .
Fig.30.Gaia G magnitude distribution of the selected least variable sources per percentile threshold, after cross-match with Gaia DR3 data.The colour schema is the same as in fig.28, the line for MAD below 1st percentile is dashed.

Fig. 31 .
Fig.31.MAD vs G magnitude of sources below the 10th percentile of the MAD distribution.We highlight with red colour the sources that we selected as least variables.

Fig. 32 .
Fig. 32.G magnitude distribution of the final selection of sources with photometric MAD below the 10th percentile, which highlights the ZTF cross-match representation as a function of magnitude.

Fig. 33 .
Fig. 33.Sky coverage in Ecliptic coordinates of the TESS sectors that were used in this work.

Fig. 34 .
Fig. 34.TESS magnitude versus MAD thresholds (of the 10th percentile) for the various sectors used.

Fig. 36 .
Fig. 36.Gaia G magnitude distribution of the 10% least variable stars in TESS, after cross-matching with Gaia sources.
-MCP: Magnetic Chemical Peculiar stars that include Ap, HgMn, and Am types.They have a natural overlap with ACV and SXARI variables (and many of them are classified as ACV in other catalogues).-CP: This is a generic class of chemically peculiar variables originating from Richards et al. (2012), which were selected using a Random Forest classifier.It includes mostly hot stars but also few cooler stars of G and later spectral type, with G BP − G RP >1.5.-FKCOM: FK Comae Berenices variables are G to K giants that rotate rapidly and have strong magnetic fields.Only a few FKCOM variables exist in the cross-match catalogue but they have the expected colour and absolute magnitude for their type.-BY: BY Draconis stars are dwarfs that have inhomogeneous

Table 1 .
List of input catalouges used.First column is the catalogue_label of the input catalogues used, then the number of objects found in the cross-match follows, and finally references of the catalogue in literature.

Table 3 .
Convention between different classes and sub-classes of eclipsing binaries and how they merged to 4 generic classes, ECL, EA, EB, and EW.

Table 4 .
Definition of the fields in the cross-match catalogue.The fields in plural may contain multiple values separeted by ";", thus are regarded as Strings.

Table 5 .
List of available types of primary_superclass.The second column shows the primary_var_types that contribute to this primary_superclass.The last two columns give the number of sources for each of primary_superclass in the full catalogue (All) and when the selection flag is true.

Table 6 .
List of different classes of sources existing in the cross-match catalogue and the number of sources found in the primary_var_type and var_type fields with such class.For the field var_type if a type is found multiple times for a given source id then it is counted once.The columns All refer to all sources in our catalogue while selection only to the cases where selection flag is true.