Open Access
Issue
A&A
Volume 689, September 2024
Article Number A240
Number of page(s) 16
Section Galactic structure, stellar clusters and populations
DOI https://doi.org/10.1051/0004-6361/202450126
Published online 17 September 2024

© The Authors 2024

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

1 Introduction

In the context of the Milky Way (MW) formation, the study of the Galactic bulge is especially relevant as it is the first massive component to get into place. Yet, it is possibly the component whose formation we understand the least. Until about one decade ago, the paradigm for the formation of bulges was built upon the dichotomy postulated by Kormendy & Kennicutt (2004), between so-called classical bulges, formed by dissipationless merging of small external galactic fragments, and the bar-like pseudo-bulges, originating from disk instabilities. More recently, observations of high-redshift (z~2) star-forming disks have shown the presence of dense clumps with high star formation rates (SFRs) (Guo et al. 2015, 2018; Dessauges-Zavadsky et al. 2017; Cava et al. 2018; Huertas-Company et al. 2020). A big clump such as those is usually present at the center of the disks, obviously showing a massive, early, in situ star formation event. Thanks to these findings, the classical or pseudo bulge dichotomy has been abandoned in favor of scenarios in which bulges form mostly in situ, but on a short (definitely non-secular) timescale (e.g., Fragkoudi et al. 2020).

Recently, Debattista et al. (2023) presented N-body + smoothed particle hydrodynamics simulations that develop high-SFR clumps, including a large one at the center. Some of the clumps formed in the disk eventually migrate toward the center, where they rapidly (<1 Gyr) dissolve. Debattista et al. (2023) demonstrate that such clumpy models are able to explain the bimodal distribution in [Fe/H], observed for bulge stars, together with the observed presence of two density peaks, separated by a trough, in the [α/Fe] versus [Fe/H] plane. Indeed, according to these models, stars formed in clumps have relatively large [α/Fe] ratios, while stars formed in the field, with lower SFRs, result in comparatively lower [α/Fe] ratios. While this represents an important step forward toward qualitatively reconciling the observations of high-redshift disks with those of individual stars in the MW, several issues remain to be understood.

For instance, one important point to be addressed is why the metal-poor stars ([Fe/H]<0) in the bulge seem to be arranged in a rather axisymmetric component, while the metal-rich stars ([Fe/H]>0), trace a well-defined bar (e.g., Zoccali et al. 2017; Lim et al. 2021). Long before it was demonstrated that the metal-poor stars traced a component that is more axisymmetric than the bar, a few independent studies had shown that they did not show the X-shape structure either (Ness et al. 2012; Rojas-Arriagada et al. 2014). In what follows, the axisymmetric component will be called “the spheroid”, although its shape has been inferred only very qualitatively.

Models by Di Matteo (2016); Debattista et al. (2017); Fragkoudi et al. (2018) are able to explain the origin of both bulge components as deriving completely from the disk, whereby a hotter (e.g., thick) disk can naturally generate the axisymmetrical structure, while a colder (e.g., thin) disk would map into a more pronounced bar. This process has been dubbed kinematic fractionation by Debattista et al. (2017). A comprehensive model combining the disk fractionation needed to produce two different spatial structures, with the presence of clumps needed to produce the bimodality in the [α/Fe] trends, is not yet available.

To further complicate things, Queiroz et al. (2021) recently combined chemical abundances and radial velocities from APOGEE DR17 with proper motions from Gaia DR3 to derive orbits. They concluded that both the stars with bar-like orbits and those with spheroid-like orbits are uniformly scattered in the [Mg/Fe] versus [Fe/H] plane. This finding, if confirmed, would dismantle the scenario in which stars in the spheroid are alpha-rich because they came from the thick disk, while stars in the bar are alpha-poor because they have a thin disk origin.

To progress on this topic, an accurate characterization of the spheroid is highly desirable. As of now, several works have used red clump (RC) stars to derive the shape parameters of the bar. All of them, however, used samples of RC stars that include both metal-rich and metal-poor stars in roughly equal proportion (Zoccali et al. 2018). Therefore, we know that the derived bar parameters are not completely correct. In order to separate the two components, metallicities are needed, which are not available for the large samples. In other words, nobody has yet attempted to isolate stars belonging to the metal-poor component (the spheroid) and derive, for example, the radial density profile or axial ratio of their parent population. A recent catalog providing photometric metallicities for 2.6 million RC stars, at latitudes of −10<b<−4, has been published by Johnson et al. (2022). As is discussed in Lim et al. (2021), these data qualitatively confirm the results by Zoccali et al. (2017) about the spheroid (bar) shape traced by metal-poor (metal-rich) RC stars, but a full derivation of the structural parameters of each of the two components has not been made with these new data, possibly due to their limited latitude coverage.

RR Lyrae (RRL) variables are of great value in this context because they trace a pure old population (> 10 Gyr; Walker 1989), on average more metal-poor than most RC stars. As such, they might be clean tracers of the spheroid. In addition, thanks to a well-defined relation between their periods and their absolute luminosity (PL), they provide much more precise distances than RC stars (see, e.g., Girardi 2016), allowing us to trace the three-dimensional (3D) shape of their parent population with greater confidence.

Results in the literature on this point are rather controversial. By analyzing the 3D distribution of RRL from the optical OGLE III survey, Pietrukowicz et al. (2012) first concluded that they tend to follow the Galactic bar traced by RC stars, a result later confirmed in Pietrukowicz et al. (2020) based on improved data from OGLE-IV. Meanwhile, Dékány et al. (2013) combined the OGLE III RRL catalog with near-infrared (IR) magnitude measurements from the Vista Variables in the Vía Láctea (VVV) ESO Public Survey (Minniti et al. 2010), less sensitive to extinction, and found a more axisymmetric, mildly barred component. By analyzing an independent sample of ~1000 RRL observed by VVV at latitude −10.3 <b< −8.0, a region not explored by OGLE, Gran et al. (2016) claimed the absence of a bar nor an X-shape. Semczuk et al. (2022), on the contrary, did find the signature of an X-shape in the OGLE-IV data. Recently, Du et al. (2020) show that the shape parameters of the RRL parent population depend on their metallicity: metal-rich RRL ([Fe/H]>−1) define a stronger bar compared to metal-poor ones. Additionally, Kunder et al. (2016), found that RRL stars show different dynamical properties than metal-rich RC stars, more consistent with the Galactic bulge spheroid.

In the present paper, we provide a new catalog of (fundamental mode) ab-type RRL variables (hereafter RRab) in the inner bulge region extracted from VVV Point Spread Function (PSF) photometry (see Sec. 2). The classification of their light curves, described in Sec. 3, has been optimized to obtain a pure sample, though with lower completeness compared to other recent near-IR catalogs (see Sec. 4). We excluded first overtone, c-type, RRL because their lower amplitude and symmetric light curve make them easily confused with eclipsing binaries. Distances are derived in Sec. 5, and their spatial distribution is analyzed in Sec. 6 and Sec. 7. Conclusions are drawn in Sec. 8.

2 Observations and photometry

The catalog of RRL presented here is based on PSF–fitting photometry performed on multi-epoch near-IR data from the VVV ESO Public Survey. The survey has observed an area of about 1700 deg2 over the MW bulge and part of the southern disk since 2010. For the present study, we only analyzed the region within −10° ≲l ≲10.5° and −2.6° ≲b ≲2.8° (Fig. 1), that is, from tile b299 to b368. This is the region where previous optical surveys such as OGLE IV (Fig. 1) or Gaia (Fig. 5), are highly incomplete or absent, due to the high interstellar extinction close to the Galactic midplane. PSF photometry was performed on all the available epochs, that is, 100 in KS and fewer than a dozen in J and H. The time baseline spans 10 years, from 2010 to 2019. The typical seeing is close to 1 arcsec; the few images with seeing larger than 1.7 arcsec were discarded from the analysis. This limit has been chosen in order to reduce blends in the very crowded region examined here, at the same time excluding only a few images per field.

The multi-epoch photometry provided us with both stellar variability and proper motions. A photometric pipeline was built, based on the DAOPHOT II/ALLSTAR package (Stetson 1987), optimized for the measurement of proper motions following the prescriptions by Anderson et al. (2006) and Bellini et al. (2014). The exact recipe was described in Contreras Ramos et al. (2017). Here, we recall just the main steps. A quick photometry, with a very high threshold, was performed initially to detect only relatively bright stars. The latter were used to derive coordinate transformations among all the epochs, for each VIRCAM chip. A stacked image was then created using those transformations, and a deep photometry was performed on it, with a small threshold, in order to create the most complete stellar master list. This list of stars was then used as input for the PSF photometry in each individual epoch, allowing the code to refine the centroids of each star. Consequently, each given star had the same ID in all the epochs, but its position was allowed to vary with time, in order to measure the star’s proper motion. To look for variable stars, all the instrumental magnitudes measured for a given star were normalized to the photometric system of an arbitrary reference epoch. Therefore, only this epoch was calibrated to a standard photometric system.

In our case, we initially calibrated the catalogs to the VISTA photometric system (González-Fernández et al. 2018), by means of a large number of stars in common with the public catalogs, which are based on aperture photometry, released by the Cambridge Astronomical Survey Unit (Emerson et al. 2004, CASU). Recently, however, Hajdu et al. (2020) noted that the standard CASU calibration procedure has zero-point inaccuracies, especially in regions of high stellar density. We thus recalibrated the photometry following the general prescription described in their work, in other words, we matched our catalogs directly with the 2MASS ones, imposing rather conservative -and supervised- cuts in order to discard stars that might be blended in 2MASS. A complete description of this procedure can be found in Nikzat et al. (2022).

Example of the final photometry is presented in Fig. 2, showing the light curves of two typical RRab in our sample. The left one is in tile b302, with approximate coordinates (l,b)=(−5°,−2°), while the one on the right is tile b332, at coordinate (l,b)=(−2°, 0°). The latter is noisier due to the increasing crowding and extinction when moving close to the Galactic center (GC).

thumbnail Fig. 1

Galactic coordinates of the 16 488 RRL in the catalog presented here, color-coded according to their extinction E(J − KS). Overlaid in gray are the coordinates of RRab variables in the OGLE IV catalog.

3 Identification of RR Lyrae variables

Our PSF photometry detected approximately 3 × 108 point sources in the 70 VVV tiles analyzed here. In order to select candidate variable stars with well-measured magnitudes, first, we discarded measurements with unusually large photometric errors (eKS > 0.1 mag) and applied an iterative 8σ clipping to their (non-phased) light curves to reject outliers. We further omitted candidate variable stars with fewer than 50 measurements (i.e., less than ~50% of the available epochs), and those with total amplitude (max –min) smaller than 0.1 mag. The former criterion was chosen because VVV light curves are unevenly sampled, and, close to the midplane, they are relatively noisy (Fig. 2). The latter criterion ensures that we keep all the RRab, but exclude variables with amplitudes too small compared with the VVV typical photometric errors (cf., Fig. 8 in Contreras Ramos et al. 2018).

At this point, in order to select candidate intrinsic variables, we run the (slightly modified) reduced χ2 test (Carpenter et al. 2001) for all the time series. This value was computed, for each star, from the individual magnitudes (KS) and the corresponding uncertainties (eKS), as follows: χν2=i=1N(KsiwKs)(N1)i=1NeKsi$\[\chi_\nu^2=\frac{\sum_{i=1}^N\left(K_{s_i}-\left\langle w K_s\right\rangle\right)}{(N-1) \sum_{i=1}^N e K_{s_i}}\]$(1)

where wKS is the error-weighted arithmetic mean of the KS magnitude, and N is the number of epochs. Sources with a χν2$\[\chi_\nu^2\]$ value greater than 3 were considered candidate variable stars.

In the second step, we used the Analysis of Variance (AoV; Schwarzenberg-Czerny 1989) method to compute periods for all the variable candidates that passed the first selection. The AoV also provides a quality index (Q) for each phase-folded light curve, using the period with the highest probability. After visually inspecting a few dozen cases, we adopted Q=0.9 as our cutoff value to select candidate variables. Our goal is to provide a high purity RRL catalog with minimal contamination from different type of variables, even at the expenses of completeness. In order to do so, we decided to select only RRab variables, because the lower amplitude and more symmetric light curves of the c-type RRL (RRc) makes them more easily mistaken with other variables, particularly eclipsing variables. Therefore, we selected stars with 0.3<P<1 days in order to make sure we include all RRab, at the same time avoiding short-period Cepheids and minimizing contamination from long-period RRc variables. Note that the latter comprises ≲30% of all RRL (Martínez-Vázquez et al. 2017; Soszyński et al. 2019; Stringer et al. 2021).

After the selection cuts described above, we were left with a sample of approximately 10 million candidate variable stars. The next and final step in our selection procedure was to apply a machine-learning classifier to these candidates, which we describe in the next section.

thumbnail Fig. 2

Phased, KS -band light curves for two typical RRab variables in our sample.

3.1 Random Forest

Identifying RRL variables is more challenging in the near-IR, compared to the optical, because at longer wavelengths, they show smaller amplitudes and a more sinusoidal (i.e., featureless) light curve (see, e.g., Fig. 2 in Bhardwaj 2022). For this reason, they are easily confused with other kinds of variables, particularly eclipsing binaries. Due to the huge number of candidates selected in the central region, we developed a simple machine learning, supervised classifier based on Random Forest (Breiman 2001) from the Python library sci-kit learn (Pedregosa et al. 2011). Random Forest classification requires three main sets of input data from the user: the training catalog, with different kinds of variables already identified; the science catalog, and a list of features (properties) quantitatively describing the brightness variation. The code is requested to distinguish RRL from other variables in the science catalog, learning how the features behave in the training catalog, also called the training set. The code randomly builds a number of binary decision trees, where each object is classified according to a handful of features across each tree. The final classification is based on the average of the result of a large number (in our case, 20 000) of randomly built decision trees.

Random Forest is a relatively simple classification algorithm compared with more modern machine learning codes. However, since, in this case, we were interested in isolating only RRL variables, we built a very basic code that was required to discriminate only between RRL and “other variables”, which turned out to work fine for our purposes.

3.1.1 The training set

The training set consisted of more than 65 000 variable stars of different types, identified and characterized within the OGLE-IV survey (Udalski et al. 2015). For all of them, we used the KS band light curve from VVV, but the period was measured in OGLE. Specifically, the training set included ~19 000 RRab, ~300 c-type RRL, ~30 000 non-contact and ~16 000 contact eclipsing variables, and ~800 elipsoidal variables. They all had periods <2 days, according to OGLE.

For the present work, we aimed to identify only bona fide RRab (P=0.30−1.0 days), that is, we excluded c-type (P=0.2−0.45 days) and d-type RRL. The latter two types, which are pulsating in the first overtone and in a mixed mode, respectively, are much less abundant than the former, fundamental mode pulsators (e.g., Smith 1995; Catelan & Smith 2015). Their light curves are sinusoidal with lower amplitude and can be easily confused with the more abundant short-period eclipsing binaries. For this reason, we decided to exclude them from the present study, and therefore the c-type RRL present in the training set are considered “other variables”.

3.1.2 The features

Several papers have used similar machine learning codes to identify RRL stars in large multi-epoch catalogs, and have defined different sets of features optimized for the characteristics of each specific survey (e.g. Nun et al. 2015; Elorrieta et al. 2016; Medina et al. 2018; Stringer et al. 2019, 2021; Cabral et al. 2020; Molnar et al. 2022; Daza-Perilla et al. 2023). After experimenting with different combinations of them, with a few minor modifications, we adopted the sets of features listed in Table 1.

The first group describes the properties of the time series photometry, independent of the period. These have the advantage of being robust against mistakes in the period determination. The second group describes the properties of the phase-folded light curve, and the third group includes parameters of the Fourier fit to the latter. One of the Random Forest outputs is the relevance of each feature, which in this case confirmed that the most relevant ones are – obviously – those in group 2.

In order to perform the Fourier fit to the light curves, we first run a Gaussian Process fit, using the python library george (Ambikasaran et al. 2015). Then, we fitted a six-order Fourier series to the derived Gaussian process bins and calculated the relevant features as in Soszyński et al. (2011, their Fig. 1).

We activated the RF option to adjust the relative weight of the two classes of the training set (RRL and “other”), in order to artificially balance the number of stars in each of them.

Finally, we fine-tuned the RF hyper-parameters, such as the maximum depth of the branches, the minimum number of objects in each leaf, the minimum number of objects required to split a node, and the number of random features to perform a split, by means of the GridSearchCV method of the sci-kit learn library. The GridSearchCV method allowed us to explore a grid of different combinations of these parameters, with the goal of maximizing the precision and the recall, described in the next section.

3.1.3 Validation

In order to validate the output catalog, a standard 10-fold sampling was used, in which the training set was divided into 10 sub-samples of equal size, and then, iteratively, each of these 10 was used as a “test science sample” and classified using the other 9 as the training set. The result of this experiment demonstrated that the nominal precision of our classifier, when used with this training set, is 92%, while the recall (completeness) is 97%. This means that in the output catalog of stars with more than 50% probability of being RRab, we expect that 92% are indeed RRab, with the other 8% being contamination from other variables. On the other hand, of the original sample of 18757 RRab in the training set, 97% were correctly classified as such, while 3% were incorrectly classified as other variables.

3.1.4 Post-Random-Forest selection and visual inspection

When applied to the real science sample of about 10 million candidate variables, the classifier yielded a catalog of ~490.000 stars with a probability higher than 50% of being RRab variables. From visual inspection of several dozens of them, however, we concluded that the above quoted precision and recall were overestimated for the real sample. This is due to the fact that the training set includes variables detected relatively far from the midplane (see Fig. 1), in a relatively low extinction region of the sky, where the photometry is more precise. In the strip within |b| <2.5°, instead, the extinction is larger. Variables are, therefore, fainter and not so well measured. In addition, the crowding is higher, making blends much more frequent. This can be verified by defining a signal-to-noise ratio (hereafter S/N) for the light curves, defined as S/N=ANMSE$\[S / N=A \frac{\sqrt{N}}{M S E}\]$(2)

where A is the Amplitude defined in Table 1, N is the number of epochs, and MSE is the mean square error with respect to a Gaussian Process fit (sort of smoothing) of the light curve. Figure 3 shows that the S/N distribution of RRL in the training set is significantly higher than that of our science sample. For the present analysis, we intentionally prioritized the purity (precision) of our catalog over the completeness (recall). Therefore, we blindly kept only the 9186 variables with >90% probability of being RRab and S/N > 60. In fact, as a safety check, we visually inspected and confirmed the light curves of a few hundred of those, randomly selected, but we also visually inspected all the candidates with probability >90% but S/N <60.

The candidates with probability of being RRab lower than 50% were definitely rejected. Those with probabilities in the range 50–90% (~480 000 stars) were submitted to further cuts in order to select the most reliable ones, and the latter were all visually inspected. These post-classification selection criteria were based on the parameters defined in Table 1, Group 4, namely: S/N, MSE, Nbins and skewness. The latter cut is because a large skewness selects the light curves with the asymmetric saw-tooth shape, which is very characteristic of RRab, and it is hard to be confused with other kinds of variables. A star having a light curve with this shape allowed us to be more generous in its classification as RRab, even if other parameters were not optimal. The MSE, instead, is the mean square difference with respect to the fit of an RRL template from Braga et al. (2019).

Specifically, we rejected all candidates with S/N ≲ 40 and Nbins>2 and skewness<0.04, that is, the light curves that were noisy, or very unevenly sampled, or very symmetric. From all the others (~180 000), we extracted the group of ~11500 with Prob>80% and MSE<0.25, and the group of ~4500 with 50%<Prob<80% and MSE<0.20. These cuts, shown in Appendix A were motivated by the attempt at isolating candidates as similar as possible as our golden sample, with Prob>90% and S/N > 60 (see Appendix A). These ~16 000 candidate RRab variables were all visually inspected and eventually lowered to ~4000.

Because we inspected the light curves of the majority of the RRab that made it to the final list, we trust that the present catalog optimizes purity. At the same time, since it relies upon visual classification as the final step, we realize the process leading to it is not completely objective nor repeatable. Yet, we believe that human classification is still the most reliable process, although the slowest. We designed the present RF classifier in order to lower the sample to be visually inspected, rather than as a tool to extract the final catalog in an objective way. We nonetheless provide the list of adopted features as an input for similar studies.

Note that the catalog released together with the present paper includes 3523 additional RRab from OGLE-IV, as explained in the next section, for a total of 16488 variables. We also release, at the CDS, the values of the features for the 12965 variables classified here, together with their light curves.

Table 1

Features used in the Random Forest classification.

thumbnail Fig. 3

Normalized S/N distribution of the light curves for stars in the RRab training set (red) and in the science sample (blue).

4 Comparison with other RRab catalogs in the same region

In what follows we compare our catalog with other ones available in the literature and including the same region. The footprints of these catalog, compared to the present one, are shown in Fig. 4. It is important to emphasize that none of these catalog is complete and free from contamination. Each of them has a different degree of these characteristics, which are both very difficult to assess a priori. For this reason, comparisons among different catalogs can help to characterize the performances of each, and perhaps choosing the most appropriate one, for future studies. As discussed below, only for the OGLE-IV catalog we decided to add the RRab not found in our selection, but belonging to the same sky region. We did this only for OGLE-IV because this survey has a very fine cadence optimized for RRL detection and the S/N of the light curves is very high (Fig. 3). These are the very high confidence variables that we adopted as training sample, therefore our analysis relies on the assumption that these are real RRab. The other catalogs listed below are based upon VVV data, like the present one. Therefore, although they reach closer to the Galactic midplane, they have sparser cadence and lower S/N light curves. Since our catalog has been optimized in favor of high purity, in contrast to larger completeness, we prefer to keep this characteristic and leave to the reader the option to add unmatched RRab from other public catalogs, if they consider it appropriate, based on their specific needs.

thumbnail Fig. 4

Footprints of the catalogs by Dékány & Grebel (2022) (top), Molnar et al. (2022) (middle), and Clementini et al. (2023) (bottom) are shown in blue, with a density scale shown at the right of each panel. Black points show the galactic positions of the RRab in the present catalog, including the addition from OGLE-IV.

4.1 OGLE-IV

There are 6881 RRab in common between our final catalog and OGLE-IV. On the other hand, 3523 OGLE RRab in the region of the sky analyzed here are not present in our catalog. With a flag (either “VVV” or “OGLE”) we added those to the present release, as we trust that they are real RRab, and they might be useful targets for future follow-up studies. For 39 of them we cannot provide a mean J magnitude, therefore a reddening or a distance. We leave to the reader the decision whether to used them or not. In the following analysis, we will show that the spatial distribution of these OGLE unmatched variables is biased toward close distances (see Sec. 6). That is, they are seen preferentially at the near side of the Galactic bulge and disk, as expected, given that OGLE-IV is an optical survey conducted with a small telescope. Therefore, they will be excluded from the analysis of the spatialdistribution of bulge RRab stars.

4.2 Dékány et al. (2020)

Out of the 16 488 RRab in our catalog (12 965 from VVV + 3523 from OGLE-IV), only 11067 are in common with Dékány et al. (2020). We could not identify anything special in the properties of the unmatched 5421 variables: they have a similar distribution as all the others in amplitude, period, mean magnitude, probability, and S/N.

On the other hand, there are ~1709 variables in the common region between Dékány et al. (2020) and this study that were considered RRab in the former but are not present in our catalog. They are preferentially lower amplitude RRab (<Amp>=0.23 mag, versus <Amp>=0.28 mag for the whole D20 catalog1), and lower S/N (<S/N>=65, versus <S/N>=88 for the whole D20 catalog), but have nothing special in either probability, period, or mean magnitude.

thumbnail Fig. 5

Top: galactic position of each RRab in our catalog. In gray, we mark those present only in our catalog, including the OGLE-IV addition. In red, we mark those also present in the Gaia DR3 main catalog, and in blue those also identified as RRL, presented in Clementini et al. (2023). Bottom: reddening distribution of the RRL in our catalog (gray). Within our sample, the RRL also found in Gaia, but not identified as RRL, are shown in red, while those identified as RRL are shown in blue. The color coding is the same in both panels.

4.3 VIVACE

Only 903 RRab in our catalog (~6%) are absent in the VIVACE catalog by Molnar et al. (2022). On one hand, there almost 8000 RRab in the VIVACE catalog that are not in our sample, although they do lie in the common region of the sky. This is a rather large number that can be explained because the VIVACE catalogs have been created with the explicit intention of prioritizing completeness over purity. Therefore, the latter should be used as a more complete list of candidates whose real nature must be double-checked before drawing scientific results from it. On the other hand, our list offers a cleaner catalog, though it is definitely not complete, especially at latitudes very close to the midplane.

4.4 Gaia DR3

There are 12 210 stars in common between the present catalog (including the addition from OGLE-IV) and Gaia DR3. Out of those, however, only 6816 were identified as RRL by Gaia (Clementini et al. 2023), as shown in Fig. 5. The figure clearly shows that whether one of our variables is present in Gaia or not, depends almost exclusively on its extinction, affecting the quality of the Gaia optical photometry more than the VVV one. Indeed, in the outer region, where Gaia can see through the interstellar dust, almost all our sources are measured by Gaia and identified as RRL. As shown by the histograms in the bottom panel of Fig. 5, the fraction of stars in our catalog detected by Gaia decreases with increasing E(J − KS), as it does the fraction of the stars classified as RRL by Clementini et al. (2023). At low reddening (E(J − KS) <1.1) the fraction of stars classified as RRL in our catalog but not in Clementini’s is marginal. This validates our strategy of building a pure catalog, although certainly incomplete, especially in the region closer to the Galactic plane.

thumbnail Fig. 6

Top panel: distribution of the estimated errors in the [Fe/H] as quoted in Crestani et al. (2022, their Table 2). The latter work is based on high dispersion spectra (R=35 000) for 143 RRL. Bottom: distribution of the estimated errors in the photometric [Fe/H] values inferred by Dékány & Grebel (2022). Photometric [Fe/H] values are inferred from the shape of the light curve, by means of a machine learning algorithm trained (indirectly) on the spectroscopic measurements by Crestani et al. (2022).

5 Metallicities, reddenings, and distances

RRL stars are excellent distance indicators because they follow a well-defined relation between the luminosity and the metallicity in the optical, and a period luminosity metallicity (PLZ) in the near-IR. The first ingredients to calculate the distances are the mean apparent magnitudes, to be compared with the absolute magnitudes given by the PLZ. The mean magnitudes in the KS band have been derived by means of the code by Dékány & Grebel (2022), which performs a Fourier fitting of the light curve, just like we did to derive the 3rd group of features in Table 1. The advantage of using the code by Dékány & Grebel (2022) is that it performs a bootstrap on the light curves, recalculates the mean by Fourier fitting at each iteration, and finally yields an error as the standard deviation of these determinations. The mean magnitude in the J band is more complicated to derive because the light curves include only ~10 points. The code by Hajdu et al. (2018) fits a template light curve to these points and derives the mean magnitude and error on the template light curve. The typical errors on the mean magnitudes are 0.005 and 0.060 mag in the KS and J band, respectively.

The metallicities for our RRab sample were estimated from the shape of their light curve, following the prescriptions by Dékány & Grebel (2022). The corresponding machine learning code available online also yields metallicity errors, which are peaked at Δ[Fe/H]=0.025 dex with a median value of Δ[Fe/H]=0.045 dex (Fig. 6, bottom). Nonetheless, the uncertainties on the derivation of these photometric metallicities should be at least slightly larger than the errors on the metallicities of the training sample, that is, the Crestani et al. (2022) sample, whose error distribution is shown in the top panel of Fig. 6. In order to correct this inconsistency, in what follows, we will adopt a constant Δ[Fe/H]=0.3 dex, as a conservative error estimate, for all the variables in our catalog (see Sec. 5.2).

Reddenings to individual variables have been estimated from the observed J − KS colors. From the distribution of RRab members of the globular cluster ω Centauri (Navarrete et al. 2015), we found that they are distributed around a mean intrinsic color of (J − KS)0=0.26 with a standard deviation of 0.03 mag, and an error on the mean of 0.003 mag. The above quoted dispersion of 0.03 magnitudes is negligible compared with the large range of reddenings observed toward the Galactic midplane (Surot et al. 2020), however, the effect of this approximation will be discussed in Sec. 5.2. We then calculated the color excess of individual variables as the difference between the observed J − KS color and the intrinsic value of 0.26 mag. In order to estimate the effect of reddening on the mean KS magnitudes, and therefore on the distances, we adopted the total to selective extinction ratio derived by Minniti et al. (2020), from a sample of classical Cepheids at the far side of the Galactic disk, that is, appropriate for the b = 0° direction, which is: AKS=(0.465±0.022)×E(JKS).$\[\mathrm{A}_{\mathrm{K}_{\mathrm{S}}}=(0.465 \pm 0.022) \times \mathrm{E}\left(\mathrm{J}-\mathrm{K}_{\mathrm{S}}\right).\]$(3)

5.1 Choosing the appropriate period–luminosity–metallicity relation

Several PLZ relations are available in the literature, allowing us to derive the absolute magnitudes of individual RRab variables from their observed periods (PLs) and periods plus metallicities (PLZs). With the absolute magnitude and the extinction estimated above, the distance can be derived from the definition of distance modulus: d=101+0.2(KSAKSMKS).$\[d=10^{1+0.2\left(K_S-A_{K_S}-M_{K_S}\right)}.\]$(4)

For the present work, we explored several recent PLZ relations, whose functional form are quoted in Table 2 for the reader’s convenience. Initially, we selected the PLZ relation from Muraveva et al. (2018, hereafter M18; their Table 4), derived based on Gaia DR2 parallaxes for a sample of MW field RRL with metallicities in the same range as those in our sample.

When analyzing the 3D distribution of the Galactic component traced by RRab variables, however, we noticed that metalrich RRab were placed systematically closer to the Sun, with respect to metal-poor RRab. Indeed, there is a trend between the distance obtained from the M18 relation and the photometric metallicity used as input (Fig. 7; top panel). Such a trend is certainly artificial, although at this time we cannot be sure whether it is due to the photometric metallicities being wrong or to the metallicity coefficient of the M18 PLZ being too large. In the first case, we would be attributing a different metallicity to two groups of variables that are, on average, identical. In this case, by means of the PLZ, we would be applying a correction factor that is not appropriate, artificially producing the observed segregation. We notice that a possible reason why the photometric metallicities derived here could have a large error is that our light curves are relatively noisy (Fig. 2), potentially resulting in the high-order Fourier coefficients being less reliable than they are for the RRab of the outer bulge, used by Dékány & Grebel (2022) to derive the calibration. The same effect would be seen if the estimated metallicity is correct but the metallicity coefficient of the PLZ is overestimated. With our data alone, we cannot distinguish between these two alternatives. Therefore, we decided to explore alternative relations available in the literature.

The PLZ provided by Neeley et al. (2019) showed a similar problem, actually larger (Fig. 7). At the time of writing this manuscript, new PLZ relations in the near-IR were published by (Zgirski et al. 2023, hereafter Z23), based on Gaia DR3 parallaxes for a sample of 28 local RRL variables, by Bhardwaj et al. (2023, hereafter B23), based on 964 RRL in 11 globular clusters and 346 field RRL with Gaia parallaxes, and by Prudil et al. (2024, hereafter P23), based on >100 field RRL stars with Gaia parallaxes. We applied all of them to the RRab in our catalog and checked against a residual trend of the derived distance with metallicity. The result is shown in the different panels of Fig. 7, together with a linear fit, revealing that the most appropriate relation for our metallicity scale is the PLZ by P23.

For all the above relations, we also checked against residual (spurious) trends against period, as shown in Fig. 8. These plots confirm that the PLZ by P23 is also the one that minimizes the trend with periods, although without completely removing it.

In order to calculate the reddening suffered by our RRab, we adopted a constant intrinsic color (J − KS)0=0.26 mag, independent from the period. Because the PLZ in the literature have a different coefficient for the period, in J and Ks, they predict a small dependence of the intrinsic color upon the period, that we have neglected in this work. The coefficients between the intrinsic (J − KS)0 and log P range from −0.01, for the PLZ by Zgirski et al. (2023), to −0.54 for the PLZ by Prudil et al. (2024), respectively. In order to verify that neglecting this dependence is not the reason for the trends of distances with period shown in Fig. 8, we calculated the differences between the adopted color (J − KS)0=0.26 mag and the color predicted by each of the PLZ explored above. The difference is small, reaching 0.05 mag at the edges of the period distribution, and it translates to a difference in AKs$\[\mathrm{A}_{\mathrm{K}_{\mathrm{s}}}\]$ of ~0.025 mag. The difference lies in the sense that, at low periods, the distance calculated here is larger than the one calculated by assuming the color predicted from the PLZ, and at high period the distance calculated here is smaller. In other words, had we assumed the expected color derived from the PLZ, instead than a fixed value, all the slopes in Fig. 8 would have been even larger.

The above exercise demonstrates that a new PL relation could be derived here, by using the present catalog alone. Indeed, we could derive a zero point by imposing that the mode of the projected distance coincides with the distance to the GC derived by the GRAVITY experiment and a slope that flattens the trend between distance and periods. We anticipate that the period slope that we would find by this method is −2.7, however, we postpone a more extensive discussion to a forthcoming paper, where we will properly include errors. The reason is that, as we will discuss below, errors on the distances are key to estimate the shape parameters traced by RRL. The reason why the PLZ by M18, B23, and Z23 perform well in the context where they were derived but not in the inner Galactic bulge remains to be fully understood.

Table 2

The PLZ relations examined in the present work.

thumbnail Fig. 7

Trend of the heliocentric distances to individual RRab in our catalog versus their photometric metallicity. From top to bottom, the adopted PLZ are those by Muraveva et al. (2018); Neeley et al. (2019); Zgirski et al. (2023); Bhardwaj et al. (2023); Prudil et al. (2024). A linear fit highlights the presence of a slope, whose value is written in the labels, that is minimized for the PLZ by Prudil et al. (2024), which we select as the optimal one for the present work.

5.2 Errors on the distances

It is important to characterize the error in the distance to individual stars because it will impact the inferred shape. To put it simple, because the error on the distances will always be larger than the error on the galactic coordinates, the distribution we will infer will always be artificially elongated (or extra-elongated) in the direction of the line of sight. A proper correction for this effect requires a knowledge of the distance errors as accurate as possible.

Besides the errors in the coefficients of the PLZ and in the extinction law quoted above, we estimated the errors in the periods to be negligible since our periods and the one derived by OGLE-IV for the stars in common, agree to the third decimal digit.

The statistical errors on the distances were calculated by applying the standard error propagation on equation 4. The errors on the mean magnitude of each RRab were derived from the code by Dékány et al. (2020) for the KS band and Hajdu et al. (2020) for the J band, as mentioned in Sec. 5. The error on the extinction comes from the propagation of the latter into Eq. (3). As explained at the beginning of this section, the error on the metallicity has been assumed to be Δ[Fe/H]=0.3 dex for all the stars.

We noticed that the difference in distance obtained when adopting one or the other PLZ, shown in the bottom panel of Fig. 9, is not negligible compared to the statistical error calculated as explained above, and shown in the top panel of the same figure. Therefore, we considered the former as an estimate of the systematic error that should be added to the global error budget for a realistic quantification of the total distance uncertainty. Accordingly, in what follows we will consider as the total error on the distance of each star, the square sum of the statistical error plus a linear fit to the systematic error as a function of distance.

It should be noted that the spread among the distances obtained with different PLZ is not, strictly speaking, a systematic error; it is not the same for all the stars. However, it is a source of error that it is not accounted for in the quoted uncertainty in the PLZ coefficients. Indeed, the difference between the distance derived from the M18 or the Z23 relations (which are at the two extremes) is larger than the formal, statistical error derived by applying the error propagation on each of the two. The way we quantify this systematic here is just an estimate, as we examined only five of the available PLZ in the literature.

For practical purposes, it does not affect our results, as the total error derived as explained above is just slightly larger than the statistical error. Nonetheless, we consider it important to highlight the fact that the errors on each of the ingredients of the PLZ seem to underestimate the total uncertainty on the derived distances.

thumbnail Fig. 8

Same as Fig. 7, but now distances are plotted against period. A linear fit highlights the presence of a slope, that should not be there, whose value is written on the labels. The slope is smaller for the PLZ by Prudil et al. (2024), finally used in the present analysis.

thumbnail Fig. 9

Top: distribution of the RRL distances and their statistical errors, derived adopting the PLZ by P23. Bottom: our estimate of the systematic error associated with the distance measurements, obtained as the standard deviation of the distances derived adopting the five different PLZ examined here.

6 Distance to the Galactic center

With the distances calculated in Sect. 5.1, the first sanity check to be done is to derive the distance to the GC (RGC), as already done by several authors using distance indicators. Recent determinations using RRL are found in Griv et al. (2020), based on OGLE-IV RRL, and in Majaess et al. (2018) using RRL from VVV at |b| > 4°. A review of results by different authors, derived by means of variable stars, is given by Bobylev & Bajkova (2021), who recommend a combined value of RGC=8.1±0.1 kpc. On the other hand, a review of results using different methods is given by Leung et al. (2023, their Fig. 1).

Here, the distances to individual variables were projected to the Galactic plane and then to the Sun-GC direction by means of their galactic latitude and longitude, respectively. A simple histogram was then constructed, including a correction factor for the so-called “cone effect”, that is, the fact that a fixed field of view maps larger volumes at larger distances, therefore the same distance bin, at larger distances, contains more stars than at shorter distances. The result is shown in Fig. 10. The distribution peaks at RGC=8.30 kpc, and it is qualitatively compatible with the sum of two Gaussians, one representing the disk and the other the bulge. As expected, our catalog is more complete on the near side of the disk (and bulge) than on the far side. The gray histogram shows the RRab from the OGLE-IV catalog that were not identified by our algorithm and were added later. Their distribution is much more skewed toward short distances due to the use of optical bands and the smaller telescope. Therefore, although we include these stars in the catalog, they are excluded from this and the following analysis.

The derived distance to the GC is consistent both with previous results with similar methods and with the much more precise result of RGC=8.275±0.034 kpc obtained by modeling the orbit of the stars around the Galaxy supermassive black hole (GRAVITY Collaboration 2021, 2022). More than claiming a new determination, we consider this qualitative analysis as a sanity check against important biases such as the one seen in the OGLE-IV variables. Indeed, it is reassuring that the selected PLZ, in addition of minimizing the spurious dependence of the distances upon metallicity and period, also has a zero point that sets the center of the RRab distribution at the correct position for the GC.

thumbnail Fig. 10

Histogram of the distances to individual variables, projected along the Sun-GC direction. The black histogram shows the 12 965 variables identified by our algorithm, while the light gray histogram shows the 3521 RRab that were added from the OGLE-IV catalog. The latter have a clear bias toward more nearby variables, and therefore, they are excluded from the analysis. The green dotted Gaussian represents the projected bulge density distribution, while the red dotted one refers to the disk. Their sum, reproducing the observed distribution, is the orange thick line. The latter is centered at RGC=8.30 kpc (blue vertical line).

7 The shape of the bulge’s RR Lyrae parent population

In order to infer the 3D distribution of RRab around the GC, the Galactic coordinates of the stars, together with their distances, were converted into standard Cartesian (X,Y,Z) positions by means of the standard formulae: X=dcoslcosbY=dsinlcosbZ=dsinb.$\[\begin{aligned}& \mathrm{X}=\mathrm{d} \cos l \cos b \\& \mathrm{Y}=\mathrm{d} \sin l \cos b \\& \mathrm{Z}=\mathrm{d} \sin b.\end{aligned}\]$

The resulting (X,Y) distribution is shown in Fig. 11, with the Sun located at the (0,0) position. A Kernel Density Estimator (KDE) is then applied to this distribution to highlight its shape and derive its parameters (ellipticity and inclination). The top panel of Fig. 12 shows the KDE with minimum smoothing. Color shades show density contours. Because different contours have different shapes, which are also highly variable with the adopted smoothing parameters, we decided to fit ellipses to several contours (bottom panel) and combine their ellipticity and inclination angle in order to provide values representative of the whole distribution. Accordingly, the median inclination angle between the major axis of the ellipses in Fig. 12 and the Sun-GC direction is ϕ=15.1 degrees, with a standard deviation of 3.04, while the median axis ratio b/a is 0.69±0.05. Therefore, the derived median ellipticity is e=1b2a2=0.72±0.07.$\[\mathrm{e}=\sqrt{1-\frac{b^2}{a^2}}=0.72 \pm 0.07.\]$

The shape parameters derived above suffer from two biases, which we discuss below.

First, the derived parameters would be correct under the hypothesis that the RRab completeness function is flat across the cartesian X,Y plane. That is, the catalog includes the same fraction of observed/real stars in every pixel of the plane in Fig. 11, so that the observed density is representative (though lower) of the true one. Unfortunately, there is no way to verify that. Even forcing a symmetry in the number of RRab at positive and negative longitudes would be inappropriate because the presence of a bar with different elongation parameters would result in a different degree of observed asymmetry in the density of tracers. In addition, and much more important, we know that the extinction is not uniform, across the sky. Although stochastic in nature, the specific reddening distribution of stars in the MW shows some degree of asymmetry between positive/negative projected coordinate Y, as shown in Fig. 13. As a consequence, the completeness of RRL is not symmetric with respect to the Y=0 direction, and, more importantly, the treatment of extinction may have an impact on the derived shape parameters, as it does not affect positive and negative Y in the same way. It is important to be aware of these effect, although we could not devise a way to correct for them.

As a second observational bias, we must take into account the effect of errors on the observed distribution. Indeed, because the errors in distance are significantly larger than the (negligible) errors in the Galactic coordinates, the observed distribution will always be more elongated along the line of sight compared to the real one, and the apparent inclination angle will be smaller (cf., Hey et al. 2023; Vislosky et al. 2024). In order to qualitatively illustrate this effect, we show in Fig. 14 how a typical error distribution would distort a perfectly spherical initial distribution (gray contours). Because the distance errors stretch the distribution along the line of sight, and because the latter is a cone, the observed distribution is egg-shaped, as shown by the black contours and color shading. This shape may be easily mistaken with an elongated bar, and parameters would be derived, that would not represent the original distribution. A proper quantification and correction of this effect requires proper modeling of the errors in distance and, therefore, depends on the sample.

We performed a simulation whereby an input mock distribution with a given inclination angle and axis ratio was affected by the error distribution shown in Fig. 9, and the observed parameters recovered. We iterated on the input distribution until we recovered the parameters actually observed in the present sample. In our case, we found that the difference between the input parameters and the observed ones is very small, within 1 degree for the inclination and negligible for the axis ratios. Nonetheless, shapes derived from other samples might be affected by larger errors. In order to provide the reader with an estimate of how important this effect can be for other tracers, we provide here a simulation of how a bar with axis ratio b/a=0.71 and inclination angle of ϕ=25 degrees would be observed at increasing fractional distance errors (Fig. 15). The input parameters adopted here have been chosen to be close to the typical values found in the literature, for the Galactic bar. The simulation shows that the effect on the angle is larger than the effect on the elongation, producing an observed angle of ~20 degrees, 5 degrees smaller than the real input one, already at 5% errors. In our case, the percent error on the distance is indeed very close to 5% (i.e., an error of ~0.45 kpc at a distance of 8 kpc, as shown in Fig. 9), which means that our observed angle of 15 degrees might be a few degrees smaller than the real one, closer to 20 degrees. On the other hand, if the same parameter is derived by means of RC stars, with a typical error of 10% or larger (Girardi 2016), then the observed angle could easily be half of the real one. As for the axis ratio, the effect is smaller. In our case, we should have recovered a value identical, within the errors, to the real one, while for RC stars, the bias would be closer to 0.1 if the true axis ratio is 0.7, as in this example. Of course, a proper simulation needs to be run for each case, iterating on the input parameters until the observed ones are recovered.

In summary, our best estimate for the shape of the structure traced by bulge RRab stars is a spheroid (we may call it a bar) with axis ratios b/a~0.7 and inclination with respect to the Sun-GC direction of ~20 degrees. This structure is less elongated than the one reported by Pietrukowicz et al. (2015), having an axis ratio of b/a~0.49, though not completely axisymmetric as quoted by Dékány et al. (2013). Taken at face values, the parameters derived here would indicate a bar that is less pronounced than the one traced by means of RC stars. The latter has been reported as having an axis ratio of 0.6 according to Wegg & Gerhard (2013), and of 0.43 and 0.44 according to Cao et al. (2013) and Simion et al. (2017), respectively. The inclination angle, on the other hand, is estimated between 25 and 30 degrees, according to different authors (see Shen & Zheng 2020, for a recent review). However, we should refrain from a direct comparison with values derived without considering the effect of observational errors discussed in the previous paragraph. In addition, as shown by Zoccali et al. (2017) and confirmed by Lim et al. (2021) with independent data, the metal-poor bulge RC stars trace a component that is much more spheroidal than that traced by metal-rich RC stars. Therefore, when tracing the bar using RC stars, it is crucial to separate the two components and de-project their 3D distribution independently. As discussed by Zoccali et al. (2018), and also evident from the figures in Lim et al. (2021), each of the two components makes up about half of the total number of stars in the bulge, and their relative proportion varies strongly across the bulge area. Therefore, as done so far, blindly tracing a mix of the two populations yields structural parameters that are not appropriate for any of the two.

Following the suggestion by Du et al. (2020), we investigated the possibility that RRL with different metallicities might trace different structures. For our sample, however, we did not find evidence in that sense (see Appendix B).

thumbnail Fig. 11

Cartesian position of individual RRab in our catalog, projected onto the Galactic plane X,Y. The white cross marks the GC, and the Sun is at (0,0).

thumbnail Fig. 12

Top: two dimensional KDE of the distribution of the RRab, projected onto the Galactic plane X,Y. Top: color shades show density contours. Bottom: ellipses (black lines) are fitted to several selected density contours.

thumbnail Fig. 13

Reddening versus Y coordinate for all the RRab in our catalogue. It is clear that the reddening distribution is not symmetric with respect to the Y=0 direction, therefore its treatment can have an effect on the derived shape parameters traced with RRL.

thumbnail Fig. 14

Qualitative representation of the effect of observational errors on the shape of the observed distribution of any sample of tracers. A perfectly spherical distribution, shown by the pink contours, is converted into an egg-shaped distribution (black contours) by the effect of the errors on the distances, which are always larger than the errors on the galactic coordinates. This would imply that a bar would be measured with a lower inclination angle with respect to the Sun-GC direction and a larger ellipticity.

thumbnail Fig. 15

Observed axis ratio (top) and inclination angle (bottom) as a function of the percent error on the distance for a fixed input value of both parameters, as indicated by the labels. The simulation shows that at larger distance errors, the observed bar is more elongated than its true counterpart, and its observed major axis tends to align with the Sun-GC direction.

8 Summary and conclusions

We provide a new catalog of 16488 RRab in the bulge region comprised within −10°≲1≲10° and −2.5°≲b≲2.5°, based on multi-epoch VVV PSF photometry. The first few lines of the catalog are given in Table 3, while the complete table is published in electronic form at the CDS. We selected bona fide RRab, among several million candidate variable stars, by means of an automatic classification based on a Random Forest algorithm. However, because the light curves of our candidate variables, close to the midplane, are noisier than their analog in the training set (OGLE-IV variables as seen by VVV), we accepted face value only the variables with a >90% probability of being RRab. Stars with 50–90% probability were passed through a few other post-classification filters and finally visually inspected. As a result, our catalog has a high purity in comparison with others in the literature, although it is not complete, especially in the range |b|<1°.

For the RRab in our catalog, we derived mean magnitudes in J and KS, reddening based on their mean colors, photometric metallicities based on their light curves, and distances adopting the PLZ recently provided by Prudil et al. (2024). We verified that the latter is the relation that minimizes the observed trend of distance versus metallicity and period, therefore providing a validation of this PLZ for the present bulge sample. We show that other recent PLZ do not perform equally well with our sample.

Finally, we derive the distribution of the RRab in our new catalog, projected on the Galactic plane. The result is an elongated spheroid with axis ratio b/a ~ 0.7 (ellipticity e ~ 0.7) and an inclination of ~ 20 degrees with respect to the Sun-GC direction. The above parameters describe a component that is slightly less elongated (i.e., rounder) than the MW main bar, as characterized by studies based on RC stars, and with a slightly smaller inclination angle. Nonetheless, we emphasize the impact that observational biases have on the shape parameters derived here, and also on those provided in the literature from the density of RC stars.

We conclude that observational errors on the distances can have an important impact on the reconstructed shape of RC stars, that has not been taken into account in previous studies. Also, previous studies based on RC stars have analyzed a mix of two populations (metal-rich and metal-poor), adopting a unique parameterization for both of them. Because we know that they have a different spatial distribution within the Galaxy, the result is most likely not representative of either of them. As for RRL, they are a cleaner sample and provide smaller errors on the distances. Therefore, observational errors have a much lower effect on the derived shape parameters. Unfortunately, their drawback is that their completeness is much lower and most likely uneven across the projected Galactic plane, which introduces different biases that are hard to model and correct. Yet, we argue that this is the RRab catalog offering the best combination of large statistics and high purity in the inner bulge, as we visually inspected a large fraction of their light curves, except for those with the largest S/N (>60) and probability of being RRab (>90%) according to our classifier. As such, we hope it will allow us to progress in our understanding of the oldest stellar component in the core of our Galaxy.

As a final remark, we note that with the present data we cannot exclude that we are dealing with a mix of two components, one belonging to the metal-poor spheroid and another one to the metal-rich bar. If that were the case, then it would be natural to derive a structure whose parameters are intermediate between the two, as we seem to find here, and in the literature. We argue, however, that contrary to what happens with bulge RC stars, it is not possible to separate the two (putative) RRL components according to their metallicity, as it shows a single peak distribution centered at [Fe/H]~−1.35 dex (Fig. 7). Perhaps a full 6D orbital analysis will allow a proper distinction, and for this reason it is very important to keep working toward producing clean RRL catalogs, reaching as close as possible to the Galactic midplane, where the density is higher, and for which we could slowly gather accurate 3D kinematics (see, e.g., Olivares Carvajal et al. 2024).

Table 3

A sample of the RRab catalog released at the CDS together with the present paper.

Data availability

Full Table 3 and lightcurves is available at the CDS via anonymous ftp to cdsarc.cds.unistra.fr (130.79.128.5) or via https://cdsarc.cds.unistra.fr/viz-bin/cat/J/A+A/689/A240

Acknowledgements

We thank Márcio Catelan and Giuseppe Bono, for many useful discussions during the past few years. Based on observations taken within the ESO VISTA Public Survey VVV, Program ID 179.B-2002, made public at the ESO Archive and through the Cambridge Astronomical Survey Unit (CASU). This work is funded by ANID, Millenium Science Initiative, ICN12_009 and AIM23-0001, awarded to the Millennium Institute of Astrophysics (MAS), by the ANID BASAL Center for Astrophysics and Associated Technologies (CATA) through grant FB210003, and by FONDECYT Regular grant No. 1230731. C. Q. Z. acknowledges support from the National Agency for Research and Development (ANID), Scholarship Program Doctorado Nacional 2021 – 21211884, ANID. A.R.A. acknowledges support from DICYT through grant 062319RA. E.V. acknowledges the Excellence Cluster ORIGINS Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – EXC-2094-390783311. J.O.C. acknowledges support from the National Agency for Research and Development (ANID) Doctorado Nacional grant 2021-21210865, ANID. A.V.N. acknowledges support from the National Agency for Research and Development (ANID), Scholarship Program Doctorado Nacional 2020 – 21201226, ANID.

Appendix A Post-classification selection

We illustrate here the criteria used in the post-RF selection. Let us recall that this further selection was aimed at discarding candidates that, though having probability of being RRab between 50% and 90%, differ from the golden sample with Prob>90% and S/N>60 in some new parameters. After this selection, the remaining candidates were all visually inspected.

thumbnail Fig. A.1

Distribution of MSE for three subsample of our catalogue. The panels show that the selection of candidates with Prob>80% and MSE<0.25 (top) or Prob=50–80% and MSE<0.2 (middle) allows us to isolate a sub-sample occupying the same parameter space of the golden sample with Prob>90% and S/N>60, here shown in red, that we use as a reference. The bottom panel shows the final sample resulting from the visual inspection (teal) compared with the reference sample (red).

thumbnail Fig. A.2

Example of a phased light curve that was discarded, for having Nbin=4. That is, if we bin the light curve with 20 bins along the Phase, 4 of them would be empty.

Appendix B The structure traced by RRL with different metallicity

Different authors suggested that metal-poor RRL trace a more axisymmetric structure than metal-rich one, with the latter following more closely the structure of the main Galactic bar (Du et al. 2020). In Fig. B.1 shows our attempt at reproducing their results, by dividing the RRab distribution at [Fe/H]=−1.4, close to the peak of their metallicity distribution. No significant difference is seen between the two distribution. While the innermost contours might indicate a rounder shape for the metal-rich RRL, opposite to previous results, a closer look indicate that this is just due to lower statistics making the axis ratio and inclination angle more stochastic. Indeed, both distributions show a larger variation of the parameters fitted to different contours.

Moving the metallicity cut does not significantly change the result. Nonetheless, we should note that if there were indeed two components with different metallicities, it is hard to believe that the correct separation between the two would coincide with the peak of the distribution. We should rather look for a dip in the distribution. Such feature has not been seen so far. Yet, if the separation is closer to one of the two edges of the metallicity distribution, then one of the two component would have too low statistics to be used as a robust tracer.

thumbnail Fig. B.1

Same as Fig. 12, for the metal-poor RRL (left) and the metal-rich ones (right).

References

  1. Ambikasaran, S., Foreman-Mackey, D., Greengard, L., Hogg, D. W., & O’Neil, M. 2015, IEEE Trans. Pattern Anal. Mach. Intell., 38, 252 [Google Scholar]
  2. Anderson, T. W., & Darling, D. A. 1952, Ann. Math. Statist. 23, 193 [CrossRef] [Google Scholar]
  3. Anderson, J., Bedin, L. R., Piotto, G., Yadav, R. S., & Bellini, A. 2006, A&A, 454, 1029 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  4. Bellini, A., Anderson, J., van der Marel, R. P., et al. 2014, ApJ, 797, 115 [Google Scholar]
  5. Bhardwaj, A. 2022, Universe, 8, 122 [NASA ADS] [CrossRef] [Google Scholar]
  6. Bhardwaj, A., Marconi, M., Rejkuba, M., et al. 2023, ApJ, 944, L51 [NASA ADS] [CrossRef] [Google Scholar]
  7. Bobylev, V. V., & Bajkova, A. T. 2021, Astron. Rep., 65, 498 [NASA ADS] [CrossRef] [Google Scholar]
  8. Braga, V. F., Stetson, P. B., Bono, G., et al. 2019, A&A, 625, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  9. Breiman, L. 2001, Mach. Learn., 45, 5 [Google Scholar]
  10. Cabral, J. B., Ramos, F., Gurovich, S., & Granitto, P. M. 2020, A&A, 642, A58 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  11. Cao, L., Mao, S., Nataf, D., Rattenbury, N. J., & Gould, A. 2013, MNRAS, 434, 595 [NASA ADS] [CrossRef] [Google Scholar]
  12. Carpenter, J. M., Hillenbrand, L. A., & Skrutskie, M. F. 2001, AJ, 121, 3160 [NASA ADS] [CrossRef] [Google Scholar]
  13. Catelan, M., & Smith, H. A. 2015, Pulsating Stars [Google Scholar]
  14. Cava, A., Schaerer, D., Richard, J., et al. 2018, Nat. Astron., 2, 76 [Google Scholar]
  15. Clementini, G., Ripepi, V., Garofalo, A., et al. 2023, A&A 674, A18 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  16. Contreras Ramos, R., Zoccali, M., Rojas, F., et al. 2017, A&A, 608, A140 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  17. Contreras Ramos, R., Minniti, D., Gran, F., et al. 2018, ApJ, 863, 79 [Google Scholar]
  18. Crestani, J., Fabrizio, M., Braga, V. F., et al. 2022, VizieR Online Data Catalog: J/ApJ/908/20 [Google Scholar]
  19. Daza-Perilla, I. V., Gramajo, L. V., Lares, M., et al. 2023, MNRAS, 520, 828 [NASA ADS] [CrossRef] [Google Scholar]
  20. Debattista, V. P., Ness, M., Gonzalez, O. A., et al. 2017, MNRAS, 469, 1587 [Google Scholar]
  21. Debattista, V. P., Liddicott, D. J., Gonzalez, O. A., et al. 2023, ApJ, 946, 118 [NASA ADS] [CrossRef] [Google Scholar]
  22. Dékány, I., & Grebel, E. K. 2020, ApJ, 898, 46 [CrossRef] [Google Scholar]
  23. Dékány, I., & Grebel, E. K. 2022, ApJS, 261, 33 [CrossRef] [Google Scholar]
  24. Dékány, I., Minniti, D., Catelan, M., et al. 2013, ApJ, 776, L19 [Google Scholar]
  25. Dessauges-Zavadsky, M., Schaerer, D., Cava, A., Mayer, L., & Tamburello, V. 2017, ApJ, 836, L22 [Google Scholar]
  26. Di Matteo, P. 2016, PASA, 33, e027 [CrossRef] [Google Scholar]
  27. Du, H., Mao, S., Athanassoula, E., Shen, J., & Pietrukowicz, P. 2020, MNRAS, 498, 5629 [CrossRef] [Google Scholar]
  28. Elorrieta, F., Eyheramendy, S., Jordán, A., et al. 2016, A&A, 595, A82 [CrossRef] [EDP Sciences] [Google Scholar]
  29. Emerson, J. P., Irwin, M. J., Lewis, J., et al. 2004, SPIE Conf. Ser., 5493, 401 [Google Scholar]
  30. Fragkoudi, F., Di Matteo, P., Haywood, M., et al. 2018, A&A, 616, A180 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  31. Fragkoudi, F., Grand, R. J. J., Pakmor, R., et al. 2020, MNRAS, 494, 5936 [Google Scholar]
  32. Girardi, L. 2016, ARA&A, 54, 95 [Google Scholar]
  33. González-Fernández, C., Hodgkin, S. T., Irwin, M. J., et al. 2018, MNRAS, 474, 5459 [Google Scholar]
  34. Gran, F., Minniti, D., Saito, R. K., et al. 2016, A&A, 591, A145 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  35. GRAVITY Collaboration (Abuter, R., et al.) 2021, A&A, 647, A59 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  36. GRAVITY Collaboration (Abuter, R., et al.) 2022, A&A, 657, L12 [NASA ADS] [CrossRef] [Google Scholar]
  37. Griv, E., Gedalin, M., Pietrukowicz, P., Majaess, D., & Jiang, I.-G. 2020, MNRAS, 499, 1091 [NASA ADS] [CrossRef] [Google Scholar]
  38. Guo, Y., Ferguson, H. C., Bell, E. F., et al. 2015, ApJ, 800, 39 [NASA ADS] [CrossRef] [Google Scholar]
  39. Guo, Y., Rafelski, M., Bell, E. F., et al. 2018, ApJ, 853, 108 [NASA ADS] [CrossRef] [Google Scholar]
  40. Hajdu, G., Dékány, I., Catelan, M., Grebel, E. K., & Jurcsik, J. 2018, ApJ, 857, 55 [Google Scholar]
  41. Hajdu, G., Dékány, I., Catelan, M., & Grebel, E. K. 2020, Exp. Astron., 49, 217 [Google Scholar]
  42. Hey, D. R., Huber, D., Shappee, B. J., et al. 2023, AJ, 166, 249 [NASA ADS] [CrossRef] [Google Scholar]
  43. Huertas-Company, M., Guo, Y., Ginzburg, O., et al. 2020, MNRAS, 499, 814 [NASA ADS] [CrossRef] [Google Scholar]
  44. Johnson, C. I., Rich, R. M., Simion, I. T., et al. 2022, MNRAS, 515, 1469 [NASA ADS] [CrossRef] [Google Scholar]
  45. Kim, D.-W., Protopapas, P., Byun, Y.-I., et al. 2011, ApJ, 735, 68 [NASA ADS] [CrossRef] [Google Scholar]
  46. Kinemuchi, K., Smith, H. A., Woźniak, P. R., McKay, T. A., & ROTSE Collaboration 2006, AJ, 132, 1202 [Google Scholar]
  47. Kormendy, J., & Kennicutt, Robert C., J. 2004, ARA&A, 42, 603 [NASA ADS] [CrossRef] [Google Scholar]
  48. Kunder, A., Rich, R. M., Koch, A., et al. 2016, ApJ, 821, L25 [CrossRef] [Google Scholar]
  49. Leung, H. W., Bovy, J., Mackereth, J. T., et al. 2023, MNRAS, 519, 948 [Google Scholar]
  50. Lim, D., Koch-Hansen, A. J., Chung, C., et al. 2021, A&A, 647, A34 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  51. Majaess, D., Dékány, I., Hajdu, G., et al. 2018, Ap&SS, 363, 127 [NASA ADS] [CrossRef] [Google Scholar]
  52. Martínez-Vázquez, C. E., Monelli, M., Bernard, E. J., et al. 2017, ApJ, 850, 137 [CrossRef] [Google Scholar]
  53. Medina, N., Borissova, J., Bayo, A., et al. 2018, ApJ, 864, 11 [NASA ADS] [CrossRef] [Google Scholar]
  54. Minniti, D., Lucas, P. W., Emerson, J. P., et al. 2010, New A, 15, 433 [Google Scholar]
  55. Minniti, J. H., Sbordone, L., Rojas-Arriagada, A., et al. 2020, A&A, 640, A92 [EDP Sciences] [Google Scholar]
  56. Molnar, T. A., Sanders, J. L., Smith, L. C., et al. 2022, MNRAS, 509, 2566 [NASA ADS] [Google Scholar]
  57. Muraveva, T., Delgado, H. E., Clementini, G., Sarro, L. M., & Garofalo, A. 2018, MNRAS, 481, 1195 [Google Scholar]
  58. Navarrete, C., Contreras Ramos, R., Catelan, M., et al. 2015, A&A, 577, A99 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  59. Neeley, J. R., Marengo, M., Freedman, W. L., et al. 2019, MNRAS, 490, 4254 [Google Scholar]
  60. Ness, M., Freeman, K., Athanassoula, E., et al. 2012, ApJ, 756, 22 [NASA ADS] [CrossRef] [Google Scholar]
  61. Nikzat, F., Ferreira Lopes, C. E., Catelan, M., et al. 2022, A&A, 660, A35 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  62. Nun, I., Protopapas, P., Sim, B., et al. 2015, arXiv e-prints [arXiv:1506.00010] [Google Scholar]
  63. Olivares Carvajal, J., Zoccali, M., De Leo, M., et al. 2024, A&A, 687, A312 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  64. Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, J. Mach. Learn. Res., 12, 2825 [Google Scholar]
  65. Pietrukowicz, P., Udalski, A., Soszyński, I., et al. 2012, ApJ, 750, 169 [NASA ADS] [CrossRef] [Google Scholar]
  66. Pietrukowicz, P., Kozłowski, S., Skowron, J., et al. 2015, ApJ, 811, 113 [Google Scholar]
  67. Pietrukowicz, P., Udalski, A., Soszyński, I., et al. 2020, Acta Astron., 70, 121 [NASA ADS] [Google Scholar]
  68. Prudil, Z., Kunder, A., Dekany, I., & Koch-Hansen, A. J. 2024, A&A, 684, A176 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  69. Queiroz, A. B. A., Chiappini, C., Perez-Villegas, A., et al. 2021, A&A, 656, A156 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  70. Richards, J. W., Starr, D. L., Butler, N. R., et al. 2011, ApJ, 733, 10 [NASA ADS] [CrossRef] [Google Scholar]
  71. Rojas-Arriagada, A., Recio-Blanco, A., Hill, V., et al. 2014, A&A, 569, A103 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  72. Schwarzenberg-Czerny, A. 1989, MNRAS, 241, 153 [Google Scholar]
  73. Semczuk, M., Dehnen, W., Schönrich, R., & Athanassoula, E. 2022, MNRAS, 509, 4532 [Google Scholar]
  74. Shen, J., & Zheng, X.-W. 2020, Res. Astron. Astrophys., 20, 159 [CrossRef] [Google Scholar]
  75. Simion, I. T., Belokurov, V., Irwin, M., et al. 2017, MNRAS, 471, 4323 [NASA ADS] [CrossRef] [Google Scholar]
  76. Smith, H. A. 1995, Camb. Astrophys. Ser., 27 [Google Scholar]
  77. Soszyński, I., Dziembowski, W. A., Udalski, A., et al. 2011, Acta Astron., 61, 1 [NASA ADS] [Google Scholar]
  78. Soszyński, I., Udalski, A., Wrona, M., et al. 2019, Acta Astron., 69, 321 [Google Scholar]
  79. Stetson, P. B. 1987, PASP, 99, 191 [Google Scholar]
  80. Stringer, K. M., Long, J. P., Macri, L. M., et al. 2019, AJ, 158, 16 [Google Scholar]
  81. Stringer, K. M., Drlica-Wagner, A., Macri, L., et al. 2021, ApJ, 911, 109 [Google Scholar]
  82. Surot, F., Valenti, E., Gonzalez, O. A., et al. 2020, A&A, 644, A140 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  83. Udalski, A., Szymański, M. K., & Szymański, G. 2015, Acta Astron., 65, 1 [NASA ADS] [Google Scholar]
  84. Vislosky, E., Minchev, I., Khoperskov, S., et al. 2024, MNRAS, 528, 3576 [NASA ADS] [CrossRef] [Google Scholar]
  85. Walker, A. R. 1989, PASP, 101, 570 [Google Scholar]
  86. Wegg, C., & Gerhard, O. 2013, MNRAS, 435, 1874 [Google Scholar]
  87. Zgirski, B., Pietrzyński, G., Górski, M., et al. 2023, ApJ, 951, 114 [NASA ADS] [CrossRef] [Google Scholar]
  88. Zoccali, M., Vasquez, S., Gonzalez, O. A., et al. 2017, A&A, 599, A12 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  89. Zoccali, M., Valenti, E., & Gonzalez, O. A. 2018, A&A, 618, A147 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

1

Note that the whole amplitude range for RRab in KS is 0.1–0.5 mag, hence this difference is non-negligible.

All Tables

Table 1

Features used in the Random Forest classification.

Table 2

The PLZ relations examined in the present work.

Table 3

A sample of the RRab catalog released at the CDS together with the present paper.

All Figures

thumbnail Fig. 1

Galactic coordinates of the 16 488 RRL in the catalog presented here, color-coded according to their extinction E(J − KS). Overlaid in gray are the coordinates of RRab variables in the OGLE IV catalog.

In the text
thumbnail Fig. 2

Phased, KS -band light curves for two typical RRab variables in our sample.

In the text
thumbnail Fig. 3

Normalized S/N distribution of the light curves for stars in the RRab training set (red) and in the science sample (blue).

In the text
thumbnail Fig. 4

Footprints of the catalogs by Dékány & Grebel (2022) (top), Molnar et al. (2022) (middle), and Clementini et al. (2023) (bottom) are shown in blue, with a density scale shown at the right of each panel. Black points show the galactic positions of the RRab in the present catalog, including the addition from OGLE-IV.

In the text
thumbnail Fig. 5

Top: galactic position of each RRab in our catalog. In gray, we mark those present only in our catalog, including the OGLE-IV addition. In red, we mark those also present in the Gaia DR3 main catalog, and in blue those also identified as RRL, presented in Clementini et al. (2023). Bottom: reddening distribution of the RRL in our catalog (gray). Within our sample, the RRL also found in Gaia, but not identified as RRL, are shown in red, while those identified as RRL are shown in blue. The color coding is the same in both panels.

In the text
thumbnail Fig. 6

Top panel: distribution of the estimated errors in the [Fe/H] as quoted in Crestani et al. (2022, their Table 2). The latter work is based on high dispersion spectra (R=35 000) for 143 RRL. Bottom: distribution of the estimated errors in the photometric [Fe/H] values inferred by Dékány & Grebel (2022). Photometric [Fe/H] values are inferred from the shape of the light curve, by means of a machine learning algorithm trained (indirectly) on the spectroscopic measurements by Crestani et al. (2022).

In the text
thumbnail Fig. 7

Trend of the heliocentric distances to individual RRab in our catalog versus their photometric metallicity. From top to bottom, the adopted PLZ are those by Muraveva et al. (2018); Neeley et al. (2019); Zgirski et al. (2023); Bhardwaj et al. (2023); Prudil et al. (2024). A linear fit highlights the presence of a slope, whose value is written in the labels, that is minimized for the PLZ by Prudil et al. (2024), which we select as the optimal one for the present work.

In the text
thumbnail Fig. 8

Same as Fig. 7, but now distances are plotted against period. A linear fit highlights the presence of a slope, that should not be there, whose value is written on the labels. The slope is smaller for the PLZ by Prudil et al. (2024), finally used in the present analysis.

In the text
thumbnail Fig. 9

Top: distribution of the RRL distances and their statistical errors, derived adopting the PLZ by P23. Bottom: our estimate of the systematic error associated with the distance measurements, obtained as the standard deviation of the distances derived adopting the five different PLZ examined here.

In the text
thumbnail Fig. 10

Histogram of the distances to individual variables, projected along the Sun-GC direction. The black histogram shows the 12 965 variables identified by our algorithm, while the light gray histogram shows the 3521 RRab that were added from the OGLE-IV catalog. The latter have a clear bias toward more nearby variables, and therefore, they are excluded from the analysis. The green dotted Gaussian represents the projected bulge density distribution, while the red dotted one refers to the disk. Their sum, reproducing the observed distribution, is the orange thick line. The latter is centered at RGC=8.30 kpc (blue vertical line).

In the text
thumbnail Fig. 11

Cartesian position of individual RRab in our catalog, projected onto the Galactic plane X,Y. The white cross marks the GC, and the Sun is at (0,0).

In the text
thumbnail Fig. 12

Top: two dimensional KDE of the distribution of the RRab, projected onto the Galactic plane X,Y. Top: color shades show density contours. Bottom: ellipses (black lines) are fitted to several selected density contours.

In the text
thumbnail Fig. 13

Reddening versus Y coordinate for all the RRab in our catalogue. It is clear that the reddening distribution is not symmetric with respect to the Y=0 direction, therefore its treatment can have an effect on the derived shape parameters traced with RRL.

In the text
thumbnail Fig. 14

Qualitative representation of the effect of observational errors on the shape of the observed distribution of any sample of tracers. A perfectly spherical distribution, shown by the pink contours, is converted into an egg-shaped distribution (black contours) by the effect of the errors on the distances, which are always larger than the errors on the galactic coordinates. This would imply that a bar would be measured with a lower inclination angle with respect to the Sun-GC direction and a larger ellipticity.

In the text
thumbnail Fig. 15

Observed axis ratio (top) and inclination angle (bottom) as a function of the percent error on the distance for a fixed input value of both parameters, as indicated by the labels. The simulation shows that at larger distance errors, the observed bar is more elongated than its true counterpart, and its observed major axis tends to align with the Sun-GC direction.

In the text
thumbnail Fig. A.1

Distribution of MSE for three subsample of our catalogue. The panels show that the selection of candidates with Prob>80% and MSE<0.25 (top) or Prob=50–80% and MSE<0.2 (middle) allows us to isolate a sub-sample occupying the same parameter space of the golden sample with Prob>90% and S/N>60, here shown in red, that we use as a reference. The bottom panel shows the final sample resulting from the visual inspection (teal) compared with the reference sample (red).

In the text
thumbnail Fig. A.2

Example of a phased light curve that was discarded, for having Nbin=4. That is, if we bin the light curve with 20 bins along the Phase, 4 of them would be empty.

In the text
thumbnail Fig. B.1

Same as Fig. 12, for the metal-poor RRL (left) and the metal-rich ones (right).

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.