A2A: 21,000 bulge stars from the ARGOS survey with stellar parameters on the APOGEE scale

We use the data-driven method, The Cannon, to bring 21,000 stars from the ARGOS bulge survey, including 10,000 red clump stars, onto the parameter and abundance scales of the cross-Galactic survey, APOGEE, obtaining rms precisions of 0.10 dex, 0.07 dex, 74 K, and 0.18 dex for [Fe/H], [Mg/Fe], Teff, and log(g), respectively. The re-calibrated ARGOS survey - which we refer to as the A2A survey - is combined with the APOGEE survey to investigate the abundance structure of the Galactic bulge. We find X-shaped [Fe/H] and [Mg/Fe] distributions in the bulge that are more pinched than the bulge density, a signature of its disk origin. The mean abundance along the major axis of the bar varies such that the stars are more [Fe/H]-poor and [Mg/Fe]-rich near the Galactic center than in the long bar/outer bulge region. The vertical [Fe/H] and [Mg/Fe] gradients vary between the inner bulge and long bar with the inner bulge showing a flattening near the plane that is absent in the long bar. The [Fe/H]-[Mg/Fe] distribution shows two main maxima, an ``[Fe/H]-poor [Mg/Fe]- rich'' maximum and an ``[Fe/H]-rich [Mg/Fe]-poor'' maximum, that vary in strength with position in the bulge. In particular, the outer long bar close to the Galactic plane is dominated by super-solar [Fe/H], [Mg/Fe]-normal stars. Stars composing the [Fe/H]-rich maximum show little kinematic dependence on [Fe/H], but for lower [Fe/H] the rotation and dispersion of the bulge increase slowly. Stars with [Fe/H]<-1 dex have a very different kinematic structure than stars with higher [Fe/H]. Comparing with recent models for the Galactic boxy-peanut bulge, the abundance gradients and distribution, and the relation between [Fe/H] and kinematics suggest that the stars comprising each maximum have separate disk origins with the ``[Fe/H]-poor [Mg/Fe]-rich'' stars originating from a thicker disk than the ``[Fe/H]-rich [Mg/Fe]-poor'' stars.


Introduction
The Milky Way bulge is notoriously difficult and expensive to observe due to the high extinction along our sightline to the Galactic Centre. Nevertheless, over the past two decades, the number of spectroscopically observed bulge stars has increased from a few hundred to tens of thousands thanks to multiple spectroscopic stellar surveys such as argos (Freeman et al. 2012), Gaia-eso (Gilmore et al. 2012), GIBS (Zoccali et al. 2014), apogee (Majewski 2016), and Gaia (Cropper et al. 2018).
The extensive coverage of these spectroscopic surveys has led to many novel discoveries and vastly improved our understanding of bulge formation and evolution. We know from its wide, multi-peaked metallicity distribution function (MDF) that the bulge is composed of a mixture of stellar populations. This is further supported by the different populations, defined by their metallicities, exhibiting different kinematics (Hill et al. 2011;Ness et al. 2013b;Rojas-Arriagada et al. 2014Zoccali et al. 2017). Through careful chemodynamical dissection, the swylie@mpe.mpg.de bulge has been found to contain stars that are part of the bar, inner thin and thick disks, and a pressure supported component (Queiroz et al. 2020b). Furthermore, there is evidence that the bulge also contains a remnant of a past accretion event, the inner Galaxy structure (Horta et al. 2021). Multiple age studies of the bulge have reported that while the bulge is mainly composed of old stars (∼10 Gyr), it contains a non-negligible fraction of younger stars (Bensby et al. 2013(Bensby et al. , 2017Schultheis et al. 2017;Bovy et al. 2019;Hasselquist et al. 2020).
While analysis of these surveys has greatly improved our understanding of the bulge, direct comparisons of studies that use different survey data, as well as combinations of the measurements of the stars from different survey pipelines is problematic. This is because different surveys use different selection criteria, wavelength coverage, and spectral resolution. Furthermore, they employ different data analysis methods, assume different underlying stellar models, and make different approximations to derive stellar parameters and individual element abundances from their spectra (see Jofré et al. (2019) for a review).
Despite these inconsistencies, analyses that employ stars from different surveys are often compared, leading to uncertainty Article number, page 1 of 26 arXiv:2106.14298v1 [astro-ph.GA] 27 Jun 2021 A&A proofs: manuscript no. aanda as to whether the results reflect intrinsic properties of the Galaxy or if they are simply due to different observing and data processing strategies. For example, Zoccali et al. (2017) and Rojas-Arriagada et al. (2017) found bi-modal bulge MDFs using data from the Gaia-eso and GIBS surveys, Rojas-Arriagada et al. (2020) found a three component bulge MDF using data from the apogee survey, and Ness et al. (2013a) found a five component bulge MDF using data from the argos survey. Because the stars in these surveys have not been observed and analyzed in the exact same manner, it is unclear whether these differences in the bulge MDF arise because of different parameter and abundance scales or because of different selection functions.
In this paper, we use the data driven method, The Cannon (Ness et al. 2015), to put 21,000 stars from the Galactic bulge survey argos onto the parameter and abundance scales of the cross-Galactic survey apogee. Of these 21,000 stars, there are roughly 10,000 red clump (RC) stars with accurate distances. By rectifying the scale differences between the two surveys, we can combine them and gain a deeper coverage of the Galactic bulge. We call the re-calibrated argos catalog the a2a catalog as we are putting argos stars onto the apogee scale. Then, using the combined a2a and apogee surveys, we investigate the chemodynamical structure of the bulge. Specifically, we examine how the Iron abundance and Magnesium enhancement vary over the bulge as well as their kinematic dependencies.
The paper is structured as follows: In Section 2 we describe the argos and apogee surveys as well as highlight the inconsistencies between them which make directly combining them questionable. In Section 3 we summarize the technical background of The Cannon. In Section 4 we explain how we apply The Cannon to the argos catalog to create the a2a catalog. In Section 4 we also describe the three validation tests we perform to verify that the label transfer was successful. In Section 5 we discuss the selection functions of the a2a and apogee surveys. In Section 6 we use the a2a and apogee catalogs to examine the abundance structure of the Galactic bulge. In Section 7 we discuss the results of the paper in more detail and finally in Section 8 we end the paper with our conclusions.

Data
In this section, we provide some background on the data used in this paper before discussing their main properties.

ARGOS
The Abundance and Radial Velocity Galactic Origin Survey (argos; Freeman et al. (2012); Ness et al. (2013a,b)) is a medium resolution spectroscopic survey designed to observe RC stars in the Galactic bulge. Using the AAOmega fibre spectrometer on the Anglo-Australian Telescope, ARGOS observed nearly 28, 000 stars located in 28 fields directed towards the bulge. The field locations are shown as green ellipses in Figure 1. The observations were performed across a wavelength region of 840−885 nm at a resolution of R = λ/δλ 11, 000, where δλ is the spectral resolution element.
The argos team determined the Iron abundance ([Fe/H]), surface gravity (log(g)), and alpha enhancement ([α/Fe]) of each star in their catalog using χ 2 minimisation to find the best fit between the observed spectra and a library of synthetic spectra. The Local Thermodynamic Equilibrium stellar synthesis program MOOG (Sneden et al. 2012) was used to generate the library of spectra. The effective temperature (T eff ) was determined from the stellar colours (J − K s ) 0 using the calibration by Bessell et al. (1998). For more information about the argos parameter and abundance determination process see Freeman et al. (2012).
The argos catalog used in this paper contains 25, 712 stars/spectra from the original 28, 000 observed. The missing stars/spectra were removed because they had low signal-to-noise (SNR) and poor quality spectra (Freeman et al. 2012). The number of pixels of each argos flux array is 1697. To process the data, we re-normalized the remaining argos spectra by dividing each by a Gaussian-smoothed version of itself, with the Calcium-triplet lines removed, using a smoothing kernel of 10 nm. We also transformed each spectrum to a common rest frame and masked out the diffuse interstellar band at 8621 nm. We also masked out the region around 8429 nm as we found that this region had a strong residual between the mean spectra of positive and negative velocity stars, indicating that it is systematically affected by the velocity shift.

APOGEE
The Apache Point Observatory Galactic Evolution Experiments (apogee; Majewski (2016)) is a program in the Sloan Digital Sky Survey (SDSS) that was designed to obtain high resolution spectra of red giant stars located in all major components of the Galaxy. The survey operates two telescopes ; one in each hemisphere with identical spectrographs which observe in the nearinfrared between 1.5µm to 1.7µm at a resolution of R 22, 500. In this work, we use the latest data release, DR16 (Ahumada et al. 2019), which is the first data release to contain stars observed in the southern hemisphere. The locations of the apogee fields used in this work are shown in Figure 1 as blue ellipses and red crosses.
Stellar parameters and abundances of the apogee stars used in this work were obtained from the apogee Stellar Parameters and Chemical Abundance Pipeline (ASPCAP; García Pérez et al. (2016); Holtzman et al. (2018); Jönsson et al. (2020)). This pipeline uses the radiative transfer code Turbospectrum (Plez et al. 1992;Plez 2012) to build a grid of synthetic spectra. The parameters and abundances were determined using the code FERRE (Allende Prieto et al. 2006) which iteratively calculates the best-fit between the synthetic and observed spectra. The fundamental atmospheric parameters such as log(g), T eff , and overall metallicity were determined by fitting the entire apogee spectrum of a star. Individual elemental abundances were determined by fitting spectral windows within which the spectral features of a given element are dominant.
We obtain spectrophotometric distances for the apogee stars from the AstroNN catalog Mackereth et al. 2019a), which derives them from a deep neural network trained on stars common to both apogee and Gaia (Gaia Collaboration et al. 2018).
In this work, we specifically focus on apogee stars located in fields directed towards the bulge with |l f | < 35°and |b f | < 13°w here l f and b f are the Galactic longitude and latitude locations of the fields. We remove 6 fields that were designed to observe the core of Sagittarius. We require that the stars are part of the apogee main sample (MSp) by setting the apogee flag, EXTRATARG, to zero. We refer to this sample as the apogee bulge MSp. To ensure that the stars we use have trustworthy parameters and abundances, we also require the stars to have valid ASPCAP parameters and abundances, SNR ≥ 60, T eff ≥ 3200 K, and no Star_Bad flag set (23rd bit of ASPCAPFLAG = 0). After applying these cuts, there are 172 remaining bulge fields containing 37, 313 stars. For reference, we refer to this sample as the HQ apogee bulge MSp.
In the analysis sections of this work, unless explicitly stated otherwise, we further restrict our apogee sample to only stars for which we can obtain good selection function estimates. This sample contains 23, 512 stars and we refer to it as the HQSSF apogee bulge MSp. See Section 5.2 for further details on this sample.

Survey Inconsistencies
After the removal of potential binaries though visual inspection of individual spectra, we find 204 stars that are observed by both the apogee and argos surveys. Using these stars we can determine whether the surveys are consistent by checking that they derive the same parameters and abundances for the same stars.
In Figure 2, we compare the [Fe/H], α-enhancements, T eff , and log(g) of the common stars. The argos α-enhancement is the average of the individual α-elements over Iron ([α/Fe] Figure 2 shows that the surveys roughly agree between ∼ − 0.75 dex and ∼0 dex (limits in apogee [Fe/H]) with a scatter of ∼0.16 dex. However, beyond these limits, the deviation between the surveys increases, reaching up to ∼0.4 dex. The α-enhancement Magnesium-enhancement comparison in the second plot shows that the argos [α/Fe] estimates are on average ∼0.07 dex larger than the apogee [Mg/Fe] estimates. If we instead compare the argos [α/Fe] estimates to the apogee [α/Fe] estimates then the bias and rms of the distribution are larger at 0.12 dex and 0.15 dex respectively. The T eff comparison in the third plot shows that the argos T eff estimates are on average ∼300 K hotter than the apogee T eff estimates. Finally, while there is a lot of scatter, the log(g) comparison in the last plot shows that the argos log(g) values are generally higher than the apogee log(g) values.
The parameter comparisons show that for most of the common stars, the apogee and argos parameters differ significantly. This could be due to a number of factors such as observing in different wavelength regions (e.g. optical in argos versus infrared in apogee), their use of different data analysis methods (e.g. photometric temperatures in argos versus spectroscopic temperatures in apogee), or their use of different stellar models. In the following sections, we use the data driven method, The Cannon, and this set of common stars to bring the apogee and argos sur-veys on to the same parameter and abundance scales, thereby correcting the deviations we see in Figure 2.

The Cannon Method
The Cannon is a data-driven method that can cross-calibrate spectroscopic surveys. It has the advantage that it is very fast, requires no direct spectral model, and has measurement accuracy comparable to physics based methods even at lower SNR. The Cannon has been previously used to put different surveys on the same parameter and abundance scales using common stars (Casey et al. 2017;Ho et al. 2017;Birky et al. 2020;Galgano et al. 2020;Wheeler et al. 2020).
The Cannon uses a set of reference objects with known labels (i.e. T eff , log(g), [Fe/H], [X/Fe] ...), that well describe the spectral variability, to build a model to predict the spectrum from the labels. This model is then used to re-label the remaining stars in the survey. The word "label" is a machine learning term that we use here to refer to stellar parameters and abundances together with one term. The set of common stars used to build the model is called the reference set.
The Cannon is built on two main assumptions: 1. Stars with the same set of labels have the same spectra. 2. Spectra vary smoothly with changing labels.
Consider two surveys, A and B, where we want to put the stars in survey A onto the label scales of survey B using The Cannon. Assume we also have the required set of common stars between the two surveys to form the reference set. To cross calibrate the surveys, The Cannon performs two main steps: the training step and the application step. During the training step, the spectra from survey A and the labels from survey B of the reference set stars are used to train a generative model. Then, given a set of labels, this model predicts the probability density function for the flux at each wavelength/pixel. During the application step, the spectra of a new set of survey A stars (not the reference set) are relabeled by the trained model. We call this set of stars the application set. If the region of the spectra fit to carries the label information and the reference set well represents the application set, then the new labels of survey A's stars should be on survey B's label scales. The success of the re-calibration can be quantified with a cross-validation procedure, the pick-on-out test described in Section 4.2, which returns the systematic uncertainty with which labels can be inferred from the data, as well as a comparison of the generated model spectra to the observational spectra for individual stars, using a χ 2 metric.
In the next two subsections, we will describe the main steps of The Cannon in more detail.

The Training Step
During the training step, a generative model is trained such that it takes the labels as input and returns the flux at each wavelength/pixel of the spectrum. The functional form of the generative model, f nλ , can be written as a matrix equation: where θ λ is a coefficient matrix, l n is a label matrix, and σ is the noise. The subscript n indicates the reference set star while subscript λ indicates the wavelength/pixel. The coefficient matrix, θ λ , contains the coefficients that control how much each label affects the flux at each pixel. The coefficients are calculated during the training step.
Here the label matrix, l n , is quadratic in the labels and for each star (each column in the matrix) has the form: (2) If, for example, the generative model is trained on the labels T eff and log(g), then each column in the label matrix would be: The noise, σ, is the rms combination of the uncertainty in the flux at each wavelength due to observational errors, σ nλ , and the intrinsic scatter at each wavelength in the model, s λ .
Equation 1 corresponds to the single-pixel log-likelihood function: During the training step, the coefficient matrix θ λ and the model scatter s λ are determined by optimising the single-pixel loglikelihood in Equation 4 for every pixel separately: ln p( f nλ |θ T λ , l n , s λ ).
During this step The Cannon uses the reference set stars to provide the label matrix, l n . The label matrix is held fixed while the coefficient matrix and model scatter are treated as free parameters.

The Application Step
In the training step, we have the label matrix, l n , and we solve for the coefficient matrix θ λ and the scatter s λ . In the application step, we do the opposite: we have the coefficient matrix θ λ and the scatter s λ and we solve for a new label matrix, l m . The subscript m is used in this step because the label matrix now corresponds to stars in the application set, not the reference set. The label matrix, l m , is solved for by optimising the same log-likelihood function as Equation 4. However, here this optimisation is performed using a non-linear least squares fit over the whole spectrum, instead of per pixel:

A2A Catalog
In this paper, we use The Cannon to put the stars from the argos survey onto the apogee survey's label scales for the following labels: [

Reference and Application Sets
A reference set of stars, which is used to train The Cannon's model, is comprised of the stars that are observed by both surveys. The labels for this reference set come from the survey with the desired label scale (in our case apogee), while the spectra are taken from the other (in our case argos). The model that is learned at training time should only be applied to stars that are well represented by the reference set. That is, applied to stars that span the label region of the training data, within which the model can interpolate but need not extrapolate. This can also be thought of as a selection in spectra. In our case, the 204 stars that are common to both the apogee and the argos surveys (discussed in Section 2.3) form the reference set for our Cannon model. The average SNR of the reference set is 46 for argos and 107 for apogee. These reference set stars are found in the following intervals in the argos parameter space: We ignore the limits in A k as this label was only included to stabilize the fits to the other labels. As such, we do not use the learned A k label for science. There are 20,435 argos stars (∼79% of the argos catalog) within the 4D parameter space defined by intervals 7a to 7d. These stars are considered to be well represented by the reference set and normally would form our application set. However, there are many stars with parameter values close to but just outside of the reference set limits. For example, if we could extend all the limits by 1σ ARG , equal to the argos observational error of each label, then we would include 2704 more stars in the application set (∼10.5% more of the argos catalog). Because these stars are still close to the reference set stars, the labels returned by The Cannon for these stars may be correct to the first order. To test whether we can extend any of the limits we use the following procedure:  Figure 3, we compare the output labels produced by the original Cannon model against the output labels produced by the m1σ models trained on the reduced reference sets. The original Cannon model was trained on all 204 reference set stars, shown as the black, blue, and red markers in Figure 2. The Fe-m1σ model (top plot) was trained on 149 reference set stars, shown as just the black and red markers in Figure 2. The Mg-m1σ model (second plot) was trained on 199 reference set stars, shown as just the black and blue markers in Figure 2. For each comparison (or plot) we only compare stars that have argos parameter values in the 1σ ARG region we removed (regions occupied by the blue and red markers in Figure 2). For both labels, less than 1% of stars have original Cannon model and m1σ model labels that differ by more than 1σ ARG .
We also compared the labels produced by the original Cannon model to the labels produced by a Cannon model that was trained on a reference set that was simultaneously reduced by 1σ ARG in [Fe/H] and [Mg/Fe]. The reference set of this model consists of only 144 stars, shown as the black markers in Figure 2. We find that less than 1% of stars from this model have labels that differ from the original Cannon model by more than 1σ ARG in [Fe/H]

Validation Tests
Given a reference set and an application set, The Cannon will always return new labels for the stars in the application set. However, if one is not careful, the returned labels can have large errors. In this section, we perform three validation tests to verify that the labels returned by The Cannon are reasonable.
The first validation test we perform is a common machine learning test called the pick-one-out test. In this test, we create 204 models, each of which is trained on 203 stars from the reference set. The single star that is left out from the reference set changes between each model. Every model is then applied to the spectra of the respective left out star to obtain a new set of labels for it. How similar the new set of labels are to the original apogee labels indicates how well The Cannon can learn the apogee labels given the reference set. In Figure 4 we compare the new Cannon labels of these stars to their apogee labels. For all four labels, the bias and rms, given in the upper left hand corner of each plot, are much lower than those from the argosapogee comparisons in Figure 2. The strong agreement indicates that The Cannon can successfully learn the apogee labels from the argos spectra using the reference set composed of the 204 common stars. The error on each parameter for each star in the a2a catalog is calculated by adding in quadrature the rms value from the pick-one-out test and the small error that is output by the optimiser of the The Cannon (see Section 3).
As a second validation test we compare the model and observational spectra. The shape of the spectrum of a star can be affected by many different stellar parameters and abundances. Ideally, when training a model to describe a stellar spectrum with labels one would like to include all stellar labels that affect the spectrum's shape. However, this would require a huge number of reference set stars, which we do not have. Instead, we are making the approximation that the argos spectra (8400 Å -8800 Å) can be well described by the five labels: [Fe/H], [Mg/Fe], T eff , log(g), and A k . To test this, we compare the model spectra generated by The Cannon against the true observational spectra. This can be done because The Cannon trained model returns the flux at each pixel when given the labels (Equation 1). In Figure 6 we plot the argos spectra of a few example stars with a range of [Fe/H] values and cumulative χ 2 values (sum of the pixel χ 2 values) around the argos pixel number (1697, see Figure 5) versus their model spectra generated by The Cannon. For the model spectra, line thicknesses show the scatter of the fit by The Cannon at each wavelength. Figure 6 shows that the model spectra closely reproduce the true observational spectra. The overall good fit between the model spectra and the true spectra  indicates that the spectra of the argos stars can be well described by the variation of the five labels.
The third test we do is comparing the T eff -log(g)-[Fe/H] distribution of a2a stars to theoretical distributions. In the right hand plot of Figure 7, we show the T eff -log(g) distribution of a2a stars coloured by mean [Fe/H] on top of 10 Gyr PARSEC isochrones with metallicities ranging from −2 dex to 0.6 dex (Tang et al. 2014;Chen et al. 2014Chen et al. , 2015Bressan et al. 2012). The a2a stars tightly follow the PARSEC isochrones. Furthermore, even though no isochrone information was input into The Cannon, there are no a2a stars in nonphysical regions of the diagram. The close fit of the a2a stars to the PARSEC isochrones supports that the label transfer was successful.
The success of these three tests shows that it is possible to train a Cannon model on a moderate number (204) of common stars and still obtain a set of labels with good precisions (see Section 4.1).

A2A versus ARGOS
In this section we compare the a2a catalog to the argos catalog. In Figure 7 we show the T eff -log(g)-[Fe/H] distributions of argos stars (left) and a2a stars (right) on top of 10 Gyr PAR-SEC isochrones. The argos stars very roughly follow the PAR-SEC isochrones. Many argos stars also fall in nonphysical regions of the parameter space. As discussed in the previous section, a2a stars have a much tighter alignment with the PARSEC isochrones with no stars falling in nonphysical regions.
In Figure 8 we show the argos and a2a MDFs of all stars (top) and only RC stars (bottom). The most prominent difference between the MDFs of the surveys is that a2a obtains more very [Fe/H]-rich stars than argos for all stars as well as when we restrict to only the RC. argos has more solar to sub-solar stars until ∼ − 0.5 dex where a2a has more stars. Between ∼ − 1 and ∼ − 0.7 dex argos has more stars for all stars and the RC. Below ∼ − 1 dex, the difference between argos and a2a is small.

Red Clump Extraction and A2A Distances
We statistically extract RC stars from the a2a catalog using the following probabilistic method: First, we determine the spectroscopic magnitudes, M K s , of each a2a star by fitting their log(g), T eff , and [Fe/H] parameters to theoretical isochrones. Then, using the spectroscopic magnitudes, we calculate a weight for each star which gives the probability that it is part of the RC. The functional form of this weight is a Gaussian: where M rc = −1.61 ± 0.22 mag is the intrinsic magnitude of the RC (Alves 2000). We find that for 10 Gyr old PARSEC isochrones (Tang et al. 2014;Chen et al. 2014Chen et al. , 2015Bressan et al. 2012) the spectroscopic magnitude varies with log(g) as dM K s /d(log g) = 2.33. The average a2a log(g) error is 0.18 dex, giving an average magnitude error of 0.42 mag. We add this in quadrature with the intrinsic width of the RC magnitude to obtain a total magnitude error of 0.47 mag for our RC sample. This is σ M Ks in Equation 9. This magnitude dependent weighing method extracts the RC from a2a by giving higher weights to stars that are likely to be part of the RC and lower weights to stars that are unlikely to be part of the RC. To obtain distances for the a2a stars, we first treat each star as a RC star and assume their absolute magnitudes are that of the RC (−1.61 mag). We then compare the RC absolute magnitude to the de-reddened apparent magnitude of each star which we obtain using the Schlegel et al. (1998) extinction maps recalibrated by Schlafly & Finkbeiner (2011). This method gives us the distance of each star assuming that it is a RC star. To account for the fact that not every a2a star is a RC star, we weigh the stars by how likely they are to be RC stars using the weight in Equation 9. By weighing the stars in this manner, we treat all stars as RC stars but effectively remove the stars that are unlikely to be part of the RC by strongly de-weighting them.
The a2a catalog contains 10, 357 RC stars. We obtain this number by summing the RC weights (Equation 9). The a2a LF is slightly below the HQ 2mass LF due to the parametric limits applied during the creation of the a2a catalog.

Selection Functions
The probability that any given star in the Galaxy is observed by a large survey program is called the survey selection function (SSF); see Sharma et al. (2011) for a detailed discussion. In order to obtain unbiased parameter and abundance distributions of the Galactic bulge using the a2a and apogee surveys we must correct for their SSFs. Otherwise, it would not be clear if the distributions we obtain are the true distributions of the Galactic bulge, or whether they are biased by the selection choices of the surveys. In the next two sections, we discuss the a2a and apogee SSFs.

A2A Selection Function
The stars composing the a2a catalog were selected from the argos catalog which in turn was selected from a high quality (HQ) subset of the Two Micron All Sky Survey (2mass; (Skrutskie et al. 2006)). In the following subsections we discuss the selection of the argos survey from the HQ 2mass subset and the selection of the a2a survey from the argos survey.

Selection of ARGOS from HQ 2MASS
The argos stars were selected from a HQ sub-sample of the 2mass survey, requiring the stars to have high photometric quality flags (see Freeman et al. (2012)), magnitudes between 11.5 ≤ K s (mag) ≤ 14.0, and colours (J − K s ) 0 ≥ 0.38 mag. For each 2mass star that met these requirements, its I 0 -band magnitude was estimated using the equation: Then, for each field, the argos team randomly selected approximately 1000 stars roughly evenly distributed among the I 0 -band bins: 13-14 mag, 14-15 mag, and 15-16 mag. This was done in order to sample a roughly equal number of stars from the front, middle, and back regions of the bulge. We use the following procedure to correct for the I 0 -band selection (similar to Portail et al. (2017a), their Section 5.1.1). First we take all 2mass stars in a given field and apply the colour, magnitude, and quality cuts described above. Then, we estimate the I 0 -band magnitude of each remaining 2mass star as well as of each argos star using Equation 10. We can then correct for the I 0 -band selection by weighing each argos star by the ratio of the number of HQ 2mass stars to the number of argos stars in each I 0 -band bin and field: After the application of the weights in Equation 11 to the argos luminosity function (LF), we statistically recover the HQ 2mass LF within the respective colour and magnitude limits. The upper plot of Figure 9 shows this for the field (l, b) = (−10°, −10°).

Selection of A2A from ARGOS
As the a2a stars were selected from the argos catalog, we also similarly correct the a2a catalog for the I 0 -band selection using the weights from Equation 11. However, the weighted a2a LFs are systematically below the HQ 2mass LFs because of the a2a selection from the argos catalog which removes 4, 135 stars. These stars were removed because their spectra could not be processed by The Cannon (did not satisfy limits 7a, 7b, 8a, and 8b).
Because of this, we know the label values of these stars on the argos scale but only have approximate knowledge of where they are on the apogee scale. To replicate this selection, we examine if the limits that removed these stars can be described using parameters that do not change during the label transfer. The T eff limits (see 7a) are simple to approximate as there is a near linear relationship between argos T eff and colour, shown in Figure 10. However, this substitution is not perfect and the colour limits must be chosen carefully as the T eff -colour distribution has some spread due to variations in the other labels. For example, stars with lower argos [Fe/H] are hotter for constant colour (see the point colour in Figure 10). If chosen incorrectly, the colour limits can remove many stars that satisfy limits 7a,7b, 8a, and 8b. We find that the T eff limits are well approximated by the colour limits: Unfortunately, the other parametric limits are not as easily replaced using alternative parameters that remain constant during the label transfer. We take the final a2a catalog to include all stars processed by The Cannon that: 1. Have model χ 2 values below 5000 (see Figure 5). 2. Satisfy the limits 7a,7b, 8a, and 8b. 3. Are within the colour limits in Equation 12.
Within these conditions, the a2a catalog contains 21, 325 stars. If we apply the colour cut (Equation 12) to the argos catalog then the argos catalog would contain 23, 487 stars. Thus, the colour cut a2a catalog is 91% complete compared to the colour cut argos catalog.
In the subsequent analysis of the bulge's chemodynamical structure, we often select and plot a2a RC stars to obtain good distance estimates (see Section 4.4 for a discussion on RC extraction). We make the assumption that the reference set limits affect RC and red giant branch stars equally such that the a2a RC catalog is also ∼91% complete. We test this assumption in Appendix A.

APOGEE Selection Function
The apogee sample we use for most of this work's analysis is the HQSSF apogee bulge MSp. It is a sub-sample of the full apogee bulge MSp in that we also require the stars to have high quality ASPCAP parameters and abundances (see Section 2.2) and good SSF estimates. In this sample, only stars that are part of complete cohorts, i.e. group of stars observed together during the same visits, have SSF estimates. Estimating the SSF for this sample proceeds in two steps: 1. To account for the selection of the apogee bulge MSp from the HQ 2mass subset, we use the publicly available python package APOGEE (Bovy et al. 2014;Bovy 2016;Mackereth & Bovy 2020). For each complete cohort, the program returns the ratio of the number of apogee MSp stars to the number of HQ 2mass stars within the respective colour and magnitude limits of the cohort 1 . Then, we weight each star in each cohort, c i , by the inverse of this ratio: 2. Restricting our sample to apogee stars with HQ ASPCAP parameters and abundances (Section 2.2) removes ∼14% of the apogee bulge MSp. To correct for this selection we bin all apogee bulge MSp stars (including the stars with poor AS-PCAP estimates) and all HQ ASPCAP MSp stars in magnitude, colour, and cohort. Then, we weight each HQ ASPCAP MSp star by the ratio of the number of MSp stars to the number of HQ ASPCAP MSp stars in the colour and magnitude bin in which it falls: Figure 11 shows the result of the application of the weights in Equations 13 and 14 to the H-band LF of a cohort in the field (l, b) = (−2°, 0°). We see that after the application of the weights, the LFs of the HQSSF apogee bulge MSp and HQ 2mass subset approximately match.
In Figure 1, the red crosses indicate apogee field locations for which we cannot use the APOGEE python package to obtain good SSF estimates of the observed stars. This occurs either because the cohorts composing the fields are not currently complete or because they do not contain any MSp stars. Removing these fields, as well as a few cohorts for which the weighted LF poorly reproduces the LF of its HQ 2mass parent sample, leaves 23, 512 stars in the HQSSF apogee bulge MSp.
In the subsequent analysis we restrict the HQSSF apogee bulge MSp further by requiring stars to have AstroNN distance errors less than 20%. This roughly removes 5% of the HQSSF apogee bulge MSp leaving 22, 340 apogee stars.

Selection of HQ 2MASS Catalogs
So far we have described the a2a and apogee SSFs as well as the corresponding weights that are needed to statistically correct each survey to the magnitude and colour distributions of their respective HQ 2mass parent samples. This is similar to the procedure done by Rojas-Arriagada et al. (2020), who used simple stellar populations to determine the fraction of giants with fixed distance modulus and metallicity that fall with in the apogee magnitude and colour ranges. Then using these fractions, they re-weighted the observed stars to the weights they had in the survey input sample. However, the input HQ 2mass sample of each survey itself has a SSF relative to the real Galaxy (in practise the current deepest photometric survey, VVV (Minniti et al. 2010;Surot et al. 2019)) due to photometric criteria, crowding, and extinction. The 2mass SSF is strongest at low latitudes and is illustrated in Portail et al. (2017a, their Section 5.1.1). This SSF would be additionally required when comparing (or weighting by) the relative number densities of stars in different fields, especially those with different latitudes.
In this paper, we confine our analysis to small spatial bins, making use of RC distances for a2a and AstroNN distances for 1 Cohort magnitude limits are set depending on the planned number of visits. Most cohorts used in this work have the magnitude limits 7 < H 0 (mag) < 11, 7 < H 0 (mag) < 12.2, or 7 < H 0 (mag) < 12.8, although some have fainter limits. The colour limits of the cohorts are (J − K s ) 0 ≥ 0.5 mag in bulge and apogee-1 disk fields, and 0.5 ≤ (J−K s ) 0 (mag) ≤ 0.8 and (J − K s ) 0 > 0.8 mag in apogee-2 disk fields; see Zasowski et al. (2017). apogee. When we do this, the observed stars in a given bin are representative of the stellar population at that distance making further corrections of the HQ 2mass survey magnitude distribution unnecessary. However, in practise, the bins we use have sizes ∼2 kpc, thus if there is a line-of-sight abundance gradient in a field, the fainter stars in a given bin could have a slightly different abundance distribution than the brighter stars as they trace somewhat larger distance. Figure 12 shows, for example bins, that no such effect is seen within the errors in either survey. An additional effect could arise due to fields at different latitudes/heights contributing stars to the same distance bins. Specifically, lower latitude fields are generally more [Fe/H]-rich and have higher crowding than higher latitude fields due to the [Fe/H] and density gradients in the bulge. In such cases, by not correcting for the HQ 2mass SSF we may introduce a slight bias against the lower latitude, higher [Fe/H] stars in each spatial bin. However, the effects of field mixing would be small in a2a as its fields are well separated and generally located at latitudes with low crowding (|b| ≥ 5°). Whereas for apogee, the effects of field mixing would also be small because at low latitudes (|b| < 4°), where the incompleteness of the HQ 2mass catalog is largest, the [Fe/H] gradient is nearly flat Ness et al. 2016), and at high latitudes (|b| > 4°), where the [Fe/H] gradient is negative, the incompleteness of the HQ 2mass catalog is small. When we vary the width of our distance bins in |Z|, we do not find significant changes in the bulge [Fe/H] gradient. Therefore we neglect field mixing effects in this paper.

Application of the SSF-corrections
Here we illustrate the effect of the different spatial selections of the two surveys in the inner Galaxy, and then compare their SSFcorrected MDFs and [Mg/Fe] distribution functions (Mg-DFs) in regions of spatial overlap.
The first two plots in the top row of Figure 13 show the apogee and a2a MDFs and Mg-DFs of all stars (RC for a2a)

The High [Mg/Fe] and Low [Fe/H] Stars
In the following sections, we will see that the discrepancy seen in Figure 13 at the high [Mg/Fe] end is widespread in the bulge occurring in both the inner and outer bulge and at various heights from the plane, even after the SSF corrections are applied. We believe that this discrepancy can be at least partially explained by the limited T eff range spanned by the reference set (see  Figure B.1, does not reach below ∼4000 K. Because of this, The Cannon cannot learn the trends between T eff and the abundances in ASPCAP that exist below ∼4000 K. Furthermore, this T eff cut means that the a2a catalog would not contain many of these [Mg/Fe]-rich stars with T eff values just below ∼4000 K. Together, this could explain why the apogee and a2a Mg-DFs disagree at the high [Mg/Fe] end. As we will see in Section 6.4, [Mg/Fe]-rich stars are typically also [Fe/H]-poor. This could then explain why a2a also observes fewer [Fe/H]-poor stars as compared to apogee.
We cannot currently be sure whether the trends we observe between ASPCAP T eff and [Mg/Fe] are physical or systematic and therefore whether the lack of these trends in a2a is problematic or not.

Abundance Structure of the Bulge
We now present how the abundances and kinematics vary over the Galactic bulge using the combined apogee and a2a catalogs. For all figures in this section, we restrict the a2a stars to RC stars and require the apogee stars to have AstroNN distance errors less than 20%. Unless mentioned otherwise, we use the HQSSF apogee bulge MSp and, correct each survey to the HQ 2mass catalog they were selected from, and limit stars to [Fe/H] > −1 dex. Furthermore, when combining stars from different spatial bins, we weight the stars in each distance bin to correct for the SSF effects on the abundance distributions but then in each bin we re-weight both surveys such that the sum of their weights is equal to the number of stars (RC for a2a) contributed by each survey. than they are in the long bar/disk. Because of this, the vertical abundance gradients at large absolute longitudes are steeper near the plane than at small absolute longitudes. Similar abundance trends with Galactic longitude and latitude were seen by Ness et al. (2016) using apogee DR12 data. Figure 15 shows illustrative mean X-Z and X-Y [Fe/H] maps built using a2a and apogee stars separately and combined. Here we use a Galactocentric left-handed coordinate system with positive X directed towards the Sun, Y along positive longitude (l), and Z along positive latitude (b). The assumed value of the solar distance is R 0 = 8.2 kpc (Bland-Hawthorn & Gerhard 2016). In order to show all our data, we do not restrict the third dimension in each plot. From the X-Z plots in the top row, we see that the stars from both surveys become more [Fe/H]-rich towards the plane. Additionally, both the individual and combined maps show that the more [Fe/H]-rich stars dominate at larger |Z| at small |X| than they do at larger |X|. Lastly, the stars at the GC are more [Fe/H]-poor than their immediate surroundings.

Mean Abundance Maps
In the bottom row of Figure 15, on top of the X-Y [Fe/H] maps, we plot the bulge's density distribution obtained from one of the Portail et al. (2017a) bulge/bar models. These models were fit to the RC density of VVV, UKIDSS and 2MASS and to the stellar kinematics of BRAVA, OGLE, and argos. The model we use has a pattern speed of Ω b = 37.5 km s −1 kpc −1 as that was found to give the best visual match to the VIRAC proper motion data (Clarke et al. 2019). In both the separate and combined X-Y maps, the near side of the bulge appears to be more [Fe/H]-rich than the far side. This is an effect of the field viewing angles which cause the nearer stars to be preferentially sampled closer to the plane than the farther stars.
Because we do not restrict the surveys to small bins in the projection direction in Figure 15, the relative weighting by number density is incorrect, especially at low heights in the face-on view (see Section 5). In the following plots of this section, we restrict the abundance maps to smaller bins in vertical height and distance in order to minimize this effect.
The bar causes an asymmetry in the spatial maps. To remove this asymmetry, we reorient the following plots to the bar reference system taking the bar angle to be 25° ). The coordinate system is: the bar long axis (X bar ), the bar short axis (Y bar ), and the height from the Galactic plane (Z). For these figures we also symmetrize the distribution of stars in order to fill in gaps in our spatial coverage as well as increase the statistics. The symmetrisation is done by reflecting each star into each projected quadrant.
The top row of Figure 16 shows the symmetrised mean [Fe/H] maps in the |X bar |-|Z| plane for different slices in |Y bar | using stars from both apogee and a2a. On top of the map, we plot the bulge density distribution (white dotted lines) obtained from the Portail et al. (2017a) model. In all |Y bar | slices, the mean [Fe/H] generally increases towards the plane. However, for |Y bar | < 1 kpc and |X bar | < 1 kpc, the mean [Fe/H] increases rapidly towards the plane, then remains roughly constant between 0.3 |Z| (kpc) 0.7, and finally decreases within the inner few 100 pc. This is not the case well outside the boxy-peanut (b/p) bulge lobes (|X bar | > 3 kpc) where, within 1 kpc from the   Figure  16) and |X bar | < 1 kpc, the rate of decrease of the mean [Mg/Fe] is slower and the gradient inverts at small |Z| such that the inner bulge is slightly more [Mg/Fe]-rich than its immediate surroundings. A clear X-shape is seen in the mean [Mg/Fe] distribution at roughly [Mg/Fe] ≈ 0.175 dex in the |Y bar | < 1 kpc slice. In the region of the X-shape, the [Mg/Fe]-poor stars dominate at larger |Z| than they do at larger |X bar | or larger |Y bar |.  Figure 17 shows the symmetrised mean [Fe/H] maps in the |X bar |-|Y bar | plane for different slices in |Z|. The stars are restricted to the bar region which we approximate as an ellipse with a semi-major axis and axis ratio of 5 kpc and 0.4 respectively; see Figure 15. For |Z| < 0.3 kpc, the center of the bar is more [Fe/H]-poor than the bar ends. As the distance from the plane increases, this reverses at ∼0.75 kpc. At greater heights, we again see that the center of the bar is more [Fe/H]-poor than the bar ends.
In near infrared star counts, the Galactic bar has a half length of ∼5 kpc (Wegg et al. 2015). The b/p bulge extends out to ∼2 kpc from the GC. The bar region that extends outside the b/p bulge is known as the long bar. Wegg et al. (2015) showed that the long bar is composed of two bar components, the thin bar with a scale height of 180 pc, extending to ∼4.6 kpc, and the super thin bar with a scale height of 45 pc, reaching ∼5 kpc. We do not have the resolution to detect an [Fe/H] or a [Mg/Fe] signature of the super thin bar, however the top panel of Figure  17 extents to roughly 1.7 thin bar scale heights above the plane. From this, we can approximately say that the combined long bar is super solar in [Fe/H] (also seen in the top left panel of Figure  16). The lower left panel of Figure 16 also shows that the region occupied by the long bar is nearly solar in [Mg/Fe]. This is in contrast to the inner region of the b/p bulge which has a mean sub-solar [Fe/H] value and is more [Mg/Fe]-rich.

Abundance Gradients
Having shown how [Fe/H] and [Mg/Fe] vary over the bulge, we now quantify the vertical and horizontal abundance gradients in the various bulge regions. In Figure 18, we present the mean [Fe/H] and [Mg/Fe] profiles for a2a and apogee stars in the inner bulge (left column) and long bar/outer bulge region (right column) as a function of |Z|. We take the inner bulge to be the region within |X bar | < 2 kpc and |Y bar | < 1 kpc and the long bar/outer bulge region to be the region within 2.5 ≤ |X bar | (kpc) < 4.5 kpc and |Y bar | < 1 kpc.
The Using stars with |l| < 11°, Rojas-Arriagada et al. (2020) found the bulge vertical metallicity gradient to be −0.09 dex/kpc for |Z| < 0.7 kpc and −0.44 dex/kpc for 0.7 < |Z| (kpc) < 1.2. Beyond |Z| > 1.2 kpc, Rojas-Arriagada et al. (2020) found a noisy but flat profile. Assuming the bar is at an angle of 25°with respect to the Sun, the 11°limit in Galactic longitude restricts their sample to 2.7 kpc along the bar, or roughly the region we refer to as the inner bulge. Thus, their vertical metallicity gradient is consistent with our inner bulge vertical metallicity gradient. However, our increased Galactic longitude range allows us to see that the flattening at small |Z| only occurs in the inner bulge and not in the long bar/outer bulge region.   Figure 16.
The [Mg/Fe] horizontal profiles of both surveys are shown in the bottom row of Figure 19. For |Z| 0.3 kpc, the mean radial [Mg/Fe] gradient is steep and negative. However as the distance from the plane increases the stars at large |X bar | increase in [Mg/Fe], causing the profile to flatten. At large |Z|, a minimum in [Mg/Fe] is seen around |X bar | ≈ 2 kpc. This profile is due to the lobes of the b/p bulge.
As was the case for the vertical gradients, there are clear offsets between the apogee and a2a horizontal profiles in [Fe/H] and [Mg/Fe], especially for |X bar | < 2 kpc. This could at least in part be due to the limited T eff range of the reference set (see Section 5.5).

Shape of Bulge Abundance Distribution Functions
So far we have only examined how the mean [Fe/H] and [Mg/Fe] vary with position in the bulge. In this section, we illustrate how the MDFs and Mg-DFs change with position in the bulge.
In Figure 20 we plot the generalised MDFs and Mg-DFs of a2a (RC) and apogee stars in the inner bulge for 0 < |Z| (kpc) < 1.8 in bins of width 0.3 kpc. The bins were chosen such that they each contain at least 100 distinct stars. The Gaussian smoothing of each stars is 0.1 dex in the MDFs and 0.033 dex in the Mg-DFs. We see that, while there are some deviations, both surveys show similar trends in their MDFs with |Z|. Far from the plane, the MDFs of both surveys are dominated by a strong peak at ∼ − 0.4-∼ − 0.5 dex. As the distance from the plane decreases, a solar and super solar peak in [Fe/H] grow and become prominent. The surveys show a stronger difference in their Mg-DFs. In apogee, far from the plane the Mg-DF is dominated by a single peak at ∼0.3 dex. As the distance from the plane decreases, a second peak at ∼0.05 dex increases in strength, such that near the Galactic plane, the two peaks are nearly equal in strength. In a2a, the Mg-DF far from the plane is also dominated by a single peak at ∼0.3 dex. However, as the distance from the plane decreases, the strength of the high [Mg/Fe] peak decreases and the stars below ∼0.25 dex increase in strength. The peak at ∼0.3 dex, seen in the apogee Mg-DF, is not prominent in the a2a Mg-DFs near the plane.
Using Gaussian B generally has the largest sigma closely followed by C, and then A and D which are nearly equal in sigma. The top right plot shows the MDF weight variations with |Z|. The weights of all Gaussians are roughly constant below |Z| ≈ 0.7 kpc, with B having marginally the largest weight. For |Z| 0.7 kpc, the most significant metal poor Gaussian C increases, while the other two decrease such that C becomes the most dominant at large |Z|. Gaussian D is the weakest component at all heights as it never reaches over 10% in weight.
The variation of the Gaussian parameters from the three Gaussians fit to the Mg-DFs in the inner bulge is shown in the bottom row of Figure 21. . The bottom middle plot shows the sigma variation of the Gaussians with |Z|. The a2a GaussianB generally has the highest sigma by ∼0.025 dex. The rest of the Gaussians have nearly equal sigma values. The bottom left plot shows the variations of the weights with |Z|. The Gaussian weights are constant below ∼0.7 kpc and nearly equal in weight. Above ∼0.7 kpc, the weight of Gaussian Â decreases with increasing |Z| such that at ∼1.7 kpc its weight is nearly zero. Above ∼0.7 kpc, the behaviors of GaussiansB andĈ strongly differ between the a2a and apogee surveys. As |Z| increases, the apogee weights of GaussiansĈ andB increase and remain roughly constant respectively, while the a2a weights of GaussiansĈ andB remain constant and increase respectively.
In Figure 22 we perform a similar procedure as in Figure  At |Z| 1 kpc, the weights of the Gaussians are similar to those of the inner bulge, with C dominating over B and A. At low |Z| the weight of the most metal rich Gaussian A is higher than weight of C, the most significant metal poor Gaussian. The transition in weight occurs at lower |Z| than in the inner bulge. Furthermore, for |Z| 0.7 kpc, the weight profiles of A and C are not constant, but continue to increase and decrease towards the Galactic plane respectively. This is consistent with the profiles in Figure 18. The weight of Gaussian D is weak at all heights, never rising above 10%.
In the bottom row of Figure  . We see a significant offset between the GaussianB means from both surveys. Furthermore, at |Z| ≈ 1 kpc the Gaussians Â means have a large offset. The bottom right plot shows the Gaussian weight variations with |Z|. For both surveys, the [Mg/Fe]-rich Gaussian C is strong at high |Z| and decreases in weight with decreasing |Z|, while the [Mg/Fe]-normal Gaussian Â is very weak at high |Z| and increases in weight with decreasing |Z|. Because of these trends, close to the plane, Gaussian Â dominates and Gaussian C is near zero in weight. At most heights, the GaussianB weight behavior from both surveys deviates with the weight of Gaus-sianB from a2a being much larger than the weight of Gaussian B from apogee. Accordingly, the weight of GaussianĈ from a2a is lower than the weight of the correspondingly [Mg/Fe]-rich GaussianĈ from apogee at most heights.
By comparing the apogee weight behavior of the MDF and Mg-DF Gaussians in both regions of the bulge, it is clear that Gaussians A, B, and C from the MDF decomposition roughly correspond to Gaussians Â,B, andĈ from the Mg-DF decomposition. However, in the outer bulge, Gaussian Â is higher in weight than Gaussian A, while Gaussian B is higher in weight than GaussianB. Therefore, there is mixing between the Gaussians with some stars that compose Gaussians A and B in apogee being part of GaussiansB and Â respectively. For a2a, Gaussian A corresponds to Gaussian Â. However, the behaviors of Gaussians B and C from a2a from the MDF decomposition differ strongly from the behaviors of GaussiansB andĈ from the Mg-DF decomposition. This deviation in Gaussian weight behavior is much stronger in the inner bulge than in the outer bulge.
Overall, it appears that the a2a stars do not reach as high in [Mg/Fe] as the apogee stars do. Because of this, the apogee and a2a Mg-DFs deviate in shape at the [Mg/Fe]-rich end. We suspect that this may be due to the limited T eff range of the ref-

[Mg/Fe]-[Fe/H] Distribution
In this section we present how the [Mg/Fe]-[Fe/H] distribution changes along the Galactic bar using a similar method to Hayden et al. (2015) and Queiroz et al. (2020a). In Figure 24, using stars from both the a2a and apogee, we plot the [Mg/Fe]-[Fe/H] distribution along the bar in bins of |Z| and |X bar |, requiring |Y bar | < 1 kpc. We colour the points by the Gaussian kernel-density estimation using band-widths that obey Scott's rule (Scott 1992). From Figure 24 it is not clear if the two maxima compose two separate sequences (as is the case in the solar neighborhood) or if they are simply the two maxima of a single sequence. Queiroz et al. (2020b)   In fact, we see that while the "[Fe/H] rich-[Mg/Fe] poor" stars tend to dominate close to the plane and the "[Fe/H] poor-[Mg/Fe] rich" stars tend to dominate far from the plane, these trends are weaker in the inner bulge than in the long bar/outer bulge region. Instead, in the inner bulge, the "[Fe/H] poor-[Mg/Fe] rich" stars extend lower into the plane and the "[Fe/H] rich-[Mg/Fe] poor" stars extend higher out of the plane than they do in the long bar/outer bulge. This is similar to what we saw in Figure 24, although now we see it with greater statistics. Many authors take the bulge to be within R gc < 3.5 kpc. When we apply this distance cut in the third column of Figure 25 Figure 25 we still can not claim that the two maxima we observe originate from different sequences. A large fraction of the stars in the inner galaxy are likely contributed by the thin and thick disks. Additionally, a significant fraction of the halo's stellar mass is thought to reside in the inner galaxy. In Figures 24 and 25 we draw lines that differentiate regions of the [Mg/Fe]-[Fe/H] space generally populated by the halo, thin disk, and thick disk. These lines are drawn by eye and are similar to those in Adibekyan et al. (2011);Mackereth et al. (2019b); Beraldo e Silva et al. (2021). From examining the stars relative to the lines, we see that stars with abundances similar to the thin disk generally dominate close to the plane while stars with abundances similar to the thick disk generally dominate far from the plane. For small |X bar |, the "[Mg/Fe] poor-[Fe/H] rich" stars associated with the thin disk dominate at larger |Z| than they do at larger |X bar |. This could be a result of the thin disk stars being more efficiently mapped to larger |Z| during the buckling episodes that built the b/p bulge. We also see that the stars in the very inner bulge (|X bar | < 1 kpc and |Z| < 0.3 kpc) are more [Fe/H]-poor and [Mg/Fe]-rich than stars in the neighboring bins. As these stars reside mainly in the thick disk region, these stars could be thick disk stars. The thick disk is more centrally concentrated than the thin disk and therefore could dominate over the thin disk in the very center.
Lastly, how do the Gaussian fits in Section 6.3 correspond to the two maxima we see in the

Bulge Chemokinematics
In this section we look at how the radial velocity and dispersion vary with [Fe/H] and field position.
In Figure 26, we plot the mean velocity (top row) and velocity dispersion (bottom row) of stars against longitude and latitude in different [Fe/H] intervals. In order to increase the number of apogee stars at negative longitudes, we use the HQ apogee bulge MSp for this analysis, no longer requiring the apogee stars to have good SSF estimates. Thus, we include apogee stars from fields represented by both the blue ellipses and red crosses in Figure 1 out to longitudes of ±22°. Because we are including apogee stars with no SSF estimates, we do not correct the apogee catalog to the HQ 2mass catalog. However, we still correct the a2a catalog to the HQ 2mass catalog it was selected from. Lastly, we restrict the stars to the bulge region by requiring them to have R gc ≤ 4.5 kpc. The error bars in Figure 26 were determined using bootstrapping.
We see from Figure 26 that for stars with [Fe/H] > −1 dex, the mean velocity profiles do not significantly vary with [Fe/H] and that stars in the bulge rotate cylindrically regardless of [Fe/H]. There is a weak trend with latitude where stars located closer to the Galactic plane tend to rotate slightly faster than stars located farther from the Galactic plane. Other studies have observed cylindrical rotation in the bulge (Howard et al. 2009;Kunder et al. 2012;Zoccali et al. 2014)  The dispersion profile of stars with −1 < [Fe/H] (dex) < −0.5 differs from those with [Fe/H] > −0.5 dex as the separation between latitudes in the inner longitudes is much weaker and the profiles of stars closer to the plane are flatter. However, we see that, as is the case for the more [Fe/H]-rich stars, the increase in dispersion is stronger for the stars at latitudes farther from the plane (|b| > 7.5°). Furthermore, while the dispersion profiles of the stars closer to the plane are flatter than their [Fe/H]rich counterparts, the distributions of these stars are still peaked around the central longitudes. Therefore, we conclude that the dispersion profiles of stars with −1 < [Fe/H] (dex) < −0.5 are higher dispersion, slightly flatter versions of the profiles of the stars with [Fe/H] > −0.5 dex.
We find that the stars with −2 < [Fe/H] (dex) < −1 have different dispersion profiles than the more [Fe/H]-rich stars. Apart from the a2a stars between 3°< |b| < 6°, the dispersion profiles of the low latitude stars are not peaked towards the central longitudes. Furthermore, the dispersion of the stars at large absolute longitudes varies significantly between the different latitudes. The range of [Fe/H], the lower rotation, and higher dispersion is consistent with these stars being part of the halo or very [Fe/H]-poor old thick disk.
We examined the velocity and velocity dispersion profiles of apogee stars in the HQSSF apogee bulge MSp corrected to the of the iso-[Fe/H] contours with respect to the density distribution further verifies that the bulge has a mainly disk origin.  Figure 19.
N-body models initialized as a single disk with a strong negative radial metallicity gradient, such as to reproduce the observed vertical gradient in the bulge after bar evolution, inevitably also show a steep radially inward rise in [Fe/H] (Di Matteo et al. 2015;Fragkoudi et al. 2017). Unlike these models, the longitudinal abundance gradient of the multi-disk model of F18 closely matches the gradients we see in Figure 14 [Fe/H] in their centres than in their in-plane bar regions (see Figure 5 of DM19). In the star-forming, evolving disk model of DB17, the highest metallicities occur in the centre of the final b/p bulge (see their Figures 11,24,26). While this model was built from a single disk, before the bar formed the older populations have lower metallicities as well as larger radial and vertical velocity dispersions, and scale heights, than the younger populations which end up with the largest central concentration.
Thus it appears that the critical ingredient in these models for explaining the radial bulge/bar [Fe/H] gradient is the shorter radial scale-length of the initial [Fe/H]-poor disk. However, the F18 model assumed a thin disk radial scale-length of 4.8 kpc, significantly larger than in current observations (Bland-Hawthorn & Gerhard 2016), and more than twice the thick disk scale-length. This suggests other factors are at play as well. Bovy et al. (2019) and Hasselquist et al. (2020) show age maps for bulge and long bar stars from apogee DR16. These maps indicate a significant fraction of younger stars in the long bar and surrounding disk for R gc > 3.5 kpc and close to the Galactic plane, which are not prominent at smaller radii and at heights above a few 100 pc. This suggests that after the formation of the bulge, the Galactic bar either experienced substantial additional gas inflow and star formation, or captured a significant population of disk stars, by slowing its pattern rotation and increasing in size. Capturing disk stars could have happened both intermittently, if indeed the bar regularly changes its pattern speed on dynamical time-scales, due to interaction with spiral arms (Hilmi et al. 2020), or secularly by angular momentum transfer to the halo (Chiba & Schönrich 2021). Both mechanisms would be consistent with the dominance of the super-thin bar (Wegg et al. 2015) at these radii. Using the stellar ages from Mackereth et al. (2019a), and restricting the APOGEE sample to ages > 6 Gyr or > 8 Gyr, we still find a horizontal gradient near the disk plane in the remaining bulge sample, but considerably milder. This could be built by a less extreme version of the F18 model, with thin and thick disk scale-lengths closer to standard values.
Outer Bulge Vertical Gradient: The vertical [Fe/H] and [Mg/Fe] gradients in the long bar/outer bulge region are ∼−0.44 dex/kpc and ∼0.13∼0.17 dex/kpc respectively (see Figures 16   and 18). In this region, there is no inner flattening of the vertical gradient near the plane as is the case in the inner bulge.
For their three-disk models DM19 showed that while both produce b/p bulges, the region beyond the lobes of the b/p bulge in the model formed from the disks with different radial dispersion has no vertical metallicity gradient (see Figure 6 in DM19). However, in the model built from disks with different vertical scale heights, a vertical gradient remains in the outer bar. The DM19 models are not realistic and are instead designed to represent extreme cases. In reality, the in-plane and vertical dispersions of a disk should be correlated such that a disk with higher (lower) radial dispersion should also have higher (lower) vertical dispersion and therefore a larger (smaller) scale height. The F18 model is more realistic in that the disks with higher dispersions, which are designed to be more [Fe/H]-poor, have larger scale heights. Due to the interplay with the disks' scale lengths, the F18 model shows a strong vertical gradient in the disk regions.
The single-disk model of Fragkoudi et al. (2017) with an initial radial [Fe/H] gradient did not lead to such a vertical gradient in the outer bar. The evolving disk model, DB17, does show a vertical gradient in the disk. In this model the older populations, which tend to be more metal poor, have larger scale heights than the younger populations, which tend to be metal rich.
This comparison supports a multi-disk formation scenario where the disks that form the bulge must differ in vertical velocity dispersion, and therefore scale height, in order to produce a bulge where the region outside the b/p bulge lobes has strong vertical [Fe/H] and [Mg/Fe] gradients. At small |X bar |, this trend is also present, although weaker than at larger |X bar |. There is no clear evidence for two distinct sequences in Figures 24 and 25 (but see also Queiroz et al. 2020b).
The stronger bi-modality in the [Fe/H]-[α/Fe] distribution is often interpreted as two distinct star formation episodes well separated by a period of quenched star formation (Chiappini et al. 1997;Haywood et al. 2018 Figure 25) and the bi-modal Mg-DFs of the inner bulge seen in apogee ( Figure 20) (see also Matteucci et al. 2019 Di Matteo et al. (2015) showed using an N-body model that if the origin of a b/p bulge is a single disk with a strong negative radial [Fe/H] Ness et al. (2013b), they needed to add a slowly rotating, low mass (5% central Milky Way mass), high dispersion component to their model. They associated this component with the stellar halo (see their Figure  21). From our data, the slower rotation and increased dispersion of the stars with −1 < [Fe/H] (dex) < −0.5 as compared to the stars with −0.5 < [Fe/H] (dex) < −0.25 may be the result of contamination by the high dispersion component dominating below −1 dex. As the stars in the [Fe/H]-poor tail are slowly rotating, kinematically hot, and [Mg/Fe]-rich, we tentatively associate them with the stellar halo. A contamination by halo stars is further supported by Lucey et al. (2021) who, from examining the chemo-kinematics of metal poor bulge stars, found that the fraction of halo interlopers in the bulge increases with decreasing metallicity between −3 < [Fe/H] (dex) < 0.5.
In all, our data paints a consistent picture for the origin of the b/p bulge: at least two initial disks with differing dispersions, scale heights, and scale lengths underwent spatial and kinematic fractionation resulting in the b/p bulge of the Milky Way that we observe today. Due to their kinematics as well as their [Fe/H]-[Mg/Fe] distributions, we associate these disks with the thin and thick disks. We also associate the very [Fe/H]-poor, [Mg/Fe]rich, kinematically hot tail of the bulge stars with the stellar halo.

Summary and Conclusions
In this work we used the data driven method, The Cannon, to put stars from the argos survey onto the parameter and abundance scales of the apogee survey. After doing so, we could directly combine the two surveys and gain a deeper and more reliable coverage of the Galactic bulge.
In the first half of the paper, we describe how we applied The Cannon to the argos stars to obtain the a2a survey. To show that we have successfully placed the a2a stars on the apogee label scales, we performed three validation tests: the pick-one-out test (Figure 4), spectrum reconstruction (Figure 6), and stellar T eff -log(g) distribution (Figure 7). These tests showed that it is possible to perform The Cannon label transfer using a moderate number (204) of reference set stars and still obtain labels with good precisions. After performing the validation tests, we accounted for the SSF of each survey such that after it is corrected for, we statistically obtain the HQ 2mass catalogs they were selected from. After this, we compared the SSF-corrected MDFs and Mg-DFs of the apogee and a2a surveys and found that the different spatial regions probed by each survey cause a clear spatial bias, however when the distribution functions are compared in fixed distance bins, the MDFs and Mg-DFs agree except at the high [Mg/Fe] and low [Fe/H] ends where apogee observes more stars. This may be due to trends between T eff and the abundances in apogee and the fact that the reference set only covers a limited T eff range on the apogee scale.
In the second half of the paper, we used stars from both the a2a and apogee surveys to investigate the abundance structure of the bulge. The results we found include: (i) The [Fe/H] and [Mg/Fe] maps built using apogee and a2a data show strong X-shapes that are more pinched than the density distribution given by the currently best dynamical model of the Milky Way's bulge/bar from Portail et al. (2017a). The stronger pinching in the [Fe/H] and [Mg/Fe] maps than in the density map supports a mainly disk origin for the Galactic bulge. The positive horizontal [Fe/H] gradient in the bulge close to the Galactic plane and the negative vertical [Fe/H] gradient in the long bar region favour models in which the bulge/bar formed from initial thin and thick disks with different vertical and radial scale-lengths (Fragkoudi et al. 2018). This multi-disk origin is further supported by the higher pinching of the abundance distributions in the bulge as compared to the density distributions and the differing kinematics of the low and high [Fe/H] stars. However, the large thin-disk scale-lengths required by these models, together with younger estimated mean ages in the outer bar Hasselquist et al. 2020) suggest that the Galactic bar may have captured younger, more metal-rich stars well after its formation.

Appendix A: Effect of log(g) Limits on the A2A Red Clump Sample Completeness
We can approximate the effect of the log(g) limits (8b) on the completeness of the a2a RC stars. In the top plot of Figure A.1, the cyan histogram shows the number of argos stars as a function of argos log(g). The colour cut in Equation 12 and reference set limits 7a, 8a, 8b have been applied. The gravity reference set limit 8b has not been applied. The blue histogram in the top plot of Figure A.1 gives the number of a2a RC stars at each argos log(g). Notice that because the blue histogram is built from a2a stars, the log(g) limits (8b) have been applied and the blue histogram does not extend as far in log(g) as the cyan histogram. In the bottom plot of Figure A.1, we take the ratio of the two histograms to get the fraction of a2a RC stars at each argos log(g). At log(g) = 2.6 dex, 55.5% of the stars in the cyan histogram are a2a RC stars. The fraction decreases linearly outward in both directions to 31% at 1.6 dex and 38.0% at 3.2 dex. We ignore the final bins because the log(g) limits fall within these bins causing the number of RC stars to be artificially lower. Assuming the linear trend on both sides continues past the limits, we fit a line to each side and extrapolate beyond the cuts to get the fraction of RC stars at each argos log(g). We then multiply the number of stars given by the cyan histogram in the outer bins by the fractions we get from extrapolating to get the approximate number of RC stars removed by the log(g) limits. Using this method, we find that approximately 200 RC stars are removed by the log(g) limits. Under the assumption that, to the first order, the other limits (7a, 8a, and 8b) affect the RC and non-RC stars equally, then 92% of the RC stars originally observed by argos are accounted for in the a2a catalog.

Appendix B: Abundance Trends in APOGEE and the Effect on A2A
It has been found in apogee that the ASPCAP T eff is correlated with some of the ASPCAP abundances due to the physical stellar models (Jönsson et al. 2018;Jofré et al. 2019). We show the correlation between T eff and [Mg/Fe] of stars in bins of [Fe/H] in Figure B.1. For all plots, the stars are restricted to narrow ranges in distance, SNR, and height from the plane. The reference set we use to train The Cannon model to build the a2a catalog does not span the entire T eff range covered by the HQSSF apogee bulge MSp; see Figure B.2.