Mapping Luminous Hot Stars in the Galaxy

[Abridged] Luminous hot stars dominate the stellar energy input to the interstellar medium throughout cosmological time, they are laboratories to test theories of stellar evolution and multiplicity, and they serve as luminous tracers of star formation in the Milky Way and other galaxies. Massive stars occupy well-defined loci in colour-colour and colour-magnitude spaces, enabling selection based on the combination of Gaia EDR3 astrometry and photometry and 2MASS photometry, even in the presence of substantive dust extinction. In this paper we devise an all-sky sample of such luminous OBA-type stars, designed to be quite complete rather than very pure, to serve as targets for spectroscopic follow-up with the SDSS-V survey. We estimate"astro-kinematic"distances by combining parallaxes and proper motions with a model for the expected velocity and density distribution of young stars; we show that this adds useful constraints on the stars' distances, and hence luminosities. With these distances we map the spatial distribution of a more stringently selected sub-sample across the Galactic disc, and find it to be highly structured, with distinct over- and under-densities. The most evident over-densities can be associated with the presumed spiral arms of the Milky Way, in particular the Sagittarius-Carina and Scutum-Centaurus arms. Yet, the spatial picture of the Milky Way's young disc structure emerging in this study is complex, and suggests that most young stars in our Galaxy ($t_{age}<t_{dyn}$) are not neatly organised into distinct spiral arms. The combination of the comprehensive spectroscopy to come from SDSS-V (yielding velocities, ages, etc..) with future Gaia data releases will be crucial to reveal the dynamical nature of the spiral arm themselves.


Introduction
Luminous and hot stars are massive, and hence rare and short lived. However, they play decisive roles across different fields of astrophysics.
They dominate the interaction between stars and the interstellar gas and dust (ISM) in their host galaxies, by heating or ionising those components; those stars also interact with the ISM through powerful stellar winds and eventually supernova explosions (Mac Low & Klessen 2004;Hopkins et al. 2014). Luminous and hot stars must indeed have played an important role in galaxy evolution throughout cosmic time, via their intense winds, ultraviolet radiation fields, chemical processing, and explosions (Haiman & Loeb 1997;Douglas et al. 2010;Bouwens et al. 2011).
Luminous and hot stars are inevitably young, and therefore they can serve as tracers of recent massive star formation. Although they make up an insignificant fraction of the overall stellar mass, they contribute a major portion of the light of the disc and they can thus probe the spiral structure and young disc kinematics of our and other galaxies (Xu et al. 2018;Chen et al. 2019;Dobbs & Baba 2014;Kendall et al. 2011Kendall et al. , 2015.
They are, directly or through their remnants, decisive drivers of the chemical evolution for many elements in the periodic table and they serve as crucial laboratories for stellar evolution in this arguably least well-understood regime of stellar physics.
Massive stars are born predominantly as members of binary and multiple systems (Sana et al. 2012(Sana et al. , 2014Kobulnicky et al. 2014;Moe & Di Stefano 2017). As a consequence, most of them are expected to undergo strong binary interaction, which drastically alters their evolution (Podsiadlowski et al. 1992;Van Bever & Vanbeveren 2000;O'Shaughnessy et al. 2008;de Mink et al. 2013;Langer et al. 2020).
Finally, massive (binary) stars are the only channel to yield binaries that involve black holes and neutron stars in the disc of the Milky Way. Therefore, understanding massive star binaries, also as they evolve away from their zero-age main sequence, is indispensable for understanding the distribution of gravitational wave events. For all the above reasons, a multi-epoch, spectroscopic census of luminous and hot stars across the Galaxy, providing spatial and dynamic information together with estimates for masses, ages, metallicity, multiplicity, and other spectroscopic information is needed. The SDSS-V survey (Kollmeier et al. 2017;Bowen & Vaughan 1973;Gunn et al. 2006;Smee et al. 2013;Wilson et al. 2019, Kollmeier, J., Rix, H.-W., et al., in prep) will provide such a comprehensive, multi-epoch spectroscopic program on hot and massive stars, which -however -Article number, page 1 of 19 arXiv:2102.08684v1 [astro-ph.GA] 17 Feb 2021 A&A proofs: manuscript no. aanda must be based on a clear and quantitative selection function to enable rigorous subsequent population studies.
There is no universal precise 'definition' across different subcommunities within astrophysics of when a star is 'luminous', 'hot', and 'massive'. 'Hot' may either mean that its T eff is sufficiently high that the spectrum is dominated by H and He lines, rather than metal lines (i.e. OBA stars); or it may mean that T eff is sufficiently high that the star produces significant amounts of ionising radiation (O and early B stars). 'Massive' may either mean, massive enough to 'go Supernova' or 'to have a convective core'.
For the current context we choose an operative definition of luminous and hot as T eff 8000 K and M K 0 mag, or -loosely speaking -OBA stars. This is the set of stars that addresses the science sketched above. Of course some aspects, such as ISM ionisation, are only affected by O and early B stars; but it is important to remember that even the most massive stars are not ionisingly hot during all of their evolution. Therefore we aim in our sample definition for a sample that is hot enough to eliminate all the luminous red giant branch (RGB) and asymptotic giant branch (AGB) stars, but include most evolved phases of massive stars. Stars with T eff ≥ 8000K can be photometrically pre-selected across the entire sky; yet efficient discrimination of 10,000 K from 20,000 K by means of photometry only would be nearly impossible in the presence of dust.
In light of the science goals above, in this paper we present the selection of the target sample of massive stars of the SDSS-V survey. We describe an approach to use kinematics to improve distance estimates. Finally we use a "clean" sub-set of the target sample to study the structure of the Milky Way disc as traced by young, massive stars.
The selection of our target sample is based on the combination of Gaia EDR3 (Gaia Collaboration et al. 2016 astrometry and photometry and 2MASS (Skrutskie et al. 2006) photometry. Similar samples of stars in the Upper Main Sequence (UMS) were already selected by Poggio et al. (2018) and Romero-Gómez et al. (2019) to characterise the structure and the kinematic properties of the Galactic warp. Poggio et al. (2018) used a combination of Gaia DR2 and 2MASS colours to select stars of spectral type earlier than B3, and obtained a sample of 599 494 stars. Romero-Gómez et al. (2019) used only Gaia DR2 photometry to select stars brighter than M G,0 = 2 mag (thus likely including also A-type stars), and obtained a sample of 1 860 651 stars. Our sample definition (described in Section 2) lies somewhat in the middle between those by Poggio et al. (2018) and Romero-Gómez et al. (2019). In Section 3 we describe the method that we use to estimate distances. In Section 4 we clean the target sample for sources with spurious astrometry and we filter out older contaminants. In Section 5 we use this filtered sample to study the 3D space distribution of the sources and we compare the distribution of OBA stars with other tracers of massive star formation and spiral arm structure. In Section 6 we discuss our findings in the context of the structure and nature of the spiral arms of the Milky Way. Finally we summarise our results and draw our conclusions in Section 7.

Target sample
In this Section we present the criteria that we applied to select the target sample for the SDSS-V spectroscopic survey, and we describe its characteristics in terms of sky distribution, magnitude and luminosity distribution, variability, purity, and completeness.

Astro-photometric selection
To select our sample, we use Gaia EDR3 photometry (Riello et al. 2020) and astrometry (Lindegren et al. 2020b) combined with 2MASS photometry. The cross-match betweeen Gaia and 2MASS is not yet provided in the Gaia archive (Marrese et al., 2021, in prep) 1 . For this reason we performed first a cross-match with Gaia DR2, then a cross-match with 2MASS. The ADQL query for this cross-match is provided in Appendix A.
We restrict our query to the sources with G < 16 mag. The motivation of this condition is twofold. On the one hand, it is a SDSS-V technical requirement: at G = 16 mag we can obtain a S/N = 75 with 15 minute exposures with the BOSS spectrograph (Smee et al. 2013). On the other hand, for G > 16 mag the Gaia EDR3 parallax precision rapidly deteriorates (see e.g. Table 3 in Gaia Collaboration et al. 2020a), thus a) estimating absolute magnitudes (which are crucial to our selection) becomes nontrivial and b) inferring distances strongly depends on the choice of the prior (see Appendix B and Section 3).
We define a proxy for the absolute magnitude of a star in the K s band,M K s , and we requireM K s < 0. This translates to the following parallax condition: and aims at selecting a reasonably-sized sample of bright stars. The value M K s = 0 mag corresponds roughly to a B7V-type star (Pecaut & Mamajek 2013). To include also later B-type stars we should requireM K s < 1 mag, as M K s = 1 mag roughly corresponds to the absolute magnitude of a A0V-type star. We chose howeverM K s = 0 mag as a good compromise between completeness and contamination (see Section 2.1.2). We stress that for the survey target list we do not require any condition on the photometric quality of 2MASS, nor on the photometric errors, nor on the parallax accuracy or the goodness of fit of the astrometric solution. This is motivated by the fact that we are aiming to obtain a known, well defined selection function: not only measurement errors do not describe intrinsic physical properties of the sources, and thus should not enter the selection criteria, but they are also difficult to model.
At this point our sample, shown in the (J − K s ) 0 vs G BP −G RP colour-magnitude diagram in Fig. 2, consists of luminous stars, which could either be OBA, RGB, and AGB stars. OBA stars fall in the region highlighted by the black ellipse. The stars in the horizontal stripes are stars with large 2MASS photometric errors or 2MASS photometric quality flags (ph_flag) different than "AAA".
To select OBA stars we use simple photometric cuts, roughly following the procedure outlined in Poggio et al. (2018) (see in particular their Fig. 1). The first step of our selection consists in selecting sources with: (2) The first condition excludes most of the red sources, the second condition removes sources with unnaturally blue colours. We obtained the (J − K s ) 0 colour by applying the following Equation: which we derived by noticing that, for the spectroscopically confirmed O and B-type stars selected by Liu et al. (2019), the (J − K s ) and (G − K s ) colours are linearly correlated (see Fig.  1, left), and by assuming that the difference between the J and K s magnitudes for such stars is close to zero (see e.g. Pecaut & Mamajek 2013). Fig. 3 shows the distribution of sources in the G−K s vs. J−H colour-colour diagram. O and B-type stars lie on a sequence in the J − H vs. G − K s colour-colour diagram as a consequence of interstellar reddening and are clearly separated from redder turn-off stars and giants. We therefore select stars located in the region defined by the equations: (solid grey lines in Fig. 3). Finally, we select stars in the G vs. G− K s colour-magnitude diagram, where giants still contaminating our sample can be easily separated from OBA stars (see Fig. 4).
In particular we select stars satisfying:  We stress that massive stars in the Magellanic clouds are included in the target sample of the SDSS-V survey. In this paper however we focus on the Milky Way disc, thus in the following sections we restrict our sample to stars with |b| < 25 • . Massive star forming regions can be recognised as over-densities in the source distribution. Dust features are visible as 'gaps' in the star distribution. For example, the Aquila rift, the Pipe Nebula and the Ophiucus clouds can be easily identified towards the Galactic centre (l, b = 0, 0 deg). By visually inspecting the features of Fig. 5, it is possible to identify old stars contaminating the sample. For instance, at l, b ∼ 309 • , 15 • the ω Centaurus globular cluster is visible, while the stars towards the Galactic centre delineate the shape of the bulge. We further study the purity and completeness of our sample in Sec. 2.1.2. Fig. 6 shows the distribution in apparent (left) and absolute magnitude (right) of the sources. The orange histograms show the distribution of all the sources in our sample. The grey histograms show the distribution of the sources selected in Section 3. On the right hand panel, absolute magnitudes are computed by using the inversion of parallaxes as the distance (orange histogram) or astro-kinematic distances (grey histogram, see Section 3). The grey distribution appears shifted towards larger M G values. This is due to the fact that, for stars with small parallaxes (and large parallax errors), our astro-kinematic distances are shorter than those obtained by inverting the parallax. Fig. 7 (left) shows the variability distribution of the sources in the G band vs. the ratio of the variability in the G BP and G RP bands. Following roughly Belokurov et al. (2017), Deason et al. (2017) and Iorio et al. (2018), the variability in a given filter x is defined as: where phot_x_n_obs is the number of observations contributing to the x-band photometry, and phot_x_mean_flux_over_error = phot_x_mean_flux/phot_x_mean_flux_error.
While most of our sources are not variable, some cluster in two regions: RR Lyrae stars have G-band variability ranging from 5 to 40% ) and G BP /G RP variability around 1.6; eclipsing binaries have G variability up to 30% and roughly equal G BP and G RP variability. Fig. 7 (right) shows the same as Fig. 7 (left) but for the sources selected in Section 3 (see below), that have astro-kinematic distances compatible with those estimated by Bailer-Jones et al. (2018). This selection removes RR Lyrae stars, while retaining eclipsing binaries: this is expected as RR Lyrae stars do not follow the same kinematics as the OBA stars that we are interested in.

Completeness and purity
To study the purity level of our sample we cross-matched it with LAMOST DR6 (http://dr6.lamost.org/), and find 36 617 sources in common. We derived effective temperatures (T eff ) and surface gravity (log g) from the LAMOST spectra by fitting ab initio model spectra generated with the 1D-LTE ATLAS12 model atmosphere (Xiang et al., in prep). Fig. 8 shows the distribution of sources in the log T eff vs. M K s plane. Around the 45% of the sources are hotter than 9 700 K (orange solid line in Fig.  8, left, corresponding to the temperature of a A0V-type star) and around the 20% are hotter than 14 000 K (roughly the temperature of a B7V-type star). We assume that we can extrapolate the same numbers to the entire sample. The PARSEC isochrones (A V = 0 mag and solar metallicity) in Fig. 8 (Bressan et al. 2012;Tang et al. 2014) show that our main source of contamination are evolved massive or intermediate mass stars (e.g. yellow and blue super-giants) and that the sample is substantially free from RGB and AGB stars, which are colder and more luminous than the stars in our sample. In the left-hand panel, the isochrones are colour-coded by their log age (from 1 Myr to 500 Myr), in the right hand panel they are colour-coded by mass.
To estimate the completeness of the sample we crossmatched with the O and B-type star catalogue by Liu et al. (2019) and the O-type star catalogue by Sota et al. (2014)  consists of 590 stars, that reduce to 580 after cross-matching with Gaia EDR3 and 2MASS. Our catalogue contains 503 (∼86%) of the GOSSS stars. As with the LAMOST sample, the stars missing from our selection do not follow some of our photometric selection criteria. For example many of the missing GOSSS sources (49%, 54 stars) have J − H > 0.15(G − K s ) + 0.05 (see Eq. 4), that is 2MASS-Gaia EDR3 colours consistent with being giants.

Distance estimates
To study the 3D space distribution of our sources, precise distance estimations are needed. We estimate distances by using a model designed to reproduce the properties of our data-set in terms of spatial and luminosity distribution, and the additional information that stars belonging to our sample should follow Galactic rotation, with a small, typical velocity dispersion. By making such assumption we neglect non-circular streaming motions. We can then predict the true proper motions (µ l , µ b ) of the stars in our sample and compare them with the observed ones, thus adding a further constraint to the distance estimation and deriving 'astro-kinematic' distances. The probability density function (pdf ) for a star in our sample to be at a certain distance d kin is given by:  where is the array of the astrometric observables, i.e. parallax (with 0 the parallax zero-point) and proper motions components in l and b, µ l * and µ b ; m K s is the apparent magnitude in the K s band; -Θ K M represents our kinematic model; -Θ S M represents our model for the distribution of stars in the Galaxy; -Θ CMD accounts for the observational effects due to our selection function on the spatial distribution of our sample.
The method and all the terms in Eq. 8 are described in detail in Appendix B. We adopt as the astro-kinematic distance estimate d kin the mode of the pdf of Eq. 8, and we use the 16th and 84th percentiles to estimate distance errors. We also tried using the median of the pdf as a distance estimate, but this did not cause significant differences in the maps presented in Section 5. Figure   9 (left) shows the comparison between our astro-kinematic distances and the photo-geometric distances estimated by Bailer-Jones et al. (2020). More than the 85% of the sources have astro-kinematic distances consistent within 1σ with the photogeometric distances from Bailer-Jones et al. (2020). We comment on the sources with inconsistent distances in Section 4.2.

Filtered sample
In this Section we define a sub-set of the target sample (Section 2), which we use to map the structure of the young Milky Way disc. Such "filtered" sample is obtained by cleaning the target sample for sources with likely spurious astrometric solutions (Sec. 4.1) and by removing sources with kinematic properties not consistent with the model that we used to estimate astrokinematic distances (Sec. 4.2).

Removal of spurious sources
To remove spurious sources, we use a method developed by Rybizki et al. (2021), based on Gaia Collaboration et al. (2020b). Spurious sources have poor astrometric solutions that can be due to the inconsistent matching of the observations to different physical sources. This is more likely to occur in regions of high source surface density (for example, in the Galactic plane) or for close binary systems (either real or due to perspective effects).
To identify poor astrometric solutions in their solar neighbourhood catalogue, Gaia Collaboration et al. (2020b) constructed a random-forest classifier that assigns a probability to each source of having a good (or bad) astrometric solution, based on astrometric quantities and quality indicators. The classification probability is ∼0 for sources with poor astrometric solutions, and ∼1 for sources with good astrometric solutions. Gaia Collaboration et al. (2020b) applied this classifier to stars with distances less than 100 pc. Rybizki et al. (2021) used similar methods to develop a neural network classifier, which they applied to the entire Gaia EDR3 catalogue. The distribution (in logarithmic scale) of "astrometric fidelities" for the sources in our target sample is shown in Fig. 10. We select sources with astrometric fidelities > 0.5 (around 75% of the target sample), following the definition of "good" and "bad" astrometric solutions in Rybizki et al. (2021). The distribution in the Galactic plane of sources with good and bad astrometric solutions is shown in Fig. 8 of Rybizki et al. (2021). While this procedure allows to obtain a more reliable sample, it might also eliminate binaries. Binaries however do not constitute the the focus of this paper and do not affect the space distribution of our filtered sample.

Kinematic cleaning
Our astro-kinematic distance estimates are not accurate for sources that do not follow the disc kinematics assumed by our model. Such sources can be grouped in two categories: 1. stars whose kinematics is influenced by the bar. They are located mostly towards the Galactic Centre, and have astro-kinematic distances larger than Bailer-Jones photogeometric distances. These stars could in principle be young. Their astro-kinematic distance estimates could however be wrong due to the kinematic effects of the bar itself, that have not been included in our model (see Section 3 and Appendix B). In the following Sections we focus on the in-plane distribution of stars within d kin 5 kpc, and thus we do not consider sources in the bar/bulge regions. 2. stars with heated vertical kinematics. These stars have Bailer-Jones photo-geometric distances d BJ 3 kpc and astro-kinematic distances d kin ∼ 2 kpc. They could correspond to old contaminants, such as the RR Lyrae-type variable stars discussed in Section 2.1.1. To remove them we proceed as follow. First, we compute their vertical velocity, v z (see Appendix C).We fit the vertical velocity distribution with a Gaussian Mixture model with two components, the first (component A) with mean vertical velocity < v z > A = −0.5 km s −1 and velocity dispersion σ v z ,A = 7.0 km s −1 which has properties comparable with the young population that we are interested in studying, and the second (component B), with < v z > B = 1.6 km s −1 and velocity dispersion σ v z ,B = 29. km s −1 , which instead better reproduces the properties of an old population. We estimate the probability for each star to belong to either component A or B, and we select those stars with larger probability of belonging to component A. Figure 9 (right) shows the comparison between the astrokinematic distances and Bailer-Jones photo-geometric distances after cleaning the target sample as explained above. At this point, around the 95% of the sources have astro-kinematic distances consistent with Bailer-Jones photo-geometric distances within 1σ.
Finally, by using our astro-kinematic distances, we estimate the absolute magnitude of each star in the K s band, M K s , and we select stars with M K s < 0 mag. This condition refines Eq. 1, and aims at selecting stars of type earlier than B7V.
After applying the quality cuts described in this Section, our filtered sample reduces to 435 273 stars.

3D space distribution
To create the 3D density maps, we follow the method outlined in Zari et al. (2018). We compute galactic Cartesian coordinates, x, y, and z, for all the sources by using our newly estimated astrokinematic distances and we define a box V = 12 × 12 × 1.5 kpc centred on the Sun. We divide the cube into volume elements v = 5 × 5 × 5 pc. After computing the number of stars in each volume, we estimate the star density D(x, y, z) by smoothing the distribution by means of a 3D Gaussian filter, using a technique similar to that used by Bouy & Alves (2015).The Gaussian width (equal on the three axes) is w = 5 pc the Gaussian is truncated at 3σ. The choice of a certain w value is arbitrary. A high w value produces a smooth, less detailed map, while a low w value results in a noisy map. Fig. 11 shows the projection on the Galactic plane of the filtered sample selected in Section 4. A number of features is visible, showing different structures at different scales and distances. At the centre of the map (x, y ∼ 0, 0), a low-density gap is visible. This is due to a combination of reasons. First, in the solar neighbourhood the number of OBA stars is low. The Sun is indeed currently located in a cavity, partially filled with hot, low-density gas, which is usually called the Local Bubble (LB). The LB was likely created 10-20 Myr ago by a number of supernova explosions that likely halted any low-or high-mass star formation episode in the region. Second, our selection criteria exclude by construction stars of later spectral type than ∼B7V.
Thus, even if some late type B stars present within 200 − 300 pc from the Sun, they are excluded from our selection. Third, selection effects due to the fact that a) nearby bright sources are not included in Gaia EDR3 and b) some of those that are included might have poor 2MASS photometry, and thus are excluded by the colour selection in Eq. 2.
The small dense clumps can be associated to well studied OB associations, for example Cygnus, Carina, Cassiopeia, and Vela, are visible. The distribution of the lower density contours traces the Milky Way young disc structure, in particular the spiral arm location (see Sec. 5.2 and 6). The density distribution presents numerous low-density gaps. These could be due to the fact that our view might be obscured by interstellar dust towards certain lines of sight, however some gaps are located in regions of relatively low extinction (see also Section 5.3). For example the Perseus gap (x, y ∼ −2, 1 kpc) has been detected in the distribution of other young tracers, which are traditionally associated with spiral arms, such as HII maps ( 2020) suggests that a similar gap might be observed around x, y ∼ 1, −1 kpc, where our map show an under-density, although less pronounced than the Perseus gap. In general, the distribution of stars along the "arms" is not homogeneous, but characterised by under-and over-densities. For distances larger than 3-4 kpc the number of sources (and thus their spatial density) decreases due to our magnitude limit at G = 16 mag (see Sec. 2).
The radial features are caused by 'shadow cones' produced by foreground extinction, and by distance uncertainties. As shown in Fig. B.3, distance uncertainties are lower than 10% for stars within 5 kpc, and therefore are only partially responsible for the elongation.

The Galactic warp
As mentioned in the Introduction, early-type star samples have been used to study the properties of the Galactic warp (see Poggio et al. 2018;Romero-Gómez et al. 2019, and references therein). Although in this paper we do not focus specifically on the warp, we studied the median height of our filtered sample with respect to the plane of the Galaxy and the median vertical velocity distribution, and compared our results to those by Poggio et al. (2018) and Romero-Gómez et al. (2019) to further validate our selection. Fig. 12 shows a map of the median height z of our sample on a spatially uniform grid, with bins of 100 pc width. We consider only bins that have at least 10 stars. The radial features in Fig.  12 are likely an artefact due to uneven sampling above or below the Galactic plane due to foreground extinction. Such features make the interpretation of Fig. 12 uncertain and might prevent from seeing the warp shape, especially for X > −2.5 kpc. For X < −2.5 kpc the impact of extinction is less dramatic (see also The kinematic signature of the warp is evident in Fig. 13 (see e.g. Poggio et al. 2018;Romero-Gómez et al. 2019), which shows the median vertical velocity (v z ) distribution of our filtered sample on the Galactic plane (see Appendix C for more details on how the vertical velocity was computed). For distances larger than 4 kpc from the Sun, the vertical velocities have positive values that seem to peak at Y ∼ 0 kpc and decrease towards both sides. At smaller distances from the Sun, some features also present positive vertical velocity. While some correspond to density enhancements in the 3D spatial distribution of our sources, this is not the case for all of them. Such variation in v z are likely related to the perturbations that the disc is experiencing (e.g. interaction with satellite galaxies).

Comparison with other O and B star maps
By combining Gaia EDR3 and VPHAS data with literature catalogues, Chen et al. (2019) obtained a sample of 14 880 OB stars, earlier than B3V, which they used to describe the morphology of the spiral structure of the Milky Way. Chen et al. method to select O and B stars substantially differs from ours, however a qualitative comparison between their Fig. 4 and our maps shows substantially the same gross structures. This increases our confidence that both maps are revealing the same features in the Galactic O and B star distribution. The larger number of sources in our catalogue enables us to reveal more details in the source distribution. Interestingly, neither maps show a clear spiral structure. This will be further discussed in the Sec. 6. Fig. 3 in Romero-Gómez et al. (2019) shows the distribution on the Galactic plane of their upper main sequence sources. Fig. 11 shows essentially the same over-densities, although to a higher level of detail. This is mainly due to two facts: a) to create their maps, Romero-Gómez et al. (2019) used a larger bin size than in our present work, and b) their sample likely includes more later type stars than ours due to their selection criteria: this causes the distribution of sources to be smoother.  2018) is higher than ours. As mentioned above, this creates a smoother map, where it is not possible to identify small-scale over-densities. The comparison between large scale structures shows however many similarities.

Comparison with dust distribution
Dust is also a tracer of Galactic structure. Fig. 14 shows the projection of the 3D dust distribution from Lallement et al. (2019) on the Galactic plane, over-plotted on top of our density map (see Fig. 11). The units of the density distribution are arbitrary. The dust density distribution shows discrepancies with respect to the star distribution. For example, the two elongated structures at the centre of the map are not prominent in the OB stars distribution, and the dust features in the first and fourth Galactic quadrants are off-set with respect to the star density distribution, and seem to have a different separation and relative inclination. Such offsets are expected, as newly formed stars will drift from their birthplaces as they follow galactic rotation while the spiral arms move with a given pattern speed.

Comparison with Cepheid distribution
Classical Cepheids are young (< 400 Myr) variale stars, whose distances can be estimated thanks to their period-luminosity relation. (Fig. 15 show the Cepheids identified by Skowron et al. (2019), over-plotted on the density distribution of O and B stars (same as in Figs. 11 and 14). The colour-bar represents the ages determined by Skowron et al. (2019) (in million years). The Fig. 11. Surface density of the stars selected in Section 3 projected on the Galactic Plane. The Sun is in (0,0), the x-axis is directed towards the Galactic centre, and the y-axis towards Galactic rotation. The z-axis is perpendicular to the plane. The density is displayed in arbitrary units. The dashed circles have radii from 1 to 6 kpc, in steps of 1 kpc. The labels follow the nomenclature proposed by K. Jardine (http://gruze.org/galaxymap/map_2020/). The gaps in the density distribution mentioned in the Section 5 are indicated by the grey circles.
Cepheid distribution traces reasonably well the density enhancements corresponding to the Sagittarius-Carina arm (see Sec. 5.5), while the correspondence with the other density enhancements is not as tight. Although the age distribution of the Cepheids and our filtered sample is similar, the selection criteria are different: this makes a more direct comparison difficult.

Comparison with maser distribution
The spiral structure of the Milky Way can be traced also by water and methanol masers associated with high-mass star forming regions (HMSFRs). By measuring parallaxes and proper motions of such HMSFRs, Reid et al. (2019) provided estimates of the fundamental Galactic parameters, among which the pitch angle and the arm width. Fig. 16 shows the masers and the location of the arms from the fit from Reid et al. (2019), on top of the OB stars density map shown in Fig. 11. Similarly to the Cepheids distribution, there is a good agreement between the position of the masers and the location of the most prominent over-densities, which makes us confident of our distance estimates. Unfortunately there are no available maser data tracing the spiral structure in the 4th Galactic quadrant (X > 0, Y < 0), thus making a more detailed comparison in that region non-trivial. However, by tracing the spiral arms beyond the data while keeping the parameters fixed (dashed lines, and lighter contours), more discrepancies become visible, especially towards the inner Galaxy, towards the Sagittarius-Carina (purple) and Scutum-Centaurus arm (orange).

Discussion
We have devised a large and systematically selected sample of massive, young stars and we studied its properties focusing on the spatial distribution of our sources on the Galactic plane. In this Section we discuss our findings in the context of the spiral arm structure of the Milky Way.
Traditionally, the four main spiral arms of the Milky Way are considered to be Perseus, Sagittarius, Scutum and Norma, with Median height for the filtered sample projected on the Galactic plane. We divided the XY plane into bins of 200 pc width, and we only show the ones containing more than 10 stars. The dashed circles have radii from 1 to 9 kpc, in steps of 1 kpc. The radial features are likely due to uneven sampling of sources in the Galactic plane due to foreground extinction. Fig. 13. Median vertical velocity v z distribution on the Galactic plane. We divided the XY plane into bins of 200 pc width, and we only show the ones containing more than 10 stars. The dashed circles have radii from 1 to 9 kpc, in steps of 1 kpc. The Galactic warp is visible for R > 4 kpc towards the Galactic anti-centre. an additional Outer arm, which might be the outer part of one of the inner arms, and the Local or Orion arm, which is shorter and may be a spur or a bridge between two arms (see for instance Churchwell et al. 2009). In the Milky Way, the terminology "spiral arm" has been used very broadly, for HI, molecular gas (and masers), dust and young stars, while it has been shown from external galaxies that these tracers have quite distinct mor-phology. This has led to some inconsistencies in defining the spiral arms in our Galaxy. For example, the structure parameters derived by Reid et al. (2019) (see Fig. 16 and Sec. 5.5) do not agree with those presented in Chen et al. (2019) (see Sec. 5.2). Further, certain features might be interpreted as spiral arms or spurs connecting different arms: this is for example the case for the dust complex labelled as "Lower Sagittarius-Carina" by Lallement et al. (2019) (see Sec 5.3 and Fig. 14) and "Lower Sagittarius-Carina Spur" by Chen et al. (2020). Further, by studying the correlation between the location of young clusters and molecular clouds in NGC 7793 and M51 (respectively a flocculent and a grand-desing spiral galaxy), Grasha et al. (2018) and Grasha et al. (2019) found that the star clusters that are associated (i.e., located within the footprint of a giant molecular cloud) are young, with a median age of 2 Myr in NGC 7793 and of 4 Myr in M51. Older clusters are mostly un-associated with any molecular cloud. Thus, equating the same spiral arm morphology with different tracers (such as early type stars and dust) might be inappropriate.
The over-density towards the inner Galaxy (visible for positive x values, i.e. in the 1st and 4th Galactic quadrant) in Fig. 11 is associated with the Sagittarius-Carina and Scutum-Centaurus arm, and is much more prominent than the others, containing numerous high-mass star forming regions. On the contrary, the distribution of massive stars associated with the Perseus arm peaks mainly in the 2nd Galactic quadrant (towards Cassiopeia). This is consistent with Reid et al. (2019) findings, and would point to the conclusion that the Perseus arm as traced by O and B-type stars is not a dominant arm, and might be dispersing in the field. This suggests that a recent, large-scale episode of (massive) star formation occurred in Sagittarius-Carina and Scutum-Centaurus, while in the other arms (Perseus and Local) stars formed earlier, except in a few isolated massive associations, such as Cygnus and Cassiopeia. To confirm this, we selected stars brighter than M K s = −1 mag and M K s = −2 mag (corresponding respectively to the K s absolute magnitude of B2V-type stars and B1V-type star), and we evaluated their density on the plane following the procedure described in Sec. 5. The density maps obtained with these samples are shown in Fig. 17, where also the same map of Fig. 11 is shown for clarity. The density contours in Fig. 17 (left) trace the same dense structures as in Fig. 11 (note that the contour levels are different), while the low-density contours substantially disappear. This is even more evident in Fig. 17 (right). This is expected, as intrinsically brighter stars are on average younger and thus have had less time to disperse in the field. Drimmel (2000) and Drimmel & Spergel (2001) studied the spiral arm features of the Milky Way in the far-infrared and nearinfrared and concluded that the near-infrared features were consistent with a two-armed spiral model, while the other two arms were traced only by the far-infrared emission and thus could perhaps be present only in gas and young stars. More recently, Xu et al. (2018) and Chen et al. (2019) noted that the spiral arm structure traced by O and early B-type stars may have many substructures in addition to the four major arms. Thus, they concluded that the Milky Way might not be pure grand design spiral galaxy with well-defined, two or four dominant arms. This might suggest that the Milky Way could exhibit characteristics of both flocculent and grand design arms, with grand design seen in the infrared (old stellar populations) and more flocculent (or multi-arm) features seen in the optical (young stars). As mentioned above, these characteristics have been observed in spiral galaxies for instance by Kendall et al. (2011Kendall et al. ( , 2015, and references therein), who concluded that galaxies that exhibit a grand design structure in the optical also exhibit such structure in the NIR, while there are optically flocculent galaxies which do exhibit NIR grand design structure. As pointed out by Dobbs & Baba (2014), Elmegreen et al. (1999) and Elmegreen et al. (2011) noted however that most flocculent galaxies do not exhibit grand-design structure and those that do have very weak spiral arms. Elmegreen et al. (2003) further suggested that both grand-design and flocculent spirals (as seen in the old stars) exhibit a similar structure in the gas and young stars (independent of the underlying old stellar population) which is driven by turbulence in the disc. The co-existence of different spiral arm patterns in the Milky Way disc might have different explanations. For example, Martos et al. (2004) concluded that secondary arms could be resonance-related features in the quasistationary density wave picture, while Steiman-Cameron et al. (2010) and Drimmel (2000) proposed that the traditional four spiral arms may represent the dynamical response of a gas disc to a two-arm spiral perturbation in the mass distribution.
As mentioned before, the density distribution presented in Fig. 11 exhibits a prominent density enhancement roughly corresponding to the Sagittarius-Carina and Scutum-Centaurus arms. This alone does not allow to draw firm conclusions on the nature (nor on the number) of the spiral arms of the Milky Way. On the one hand, star formation occurs in a clumpy and patchy fashion: this might explain the observed density distribution without necessarily assuming a flocculent structure. On the other hand, the map shown in Fig. 15 might suggest a different picture, in which the Milky Way, as traced by young stars ( 300 Myr) does not show a set of discrete and distinct set of spiral arms.
The spiral arm structure of the Milky Way can also be investigated by using stellar kinematics (see e.g. Antoja et al. 2016). Of course, a spiral mass density perturbation can have a very different morphology from spiral-like density enhancements in molecular gas or young stars. Eilers et al. (2020) studied the kinematics of red giant stars and showed a spiral feature in their radial ve- locities that they interpret as rising from a perturbation to the potential of the Milky Way caused by a two-armed logarithmic spiral. Their model predicts the locations of the spiral perturbation in the Milky Way, one roughly co-spatial to the Orion (or Local) arm, and the other roughly corresponding to the Outer arm. The location of the perturbation predicted by Eilers et al. (2020) does not closely follow the over-densities of the map in Fig. 11. The comparison between the radial velocities of the O and B-type star sample and the giant sample will be crucial to make progress in our understanding of the mechanism giving rise to the spiral signature presented by Eilers et al. (2020). The data of the SDSS-V survey will make such a comparison possible, and will also likely allow to construct a surface density map of the giant sample which could be directly compared with the map presented in this work. The complete kinematic information that we will obtain by combining Gaia proper motions with SDSS-V radial velocities will also allow to better separate stars belonging to different spiral arms, and to study their internal motions. Indeed, as already mentioned in Section 5, vertical velocities are not related to the perturbation in the disc kinematics induced by the spiral arms.

Conclusions
In this study we have analysed the 3D space distribution of a sample of hot and luminous OBA stars, focusing in particular on their configuration in the Galactic disc.
Our target selection is based on the combination of Gaia EDR3 astrometry and photometry and 2MASS photometry, and is aimed at providing a well-defined selection function. We describe the properties of the target sample in terms of distribution in the sky, brightness, and variability. We estimate the purity and contamination of the catalogue by comparing with existing O and B-type star catalogues, such as those by Liu et al. (2019), and Sota et al. (2014) and Maíz Apellániz et al. (2016, and by deriving stellar parameters (T eff and log g) for stars in common with LAMOST DR6.
By assuming that young massive stars are on near-circular orbits with a small velocity dispersion, we compute astrokinematic distances, and compare them to those derived by Bailer-Jones et al. (2020). We use these distances to study the distribution of sources in the Galactic disc of a sub-set of the target sample, which we obtain by filtering out sources with spu- Fig. 16. The masers (dots) and the fit to the spiral arms (solid lines) from Reid et al. (2019) are over-plotted on top of our density map (same as Fig. 11). The maser distances were computed by naively inverting the parallax. The dashed circles have radii from 1 to 6 kpc, in steps of 1 kpc. The shaded regions correspond to the arm width. Different colours correspond to different spiral arms: Outer, green; Perseus, grey; Local, pink; Sagittarius-Carina, purple; Scutum-Centaurus, orange.
rious astrometric solutions and kinematic properties inconsistent with our model. We find that the distribution of sources in the plane of the Milky Way is highly structured, characterised by over-and under-densities. Some of the density enhancements correspond to massive star forming regions, such as Carina, Cygnus, and Cassiopeia. With these associations, the inner arms (Sagittarius-Carina and Scutum-Centaurus) are strikingly more prominent in OBA stars than any outer arms, for which we find little evidence.
The distribution of O and B-type stars from previous catalogues, classical Cepheids, dust, and high-mass star forming regions shows similarities with our density distribution. However, the picture of the spiral arm structure of the Galaxy that we obtain in this study is complex and it may suggest that young stars show little tendency to be neatly organised in distinct spiral features.
To assess different spiral arm models, it would be necessary to extend our map beyond the volume currently studied and, per-haps more importantly, complement with better Gaia data and spectroscopic information.
The target sample that we have devised is indeed optimised for spectroscopic follow-up with the SDSS-V survey. The information that we will be able to obtain with SDSS-V will be crucial for studying the kinematics and dynamics of the spiral arms, and thus the nature of the spiral arms themselves. Finally, the sample that we have devised will further allow to study the properties of massive stars, for example in terms of multiplicity, internal structure, and (binary) evolution, in a statistic fashion.
In this Section we describe all the terms of Eq. 8, which we report here for convenience: The prior p(d, m K s | l, b, Θ SM , Θ CMD ) can be written as: The term p(d|l, b, Θ SM ) represents the probability density function of observing a star in the direction (l, b) at distance d from the Sun according to our assumed model for the spatial distribution of stars in the Galaxy. The term f (d, m K s |Θ CMD ) specifies the fraction of stars that can be observed at a distance d and magnitude m K s given the distribution of stars in colour-magnitude space and our selection criteria. In the following Sections, we describe the different components of Eq. B.1. Figure B.1 shows example pdf 's for four randomly selected stars. The different components of the pdf 's are represented with thin coloured lines, and the total pdf, P(d |...), is represented with a black solid line. Both the parallaxes of the stars in the top row have σ / < 20%. In the left panel, the distance estimate is mostly constrained by the term P(d | µ l * , Θ K M ), which represent the probability of observing a star at distance d given the observed proper motion along Galactic latitude and the kinematic model Θ K M . P(d | µ l * , Θ K M ) shows bi-modality, with a primary maximum at ∼ 6 kpc, and a secondary maximum at ∼ 1.8 kpc. The total distance pdf shows traces of such bi-modality. In the right panel of the top row instead, the parallax component dominates the distance determination. The star in the left panel of the bottom row has σ / < 20%, however both P(d | µ l * , Θ K M and P(d | µ b , Θ K M peak at closer distances than the parallax term, and thus put a strong constraint on the final distance estimate. Finally, the star in the right panel of the bottom row has σ / ≈ 70%. The final distance pdf is however still quite narrow, as a result of the combination of the different terms.

Appendix B.1: Kinematic model
We assume that stars in our sample follow the rotation curve determined by Eilers et al. (2019), with R the Galactocentric radius and R = 8.122 kpc (Gravity Collaboration et al. 2018). The intrinsic velocity dispersions at the Sun's location in the components of the velocity in Galactic coordinates U, V, W are σ U = 16.7 km/s, σ V = 10.8 km/s, and σ W = 6 km/s (Robin et al. 2003). These values are assumed to be constant over the disc. In Cartesian Galactic coordinates, the pdf of the space velocity for one star in our sample is: the prime signifying the transpose of the vector. The quantity u takes into account Galactic rotation and Solar motion, and it can be written as: where e Y = [0, 1, 0] is the unit vector in the Y direction in Galactic coordinates. We assumed the Sun's position with respect to the Galactic centre to be x = [−8.122, 0., 0.025] kpc and the Sun's peculiar motion with respect to the Local Standard of Rest to be v = [11.1, 12.24, 7.25] km/s (Schönrich et al. 2010). We write the velocity dispersion matrix S as: For each star the astrometric observables are the parallax , and the proper motions components in l and b, µ l * = µ l cos b, and µ b respectively. These are collected in arrays: where 0 is the systematic zero point offset of Gaia EDR3 parallaxes. There is not a unique 0 for all the Gaia EDR3 stars. Following Bailer-Jones et al. (2020), we applied the parallax zeropoint correction derived by Lindegren et al. (2020a).
By assuming Gaussian errors, the pdf for the observables o given the true valuesõ, is then: The true values are defined as: where d is the true distance to the star and A = 4.74047 km yr/s. The two vectors p and q are the two components of the normal triad in longitude and latitude, and, as above, the prime signify . The coloured thin lines illustrate the different components of the pdf 's. The correlation terms in the covariance matrix of Eq. A.12 are neglected here for illustrative purposes. The distributions are scaled so that they peak at unity. their transpose. The elements of the astrometric covariance matrix C are: ρ µ l * σ σ µ l * ρ µ b σ σ µ b ρ µ l * σ σ µ l * σ 2 µ l * ρ µ l * µ b σ µ l * σ µ b ρ µ b σ σ µ b ρ µ l * µ b σ µ l * σ µ b σ 2 B.12) and are provided in the Gaia archive. The joint pdf of the observables with the velocity is therefore: (B.13) The pdf for the observables is obtained by marginalising over the velocity: 14) The integral can be resolved analytically. Since the product of two normal pdf is normal and the marginal density of a normal pdf is also normal, we can write: p(o | Θ K M ) = 1 (2π) 3/2 |D| 1/2 exp − The probability of a star to be at a true distance d is proportional to the stellar density ρ predicted by our spatial model Θ SM , so that we can write: where the Jacobian term d 2 takes volume effects into account. For convenience we write our model in Galactocentric coordinates (R, φ, z), where the stellar density is modelled as an exponential disc: R and z are the Galactocentric radius and height above the plane. The quantity h z = 0.15 kpc is the disc scale height (cfr. Poggio et al. 2020), and L = 3.5 kpc is the best fit scale-length of the young (< 1Gyr) disk, where we have used the model presented in Frankel et al. (2018) with the data and method in Frankel et al. (2019).
is the line-of-sight velocity. The majority of stars in our sample lacks line-of-sight velocities, thus it is not possible to calculate directly their vertical velocity. However, for stars at low Galactic latitudes (|b| < 20 • ), the term v r sin(b) can be approximately computed by assigning to each star the velocity it would have if it followed exactly the Galactic rotation curve: v r sin(b) ≈ (S − S ) tan(b), (C.2) where: S = U cos l + V sin(l), with l the galactic longitude of a star, and U and V the components of the Sun's motion along X and Y respectively; S = v φ R /R − v LS R sin(l), with R the distance of the Sun from the Galactic Centre, R a star's Galactocentric radius, v φ the azymutal velocity, and v LS R the standard of rest velocity. We assumed (U, V, W) = (11.1, 12.24, 7.25) km/s from Schönrich et al. (2010), and the rotation curve derived by Eilers et al. (2019), reported in Eq. B.3.