A million asteroid observations in the Sloan Digital Sky Survey

Context. The populations of small bodies of the Solar System (asteroids, comets, Kuiper-Belt objects) are used to constrain the origin and evolution of the Solar System. Both their orbital distribution and composition distribution are required to track the dynamical pathway from their regions of formation to their current locations. Aims. We aim at increasing the sample of Solar System objects that have multi-filter photometry and compositional taxonomy. Methods. We search for moving objects in the archive of the Sloan Digital Sky Survey. We attempt at maximizing the number of detections by using loose constraints on the extraction. We then apply a suite of filters to remove false-positive detections (stars or galaxies) and mark out spurious photometry and astrometry. Results. We release a catalog of 1 542 522 entries, consisting of 1 036 322 observations of 379 714 known and unique SSOs together with 506 200 observations of moving sources not linked with any known SSOs. The catalog completeness is estimated to be about 95% and the purity to be above 95% for known SSOs.


Introduction
Considered as nuisances for the trails they imprint on photographic plates, asteroids (i.e., the "vermin of the sky") are prime trackers of the dynamical events that shaped our Solar System, in particular migrations of the giant planets (Morbidelli et al. 2015;Clement et al. 2020). While their orbital distribution has been widely used, the distribution of their compositions provides further constraints on the models that describe the evolution of our Solar System (Levison et al. 2009;Vokrouhlický et al. 2016;Raymond & Izidoro 2017).
In this context, it is critical to understand their composition in detail and their location as well as the timing of formation. This is achieved through the laboratory study of meteorites (Bottke et al. 2006;Vernazza et al. 2014;Scott et al. 2018), of which the near-Earth asteroids (NEAs) are the progenitors (Granvik & Brown 2018;, and the compositional mapping of asteroids in the main belt (MB) between Mars and Jupiter .
The composition, and more broadly the classification, of asteroids has been based for decades on multi-filter photometry in the visible (e.g., Zellner et al. 1985). With the advent of charge coupled devices (CCDs) in the 1990s, multi-filter photometry moved toward low-resolution spectroscopy in the visible (Xu et al. 1995;Bus & Binzel 2002a) and was later extended to the near-infrared (DeMeo et al. 2009).
After three decades of targeted observations, spectra (visible, near-infrared, or covering both wave ranges) have been collected for about 9,000 asteroids (albeit disseminated over a myriad of articles, e.g., Bus & Binzel 2002b;Lazzaro et al. 2004;Fornasier et al. 2014), including around one thousand NEAs (mainly from the NEOSHIELD2, MANOS, and MITHNEOS surveys; Perna et al. 2018;Devogèle et al. 2019;Binzel et al. 2019). However, compared to the current census of over 20,000 NEAs and more than 860,000 MB asteroids, these 9000 are only the tip of the iceberg.
In the current era, these "vermin" have become more appreciated, being serendipitously imaged by large sky surveys from the ultraviolet to far-infrared ranges. The most prominent examples of such un-targeted observations are the scientific exploitations of the Sloan Digital Sky Survey (SDSS) -which provides multi-filter photometry in the visible, constraining the composition (Ivezić et al. 2001) -and the Wide-field Infrared Survey Explorer (WISE) -which provides mid-infrared photometry and hence diameter and albedo determinations (Masiero et al. 2011).
However, most imaging surveys do not report moving objects as direct products. They must be specifically searched for on released data, generally by external teams. The recent works of Popescu et al. (2016) on the European Southern Observatory (ESO) Visible and Infrared Survey Telescope for Astronomy (VISTA) Hemisphere Survey and Mahlke et al. (2018) on the ESO VST (Very Large Telescope (VLT) Survey Telescope) Kilo-Degree Survey are typical examples. This situation is, fortunately, evolving. The current European Space Agency (ESA) Gaia mission includes a specific processing dedicated to Solar System objects (SSOs) for which a data release already occurred (Spoto et al. 2018), and the upcoming ESA Euclid mission and the Legacy Survey of Space and Time (LSST) of the Vera C. Rubin observatory include SSOs releases in their respective baselines (see Carry 2018;Jones et al. 2009).
We focus here on broadband photometry, in the visible, as it provides strong constraints on asteroid composition and classification. With over 100,000 reported individual asteroids with five filters in the visible (u, g, r, i, and z;Gunn et al. 1998), the SDSS has been the main source of compositional information Article number, page 1 of 12 arXiv:2108.05749v1 [astro-ph.EP] 12 Aug 2021 A&A proofs: manuscript no. 40430corr for almost two decades. Its multi-filter photometry has been analyzed both in its own color space (Szabó et al. 2004;Nesvorný et al. 2005) and mapped into a classification compliant with visible and near-infrared spectroscopy (Carvano et al. 2010;DeMeo & Carry 2013). SDSS photometry has been the source of a variety of studies, including regarding the selection of candidates for dedicated spectroscopic surveys (Moskovitz et al. 2008;Oszkiewicz et al. 2014;DeMeo et al. , 2019, dynamical families and surface aging through space weathering (Parker et al. 2008;Thomas et al. 2012;Graves et al. 2018), the orbital distribution of hydrated C-types (Rivkin 2012), and the general structure of the asteroid belt .
Even though the SDSS Moving Object Catalog (MOC) has been widely used, its current release, which is called the ADR4 (Ivezic et al. 2010) and was published in 2008, only contains a fraction of the entire data set. First, only 211,138 moving sources were identified (i.e., linked with a known SSO) out of the 471,569 detected moving sources. As the number of known SSOs has increased from about 550,000 to 960,000 since 2008, many of the identifications missing from the ADR4 may be recovered (such as was done by ) who identified 57,646 observations of 36,730 asteroids). Second, the data selection at the origin of the SDSS MOC imposed strict lower and upper limits on the apparent velocity of sources between frames. This led to the rejection of any object moving faster than 0.5 deg/day, that is, effectively filtering out most NEAs present in the images (Solano et al. 2014). Finally, the last observing run included in the ADR4 release dated from March 2007, while the SDSS imaging survey lasted until November 2008.
The present article aims to increase the number of asteroids with SDSS astrometry and multi-filter photometry by identifying previously unidentified moving sources and searching for moving sources in frames not included in the latest SDSS MOC release. The article is organized as follows. In Section 2 we describe the approach we take to identify potential moving objects from the SDSS archive, aiming at maximizing completeness. We detail the filters applied to the sample to increase its purity (i.e., to minimize the number of false-positive sources) in Section 3. The identification of known sources among these candidates is described in Section 4. The completeness and purity of the catalog is estimated in Section 5. We then address suspicious individual measurements (astrometry or photometry) in Section 6. We classify the objects consistently with the DeMeo et al. (2009) taxonomic classification in Section 7. Finally, we summarize the released asteroid sample in Section 8.

Extraction of Solar System object candidates
As noted earlier, the latest release (ADR4) of the SDSS MOC took place before the end of operations of the SDSS imaging survey. We thus compared the list of 519 runs (number, which identifies the specific scan) 1 included in the ADR4 with the 832 runs of SDSS Data Release 16 (DR16; Ahumada et al. 2019) and the 303 runs of Stripe 82 (Adelman-McCarthy et al. 2008), together known as the SDSS Legacy Survey.
The main SDSS imaging survey covered 7,500 deg 2 in the northern Galactic cap and 750 deg 2 in the southern Galactic cap, where each patch of the sky was imaged in five filters in a single visit. The 270 deg 2 of Stripe 82 (a thin long area within 21h < RA < 04h and -1.25 • < DEC < 1.25 • ; see Figure 1) were repeatedly imaged (about 80 times) as a time-domain survey . The background image is generated from the Pan-STARRS stellar catalog. (Chambers et al. 2016). The black line represents the ecliptic plane. (Sako et al. 2008;Frieman et al. 2008). Both surveys include 80 common runs. We only kept 756 runs from DR16, rejecting the runs that were affected by bad weather (e.g., clouds, bad seeing) and thus had a quality score below 0.1. In total, we analyzed 938 unique runs, about twice as much as those listed in the ADR4.
The median 5σ depth of SDSS photometric observations, based on the formal uncertainties from point spread function Contrary to the ADR4 extraction we did not set an upper limit on the apparent velocity, to avoid the rejection of NEAs. Furthermore, as sources moving faster than 0.5 • /day may appear trailed in the images, we did not restrict the object type to stars only, but allowed both stars (type=6) and galaxies (type=3). The larger number of runs, combined with our more relaxed selection criteria to maximize the completeness (the SQL (Structured Query Language) code is provided in Appendix B for reproducibility). leads to a dramatic increase in the number of SSO candidates compared to the ADR4: 4,804,003 in DR16 and 4,071,153 in Stripe 82.

Rejection of false-positive candidates
The above extraction returned almost 9 million SSO candidates. As the selection criteria aimed at completeness, many if not most candidates could be stars or galaxies (i.e., false-positive identification rather than genuine SSOs). We describe below the successive steps we performed to improve the purity of the sample by removing these false-positive detections.

Identification of background sources with Pan-STARRS
The Pan-STARRS catalog provides a homogeneous, multiband survey that covers the sky north of declination -30 • (Figure 1). It probes the sky with similar filters, depth, and seeing quality as the SDSS (Flewelling et al. 2020). The mean 5σ point source limiting sensitivities in the Pan-STARRS stacked images reach up to a 23.3 mag limit. The epochs of observation separated by years between the SDSS and Pan-STARRS provide a practical means to identify fixed-coordinate objects (stars and galaxies) among the SSO candidates.
We used the Pan-STARRS Python search API (MAST 2 ) to compare our catalog of candidates with the Pan-STARRS Data Release 2 (DR2) catalog. We excluded candidates matched within 2 by Pan-STARRS DR2 objects if the latter were detected at least twice by Pan-STARRS. After this filtering, the number of SSO candidates dropped to 3,238,202.

Exclusion of duplicates
The SDSS PhotoObjAll catalog we queried (Appendix B) contains duplicated measurements in some cases (i.e., the same observation could appear a few times in the catalog). This is a consequence of several factors.
The first is the methodology of SDSS photometry. Identical objects could be measured in different fields because part of the frames has cross-covering. The second is the fact that while the Stripe82 and DR16 catalogs have the same runs, they were processed by different versions of the SDSS photometry pipeline and therefore could have slightly different photometry. The third factor is due to the moving nature of observed SSOs. For a high apparent rate, the same object can be identified as two or more stationary sources (see Subsection 6.3 for a description of our extraction of photometry for fast-moving asteroids such as NEAs). For each duplicated entry, we grouped objects by run number within a 5 radius and selected the most accurate photometry for each band.

Identification of high-proper-motion stars with Gaia
While the cross-match with Pan-STARRS allows the identification and rejection of most stationary sources, high-propermotion stars may have moved farther than our cross-match radius of 2 in 20 years. We selected all stars listed in Gaia DR2 ) with proper motion above 0.05 arcsec/year, fainter than G=14.0, and with declination above -30 • . For each of the 2,278,339 stars we retrieved, we computed its position at the SDSS mid-observing epoch, January 1, 2004 (see Appendix B). We then cross-matched this catalog of high-propermotion stars with the catalog of SSO candidates. We identified 67 additional stars missed by the Pan-STARRS comparison above and rejected them from the sample.

Rejection of dubious frames
Among all SDSS fields, we found frames with a suspiciously high number of SSOs. A visual inspection of these fields revealed issues with the astrometry across the different filters. As a result, sources have different positions in each filter, leading to erroneous classification as moving sources by the SDSS software. We removed 312 such fields from DR16 and 414 from Stripe82. We also removed all 300 frames of Run 2505, which were totally crowded by Milky Way stars. In this step, we rejected 6701 candidates from our catalog.

Identification of known Solar System objects
The extraction and selection presented in the previous section resulted in a catalog of 1,257,408 observations. To identify the detected objects, we performed 938,046 (DR16) and 586,983 (Stripe 82) cone-search queries with SkyBoT (the Sky Body Tracker; Berthier et al. 2006), one for each of the frames 2 http://ps1images.stsci.edu/ps1_dr2_api.html (13.5 × 9.0 ) included in our catalog. SkyBoT is a Virtual Observatory web service that provides cone-search utilities for SSOs. It is thus a useful tool for seeking and identifying moving objects in astronomical images.
In 812,573 of our cases, a known SSO was predicted by SkyBoT to be located near the source we extracted from the SDSS archive (Section 2). The mean distance between the prediction and the detection is -0.05 ± 0.45 in right ascension and -0.01 ± 0.22 in declination ( Figure 2). The V-r color, presented in Figure 3, is 0.01 ± 0.24 (i.e., slightly redder than Solar color; Willmer 2018). We associated the SkyBoT prediction with the SDSS measurement whenever the angular distance was smaller than 30 , the |V − r| magnitude difference was smaller than 5, and the difference in apparent motion was within 5 /h.
In 285,114 other cases, a known SSO was predicted by SkyBoT but did not correspond to any of the 1,257,408 sources we extracted (Section 2). For each of these predictions, we extracted the closest source in the SDSS archive. The majority of these predictions have small apparent rates (hence were not marked as moving objects by the SDSS pipeline). We then applied the suite of filters described above (Section 3) and added all these recovered sources to the catalog. In total, we identified 1,036,322 observations of 379,714 known SSOs. The distribution of the number of asteroid observations is presented in Figure 4. Nearly half of the asteroids were observed only once; however, the mean value of the observations of individual asteroids is 2.7. Moreover, 13,882 asteroids have more than ten observations. The catalog contains 1,542,522 observations, including 506,200 observations of moving objects not linked with any known SSO. Most of the identified SSOs are located in the main asteroid belt. All dynamical classes are, however, present in the catalog, including comets (Table 1).

Purity and completeness
We estimated the completeness of our catalog using the association with known SSOs presented above (Section 4). The SSOs predicted by SkyBoT but not present in our catalog provide an estimate of its completeness, albeit not an unbiased one. The current census of SSOs may itself be biased (Mahlke et al. 2018), and a 100% completeness with known SSOs may not necessary imply a full completeness of sources on sky. As an illustration, about a third of all SSOs reported in the present catalog are not identified. Hence, this comparison provides an estimate on the fraction of objects removed by the different filters we applied in Section 3. We treated the cases of MB asteroids, NEAs, and Kuiper belt objects (KBOs) separately owing to their vastly different apparent motion. In all cases, we studied the completeness as a function of the apparent magnitude (as reported in V by SkyBoT) and motion (in /h). We present the estimated completeness in Figure 5. The completeness is typically around 95% until V≈22, where it drops to about 70% at V≈23, and then to 30% at V≈24. Such a completeness is similar to that of the ADR4 (Ivezić et al. 2001).
We estimated the purity of the catalog by visually inspecting RGB (red, green, blue) images made by combining the g, r, and i frames (similar to the insets of Figure 6). We inspected all KBOs, 2000 randomly selected MB asteroids, and 2915 NEAs. The purity of the catalog is typically above 97% down to magnitude V≈23, where it suddenly drops ( Figure 5). This represents a clear improvement with respect to the ADR4, for which a 94% purity was quoted for MB asteroids, dropping to 50% for both fast and slow motion SSOs (Ivezić et al. 2001).

Identification of spurious measurements
The filters described in Section 3 limit the number of false positives among the 1,542,522 observations we report (Section 5). We describe below further tests for identifying potential issues with the astrometry or photometry of measurements in individual bands.

Spurious astrometry from linearity of motion
As the SDSS imaging camera scanned the sky, the same region of the sky was sequentially imaged in r, u, i, z, and g, with a time interval of 17.7 s between each 54 s exposure frame. From the reported position offsets in each band with respect to the position in r, we built the footsteps of each SSO to check the linearity of its motion (Mahlke et al. 2019). We used the Siegel estimator, a linear regression that is robust to outliers (Siegel 1982). We find that 92% of the SSOs have a coefficient of determination, R 2 , larger than 0.9. Considering that an offset position in a single band significantly lowers R 2 , such high values provide a strong indication of linear motion, and hence a confirmation of the genuineness of these SSOs. We used this criterion to identify spurious astrometry in the catalog (see Appendix A). Using g and r as reference (the two deepest filters; Section 2), we predicted the expected position in z, u, and i. We marked as dubious each position in these bands that is located farther than one pixel (0.4 ) from the prediction (Figure 6).

Identification of slow-moving SSOs
The extraction of SSO candidates (Section 2) imposes a minimum apparent motion of 0.05 • /day to reject all stationary sources, such as stars and galaxies (similar to the ADR4 extraction; Ivezić et al. 2001). This threshold converts to a minimum change in position of 0.6 between the r and g frames, which provide the largest temporal baseline, 286.8 seconds. This effectively precludes the identification of most moving objects from the outer Solar System (only 41 SSOs with a semimajor axis above 5.5 au were reported by Ivezić et al. 2001).
As is visible in Figure 7, however, 216,348 objects do not display apparent motion. These can be slow-moving objects, such as KBOs or asteroids at their turnaround point between di- rect and retrograde motion. However, the false-positive rate may be higher among these objects because the identification of these sources as moving objects is more prone to uncertainty (the linearity of motion cannot be checked, for instance).
We thus marked 53,980 objects in the catalog (Appendix A) as suspicious: Their change in position between g and r is less than 0.6 and there are no known SSOs predicted within 5 . We, however, did not flag out the 162,368 slow-moving SSOs associated with a SkyBoT counterpart (Section 4).

Retrieving the photometry of fast-moving SSOs
As a feature of the SDSS catalog creation, moving objects with a large apparent motion may not be recognized as such (Ivezić et al. 2001). Their successive appearances in the different filters may be cataloged as different SDSS sources. We thus applied an improved source extraction for 9,134 SSOs with a proper motion above 60 /h. This apparent velocity corresponds to an apparent displacement of 1 (the mean seeing value) over the 54 s integration of each filter.
The extraction procedure includes extracting all sources in the predicted moving distance from the SDSS catalog for each band. We then checked sources along with predicted positions and combined them into one moving source.
For each filter, we replaced the astrometry and photometry from the previous extraction with those of the SDSS source located the closest to the SkyBoT prediction. We then reapplied all the filters described in Section 3.

Spurious photometric uncertainty
As is visible in Figure 8, the distribution of photometric uncertainties in each filter is too complex to be due to photon noise only. There are in particular two spurious behaviors, which are present in all filters. We marked the photometry as suspicious in the catalog (Appendix A) using the following empirical threshold: where the limiting magnitude, mag lim , and the curvature, σ, are filter dependent and listed in Table 2  . Position offset between r and g. The inner black circle represents the lower limit of 0.05 • /day that we use to extract SSO candidates. The outer blue circle shows the upper limit of 0.5 • /day from the ADR4 that we do not use here.
ponentially decreases to 0 for faint sources. The second marks the photometry with uncertainties too large to be meaningful, including the overrepresented uncertainty value of 0.6 mag. We set the σ curvature by fitting the following equation to each band (red curves in Figure 8): As expected, the largest number of observations with suspicious photometry is presented in the u and z bands, while almost all observations in the r and i bands have accurate photometry (Table 2).

Identification of spurious colors
We performed a final test on the reported photometry. In a first approximation, SSOs display Solar colors, and a significant departure from these colors can be used to identify potential issues with the photometry not identified in the previous steps. Using r as a reference, we analyzed how colors vary with magnitude ( Figure 9). The spread of colors increases with magnitude due to increasing photometric uncertainty. We flagged photometry resulting in a color farther than 3 σ from the median color (Appendix A). The fraction of measurements marked as such amounts to 7%, 11%, 1%, and 8% for u, g, i, and z, respectively.

Taxonomic classification
We used the photometry to classify the asteroids within a scheme consistent with the widely used Bus-DeMeo taxonomy (De-Meo et al. 2009). We followed the approach in DeMeo & Carry (2013), according to which strict boundaries in the overall spectral slope and depth of the one-micron band are used to classify the SSOs into ten broad complexes: A, B, C, D, K, L, Q, S, V, and X. We, however, introduced several improvements.
Magnitude versus reflectance. We used the SDSS magnitudes, similarly to what was done in Carvano et al. (2010), rather than converting them into spectral reflectance, as was done by DeMeo & Carry (2013). Our aim was to remain as close as possible to the original data. Moreover, this conversion requires the colors of the Sun in the SDSS bands, the estimates of which differ between authors (see, e.g., the SDSS estimates 3 , Holmberg et al. 2006, or Ramírez et al. 2012. We thus converted the class boundaries from DeMeo & Carry (2013, Table 3) into limits in the 3D g-r, ri, and i-z color space (Table 3, Figure 10). As in DeMeo & Carry (2013), some classes overlap in the 3D color space. We, however, deal differently with these overlaps, as described below. Accounting for uncertainties. For each observation, we computed the volume it occupies in the 3D color space (g-r, g-i, i-z) based on a 3D Gaussian distribution, whose σ are set to color uncertainties. We then computed a score for each class, k, based on the volume of the intersection between each observation 3D Gaussian with the space occupied by each taxonomic complex (Figure 10), normalized by the Gaussian volume:  where er f (z) is the error function, er f (z) = 2 √ π z 0 e −t 2 dt, the index j indicates the g − r, i − z, and g − i colors, a j and b j are the color boundaries of the complexes, and µ j and σ j are the color and uncertainty of the SSO. Hence, for a given observation, the volumes of all intersections sum to one.
A&A proofs: manuscript no. 40430corr   Fig. 11. Pseudo reflectance spectra of asteroids observed by the SDSS, grouped by taxonomic class. The distribution of values for each band is represented by whiskers (95% extrema, and the 25, 50, and 75% quartiles). For each, we also represent the associated template spectra of the Bus-DeMeo taxonomy (DeMeo et al. 2009). The Bus-DeMeo spectra do not cover the wavelength range covered by the u filter, so we present a simple linear extrapolation.
A probabilistic approach. These normalized volumes correspond to the probabilities, P k , of pertaining to each taxonomic class. An observation fully fitting inside a given class cuboid will have a probability of 1 of being of that class. Conversely, an observation whose 3D Gaussian overlaps with two classes in a 1/3-2/3 proportion will have a 0.33 probability of being of the first class and a 0.66 probability of being of the second.
Handling of multiple observations. The probabilistic approach offers a straightforward solution to multiple observations of the same object. The total probability for each class (P k ) is the sum of the probabilities for that class over all observations ( j), weighted by the uncertainty of each observation (V σ, j ), where: Assignation of a class to each asteroid. We assigned to each object its most probable class as derived from all its observations. The sole exception is the unknown class (U), which is only assigned if its probability is strictly equal to 1. Otherwise, whenever U is the most probable (due to poor photometric accuracy for instance), we assigned the second most probable class.
We report in the catalog the probabilities for each class for each observation, together with the most probable class for that observation. We also list the uncertainty, V σ , and the most probable class after the combination of all the observations of a given SSO.
We present in Figure 11 the pseudo reflectance spectra of each class, computed using the Solar colors of Holmberg et al. (2006), compared with the template spectra of the associated Bus-DeMeo classes. The correspondence of the median SDSS spectra with the template spectra validates the approach described above. The spread of values is larger for the SDSS, but the sample size is four orders of magnitude larger than the 371 spectra that define the Bus-DeMeo taxonomy.
Article number, page 8 of 12 Alexey V. Sergeyev and Benoit Carry: A million asteroid observations in the Sloan Digital Sky Survey We also illustrate the catalog by presenting in Figure 12 the orbital distribution of taxonomic classes. The general trend of S-types dominating the inner belt and C-types the outer belt is clearly visible. The dynamical families are also easy to identify. The Bus-DeMeo classification scheme we applied is adapted for asteroids but not for comets nor KBOs, which generally display featureless spectra in the visible, with spectral slopes ranging from X-type to D-type and even redder (Snodgrass et al. 2006;Fulchignoni et al. 2008). We did not apply a different scheme to comets and KBOs, but we present their pseudo reflectance spectra in Figure 13 compared with template spectral classes.

Conclusions
Based on a new extraction of moving objects in the SDSS, and following a suite of filters to minimize contamination, we release a catalog of 1,542,522 entries, consisting of 1,036,322 observations of 379,714 known and unique SSOs together with 506,200 observations of moving sources not linked with any known SSO. The catalog contains the SDSS identification, astrometry, photometry, SSO identification, geometry of observation, taxonomy, and quality flag for each observation. Its content is fully described in Appendix A.
The catalog completeness is estimated to be about 95% and the purity to be above 95% for known SSOs (see Section 5 for details). The present catalog contains 399,502 (85%) of the 471,569 sources released in the ADR4 and 93% of the NEAs and Mars-crossers from Carry et al. (2016). The missing sources are found to be fixed-coordinate sources (identified by comparing their coordinates with the Pan-STARRS catalog using the procedure described in Subsection 3.1) in 58,381 cases. The remaining 13,686 sources were located in fields marked as affected by poor weather conditions, which we excluded from the present analysis (Section 2).