Comparison of machine learning algorithms used to classify the asteroids observed by all-sky surveys

H. Klimczak; D. Oszkiewicz; B. Carry; A. Penttilä; W. Kotlowski; A. Kryszczyńska; E. Wilawer

doi:10.1051/0004-6361/202243889

Home

All issues

Volume 667 (November 2022)

A&A, 667 (2022) A10

Abstract

Open Access

Issue		A&A Volume 667, November 2022


Article Number		A10
Number of page(s)		15
Section		Planets and planetary systems
DOI		https://doi.org/10.1051/0004-6361/202243889
Published online		31 October 2022

A&A 667, A10 (2022)

Comparison of machine learning algorithms used to classify the asteroids observed by all-sky surveys

H. Klimczak¹^,2, D. Oszkiewicz¹, B. Carry³, A. Penttilä⁴, W. Kotlowski², A. Kryszczyńska¹ and E. Wilawer¹

¹ Astronomical Observatory Institute, Faculty of Physics, A. Mickiewicz University, Sloneczna 36, 60-286 Poznań, Poland
e-mail: klimczakhm@gmail.com
² Institute of Computing Science, Poznań University of Technology, ul. Piotrowo, Poznań 60-965, Poland
³ Université Côte d'Azur, Observatoire de la Côte d'Azur, CNRS, Laboratoire Lagrange, France
⁴ Department of Physics, University of Helsinki, PO Box 64, 00014 Helsinki, Finland

Received: 27 April 2022
Accepted: 5 September 2022

Abstract

Context. Multifilter photometry from large sky surveys is commonly used to assign asteroid taxonomic types and study various problems in planetary science. To maximize the science output of those surveys, it is important to use methods that best link the spectro-photometric measurements to asteroid taxonomy.

Aims. We aim to determine which machine learning methods are the most suitable for the taxonomic classification for various sky surveys.

Methods. We utilized five machine learning supervised classifiers: logistic regression, naive Bayes, support vector machines (SVMs), gradient boosting, and MultiLayer Perceptrons (MLPs). Those methods were found to reproduce the Bus-DeMeo taxonomy at various rates depending on the set of filters used by each survey. We report several evaluation metrics for a comprehensive comparison (prediction accuracy, balanced accuracy, F1 score, and the Matthews correlation coefficient) for 11 surveys and space missions.

Results. Among the methods analyzed, multilayer perception and gradient boosting achieved the highest accuracy and naive Bayes achieved the lowest accuracy in taxonomic prediction across all surveys. We found that selecting the right machine learning algorithm can improve the success rate by a factor of >2. The best balanced accuracy (~85% for a taxonomic type prediction) was found for the Visible and Infrared Survey telescope for Astronomy (VISTA) and the ESA Euclid mission surveys where broadband filters best map the 1 µm and 2 µm olivine and pyroxene absorption bands.

Conclusions. To achieve the highest accuracy in the taxonomic type prediction based on multifilter photometric measurements, we recommend the use of gradient boosting and MLP optimized for each survey. This can improve the overall success rate even when compared with naive Bayes. A merger of different datasets can further boost the prediction accuracy. For the combination of the Legacy Survey of Space and Time and VISTA survey, we achieved 90% for the taxonomic type prediction.

Key words: minor planets / asteroids: general / methods: data analysis / methods: statistical / surveys

Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe-to-Open model. Subscribe to A&A to support open access publication.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.