Machine learning technique for morphological classification of galaxies from the SDSS

I. B. Vavilova; D. V. Dobrycheva; M. Yu. Vasylenko; A. A. Elyiv; O. V. Melnyk; V. Khramtsov

doi:10.1051/0004-6361/202038981

Home

All issues

Volume 648 (April 2021)

A&A, 648 (2021) A122

Abstract

Free Access

Issue		A&A Volume 648, April 2021


Article Number		A122
Number of page(s)		14
Section		Numerical methods and codes
DOI		https://doi.org/10.1051/0004-6361/202038981
Published online		26 April 2021

A&A 648, A122 (2021)

I. Photometry-based approach^⋆

I. B. Vavilova¹, D. V. Dobrycheva¹, M. Yu. Vasylenko¹^,2, A. A. Elyiv¹, O. V. Melnyk¹ and V. Khramtsov³

¹ Main Astronomical Observatory of the National Academy of Sciences of Ukraine, 27 Akademik Zabolotny St., Kyiv 03143, Ukraine
e-mail: irivav@mao.kiev.ua
² Institute of Physics of the National Academy of Sciences of Ukraine, 46 avenue Nauka, Kyiv 03028, Ukraine
³ Institute of Astronomy, V.N. Karazin Kharkiv National University, 35 Sumska St., Kharkiv 61022, Ukraine

Received: 20 July 2020
Accepted: 1 February 2021

Abstract

Context. Machine learning methods are effective tools in astronomical tasks for classifying objects by their individual features. One of the promising utilities is related to the morphological classification of galaxies at different redshifts.

Aims. We use the photometry-based approach for the SDSS data (1) to exploit five supervised machine learning techniques and define the most effective among them for the automated galaxy morphological classification; (2) to test the influence of photometry data on morphology classification; (3) to discuss problem points of supervised machine learning and labeling bias; and (4) to apply the best fitting machine learning methods for revealing the unknown morphological types of galaxies from the SDSS DR9 at z < 0.1.

Methods. We used different galaxy classification techniques: human labeling, multi-photometry diagrams, naive Bayes, logistic regression, support-vector machine, random forest, k-nearest neighbors.

Results. We present the results of a binary automated morphological classification of galaxies conducted by human labeling, multi-photometry, and five supervised machine learning methods. We applied it to the sample of galaxies from the SDSS DR9 with redshifts of 0.02 < z < 0.1 and absolute stellar magnitudes of −24^m < M_r < −19.4^m. For the analysis we used absolute magnitudes M_u, M_g, M_r, M_i, M_z; color indices M_u − M_r, M_g − M_i, M_u − M_g, M_r − M_z; and the inverse concentration index to the center R50/R90. We determined the ability of each method to predict the morphological type, and verified various dependencies of the method’s accuracy on redshifts, human labeling, morphological shape, and overlap of different morphological types for galaxies with the same color indices. We find that the morphology based on the supervised machine learning methods trained over photometric parameters demonstrates significantly less bias than the morphology based on citizen-science classifiers.

Conclusions. The support-vector machine and random forest methods with Scikit-learn software machine learning library in Python provide the highest accuracy for the binary galaxy morphological classification. Specifically, the success rate is 96.4% for support-vector machine (96.1% early E and 96.9% late L types) and 95.5% for random forest (96.7% early E and 92.8% late L types). Applying the support-vector machine for the sample of 316 031 galaxies from the SDSS DR9 at z < 0.1 with unknown morphological types, we found 139 659 E and 176 372 L types among them.

Key words: galaxies: general / methods: data analysis / galaxies: statistics / galaxies: photometry / galaxies: spiral / galaxies: elliptical and lenticular, cD

^⋆

The catalog is only available at the CDS via anonymous ftp to cdsarc.u-strasbg.fr (130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/cat/J/A+A/648/A122

© ESO 2021

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

Machine learning technique for morphological classification of galaxies from the SDSS

I. Photometry-based approach⋆

I. Photometry-based approach^⋆