An efficient unsupervised classification model for galaxy morphology: Voting clustering based on coding from ConvNeXt large model

Guanwen Fang; Yao Dai; Zesen Lin; Chichun Zhou; Jie Song; Yizhou Gu; Xiaotong Guo; Anqi Mao; Xu Kong

doi:10.1051/0004-6361/202451734

Home

All issues

Volume 693 (January 2025)

A&A, 693 (2025) A141

Abstract

Open Access

Issue		A&A Volume 693, January 2025


Article Number		A141
Number of page(s)		12
Section		Numerical methods and codes
DOI		https://doi.org/10.1051/0004-6361/202451734
Published online		13 January 2025

A&A, 693, A141 (2025)

An efficient unsupervised classification model for galaxy morphology: Voting clustering based on coding from ConvNeXt large model

Guanwen Fang¹^★, Yao Dai¹, Zesen Lin², Chichun Zhou³^★, Jie Song⁴^,5, Yizhou Gu⁶, Xiaotong Guo¹, Anqi Mao³ and Xu Kong⁴^,5^★

¹ Institute of Astronomy and Astrophysics, Anqing Normal University, Anqing 246133, PR China
² Department of Physics, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong S.A.R., China
³ School of Engineering, Dali University, Dali 671003, PR China
⁴ Department of Astronomy, University of Science and Technology of China, Hefei 230026, China
⁵ School of Astronomy and Space Science, University of Science and Technology of China, Hefei 230026, PR China
⁶ Tsung-Dao Lee Institute, and Key Laboratory for Particle Physics, Astrophysics and Cosmology, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China

^★ Corresponding authors; wen@mail.ustc.edu.cn; zhouchichun@dali.edu.cn; xkong@ustc.edu.cn

Received: 31 July 2024
Accepted: 29 November 2024

Abstract

By combining unsupervised and supervised machine learning methods, we have proposed a framework, called USmorph, to carry out automatic classifications of galaxy morphologies. In this work, we update the unsupervised machine learning (UML) step by proposing an algorithm based on ConvNeXt large model coding to improve the efficiency of unlabeled galaxy morphology classifications. The method can be summarized into three key aspects as follows: (1) a convolutional autoencoder is used for image denoising and reconstruction and the rotational invariance of the model is improved by polar coordinate extension; (2) uthilizing a pre-trained convolutional neural network (CNN) named ConvNeXt for encoding the image data. The features were further compressed via a principal component analysis (PCA) dimensionality reduction; (3) adopting a bagging-based multi-model voting classification algorithm to enhance robustness. We applied this model to I-band images of a galaxy sample with I_mag < 25 in the COSMOS field. Compared to the original unsupervised method, the number of clustering groups required by the new method is reduced from 100 to 20. Finally, we managed to classify about 53% galaxies, significantly improving the classification efficiency. To verify the validity of the morphological classification, we selected massive galaxies with M_* > 10¹⁰M_⊙ for morphological parameter tests. The corresponding rules between the classification results and the physical properties of galaxies on multiple parameter surfaces are consistent with the existing evolution model. Our method has demonstrated the feasibility of using large model encoding to classify galaxy morphology, which not only improves the efficiency of galaxy morphology classification, but also saves time and manpower. Furthermore, in comparison to the original UML model, the enhanced classification performance is more evident in qualitative analysis and has successfully surpassed a greater number of parameter tests. The enhanced UML method will support the Chinese space station telescope in the future.

Key words: Galaxy: general / Galaxy: structure / galaxies: statistics

Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.