Issue |
A&A
Volume 690, October 2024
|
|
---|---|---|
Article Number | A224 | |
Number of page(s) | 16 | |
Section | Numerical methods and codes | |
DOI | https://doi.org/10.1051/0004-6361/202450214 | |
Published online | 11 October 2024 |
Supervised star, galaxy, and QSO classification with sharpened dimensionality reduction
1
Kapteyn Astronomical Institute, University of Groningen,
Landleven 12,
9747 AD
Groningen,
The Netherlands
2
Netherlands Institute for Radio Astronomy (ASTRON),
Oude Hoogeveensedijk 4,
7991 PD
Dwingeloo,
The Netherlands
3
Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence, University of Groningen,
Nijenborgh 9,
9747 AG
Groningen,
The Netherlands
4
Department of Information and Computing Sciences, Utrecht University,
Princetonplein 5,
3584 CC
Utrecht,
The Netherlands
★ Corresponding author; martenlourens@gmail.com
Received:
2
April
2024
Accepted:
28
August
2024
Aims. We explored the use of broadband colors to classify stars, galaxies, and quasi-stellar objects (QSOs). Specifically, we applied sharpened dimensionality reduction (SDR)-aided classification to this problem, with the aim of enhancing cluster separation in the projections of high-dimensional data clusters to allow for better classification performance and more informative projections.
Methods. The main objective of this work was to apply SDR to large sets of broadband colors derived from the CPz catalog to obtain projections with clusters of star, galaxy, and QSO data that exhibit a high degree of separation. The SDR method achieves this by combining density-based clustering with conventional dimensionality-reduction techniques. To make SDR scalable and have the ability to project samples using the earlier-computed projection, we used a deep neural network trained to reproduce the SDR projections. Subsequently classification was done by applying a k-nearest neighbors (k-NN) classifier to the sharpened projections.
Results. Based on a qualitative and quantitative analysis of the embeddings produced by SDR, we find that SDR consistently produces accurate projections with a high degree of cluster separation. A number of projection performance metrics are used to evaluate this separation, including the trustworthiness, continuity, Shepard goodness, and distribution consistency metrics. Using the k-NN classifier and consolidating the results of various data sets, we obtain precisions of 99.7%, 98.9%, and 98.5% for classifying stars, galaxies, and QSOs, respectively. Furthermore, we achieve completenesses of 97.8%, 99.3%, and 86.8%, respectively. In addition to classification, we explore the structure of the embeddings produced by SDR by cross-matching with data from Gaia DR3, Galaxy Zoo 1, and a catalog of specific star formation rates, stellar masses, and dust luminosities. We discover that the embeddings reveal astrophysical information, which allows one to understand the structure of the high-dimensional broadband color data in greater detail.
Conclusions. We find that SDR-aided star, galaxy, and QSO classification performs comparably to another unsupervised learning method using hierarchical density-based spatial clustering of applications with noise (HDBSCAN) but offers advantages in terms of scalability and interpretability. Furthermore, it outperforms traditional color selection methods in terms of QSO classification performance. Overall, we demonstrate the potential of SDR-aided classification to provide an accurate and physically insightful classification of astronomical objects based on their broadband colors.
Key words: methods: data analysis / techniques: photometric / surveys / stars: general / galaxies: active / galaxies: general
© The Authors 2024
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.