Issue |
A&A
Volume 675, July 2023
|
|
---|---|---|
Article Number | A159 | |
Number of page(s) | 13 | |
Section | Numerical methods and codes | |
DOI | https://doi.org/10.1051/0004-6361/202346770 | |
Published online | 18 July 2023 |
A multi-band AGN-SFG classifier for extragalactic radio surveys using machine learning★
1
Kapteyn Astronomical Institute, University of Groningen,
Groningen
9747, AD, The Netherlands
e-mail: jesper.karsten1999@gmail.com; karsten@astro.rug.nl
2
SRON Netherlands Institute for Space Research,
Landleven 12,
9747 AD,
Groningen, The Netherlands
3
Institute for Astronomy, University of Edinburgh, Royal Observatory,
Blackford Hill,
Edinburgh,
EH9 3HJ, UK
4
ASTRON, the Netherlands Institute for Radio Astronomy,
Oude Hoogeveensedijk 4,
7991 PD
Dwingeloo, The Netherlands
5
Leiden Observatory, Leiden University,
PO Box 9513,
2300 RA
Leiden, The Netherlands
6
Inter-University Institute for Data Intensive Astronomy, Department of Astronomy, University of Cape Town,
7701
Rondebosch, Cape Town, South Africa
7
Inter-University Institute for Data Intensive Astronomy, Department of Physics and Astronomy, University of the Western Cape,
Robert Sobukwe Road,
7535
Bellville, Cape Town, South Africa
8
INAF - Istituto di Radioastronomia,
via Gobetti 101,
40129
Bologna, Italy
9
UK Astronomy Technology Centre, Royal Observatory,
Blackford Hill,
Edinburgh,
EH9 3HJ, UK
Received:
28
April
2023
Accepted:
5
June
2023
Context. Extragalactic radio continuum surveys play an increasingly more important role in galaxy evolution and cosmology studies. While radio galaxies and radio quasars dominate at the bright end, star-forming galaxies (SFGs) and radio-quiet active galactic nuclei (AGNs) are more common at fainter flux densities.
Aims. Our aim is to develop a machine-learning classifier that can efficiently and reliably separate AGNs and SFGs in radio continuum surveys.
Methods. We performed a supervised classification of SFGs versus AGNs using the light gradient boosting machine (LGBM) on three LOFAR Deep Fields (Lockman Hole, Boötes, and ELAIS-N1), which benefit from a wide range of high-quality multi-wavelength data and classification labels derived from extensive spectral energy distribution (SED) analyses.
Results. Our trained model has a precision of 0.92±0.01 and a recall of 0.87±0.02 for SFGs. For AGNs, the model performs slightly worse, with a precision of 0.87±0.02 and a recall of 0.78±0.02. These results demonstrate that our trained model can successfully reproduce the classification labels derived from a detailed SED analysis. The model performance decreases towards higher redshifts, which is mainly due to smaller training sample sizes. To make the classifier more adaptable to other radio galaxy surveys, we also investigate how our classifier performs with a poorer multi-wavelength sampling of the SED. In particular, we find that the far-infrared and radio bands are of great importance. We also find that a higher signal-to-noise ratio in some photometric bands leads to a significant boost in the model performance. In addition to using the 150 MHz radio data, our model can also be used with 1.4 GHz radio data. Converting 1.4 GHz to 150 MHz radio data reduces the performance by ~4% in precision and ~3% in recall.
Key words: galaxies: active / methods: data analysis / catalogs
The final trained model is publicly available at https://github.com/Jesper-Karsten/MBASC
© The Authors 2023
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.