Volume 649, May 2021
|Number of page(s)||17|
|Section||Catalogs and data|
|Published online||13 May 2021|
National Centre for Nuclear Research, Astrophysics Division, ul. Pasteura 7, 02-093 Warsaw, Poland
2 Center for Theoretical Physics, Polish Academy of Sciences, al. Lotników 32/46, 02-668 Warsaw, Poland
3 Astronomical Observatory of the Jagiellonian University, 31-007 Kraków, Poland
4 Institute for Astronomy, University of Edinburgh, Royal Observatory, Blackford Hill, Edinburgh EH9 3HJ, UK
5 Ruhr University Bochum, Faculty of Physics and Astronomy, Astronomical Institute (AIRUB), German Centre for Cosmological Lensing, 44780 Bochum, Germany
6 Argelander-Institut für Astronomie, Auf dem Hügel 71, 53121 Bonn, Germany
7 Department of Astrophysical Sciences, Princeton University, 4 Ivy Lane, Princeton, NJ 08544, USA
8 Leiden Observatory, Leiden University, PO Box 9513, 2300 RA Leiden, The Netherlands
9 School of Physics and Astronomy, Sun Yat-sen University, Guangzhou 519082, Zhuhai Campus, PR China
10 Kapteyn Institute, University of Groningen, PO Box 800 9700 AV Groningen, The Netherlands
Accepted: 16 February 2021
We present a catalog of quasars with their corresponding redshifts derived from the photometric Kilo-Degree Survey (KiDS) Data Release 4. We achieved it by training machine learning (ML) models, using optical ugri and near-infrared ZYJHKs bands, on objects known from Sloan Digital Sky Survey (SDSS) spectroscopy. We define inference subsets from the 45 million objects of the KiDS photometric data limited to 9-band detections, based on a feature space built from magnitudes and their combinations. We show that projections of the high-dimensional feature space on two dimensions can be successfully used, instead of the standard color-color plots, to investigate the photometric estimations, compare them with spectroscopic data, and efficiently support the process of building a catalog. The model selection and fine-tuning employs two subsets of objects: those randomly selected and the faintest ones, which allowed us to properly fit the bias versus variance trade-off. We tested three ML models: random forest (RF), XGBoost (XGB), and artificial neural network (ANN). We find that XGB is the most robust and straightforward model for classification, while ANN performs the best for combined classification and redshift. The ANN inference results are tested using number counts, Gaia parallaxes, and other quasar catalogs that are external to the training set. Based on these tests, we derived the minimum classification probability for quasar candidates which provides the best purity versus completeness trade-off: p(QSOcand) > 0.9 for r < 22 and p(QSOcand) > 0.98 for 22 < r < 23.5. We find 158 000 quasar candidates in the safe inference subset (r < 22) and an additional 185 000 candidates in the reliable extrapolation regime (22 < r < 23.5). Test-data purity equals 97% and completeness is 94%; the latter drops by 3% in the extrapolation to data fainter by one magnitude than the training set. The photometric redshifts were derived with ANN and modeled with Gaussian uncertainties. The test-data redshift error (mean and scatter) equals 0.009 ± 0.12 in the safe subset and −0.0004 ± 0.19 in the extrapolation, averaged over a redshift range of 0.14 < z < 3.63 (first and 99th percentiles). Our success of the extrapolation challenges the way that models are optimized and applied at the faint data end. The resulting catalog is ready for cosmology and active galactic nucleus (AGN) studies.
Key words: methods: data analysis / methods: observational / catalogs / surveys / quasars: general / large-scale structure of Universe
A copy of the catalog is only available at the CDS via anonymous ftp to cdsarc.u-strasbg.fr (22.214.171.124) or via http://cdsarc.u-strasbg.fr/viz-bin/cat/J/A+A/649/A81
We publicly release the catalog at https://kids.strw.leidenuniv.nl/DR4/quasarcatalog.php and the code at github.com/snakoneczny/kids-quasars
© ESO 2021
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.