Boost recall in quasi-stellar object selection from highly imbalanced photometric datasets

Giorgio Calderone; Francesco Guarneri; Matteo Porru; Stefano Cristiani; Andrea Grazian; Luciano Nicastro; Manuela Bischetti; Konstantina Boutsia; Guido Cupani; Valentina D’Odorico; Chiara Feruglio; Fabio Fontanot

doi:10.1051/0004-6361/202346625

Home

All issues

Volume 683 (March 2024)

A&A, 683 (2024) A34

Abstract

Open Access

Issue		A&A Volume 683, March 2024


Article Number		A34
Number of page(s)		14
Section		Catalogs and data
DOI		https://doi.org/10.1051/0004-6361/202346625
Published online		04 March 2024

A&A, 683, A34 (2024)

The reverse selection method^★

Giorgio Calderone¹, Francesco Guarneri¹^,2, Matteo Porru¹, Stefano Cristiani¹^,3^,4, Andrea Grazian⁵, Luciano Nicastro⁶, Manuela Bischetti¹, Konstantina Boutsia⁷^,8, Guido Cupani¹^,3, Valentina D’Odorico¹^,3^,9, Chiara Feruglio¹ and Fabio Fontanot¹^,3

¹ INAF – Osservatorio Astronomico di Trieste, Via G.B. Tiepolo 11, 34143 Trieste, Italy
e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.
² Dipartimento di Fisica, Sezione di Astronomia, Università di Trieste, via G.B. Tiepolo 11, 34143 Trieste, Italy
³ IFPU – Institute for Fundamental Physics of the Universe, via Beirut 2, 34151 Trieste, Italy
⁴ INFN – National Institute for Nuclear Physics, via Valerio 2, 34127 Trieste, Italy
⁵ INAF – Osservatorio Astronomico di Padova, Vicolo dell’Osservatorio 5, 35122 Padova, Italy
⁶ INAF – Osservatorio di Astrofisica e Scienza dello Spazio di Bologna, Via P. Gobetti 101, 40129 Bologna, Italy
⁷ Cerro Tololo Inter-American Observatory/NSFs NOIRLab, Casilla 603, La Serena, Chile
⁸ Las Campanas Observatory, Carnegie Observatories, Colina El Pino, Casilla 601, La Serena, Chile
⁹ Scuola Normale Superiore, P.zza dei Cavalieri, 56126 Pisa, Italy

Received: 8 April 2023
Accepted: 12 December 2023

Abstract

Context. The identification of bright quasi-stellar objects (QSOs) is of fundamental importance to probe the intergalactic medium and address open questions in cosmology. Several approaches have been adopted to find such sources in the currently available photometric surveys, including machine learning methods. However, the rarity of bright QSOs at high redshifts compared to other contaminating sources (such as stars and galaxies) makes the selection of reliable candidates a difficult task, especially when high completeness is required.

Aims. We present a novel technique to boost recall (i.e., completeness within the considered sample) in the selection of QSOs from photometric datasets dominated by stars, galaxies, and low-z QSOs (imbalanced datasets).

Methods. Our heuristic method operates by iteratively removing sources whose probability of belonging to a noninteresting class exceeds a user-defined threshold, until the remaining dataset contains mainly high-z QSOs. Any existing machine learning method can be used as the underlying classifier, provided it allows for a classification probability to be estimated. We applied the method to a dataset obtained by cross-matching PanSTARRS1 (DR2), Gaia (DR3), and WISE, and identified the high-z QSO candidates using both our method and its direct multi-label counterpart.

Results. We ran several tests by randomly choosing the training and test datasets, and achieved significant improvements in recall which increased from ~50% to ~85% for QSOs with z > 2.5, and from ~70% to ~90% for QSOs with z > 3. Also, we identified a sample of 3098 new QSO candidates on a sample of 2.6 ×10⁶ sources with no known classification. We obtained follow-up spectroscopy for 121 candidates, confirming 107 new QSOs with z > 2.5. Finally, a comparison of our QSO candidates with those selected by an independent method based on Gaia spectroscopy shows that the two samples overlap by more than 90% and that both selection methods are potentially capable of achieving a high level of completeness.

Key words: methods: statistical / astronomical databases: miscellaneous / catalogs / surveys / quasars: general

^★

Table B.1 is available at the CDS via anonymous ftp to cdsarc.cds.unistra.fr (130.79.128.5) or via https://cdsarc.cds.unistra.fr/viz-bin/cat/J/A+A/683/A34

Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. This email address is being protected from spambots. You need JavaScript enabled to view it. to support open access publication.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

Boost recall in quasi-stellar object selection from highly imbalanced photometric datasets

The reverse selection method★

The reverse selection method^★