Star–galaxy classification in deep LSST data with random forest

M. Gatto; V. Ripepi; M. Bellazzini; C. Tortora; M. Dall’Ora

doi:10.1051/0004-6361/202658903

Home

All issues

Volume 709 (May 2026)

A&A, 709 (2026) A79

Abstract

Open Access

Issue		A&A Volume 709, May 2026


Article Number		A79
Number of page(s)		14
Section		Numerical methods and codes
DOI		https://doi.org/10.1051/0004-6361/202658903
Published online		05 May 2026

A&A, 709, A79 (2026)

A pilot study on the Data Preview 1 release

M. Gatto¹^★, V. Ripepi¹, M. Bellazzini², C. Tortora¹ and M. Dall’Ora¹

¹ INAF – Osservatorio Astronomico di Capodimonte, Salita Moiariello, 16, 80131 Napoli, Italy
² INAF – Osservatorio di Astrofisica e Scienza dello Spazio, Via Gobetti, 93/3, 40129 Bologna, Italy

^★ Corresponding author: This email address is being protected from spambots. You need JavaScript enabled to view it.

Received: 9 January 2026
Accepted: 24 March 2026

Abstract

Context. The Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) will produce unprecedentedly deep and wide photometric catalogues, enabling transformative studies of faint stellar systems such as the research of ultra-faint dwarf (UFD) galaxies. A critical challenge for these studies is reliable star–galaxy separation at faint magnitudes, where compact background galaxies increasingly contaminate stellar samples.

Aims. This work aims to assess the performance of supervised machine-learning techniques for star–galaxy separation in LSST-like data, to quantify the relative importance of morphological and photometric information, and to identify the most effective combinations of input features for minimizing galaxy contamination while preserving stellar completeness in the faint regime relevant for UFD searches.

Methods. We applied a Random Forest classifier to observations of the Extended Chandra Deep Field South from LSST Data Preview 1 (DP1), the deepest field observed within the DP1. We constructed a curated sample of bona fide stars and galaxies using spectroscopic data, Gaia DR3, and multi-band photometric catalogues. We trained and validated the classifier using several configurations of LSST-based input features, including multi-band colours, the LSST morphological parameter REFEXTENDEDNESS, and photometric uncertainties.

Results. We find that LSST multi-band photometry alone delivers a good star–galaxy separation, significantly outperforming morphology-based classification at faint magnitudes. Colours involving the u band are essential to provide a robust star-galaxy separation. Furthermore, explicitly including photometric uncertainties as input features yields the best overall performance. Across all configurations that include all the six LSST filters, galaxy contamination remains negligible almost the whole magnitude range probed in this work (i.e. r ≲ 27.5 mag).

Conclusions. Our results demonstrate that supervised machine-learning methods, when combined with LSST multi-band photometry, can effectively suppress galaxy contamination in deep stellar catalogues, ensuring that searches for UFDs are not significantly compromised. Given that the DP1 data are shallower and have poorer seeing than the final LSST survey, our findings should be regarded as a conservative lower limit on the performance achievable with the full 10-year dataset. To facilitate further development, we will publicly release the curated star–galaxy sample used in this work.

Key words: methods: data analysis / methods: statistical / techniques: photometric / surveys / stars: general / galaxies: general

Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. This email address is being protected from spambots. You need JavaScript enabled to view it. to support open access publication.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.