Machine-learning identification of galaxies in the WISE × SuperCOSMOS all-sky catalogue⋆
1 National Centre for Nuclear Research, ul. Andrzeja Sołtana 7, 05-400 Otwock, Poland
2 Janusz Gil Institute of Astronomy, University of Zielona Góra, ul. Lubuska 2, 65-265 Zielona Góra, Poland
3 Leiden Observatory, Leiden University, PO Box 9513, 2300 RA Leiden, The Netherlands
4 Astronomical Observatory of the Jagiellonian University, ul. Orla 171, 30-244 Kraków, Poland
Received: 22 June 2016
Accepted: 11 August 2016
Context. The two currently largest all-sky photometric datasets, WISE and SuperCOSMOS, have been recently cross-matched to construct a novel photometric redshift catalogue on 70% of the sky. Galaxies were separated from stars and quasars through colour cuts, which may leave imperfections because different source types may overlap in colour space.
Aims. The aim of the present work is to identify galaxies in the WISE × SuperCOSMOS catalogue through an alternative approach of machine learning. This allows us to define more complex separations in the multi-colour space than is possible with simple colour cuts, and should provide a more reliable source classification.
Methods. For the automatised classification we used the support vector machines (SVM) learning algorithm and employed SDSS spectroscopic sources that we cross-matched with WISE × SuperCOSMOS to construct the training and verification set. We performed a number of tests to examine the behaviour of the classifier (completeness, purity, and accuracy) as a function of source apparent magnitude and Galactic latitude. We then applied the classifier to the full-sky data and analysed the resulting catalogue of candidate galaxies. We also compared the resulting dataset with the one obtained through colour cuts.
Results. The tests indicate very high accuracy, completeness, and purity (>95%) of the classifier at the bright end; this deteriorates for the faintest sources, but still retains acceptable levels of ~85%. No significant variation in the classification quality with Galactic latitude is observed. When we applied the classifier to all-sky WISE × SuperCOSMOS data, we found 15 million galaxies after masking problematic areas. The resulting sample is purer than the one produced by applying colour cuts, at the price of a lower completeness across the sky.
Conclusions. The automatic classification is a successful alternative approach to colour cuts for defining a reliable galaxy sample. The identifications we obtained are included in the public release of the WISE × SuperCOSMOS galaxy catalogue.
Key words: methods: data analysis / methods: numerical / astronomical databases: miscellaneous / galaxies: statistics / large-scale structure of Universe
The public release of the WISE × SuperCOSMOS galaxy catalogue is available from http://ssa.roe.ac.uk/WISExSCOS
© ESO, 2016