Towards automatic classification of all WISE sources

A. Kurcz; M. Bilicki; A. Solarz; M. Krupa; A. Pollo; K. Małek

doi:10.1051/0004-6361/201628142

Home

All issues

Volume 592 (August 2016)

A&A, 592 (2016) A25

Abstract

Free Access

Issue		A&A Volume 592, August 2016


Article Number		A25
Number of page(s)		18
Section		Numerical methods and codes
DOI		https://doi.org/10.1051/0004-6361/201628142
Published online		06 July 2016

A&A 592, A25 (2016)

Towards automatic classification of all WISE sources

A. Kurcz¹^,2, M. Bilicki³^,2^,4, A. Solarz⁵^,2, M. Krupa¹^,2, A. Pollo¹^,5^,2 and K. Małek⁵^,2

¹ Astronomical Observatory of the Jagiellonian University, ul.Orla 171, 30-244 Cracow, Poland
e-mail: kurcz.agnieszka@gmail.com
² Janusz Gil Institute of Astronomy, University of Zielona Góra, ul. Szafrana 2, 65-516 Zielona Góra, Poland
³ Leiden Observatory, Leiden University, Niels Bohrweg 2, 2333 CA Leiden, The Netherlands
⁴ Astrophysics, Cosmology and Gravity Centre, Department of Astronomy, University of Cape Town, Rondebosch, South Africa
⁵ National Centre for Nuclear Research, ul.Hoża 69, 00-681 Warszawa, Poland

Received: 15 January 2016
Accepted: 11 April 2016

Abstract

Context. The Wide-field Infrared Survey Explorer (WISE) has detected hundreds of millions of sources over the entire sky. Classifying them reliably is, however, a challenging task owing to degeneracies in WISE multicolour space and low levels of detection in its two longest-wavelength bandpasses. Simple colour cuts are often not sufficient; for satisfactory levels of completeness and purity, more sophisticated classification methods are needed.

Aims. Here we aim to obtain comprehensive and reliable star, galaxy, and quasar catalogues based on automatic source classification in full-sky WISE data. This means that the final classification will employ only parameters available from WISE itself, in particular those which are reliably measured for the majority of sources.

Methods. For the automatic classification we applied a supervised machine learning algorithm, support vector machines (SVM). It requires a training sample with relevant classes already identified, and we chose to use the SDSS spectroscopic dataset (DR10) for that purpose. We tested the performance of two kernels used by the classifier, and determined the minimum number of sources in the training set required to achieve stable classification, as well as the minimum dimension of the parameter space. We also tested SVM classification accuracy as a function of extinction and apparent magnitude. Thus, the calibrated classifier was finally applied to all-sky WISE data, flux-limited to 16 mag (Vega) in the 3.4 μm channel.

Results. By calibrating on the test data drawn from SDSS, we first established that a polynomial kernel is preferred over a radial one for this particular dataset. Next, using three classification parameters (W1 magnitude, W1−W2 colour, and a differential aperture magnitude) we obtained very good classification efficiency in all the tests. At the bright end, the completeness for stars and galaxies reaches ~95%, deteriorating to ~80% at W1 = 16 mag, while for quasars it stays at a level of ~95% independently of magnitude. Similar numbers are obtained for purity. Application of the classifier to full-sky WISE data and appropriate a posteriori cleaning allowed us to obtain catalogues of star and galaxy candidates that appear reliable. However, the sources flagged by the classifier as “quasars” are in fact dominated by dusty galaxies; they also exhibit contamination from sources located mainly at low ecliptic latitudes, consistent with solar system objects.

Key words: methods: data analysis / methods: statistical / astronomical databases: miscellaneous / catalogs / infrared: general / surveys

© ESO, 2016

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.