Volume 616, August 2018
|Number of page(s)||21|
|Section||Numerical methods and codes|
|Published online||28 August 2018|
Return of the features
Efficient feature selection and interpretation for photometric redshifts⋆
Astroinformatics Group, Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, 69118 Heidelberg, Germany
e-mail firstname.lastname@example.org; email@example.com
2 Zentrum für Astronomie der Universität Heidelberg, Astronomisches Rechen-Institut, Heidelberg, Germany
3 Department of Physics “E. Pancini”, University Federico II, via Cinthia 6, 80126 Napoli, Italy
4 INAF – Astronomical Observatory of Capodimonte, via Moiariello 16, 80131 Napoli, Italy
5 INFN – Section of Naples, via Cinthia 9, 80126 Napoli, Italy
6 Machine Learning Group Image Section, Department of Computer Science, University of Copenhagen, Sigurdsgade 41, 2200 København N, Denmark
Accepted: 23 April 2018
Context. The explosion of data in recent years has generated an increasing need for new analysis techniques in order to extract knowledge from massive data-sets. Machine learning has proved particularly useful to perform this task. Fully automatized methods (e.g. deep neural networks) have recently gathered great popularity, even though those methods often lack physical interpretability. In contrast, feature based approaches can provide both well-performing models and understandable causalities with respect to the correlations found between features and physical processes.
Aims. Efficient feature selection is an essential tool to boost the performance of machine learning models. In this work, we propose a forward selection method in order to compute, evaluate, and characterize better performing features for regression and classification problems. Given the importance of photometric redshift estimation, we adopt it as our case study.
Methods. We synthetically created 4520 features by combining magnitudes, errors, radii, and ellipticities of quasars, taken from the Sloan Digital Sky Survey (SDSS). We apply a forward selection process, a recursive method in which a huge number of feature sets is tested through a k-Nearest-Neighbours algorithm, leading to a tree of feature sets. The branches of the feature tree are then used to perform experiments with the random forest, in order to validate the best set with an alternative model.
Results. We demonstrate that the sets of features determined with our approach improve the performances of the regression models significantly when compared to the performance of the classic features from the literature. The found features are unexpected and surprising, being very different from the classic features. Therefore, a method to interpret some of the found features in a physical context is presented.
Conclusions. The feature selection methodology described here is very general and can be used to improve the performance of machine learning models for any regression or classification task.
Key words: methods: data analysis / methods: statistical / galaxies: distances and redshifts / quasars: general
The three catalogues are only available at the CDS via anonymous ftp to cdsarc.u-strasbg.fr (188.8.131.52) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/616/A97
© ESO 2018
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.