Issue |
A&A
Volume 662, June 2022
|
|
---|---|---|
Article Number | A36 | |
Number of page(s) | 28 | |
Section | Numerical methods and codes | |
DOI | https://doi.org/10.1051/0004-6361/202142751 | |
Published online | 14 June 2022 |
Photometric redshift estimation with convolutional neural networks and galaxy images: Case study of resolving biases in data-driven methods
1
Aix-Marseille Univ., CNRS/IN2P3, CPPM, Marseille, France
e-mail: lin@cppm.in2p3.fr
2
UMR TETIS, Univ. Montpellier, France
3
AgroParisTech, Cirad, CNRS, Irstea, Montpellier, France
4
Aix Marseille Univ., CNRS, CNES, LAM, Marseille, France
Received:
25
November
2021
Accepted:
11
February
2022
Deep-learning models have been increasingly exploited in astrophysical studies, but these data-driven algorithms are prone to producing biased outputs that are detrimental for subsequent analyses. In this work, we investigate two main forms of biases: class-dependent residuals, and mode collapse. We do this in a case study, in which we estimate photometric redshift as a classification problem using convolutional neural networks (CNNs) trained with galaxy images and associated spectroscopic redshifts. We focus on point estimates and propose a set of consecutive steps for resolving the two biases based on CNN models, involving representation learning with multichannel outputs, balancing the training data, and leveraging soft labels. The residuals can be viewed as a function of spectroscopic redshift or photometric redshift, and the biases with respect to these two definitions are incompatible and should be treated individually. We suggest that a prerequisite for resolving biases in photometric space is resolving biases in spectroscopic space. Experiments show that our methods can better control biases than benchmark methods, and they are robust in various implementing and training conditions with high-quality data. Our methods hold promises for future cosmological surveys that require a good constraint of biases, and they may be applied to regression problems and other studies that make use of data-driven models. Nonetheless, the bias-variance tradeoff and the requirement of sufficient statistics suggest that we need better methods and optimized data usage strategies.
Key words: galaxies: distances and redshifts / surveys / methods: data analysis / techniques: image processing
© Q. Lin et al. 2022
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.