Representation learning for automated spectroscopic redshift estimation

J. Frontera-Pons; F. Sureau; B. Moraes; J. Bobin; F. B. Abdalla

doi:10.1051/0004-6361/201834295

Home

All issues

Volume 625 (May 2019)

A&A, 625 (2019) A73

Abstract

Open Access

Issue		A&A Volume 625, May 2019


Article Number		A73
Number of page(s)		15
Section		Numerical methods and codes
DOI		https://doi.org/10.1051/0004-6361/201834295
Published online		15 May 2019

A&A 625, A73 (2019)

Representation learning for automated spectroscopic redshift estimation

J. Frontera-Pons¹^,2, F. Sureau¹, B. Moraes³^,4, J. Bobin¹ and F. B. Abdalla³^,5

¹ Laboratoire AIM, CEA, CNRS, Université Paris-Saclay, Université Paris Diderot, Sorbonne Paris Cité, 91191 Gif-sur-Yvette, France
e-mail: joana.frontera-pons@cea.fr
² DR2I, Institut Polytechnique des Sciences Avancées, 94200 Ivry-sur-Seine, France
³ Department of Physics & Astronomy, University College London, Gower Street, London, WC1E 6BT, UK
⁴ Instituto de Fisica, Universidade Federal do Rio de Janeiro, 21941-972 Rio de Janeiro, Brazil
⁵ Department of Physics and Electronics, Rhodes University, PO Box 94, Grahamstown 6140, South Africa

Received: 21 September 2018
Accepted: 7 March 2019

Abstract

Context. Determining the radial positions of galaxies up to a high accuracy depends on the correct identification of salient features in their spectra. Classical techniques for spectroscopic redshift estimation make use of template matching with cross-correlation. These templates are usually constructed from empirical spectra or simulations based on the modeling of local galaxies.

Aims. We propose two new spectroscopic redshift estimation schemes based on new learning techniques for galaxy spectra representation, using either a dictionary learning technique for sparse representation or denoising autoencoders. We investigate how these representations impact redshift estimation.

Methods. We first explored dictionary learning to obtain a sparse representation of the rest-frame galaxy spectra modeling both the continuum and line emissions. As an alternative, denoising autoencoders were considered to learn non-linear representations from rest-frame emission lines extracted from the data. In both cases, the redshift was then determined by redshifting the learnt representation and selecting the redshift that gave the lowest approximation error among the tested values.

Results. These methods have been tested on realistic simulated galaxy spectra, with photometry modeled after the Large Synoptic Survey Telescope (LSST) and spectroscopy reproducing properties of the Sloan Digital Sky Survey (SDSS). They were compared to Darth Fader, a robust technique extracting line features and estimating redshift through eigentemplates cross-correlations. We show that both dictionary learning and denoising autoencoders provide improved accuracy and reliability across all signal-to-noise (S/N) regimes and galaxy types. Furthermore, the former is more robust at high noise levels; the latter is more accurate on high S/N regimes. Combining both estimators improves results at low S/N.

Conclusions. The representation learning framework for spectroscopic redshift analysis introduced in this work offers high performance in feature extraction and redshift estimation, improving on a classical eigentemplates approach. This is a necessity for next-generation galaxy surveys, and we demonstrate a successful application in realistic simulated survey data.

Key words: methods: data analysis / techniques: spectroscopic / galaxies: distances and redshifts

Note to the reader : The name of the author “J. Bobifn” was a mistake and has been corrected in “J. Bobin” on June 3, 2019.

Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.