Mixture models for photometric redshifts

Zoe Ansari; Adriano Agnello; Christa Gall

doi:10.1051/0004-6361/202039675

Home

All issues

Volume 650 (June 2021)

A&A, 650 (2021) A90

Full HTML

Free Access

Issue		A&A Volume 650, June 2021


Article Number		A90
Number of page(s)		16
Section		Catalogs and data
DOI		https://doi.org/10.1051/0004-6361/202039675
Published online		10 June 2021

A&A 650, A90 (2021)

Mixture models for photometric redshifts

Zoe Ansari, Adriano Agnello and Christa Gall

DARK, Niels Bohr Institute, University of Copenhagen, Jagtvej 128, 2200 Copenhagen, Denmark
e-mail: zakieh.ansari@nbi.ku.dk

Received: 14 October 2020
Accepted: 2 March 2021

Abstract

Context. Determining photometric redshifts (photo-zs) of extragalactic sources to a high accuracy is paramount to measure distances in wide-field cosmological experiments. With only photometric information at hand, photo-zs are prone to systematic uncertainties in the intervening extinction and the unknown underlying spectral-energy distribution of different astrophysical sources, leading to degeneracies in the modern machine learning algorithm that impacts the level of accuracy for photo-z estimates.

Aims. Here, we aim to resolve these model degeneracies and obtain a clear separation between intrinsic physical properties of astrophysical sources and extrinsic systematics. Furthermore, we aim to have meaningful estimates of the full photo-z probability distribution, and their uncertainties.

Methods. We performed a probabilistic photo-z determination using mixture density networks (MDN). The training data set is composed of optical (griz photometric bands) point-spread-function and model magnitudes and extinction measurements from the SDSS-DR15 and WISE mid-infrared (3.4 μm and 4.6 μm) model magnitudes. We used infinite Gaussian mixture models to classify the objects in our data set as stars, galaxies, or quasars, and to determine the number of MDN components to achieve optimal performance.

Results. The fraction of objects that are correctly split into the main classes of stars, galaxies, and quasars is 94%. Furthermore, our method improves the bias of photometric redshift estimation (i.e., the mean Δz = (z_p − z_s)/(1 + z_s)) by one order of magnitude compared to the SDSS photo-z, and it decreases the fraction of 3σ outliers (i.e., 3 × rms(Δz) < Δz). The relative, root-mean-square systematic uncertainty in our resulting photo-zs is down to 1.7% for benchmark samples of low-redshift galaxies (z_s < 0.5).

Conclusions. We have demonstrated the feasibility of machine-learning-based methods that produce full probability distributions for photo-z estimates with a performance that is competitive with state-of-the art techniques. Our method can be applied to wide-field surveys where extinction can vary significantly across the sky and with sparse spectroscopic calibration samples. The code is publicly available.

Key words: methods: statistical / astronomical databases: miscellaneous / catalogs / surveys

© ESO 2021

1. Introduction

The redshift of an astrophysical object is routinely determined from absorption or emission lines in its spectrum. In the absence of spectroscopic information, its photometric redshift (hereafter photo-z) can be estimated from the apparent luminosity measured in different photometric bands (see e.g., Salvato et al. 2019, for a general review). Accurate photo-zs are needed by wide-field surveys that seek to probe cosmology through the spatial correlations of the matter density field, and are in fact a core limiting factor in the accuracy of these measurements (e.g., Knox et al. 2006).

While large areas of the sky are covered by optical and near-IR imaging surveys, only a minority of objects have observed spectra, and hence secure redshifts from emission or absorption lines. The major problem is the rather narrow wavelength range covered by most photometric bands that introduces uncertainties and degeneracies in estimating the redshift. The Kilo-Degree Survey (KiDS; de Jong et al. 2013) and Dark Energy Survey (DES; Abbott et al. 2018) collaborations have used a combination of spectroscopic surveys to calibrate photo-zs (e.g., Hoyle et al. 2018; Joudaki et al. 2018), with the ultimate aim of measuring the matter content (Ω_m) and present-day root-mean-square (rms) matter density fluctuations (σ₈). The most used spectroscopic datasets are zCOSMOS (Lilly & Zcosmos Team 2008), PRIMUS¹ (Coil et al. 2011), VVDS (Garilli et al. 2008), DEEP2 (Newman et al. 2013), VIPERS (Guzzo et al. 2014), GAMA (Driver et al. 2011), and SDSS (Ahn et al. 2012). Albeit to a different extent, their completeness is generally affected by a non-trivial pre-selection in colour and morphology, and in some cases a limited footprint and depth. Hildebrandt et al. (2020) have identified the different calibrations of photo-zs, across DES and KiDS, to explain the difference in inferred cosmological parameters, claiming that the uncertainties in photo-zs are one outstanding challenge towards percent-level cosmology from weak lensing.

When only photometric information is available, a three-fold degeneracy between an object type, its redshift, and foreground extinction hinders the unambiguous determination of the redshift. Galametz et al. (2017) have quantified this effect explicitly in view of a possible synergy between the ESA-Euclid mission (Amiaux et al. 2019) and Rubin-Legacy Survey of Space and Time (LSST; Amiaux et al. 2019), which should cover more than half of the extragalactic sky to ≳24 mag depth in YJH-bands and ugriz-bands, respectively. Here, we explore a probabilistic approach to compute photo-zs that account for the existence of an indefinite number of astrophysical object types and their cross-contamination due to broad-band imaging information. Specifically, we trained a suite of mixture density networks (MDNs, Bishop 1994) to predict the probability distribution of the photo-z of an object with measured magnitudes in multiple photometric bands as well as Galactic extinction. Following the standard nomenclature of machine-learning works, we alternatively refer to the photometric properties (magnitudes and extinction) as features in the rest of this paper. The MDN output is a sum of Gaussian functions in photo-z, whose parameters (i.e., the average, dispersion, amplitude) are non-linear combinations of the photometric inputs such as magnitude and extinction. Throughout the paper, we refer to these output Gaussians as ’branches’. In order to determine the number of branches that are needed to optimally parameterise the photo-z probabilities, we must determine the range of MDN branches that most accurately describe the data set. Hence, we explore infinite Gaussian mixture models (IGMM) on a photometric sample of which about 2% of the sources have spectroscopic redshifts (see Sect. 2.1).

There are two main methods commonly used to estimate photometric redshifts: (i) template fitting and (ii) machine learning algorithms. Template fitting methods specify the relation between synthetic magnitudes and redshift with a suite of spectral templates across a range of redshifts and object classes, through maximum likelihood (e.g., Fernández-Soto et al. 1999) or Bayesian techniques (e.g., Benítez 2000; Brammer et al. 2008; Ilbert et al. 2006). Machine learning methods, using either images or a vector of magnitudes and colours, learn the relation between magnitude and redshift from a training data set of objects with known spectroscopic redshifts. In principle, template fitting techniques do not require a large sample of objects with spectroscopic redshifts for training, and can be applied to different surveys and redshift coverage. However, these methods are computationally intensive and require explicit assumptions on, for example, dust extinction, which can lead to a degeneracy in colour-redshift space. Moreover, template fitting techniques are only as predictive as the family of available templates. In the case of large samples of objects with spectroscopic redshifts, machine learning approaches such as artificial neural networks (ANNs; e.g., Amaro et al. 2019; Shuntov et al. 2020), k-nearest neighbours (kNN; e.g., Curran 2020; Graham et al. 2018; Nishizawa et al. 2020), tree-based algorithms (e.g., Carrasco Kind & Brunner 2013; Gerdes et al. 2010) or Gaussian processes (e.g., Almosallam et al. 2016) have shown similar or better performances than the template fitting methods. However, machine learning algorithms are only reliable in the range of input values of their training data set. Additionally, a lack of sufficient high redshift spectroscopic samples affects the performance of machine learning implementations on photo-z estimates. Another aspect is the production of photo-z probability distributions given the photometric measurements: While template-based methods can easily produce a probability distribution by combining likelihoods from different object templates, most of the machine-learning methods in the literature are only trained to produce point estimates, that is just one photo-z value for each object. For the sake of completeness, we summarise the state-of-the-art (and heterogeneous) efforts in the literature in Table A.1 and their performance metrics evaluation in Table A.2. We emphasise that most of the photo-z estimation methods above have been trained and tested purely on spectroscopic samples of different types of galaxies, often in a limited redshift range. Additionally, some of the spectroscopic galaxy samples were simulated entirely.

Here, we explore different kinds of mixture models to produce appropriate photo-z probability distributions that naturally account for the superposition of multiple, a priori unknown classes of astrophysical objects (e.g., stars, galaxies, quasars). There are multiple ways to describe a distribution of such objects in photometry space that consists of, for instance, magnitudes and extinction estimates (see Sect. 2.1) and that is also termed ‘feature space’ following the standard machine-learning terminology.

First, we use an IGMM (Teh 2010) to separate the astrophysical objects in feature space. This approach allows the algorithm to cluster the objects based on all the available photometric information without forcing the algorithm to classify the objects in a pre-determined way. Subsequently, the structure of the photometric (feature) space defines the number of Gaussian mixture components. Whenever a spectroscopic subsample of different types of astrophysical objects is available, IGMMs allow to separate this sample into classes, ideally representing each type of object. Secondly, we train MDNs to predict the photo-z probablity distributions of objects in our data set. To find the optimal results, we explore different MDN implementations, which all include the IGMM components and membership probabilities obtained in the first step next to the entire photomoetric (feature) space (Sect. 2.1).

In Sect. 2, we describe our chosen training and test data sets as well as the IGMM and MDN implementations. The obtained accuracy of the classification along with the precision of the inferred photo-zs are provided in Sect. 3. In Sect. 5 we discuss our results, shortcomings and future improvements on our photo-z estimation alongside a comparison with other methods to estimate photo-zs from the literature.

2. Data and methods

To train our machine learning algorithms, we require a data set that contains: (i) morphological information from publicly available object catalogues (e.g., psf vs. model magnitudes, or stellarity index), to aid the separation of stars from galaxies and quasars; (ii) a wide footprint of the sky, to cover regions with sufficiently different extinction; (iii) multiband photometry from optical to mid-IR wavelengths, possibly including u-band; and (iv) a spectroscopic sub-sample of different types of objects (here: stars, galaxies and quasars).

2.1. Data

Our photometric data set is composed of optical PSF and modelgriz-band magnitudes including i-band extinction measurements from the SDSS-DR15 (Aguado et al. 2019). We combine these SDSS magnitudes with w1mpro and w2mpro magnitudes (hereafter W1, W2) from WISE (Wright et al. 2010). We query the data in CasJobs² on the PhotoObjAll table with a SDSS-WISE cross-match, requiring magnitude errors lower than 0.3 mag and i − W1 < 8 mag. Adding g − r, r − i, i − z, z − W1 and W1 − W2 colours leaves us with 22 dimensions to be used by our MDNs. However, the colours are strictly speaking redundant as they are obtained from the same, individual photometric bands. While this introduces many null value Eigenvectors in the IGMM, additional combinations of measurements are enabled, which speeds up the MDN computations by detrending the magnitude-magnitude distribution. Our spectroscopic data set (from SDSS-DR15) includes only objects with uncertainties on their spectroscopic redshift (from the SDSS pipelines) smaller than 1%. For only one MDN training, we added u-band PSF as well as model magnitudes. Our complete data sets are composed of a photometric and a spectroscopic data set. For about 2% of the photometric data set we have spectroscopic information. This data set is called the ‘spectroscopic data set’. For the IGMM, we have in total 1 022 731 unique sources from PhotoObjAll and WISE, with additional 11 358 unique galaxies from WiggleZ (Drinkwater et al. 2010) cross-matched with PhotoObjAll and WISE. For the test samples, the spectroscopic data set contains 86 412 unique stars, 83 119 unique galaxies and 75 955 quasars from SpecPhoto and WISE according to the classification of their spectra by the SDSS pipelines (see Fig. 1).

Fig. 1.

Spectroscopic data set in equatorial coordinates. Data are taken from SDSS-DR15 + WISE totalling about 245 000 objects of which there are 86 412 stars (yellow), 83 119 galaxies (purple) and 75 955 quasars (green). The entire photometric data set is a sample of about 1 023 000 objects, of which 98% lack spectroscopic redshifts and classification.

Figure 2 shows a general issue that is common to the literature on photo-zs, i.e., (by survey construction) the spectroscopic training sets do not reach the same depth as the photometric ones. This highlights the need for techniques that can extrapolate smoothly and with realistic uncertainties outside the ranges of a limited spectroscopic training set. Figure 3 shows the redshift distribution of different classes of objects from the spectroscopic data set. Evidently, galaxies are mainly placed at redshifts ≲1, while quasars extend out to redshifts ∼7.

Fig. 2.

Histogram showing i-band magnitudes of the objects from the photometric (blue) and spectroscopic data sets for stars (yellow), galaxies (purple) and quasars (green), in 0.1 magnitude bins.

Fig. 3.

Redshift distribution of the spectroscopic data set. Top panel: galaxies (purple) and quasars (green) in 0.1 redshift bins. Bottom panel: stars (yellow) in 0.0001 redshift bins.

2.2. Infinite Gaussian mixture models

In a Gaussian mixture model (GMM), the density distribution of objects in ‘feature space’ (equivalent to photometric space, see Sect. 2.1) is described by a sum of Gaussian density components. The GMM is a probabilistic model which requires that a data set is drawn from a mixture of Gaussian density functions. Each Gaussian distribution is called a component. As the Gaussian distributions are defined in all the dimensions of the feature space, they are characterised by a mean vector and a covariance matrix. The feature vector contains the photometric information of each astronomical source. To describe the GMM, whenever needed, we use the notation π_k𝒩(x|μ_k, Σ_k), where k(∈{1, …, K}) is the component index, μ_k, Σ_k and π_k are the mean vector and the covariance matrix in feature space, and the weight of component k, respectively.

Since the GMM is a Bayesian method, it requires multiple sets of model parameters and hyperparameters. The model parameters (means, covariances) change across the Gaussian components, while the hyperparameters are common to all of the Gaussian components, because they describe the priors from which all Gaussian components are drawn. For the GMM, the number of Gaussian components is a fixed hyperparameter.

The IGMM is the GMM case with an undefined number of components, which is optimised by the model itself, depending on the photometric data set used. In particular, the IGMM describes a mixture of Gaussian distributions on the data population with an infinite (countable) number of components, using a Dirichlet process (Teh 2010) to define a distribution on the component weights.

However, setting an initial number of Gaussian density components is required by the IGMM. Based on the weights that are given to each such component at the end of the model training, it is common practice to exclude the least weighted components and define the data population only by the highest-weighted components. To pursue a fully Bayesian approach, it is advisable to explore a set of model hyperparameters with different initial guesses for the number of components. Like its finite GMM counterpart, each realisation of IGMM estimates the membership probability of each data point to each component. Appendix B provides a summary of the IGMM formalism.

For this work, we used the built-in variational IGMM package from the scikit-learn library for our implementations. In practice, the variational optimiser uses a truncated distribution over component weights with a fixed maximum number of components, known as stick-breaking representation (Ferguson 1973), with an expectation-maximisation algorithm (Dempster et al. 1977). To optimise the model and find the best representation of the data set, we explored a range of parameters. We increased the maximum number of allowed Gaussian components from 10 to 100 in increments of 2 and we set the maximum number of iterations for expectation maximisation performance to 2000. We used a Dirichlet concentration (γ) of each Gaussian component (k) on the weight distribution, of either 0.01, 0.05 or 0.0001 times the number of objects in the training data set. The covariance matrix for each Gaussian component was defined as type ‘full’; as per definition, each component has its own general covariance matrix. Furthermore, the prior on the mean distribution for each Gaussian component is defined as the median of the entries of the input vectors of the training data set (i.e., magnitudes, extinction).

Whenever needed, each object is assigned to the component to which its membership probability is maximal. In that case, we say that a component contains a data point.

The IGMM provides different possible representations of the same data set for each set of hyperparameters: here, we are interested in finding out the optimal number of components that can adequately describe the majority of the data. We then introduce a lower threshold on the number of sources that each component contains, and drop the components which contain less than the threshold. The threshold is defined by considering the size of the photometric sample and the highest value that we considered for the Dirichlet γ prior. The IGMM starts with components that contribute to 0.5% of the size of the photometric sample, since the highest γ value is 510 000 (see appendix for further details), due to our chosen ranges of hyperparameters. Therefore, we use 0.5% of the size of the photometric data set as the threshold. Figure 4 shows that the final number of components converges to 48 ± 4. The convergence indicates that the models do not need more than 48 ± 4 components to describe the sample. Moreover, the initial 1:1 ramp-up in the figure shows that the final number of components is the same as the maximum tolerance, and so the model cannot adequately describe the data set; this trend breaks at about 44 components. To guide the eye, we determine a regression surface of all the IGMM profiles by a multivariate smoothing procedure³. In what follows, we choose 52 components.

Fig. 4.

Maximum number of components vs. final number of components for different IGMM realisations, restricted to Gaussian components that contain at least 0.5% of the photometric data. Blue filled circles represent IGMM realisations that needed more than 2000 iterations to converge, while purple filled circles mark IGMM realisations that needed less than 2000 iterations. The size of the symbols scales with three different values of the prior of the Dirichlet concentration (γ). The light blue shaded region represents the confidence interval of 99% of regression estimation over the IGMM profiles by a multivariate smoothing procedure.

The first IGMM implementation was fully unsupervised, as it was optimised to only describe the distribution of the objects in feature space. Subsequently, we trained different IGMMs considering additional spectroscopic information available for ≈2% of the photometric sample. In particular, these partially supervised implementations are trained using the entire photometric feature space including either (I) spectroscopic classifications or (II) spectroscopic redshifts or (III) spectroscopic classifications and redshifts. Since the objects with additional spectroscopic information are a small part of the photometric training sample (≈2%), the implementations ensure that the SDSS spectroscopic preselection does not bias the IGMM over the entire photometric sample. Finally, we calculate the membership probabilities to the 52 components for each object in the spectroscopic data set (≈2.45 × 10⁵ objects) from the optimised IGMM. This allows us to assign each object from the spectroscopic sample to one component. Thereafter, we label each of the IGMM components based on the percentage of spectroscopic classes that it contains.

Figure 5 shows the population of objects from the spectroscopic data set and their corresponding IGMM components in g − r vs. z − w1 (upper panel) and w2 vs. w1 − w2 (bottom panel) colour-colour and colour-magnitude diagrams. Each row from left to right shows the assigned components to stars, galaxies and quasars in the respective panels.

Fig. 5.

Colour-colour and colour-magnitude diagrams. Shown are g − r vs. z − W1 colour-colour diagrams (upper panel) and W2 vs. W1 − W2 colour-magnitude diagrams (bottom panel) for a populations of objects from the spectroscopic data set such as stars (left column), galaxies (middle column) and quasars (right column). The purple contours correspond to the 68th percentile of each Gaussian IGMM component. The green filled circles correspond to the means μ_k of the Gaussian components. The grey scale indicates the abundance of the sources in each diagram.

2.3. Mixture density networks

MDNs are a form of ANNs, which are capable of arbitrarily accurate approximation to a function and its derivatives based on the Universal Approximation Theorem (Hornik 1991). ANNs can be used for regression or classification purposes. ANNs are structured in layers of neurons, where each neuron receives an input vector from the previous layer, and outputs a nonlinear function of it that is passed on to the next layer. In MDNs, the aim is to approximate a distribution in the product space of input vectors of the individual sources (f_i) and target values (e.g., z_s, i) as a superposition of different components. MDNs (Bishop 1994) are trained to optimise the log-likelihood

$\begin{matrix} log L = \sum_{i = 1}^{N} log (\sum_{k = 1}^{N_{c}} {\hat{p}}_{k} (f_{i}) N (z_{s, i} | m_{k} (f_{i}), s_{k} (f_{i}))) \end{matrix}$ $\begin{aligned} \log \mathcal{L} \ =\ \sum \limits _{i=1}^{N}\log \left( \sum _{k=1}^{N_{\rm c}}\hat{p}_{k}(\boldsymbol{f}_{i})\mathcal{N} (z_{s,i}|m_{k}(\boldsymbol{f}_{i}),s_{k}(\boldsymbol{f}_{i}))\right) \end{aligned}$ (1)

by approximating the averages m_k(f), amplitudes ${\hat{p}}_{k} (f)$ $\hat{p}_{k}(\boldsymbol{f})$ and widths s_k(f). Here, N is the number of objects in the spectroscopic data set, while N_c denotes the number of output components (or branches) of the MDN.

Due to the limited information provided by the photometric space, a source of a specific spectroscopic class and low redshift can be confused with a different spectroscopic class and high redshift. Therefore, by providing distributions over a full range of redshifts, MDNs can cope with the fact that colours are not necessarily monotonic with redshift (as is the case e.g., in quasars). In order to avoid confusing MDN components with IGMM components, here we call MDN components branches.

For the sake of reproducibility, we use a publicly available MDN wrapper around the keras ANN module⁴ and a simple MDN architecture. The MDN input layer contains the same photometric features (see Sect. 2.1) along with the membership probabilities of the IGMM, which carry additional information of the object classes (stars, galaxies and quasars). The dimension of the MDN input space is 74, of which 52 are the IGMM membership probabilities and 22 are the feature-space entries. The output layer of the MDN is defined by three neurons for each branch: the average redshift on the branch, the width of the branch and the membership probability of the source to the branch. The MDN is fully connected, that is the neurons in one layer are connected to all of the neurons in the next layer. Due to the fact that the MDN input contains the IGMM membership probabilities, after MDN hyperparameter optimisation, we train one MDN for each of the four IGMM implementations as described in previous sections.

We randomly split the entire spectroscopic data set Sect. 2.1 and use 80% for training and 20% for validation of the MDN. In order to optimise the MDN, we explored neural networks with 0 to 3 hidden layers. Each layer with a discrete number of neurons in the interval [3, 7, 10, 74, 100, 156, 222, 300, 400, 500, 528, 600, 740]. Furthermore, we used a discrete number of branches for the MDN in the interval [10, 52, 56, 100, 300]. The standard rectified linear unit (ReLU, Nair & Hinton 2010) and parametric rectified linear unit (PReLU, He et al. 2015) are used as activation functions. We used ADAM as optimiser and batch learning with 64 objects per epoch with learning rates of 10⁻⁶, 10⁻⁵, 10⁻⁴ and 10⁻³ to mitigate local minima of the loss function.

By comparing the training and validation loss of MDNs with the previously defined set of hyperparameters, the resulting optimal set of hyperparameters contains a hidden layer with 528 neurons and 10 MDN branches. The activation function is PReLU and the learning rate for the ADAM optimiser is 10⁻⁴. Figure 6 shows the loss function, −log(ℒ)/N, for the training and validation data set, for the MDN optimisation for which membership probabilities are obtained from the partially supervised IGMM realisation that also considers the spectroscopic classes. As Fig. 6 shows, the learning curve flattens roughly around 300 epochs. To mitigate overfitting, we concluded that 300 epochs are sufficient to train the model. Additionally to training MDNs with the redshifts as targets, we tested log(z_s) as a target and it led to an improvement in the z_p estimation.

Fig. 6.

MDN Loss (−log(ℒ)/N) as a function of epoch. The loss obtained during the MDN training and validation are shown by blue and orange lines, respectively.

3. Results

We trained an IGMM on the photometric data set (see Sect. 2.1), using the optimal hyperparameters (Sect. 2.2). Thereafter, we linked IGMM components to the three spectroscopic classes using a spectroscopic data set (Sect. 2.1). Finally, we implemented MDNs on the spectroscopic data set using photometric features and membership probabilities from the IGMM to estimate the conditional probability distribution p(z_p|f) of photo-z values from the photometric inputs. In this section, we describe the evaluation methods and the resulting classification and photo-z estimations.

3.1. Classification

With our mixture models we address the common problem of cross contamination among different classes of objects due to the a priori unknown underlying spectral energy distribution. In the IGMM realisations, each object can belong to each of the components with a probability p_i, k = w_k𝒩(f_i|μ_k, Σ_k)/∑_l(w_l𝒩(f_i|μ_l, Σ_l)), which we denote by membership probabilities in the following. As we introduced above (end of Sect. 2.2), the simplest way to assign an object (with feature vector f_i) to a component is to consider the component index $\hat{k}$ $\hat{k}$ for which $p_{i, \hat{k}}$ $p_{i,\hat{k}}$ is maximised.

To parameterise the accuracy of the classification, we consider the usual quantification of true and false positives and true and false negatives (e.g., Fawcett 2006), and build a confusion matrix to quantify the rate of correct classifications. Figure 7 shows the confusion matrix of the GMM-based classification for the spectroscopic data set. The true positive rates⁵ for stars, galaxies and quasars are 0.97, 0.91 and 0.95, respectively. False positive rates for stars that are true galaxies and quasars are 0.0029 and 0.029. False negative rates for stars that are assigned to galaxies and quasars are 0.031 and 0.019 of all stars, respectively. The accuracy⁶ is ≈94%. This means that the IGMM part of our mixture models can clean an extragalactic sample from most of stellar contaminants, and broadly separate galaxies from AGN-dominated objects. We compare the performance of our classification to that of a HDBSCAN classification (Campello et al. 2013) of a subset of 39 447 objects from the SDSS-DR14 that have an accuracy of ≈95% using ugrizYHJKw1w2-band magnitudes (Logan & Fotopoulou 2020). We find that our method achieves a comparable accuracy even without u-band and YJHK-band magnitudes.

Fig. 7.

IGMM confusion matrix. The spectroscopic classifications are shown against the IGMM classes of the spectroscopic data set.

Figure 5 demonstrates that the IGMM recognises the main behaviours of stars, galaxies and quasars in colour space and also identifies subclasses that are not highly represented in the spectroscopic sample, such as white dwarfs and brown dwarfs. On the other hand, some components happen to lie in regions of the colour-magnitude-extinction space that are not dominated by only one subclass. The overlap between different object classes in photometry can affect the classification performance and the output of the classification that is then used by the MDN regression. The components corresponding to regions of overlap between different classes are discussed below.

Approximately 30% of IGMM components that cover ≈15% of the spectroscopic data set, marked in red in Table 1, contain a non-negligible fraction of objects from more than one of the three main classes. Figure 8 shows their position in the same colour-colour and colour-magnitude diagrams as Fig. 5. We address these components as ‘problematic components’.

Fig. 8.

Colour-colour and colour-magnitude diagrams. Shown are g − r vs. z − W1 colour-colour diagrams (upper panel) and W2 vs. W1 − W2 colour-magnitude diagrams (bottom panel) for objects from the spectroscopic data set of the three spectroscopic classes such as stars (left column), galaxies (middle column) and quasars (right column). The purple contours correspond to the 68-th percentile of the problematic Gaussian components of the IGMM that are not dominated by objects of just one spectroscopic class. The green filled circles correspond to the means μ_k of these components. The grey scale indicates the number of sources in each diagram.

Table 1.

Percentage of objects from each spectroscopic class (stars, galaxies, quasars) within each IGMM component.

As expected, the problematic components lie at the faint end (with higher magnitude uncertainties in WISE), or in intermediate regions of the colour space between AGN-dominated and galaxy-dominated systems. Additionally, the SDSS spectroscopic classification of some objects is ambiguous and for some cases the automatic classification (by the SDSS spectral pipelines) is either erroneous or has multiple incompatible entries⁷. These issues occur more frequent for fainter objects which have spectra with low signal-to-noise ratio⁸. However, since most of the objects are clustered in three main classes which are correctly identified by the IGMM components, uncertain spectroscopic labels are not a significant problem for our calculations.

3.2. Photometric redshifts

Here we discuss different metrics employed to evaluate the performance of our methods used to determine photometric redshifts. Most metrics are based on commonly used statistical methods as outlined. The prediction bias is defined as the mean of weighted residuals, Δz = (z_p − z_s)/(1 + z_s), which is the same as the definition in Cohen et al. (2000). The root-mean-square of the weighted residuals is defined as rms(Δz). The fraction of outliers is defined as the number of objects with 3 × rms(Δz) < Δz.

For all methods, we excluded objects with spectroscopic redshift errors δz_s > 0.01 × (1 + z_s). For each source, the MDN determines a full photo-z distribution, which is a superposition of all branches, each with a membership probability, average, and dispersion. If one so-called point estimate is needed, there are at least two options to compute it. One option is the expectation value

$\begin{matrix} E (z_{p, i} | f_{i}) = \frac{\sum_{k} μ_{k} (f_{i}) {\hat{p}}_{k}}{\sum_{k} {\hat{p}}_{k}}, \end{matrix}$ $\begin{aligned} \mathbb{E} (z_{p,i}|\boldsymbol{f}_{i})=\frac{\sum _{k}\upmu _{k}(\boldsymbol{f}_{i})\hat{p}_{k}}{\sum _{k}\hat{p}_{k}}, \end{aligned}$ (2)

weighted across all branches that an object can belong to, according to its branch membership probabilities. Another choice, which we follow here, is the peak μ_r(f_i) of the branch that gives the maximum membership probability of a given object. In what follows, we refer to this redshift value as the ‘peak photo-z’.

In a fully Bayesian framework, one would also consider the maximum-a-posteriori, which in our approach would correspond with the photo-z that maximises the MDN output sum of Gaussians because all our priors are uniform. Strictly speaking, in principle the maximum-a-posteriori and the peak redshift are not the same, if the MDN Gaussian with the highest membership probability lies close to other MDN Gaussians. Here, for the sake of computational convenience we choose the peak redshift, leaving the comparison with other possible choices (including the maximum-a-posteriori, or non-uniform priors) to future investigation. We also note that this distinction becomes important for objects whose membership probabilities are not clearly dominated by one of the MDN output Gaussians. In the following, we examine the photo-z estimation performance on objects whose membership probabilities to one of the MDN Gaussians is > 80%, in which case there is almost no difference between peak redshit and maximum-a-posteriori redshift.

Figure 9 shows the distribution of peak photo-zs (top) and expectation photo-zs (bottom) versus spectroscopic redshifts, z_s, for the MDN run with ten branches. One aspect to consider when determining photo-z in cosmological wide-field imaging surveys, is the availability of u-band magnitudes, which is currently available for KiDS but not for DES. The Rubin LSST is expected to deliver u-band photometry at the same depth of KiDS over ≈12 300deg² (Y1) and ≈14 300deg² (Y2) (The LSST Dark Energy Science Collaboration 2018). To test the effect, we re-trained one of our mixture models (IGMM spec. class) for a data set that includes u-band PSF and model magnitudes as additional input features (Fig. 10). The bias and root-mean-square residuals are provided in Table 2 for all objects and for galaxies with spectroscopic redshifts z_s < 0.3, z_s < 0.4, and z_s < 0.5. This test leads to a lower rms Δz and smaller fraction of 3σ outliers than for the same model without u-band magnitudes and can be considered an improvement in accuracy. One reason is that for low redshift objects, u-band contains information on the position and strength of the 4000 Å Balmer-break, which is not covered by other, longer wavelength bands. Furthermore, with respect to the cross-contamination problem, this model also improves the overall confidence level with which an object belongs to a branch. As demonstrated in Fig. 10, bottom panel, the MDN performs indeed better for objects with increased confidence level.

Fig. 9.

Comparison of spectroscopic vs. MDN photometric redshifts. The photometric redshifts are taken from the partially supervised ‘spec. class’ IGMM implementation (as described in Sect. 2.2). The colour-scales indicate the number of objects. Top panels: predicted photometric redshifts that correspond to the branches with the highest weights. The single panels show the weights, dispersions (denoted by ‘width’) and residuals from top to bottom. Bottom panels: mean photometric redshifts of the predicted redshifts over all branches with respect to their weights. Lower panel: residuals. Left panels: include all classes with z_spec < 7. Right panels: include all galaxies with z_spec < 0.3.

Fig. 10.

Photo-z performance of different MDN implementations. Top panel: retaining only objects with weight_max > 0.8 membership probability to a MDN branch. Middle panel: including u-band PSF and model magnitudes. Bottom panel: u-band magnitudes and MDN branch weight_max > 0.8. Left column: all types of objects in the spectroscopic data set. Right column: only galaxies in z_s < 0.3. from the spectroscopic data set.

Table 2.

MDN performance evaluation, without any clipping for the average and rms, without any threshold on branch membership probabilities.

4. Conclusion

Performance evaluations of MDN photo-zs from different IGMM realisations are summarised in Table 2. The minimum MDN photo-z bias and rms of different IGMM realisations without using u-band magnitudes is reached by the IGMM realisation (Spec. class, (z_s)) that uses the spectroscopic classifications and spectroscopic redshifts of ≈2% of the dataset. This implies that the IGMM can provide a reasonable description of the objects in their feature space even with very limited spectroscopic information. On the other hand, for this realisation the fraction of 3σ outliers is not as low as for other IGMM realisations (Tables 2 and 3) which have a slightly higher bias and rms. For the validation samples that are restricted to galaxies with a spectroscopic redshift < 0.3, the minimum MDN photo-z bias, rms and fraction of 3σ outliers are achieved for the realisation that includes u-band magnitudes. The MDN performance of different IGMM realisations for objects that belong to one MDN branch with high confidence (i.e., weight_max > 0.8) is summarised in Table 3. In order to compare our photo-z estimates to those from the SDSS, we select objects from the SDSS SpecObjAll table that have photo-zs (obtained with a kNN interpolation) from the SDSS PhotoObjAll (38 487 objects). We remark that our MDN yields the full photo-z PDF, so a choice must be made when comparing its results with point estimates from other methods in the literature. For this reason, we follow our previous choice and consider the peak photo-z’s. We compare our photo-z’s to the SDSS ones for the full sample (38 487 objects) and also for the subset of objects that belong to a MDN branch with high confidence (weight_max > 0.8, 18 355 objects). To account for uncertainties in bias and rms due to the finite sample size, we split the whole sample into five sub-sets, compute the bias and rms on each, and then report their average and standard deviation. As an alternative, we also evaluate the bias, rms and fraction of outliers when the MDN is trained with a k-fold cross-validation method (see e.g Altman & Bland 2005) on the full SDSS training set (≈245 000 objects), where each of the k = 5 folds exclude a fifth of the SDSS photo-z sample. As summarised in Table 4 and shown in Fig. 11, our method improves the bias of photo-z estimates by about one order of magnitude compared to SDSS photo-zs for objects for which our model estimated the photo-zs at high confidence (i.e., weight_max > 0.8), and our method also decreases the rms and the fraction of outliers. All metrics are improved, with the added advantage that the MDN computes photo-zs for all objects (instead of just those with low stellarity) and can also cover the z_s > 1 range more accurately than the SDSS kNN. As a matter of fact, the SDSS photo-zs hardly exceed z_p ≈ 1, while our machinery is trained over a much wider redshift range.

Fig. 11.

Top panel: SDSS spectroscopic redshift vs. SDSS photometric redshift. Bottom panel: spectroscopic redshift vs. photometric redshift (this work). Colour bars indicate the number of sources in the diagrams. The selection of sources is made by retaining objects with weight_max > 0.8 membership probability to a MDN branch.

Table 3.

MDN performance evaluation exclusively for sources with MDN branch weight_max > 0.8, without any clipping for the average and rms.

Table 4.

Comparison between the photo-z evaluation on all objects from the spectroscopic sample with available SDSS photo-zs.

As a general benchmark, the LSST system science requirements document⁹ defines three photometric redshift requirements for a sample of four billion galaxies with i < 25 mag within z_s < 0.3 as follows: First, for the error in (1 + z_s) the rms of residuals is lower than 0.02. Secondly, the fraction of 3σ (‘catastrophic’) outliers is lower than 10% of the sample and thirdly, the bias is lower than 0.003.

In our approach, these requirements are met if the MDN peak z_p is adopted. The rms Δz can be brought to 0.02 over 0 < z_s < 0.5 both for all objects (Table 5) and for ‘high-confidence’ objects with > 0.8 membership probability to a branch (Table 6; called weight in Sect. 2.3). Although our training and evaluation samples from the SDSS do not reach as deep as the Rubin LSST is expected to, our method shows promising results as a starting point. A general remark, which holds for all photo-z estimation methods is that a re-training for the LSST regime may also need deeper photometric and more robust spectroscopic samples. Recently, Beck et al. (2021) used neural networks to classify objects in the Pan-STARRS1 footprint, which is known to have a more accurate photometry than the SDSS (Magnier et al. 2013), and evaluated photo-zs on objects with a probability p > 0.8 of being galaxies, obtaining rms(Δz) = 0.03 over 0 < z_s < 1. If we follow the same definitions and clipping¹⁰ as by Beck et al. (2021), then we obtain 1.7–2% relative rms over the 0 < z_s < 0.5 redshift range. Our code for re-training and any further evaluation is publicly accessible¹¹.

Table 5.

MDN performance evaluation.

Table 6.

MDN performance evaluation exclusively for sources with MDN branch weight_max > 0.8.

5. Discussion

Adding u-band information, as is the case with the SDSS and will be the case with the LSST, reduces the bias and fraction of outliers in all the redshift ranges considered. This is also because adding u-band magnitudes sharpens the MDN separation into branches and increases the fraction of objects with the highest weighted branch > 0.8, as can be seen in the bottom panels of Fig. 9.

We remark that throughout this work, we are simply adopting reddening values in the i-band (A_i), which the SDSS provides via a simple conversion of measured E(B − V) values with a Milky-Way extinction law and R_V = 3.1. Our approach accounts for the systematic uncertainties due to the unknown extinction law by producing probability distributions and associate uncertainties for each photo-z value.

The combined information across the optical and infrared, through the SDSS and WISE magnitudes, helps reducing the overlap between different classes in colour-magnitude space. The WISE depth is not a major limiting factor in the sample completeness as long as samples from the SDSS are considered, but it can affect the completeness significantly for deeper surveys (Spiniello & Agnello 2019). In view of performing the classification and photo-z estimation on the DES, and on the Rubin LSST later on, deeper mid-IR data are needed. The unWISE reprocessing of the WISE cutouts improved upon the original WISE depth (Lang 2014). Further in the future, forced photometry of the unWISE cutouts from wide-field optical and NIR surveys may further increase the mid-IR survey depth (e.g., Lang & Hogg 2014).

In general, separating objects into many sub-classes aids the photo-z regression, as each MDN branch only needs to consider a subset of objects with more homogeneous properties than the whole photometric sample. Furthermore, the approach that we used in this work is both in the realm of machine learning (hence less constrained by choices of templates) while it can also produce a full output distribution for the photo-z given the available photometric information. Beyond their first implementation in this work, mixture models can be easily adapted so that they can account for missing entries and limited depth, as in the GMM implementation by Melchior & Goulding (2018).

¹

While it is not all present in the literature, PRIMUS has been used for instance by Hoyle et al. (2018), and Behroozi et al. (2019) suggest that its redshift uncertainties may be smaller than previously thought.

²

https://skyserver.sdss.org/casjobs/

³

https://has2k1.github.io/scikit-misc/stable/generated/skmisc.loess.loess.html

⁴

https://github.com/cpmpercussion/keras-mdn-layer

⁵

Defined as: TP/(TP+FN).

⁶

Defined as: (TP+TN)/(TP+TN+FN+FP).

⁷

For example for OBJID=1691188859137714176 from SDSS-DR15.

⁸

For example for OBJID=743142903307593728 from SDSS-DR15.

⁹

https://docushare.lsstcorp.org/docushare/dsweb/Get/LPM-17

¹⁰

Their clipping procedure removes objects with |Δz| > 0.15.

¹¹

github.com/ZoeAnsari/MixtureModelsForPhotometricRedshifts

Acknowledgments

This work is supported by a VILLUM FONDEN Investigator grant (project number 16599), a VILLUM FONDEN Young Investor Grant (project number 25501), and a Villum Experiment Grant (project number 36225). This project is partially funded by the Danish council for independent research under the project “Fundamentals of Dark Matter Structures”, DFF–6108-00570.

References

Abbott, T. M. C., Abdalla, F. B., Allam, S., et al. 2018, ApJS, 239, 18 [Google Scholar]
Aguado, D. S., Ahumada, R., Almeida, A., et al. 2019, ApJS, 240, 23 [Google Scholar]
Ahn, C. P., Alexandroff, R., Allende Prieto, C., et al. 2012, ApJS, 203, 21 [Google Scholar]
Almosallam, I. A., Lindsay, S. N., Jarvis, M. J., et al. 2016, MNRAS, 455, 2387 [NASA ADS] [CrossRef] [Google Scholar]
Altman, D. G., & Bland, J. M. 2005, BMJ, 331, 903 [Google Scholar]
Amaro, V., Cavuoti, S., Brescia, M., et al. 2019, MNRAS, 482, 3116 [NASA ADS] [CrossRef] [Google Scholar]
Amiaux, J., Scaramella, R., Mellier, Y., et al. in Space Telescopes and Instrumentation 2012: Optical, Infrared, and Millimeter Wave, SPIE Conf. Ser., 8442, 84420Z [NASA ADS] [CrossRef] [Google Scholar]
Beck, R., Szapudi, I., Flewelling, H., et al. 2021, MNRAS, 500, 1633 [Google Scholar]
Behroozi, P., Wechsler, R. H., Hearin, A. P., & Conroy, C. 2019, MNRAS, 488, 3143 [Google Scholar]
Benítez, N. 2000, ApJ, 536, 571 [Google Scholar]
Bishop, C. M. 1994, unpublished [Google Scholar]
Brammer, G. B., van Dokkum, P. G., & Coppi, P. 2008, ApJ, 686, 1503 [NASA ADS] [CrossRef] [Google Scholar]
Campello, R. J. G. B., Moulavi, D., & Sander, J. 2013, in Advances in Knowledge Discovery and Data Mining, eds. J. Pei, V. L., Cao, H. Motoda, & G. Xu (Berlin, Heidelberg: Springer), 160 [Google Scholar]
Carrasco Kind, M., & Brunner, R. J. 2013, MNRAS, 432, 1483 [NASA ADS] [CrossRef] [Google Scholar]
Cohen, J. G., Hogg, D. W., Blandford, R., et al. 2000, ApJ, 538, 29 [NASA ADS] [CrossRef] [Google Scholar]
Coil, A. L., Blanton, M. R., Burles, S. M., et al. 2011, ApJ, 741, 8 [Google Scholar]
Curran, S. J. 2020, MNRAS, 493, L70 [NASA ADS] [CrossRef] [Google Scholar]
de Jong, J. T. A., Verdoes Kleijn, G. A., Kuijken, K. H., et al. 2013, Exp. Astron., 35, 25 [Google Scholar]
Dempster, A. P., Laird, N. M., & Rubin, D. B. 1977, J. R. Stat. Soc.: Ser. B (Methodological), 39, 1 [Google Scholar]
Drinkwater, M. J., Jurek, R. J., Blake, C., et al. 2010, MNRAS, 401, 1429 [Google Scholar]
Driver, S. P., Hill, D. T., Kelvin, L. S., et al. 2011, MNRAS, 413, 971 [Google Scholar]
Fawcett, T. 2006, Pattern Recognit. Lett., 27, 861 [Google Scholar]
Ferguson, T. S. 1973, Ann. Statist., 1, 209 [Google Scholar]
Fernández-Soto, A., Lanzetta, K. M., & Yahil, A. 1999, ApJ, 513, 34 [NASA ADS] [CrossRef] [Google Scholar]
Galametz, A., Saglia, R., Paltani, S., et al. 2017, A&A, 598, A20 [CrossRef] [EDP Sciences] [Google Scholar]
Garilli, B., Le Fèvre, O., Guzzo, L., et al. 2008, A&A, 486, 683 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Gerdes, D. W., Sypniewski, A. J., McKay, T. A., et al. 2010, ApJ, 715, 823 [NASA ADS] [CrossRef] [Google Scholar]
Görür, D., & Edward Rasmussen, C. 2010, J. Comp. Sci. Technol., 25, 653 [Google Scholar]
Graham, M. L., Connolly, A. J., Ivezić, Ž., et al. 2018, AJ, 155, 1 [NASA ADS] [CrossRef] [Google Scholar]
Guzzo, L., Scodeggio, M., Garilli, B., et al. 2014, A&A, 566, A108 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
He, K., Zhang, X., Ren, S., & Sun, J. 2015, ArXiv eprints [arXiv:1502.01852] [Google Scholar]
Hildebrandt, H., Köhlinger, F., van den Busch, J. L., et al. 2020, A&A, 633, A69 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Hornik, K. 1991, Neural Networks, 4, 251 [CrossRef] [MathSciNet] [Google Scholar]
Hoyle, B., Gruen, D., Bernstein, G. M., et al. 2018, MNRAS, 478, 592 [Google Scholar]
Ilbert, O., Arnouts, S., McCracken, H. J., et al. 2006, A&A, 457, 841 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Joudaki, S., Blake, C., Johnson, A., et al. 2018, MNRAS, 474, 4894 [Google Scholar]
Knox, L., Song, Y.-S., & Zhan, H. 2006, ApJ, 652, 857 [NASA ADS] [CrossRef] [Google Scholar]
Lang, D. 2014, AJ, 147, 108 [NASA ADS] [CrossRef] [Google Scholar]
Lang, D., & Hogg, D. W. 2014, ApJ, 151, 36 [Google Scholar]
Lilly, S., & Zcosmos Team, 2008, The Messenger, 134, 35 [NASA ADS] [Google Scholar]
Logan, C. H. A., & Fotopoulou, S. 2020, A&A, 633, A154 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Magnier, E. A., Schlafly, E., Finkbeiner, D., et al. 2013, ApJS, 205, 20 [NASA ADS] [CrossRef] [Google Scholar]
Melchior, P., & Goulding, A. D. 2018, Astron. Comput., 25, 183 [NASA ADS] [CrossRef] [Google Scholar]
Nair, V., & Hinton, G. E. 2010, in Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10 (Madison, WI, USA: Omnipress), 807 [Google Scholar]
Newman, J. A., Cooper, M. C., Davis, M., et al. 2013, ApJS, 208, 5 [Google Scholar]
Nishizawa, A. J., Hsieh, B. C., Tanaka, M., & Takata, T. 2020, ArXiv eprints [arxiv: 2003.01511] [Google Scholar]
Pasquet, J., Bertin, E., Treyer, M., et al. 2019, A&A, 621, A26 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Sadeh, I., Abdalla, F. B., & Lahav, O. 2019, ANNz2: Estimating Photometric Redshift and Probability Density Functions Using Machine Learning Methods [Google Scholar]
Salvato, M., Ilbert, O., & Hoyle, B. 2019, Nat. Astron., 3, 212 [NASA ADS] [CrossRef] [Google Scholar]
Schmidt, S. J., Malz, A. I., Soo, J. Y. H., et al. 2020, MNRAS, 499, 1587 [Google Scholar]
Shuntov, M., Pasquet, J., Arnouts, S., et al. 2020, A&A, 636, A90 [CrossRef] [EDP Sciences] [Google Scholar]
Spiniello, C., & Agnello, A. 2019, A&A, 630, A146 [EDP Sciences] [Google Scholar]
Teh, Y. W. 2010, in Dirichlet Process, eds. C. Sammut, & G. I. Webb (Boston, MA: Springer, US), 280 [Google Scholar]
The LSST Dark Energy Science Collaboration (Mandelbaum, R., et al.) 2018, ArXiv eprints [arXiv:1809.01669] [Google Scholar]
Wright, E. L., Eisenhardt, P. R. M., Mainzer, A. K., et al. 2010, AJ, 140, 1868 [Google Scholar]

Appendix A: Summary of photometric redshifts in the literature

Table A.1.

Recent automated approaches to estimate photo-zs.

Table A.2.

Comparison of photo-z estimates.

Appendix B: IGMM

Probability density distribution (PDF) formalisation by Gaussian mixture modelling for K components is defined as follows:

$\begin{matrix} P (x | μ_{1}, \dots, μ_{K}, Σ_{1}, \dots, Σ_{K}) = \sum_{k = 1}^{K} π_{k} N (μ_{k}, σ_{k}) \end{matrix}$ $\begin{aligned} P(x|\upmu _{1},\ldots ,\upmu _{K},\Sigma _{1},\ldots ,\Sigma _{K})=\sum _{k=1}^{K} \pi _{k} \mathcal{N} (\upmu _k, \sigma _{k}) \end{aligned}$ (B.1)

where x is the data, π_k is the weight distribution of mixtures that is defined by a Dirichlet distribution and $\sum_{k = 1}^{K} π_{k} = 1$ $\sum\nolimits_{k=1}^{K} \pi_{k}=1$ .

IGMM is the GMM case with infinite number of components using Dirichlet process instead of Dirichlet distribution to define the prior over the mixture distribution. Dirichlet process is a distribution over distributions, parameterising by concentration parameter γ and a base distribution G₀. The base distribution is the Dirichlet distribution which is a prior over the locations of components in the parameter space (i.e., Θ = (μ,Σ)). The concentration parameter γ expresses the strength of belief in G₀ and affects the components weight (Görür & Edward Rasmussen 2010).

Based on Bayes rule:

$\begin{matrix} γ Z_{i} (k) = P (Z_{i} = k | x) = \frac{P (k) P (x | Z_{i} = k)}{P (x)} = \frac{π_{k} N (x | Θ_{k})}{\sum_{k = 1}^{k} π_{k} N (x | Θ_{k})} \end{matrix}$ $\begin{aligned} \gamma Z_{i}(k)=P(Z_{i}=k|x)= \frac{P(k)P(x|Z_{i}=k)}{P(x)}=\frac{\pi _{k} \mathcal{N} (x|\Theta _{k})}{\sum _{k=1}^{k} \pi _{k} \mathcal{N} (x|\Theta _{k})} \end{aligned}$ (B.2)

where $\underset{̲}{π}$ $\underline{\pi}$ is considered as the Dirichlet process and Z_i is the latent variable. π_k = N_k/N represents the effective number of data points assigned to the k-th mixture component. Despite the fact that we do not know the latent variable, there is information about it in the posterior.

Using an expectation-maximisation (EM) algorithm to find the maximum likelihood with respect to the model parameters includes two steps, estimation step (e-step) and maximisation step (m-step). After initialising the model parameters and evaluating the log-likelihood, the e-step evaluates the posterior distribution of Z_i using the current model parameter values by Eq. (B.2). Then, the m-step updates the model parameters based on the calculated latent variable as follows:

$\begin{matrix} μ_{k} = \frac{\sum_{i = 1}^{N} γ Z_{i} (k) x_{i}}{\sum_{i = 1}^{N} γ Z_{i} (k)} = \frac{1}{N_{k}} \sum_{i = 1}^{N} γ Z_{i} (k) x_{i} \end{matrix}$ $\begin{aligned} \upmu _{k} =\frac{\sum _{i=1}^{N} \gamma Z_{i}(k) x_{i}}{\sum _{i=1}^{N} \gamma Z_{i}(k)}=\frac{1}{N_{k}} \sum\nolimits _{i=1}^{N} \gamma Z_{i}(k) x_{i} \end{aligned}$ (B.3)

$\begin{matrix} Σ_{k} = \frac{1}{N_{k}} \sum_{i = 1}^{N} γ Z_{i} (k) (x_{i} - μ_{k}) (x_{i} - μ_{k}) \end{matrix}$ $\begin{aligned} \Sigma _{k} = \frac{1}{N_{k}} \sum _{i=1}^{N} \gamma Z_{i}(k) (x_{i} - \upmu _{k})(x_{i} - \upmu _{k}) \end{aligned}$ (B.4)

$\begin{matrix} π_{k} = \frac{N_{k}}{N} \end{matrix}$ $\begin{aligned} \pi _{k} = \frac{N_{k}}{N} \end{aligned}$ (B.5)

Where $N_{k} = \sum_{i = 1}^{N} γ Z_{i} (k)$ $N_{k}=\sum\nolimits_{i=1}^{N} \gamma Z_{i}(k)$ . Eventually, the algorithm detects the convergence by the lack of significant change in the log-likelihood value from one iteration to the next, using:

$\begin{matrix} log P (x | μ, Σ, π) = \sum_{i = 1}^{N} log {\sum_{k = 1}^{K} π_{k} N (x_{i} | μ_{k}, Σ_{k})} \end{matrix}$ $\begin{aligned} \log P(x|\upmu , \Sigma , \pi )= \sum _{i=1}^{N} \log \Bigg \{ \sum _{k=1}^{K} \pi _{k} \mathcal{N} (x_{i}|\upmu _{k}, \Sigma _{k}) \Bigg \} \end{aligned}$ (B.6)

where π_k, the mixture proportion, represents the probability of x_i belonging to the k-th mixture component.

All Tables

Table 1.

Percentage of objects from each spectroscopic class (stars, galaxies, quasars) within each IGMM component.

In the text

Table 2.

MDN performance evaluation, without any clipping for the average and rms, without any threshold on branch membership probabilities.

In the text

Table 3.

MDN performance evaluation exclusively for sources with MDN branch weight_max > 0.8, without any clipping for the average and rms.

In the text

Table 4.

Comparison between the photo-z evaluation on all objects from the spectroscopic sample with available SDSS photo-zs.

In the text

Table 5.

MDN performance evaluation.

In the text

Table 6.

MDN performance evaluation exclusively for sources with MDN branch weight_max > 0.8.

In the text

Table A.1.

Recent automated approaches to estimate photo-zs.

In the text

Table A.2.

Comparison of photo-z estimates.

In the text

All Figures

	Fig. 1. Spectroscopic data set in equatorial coordinates. Data are taken from SDSS-DR15 + WISE totalling about 245 000 objects of which there are 86 412 stars (yellow), 83 119 galaxies (purple) and 75 955 quasars (green). The entire photometric data set is a sample of about 1 023 000 objects, of which 98% lack spectroscopic redshifts and classification.
In the text

	Fig. 2. Histogram showing i-band magnitudes of the objects from the photometric (blue) and spectroscopic data sets for stars (yellow), galaxies (purple) and quasars (green), in 0.1 magnitude bins.
In the text

	Fig. 3. Redshift distribution of the spectroscopic data set. Top panel: galaxies (purple) and quasars (green) in 0.1 redshift bins. Bottom panel: stars (yellow) in 0.0001 redshift bins.
In the text

Fig. 4.

Maximum number of components vs. final number of components for different IGMM realisations, restricted to Gaussian components that contain at least 0.5% of the photometric data. Blue filled circles represent IGMM realisations that needed more than 2000 iterations to converge, while purple filled circles mark IGMM realisations that needed less than 2000 iterations. The size of the symbols scales with three different values of the prior of the Dirichlet concentration (γ). The light blue shaded region represents the confidence interval of 99% of regression estimation over the IGMM profiles by a multivariate smoothing procedure.

In the text

Fig. 5.

Colour-colour and colour-magnitude diagrams. Shown are g − r vs. z − W1 colour-colour diagrams (upper panel) and W2 vs. W1 − W2 colour-magnitude diagrams (bottom panel) for a populations of objects from the spectroscopic data set such as stars (left column), galaxies (middle column) and quasars (right column). The purple contours correspond to the 68th percentile of each Gaussian IGMM component. The green filled circles correspond to the means μ_k of the Gaussian components. The grey scale indicates the abundance of the sources in each diagram.

In the text

	Fig. 6. MDN Loss (−log(ℒ)/N) as a function of epoch. The loss obtained during the MDN training and validation are shown by blue and orange lines, respectively.
In the text

	Fig. 7. IGMM confusion matrix. The spectroscopic classifications are shown against the IGMM classes of the spectroscopic data set.
In the text

Fig. 8.

Colour-colour and colour-magnitude diagrams. Shown are g − r vs. z − W1 colour-colour diagrams (upper panel) and W2 vs. W1 − W2 colour-magnitude diagrams (bottom panel) for objects from the spectroscopic data set of the three spectroscopic classes such as stars (left column), galaxies (middle column) and quasars (right column). The purple contours correspond to the 68-th percentile of the problematic Gaussian components of the IGMM that are not dominated by objects of just one spectroscopic class. The green filled circles correspond to the means μ_k of these components. The grey scale indicates the number of sources in each diagram.

In the text

Fig. 9.

Comparison of spectroscopic vs. MDN photometric redshifts. The photometric redshifts are taken from the partially supervised ‘spec. class’ IGMM implementation (as described in Sect. 2.2). The colour-scales indicate the number of objects. Top panels: predicted photometric redshifts that correspond to the branches with the highest weights. The single panels show the weights, dispersions (denoted by ‘width’) and residuals from top to bottom. Bottom panels: mean photometric redshifts of the predicted redshifts over all branches with respect to their weights. Lower panel: residuals. Left panels: include all classes with z_spec < 7. Right panels: include all galaxies with z_spec < 0.3.

In the text

Fig. 10.

Photo-z performance of different MDN implementations. Top panel: retaining only objects with weight_max > 0.8 membership probability to a MDN branch. Middle panel: including u-band PSF and model magnitudes. Bottom panel: u-band magnitudes and MDN branch weight_max > 0.8. Left column: all types of objects in the spectroscopic data set. Right column: only galaxies in z_s < 0.3. from the spectroscopic data set.

In the text

	Fig. 11. Top panel: SDSS spectroscopic redshift vs. SDSS photometric redshift. Bottom panel: spectroscopic redshift vs. photometric redshift (this work). Colour bars indicate the number of sources in the diagrams. The selection of sources is made by retaining objects with weight_max > 0.8 membership probability to a MDN branch.
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Abbott, T. M. C., Abdalla, F. B., Allam, S., et al. 2018, ApJS, 239, 18 [Google Scholar]

[2] Aguado, D. S., Ahumada, R., Almeida, A., et al. 2019, ApJS, 240, 23 [Google Scholar]

[3] Ahn, C. P., Alexandroff, R., Allende Prieto, C., et al. 2012, ApJS, 203, 21 [Google Scholar]

[4] Almosallam, I. A., Lindsay, S. N., Jarvis, M. J., et al. 2016, MNRAS, 455, 2387 [NASA ADS] [CrossRef] [Google Scholar]

[5] Altman, D. G., & Bland, J. M. 2005, BMJ, 331, 903 [Google Scholar]

[6] Amaro, V., Cavuoti, S., Brescia, M., et al. 2019, MNRAS, 482, 3116 [NASA ADS] [CrossRef] [Google Scholar]

[7] Amiaux, J., Scaramella, R., Mellier, Y., et al. in Space Telescopes and Instrumentation 2012: Optical, Infrared, and Millimeter Wave, SPIE Conf. Ser., 8442, 84420Z [NASA ADS] [CrossRef] [Google Scholar]

[8] Beck, R., Szapudi, I., Flewelling, H., et al. 2021, MNRAS, 500, 1633 [Google Scholar]

[9] Behroozi, P., Wechsler, R. H., Hearin, A. P., & Conroy, C. 2019, MNRAS, 488, 3143 [Google Scholar]

[10] Benítez, N. 2000, ApJ, 536, 571 [Google Scholar]

[11] Bishop, C. M. 1994, unpublished [Google Scholar]

[12] Brammer, G. B., van Dokkum, P. G., & Coppi, P. 2008, ApJ, 686, 1503 [NASA ADS] [CrossRef] [Google Scholar]

[13] Campello, R. J. G. B., Moulavi, D., & Sander, J. 2013, in Advances in Knowledge Discovery and Data Mining, eds. J. Pei, V. L., Cao, H. Motoda, & G. Xu (Berlin, Heidelberg: Springer), 160 [Google Scholar]

[14] Carrasco Kind, M., & Brunner, R. J. 2013, MNRAS, 432, 1483 [NASA ADS] [CrossRef] [Google Scholar]

[15] Cohen, J. G., Hogg, D. W., Blandford, R., et al. 2000, ApJ, 538, 29 [NASA ADS] [CrossRef] [Google Scholar]

[16] Coil, A. L., Blanton, M. R., Burles, S. M., et al. 2011, ApJ, 741, 8 [Google Scholar]

[17] Curran, S. J. 2020, MNRAS, 493, L70 [NASA ADS] [CrossRef] [Google Scholar]

[18] de Jong, J. T. A., Verdoes Kleijn, G. A., Kuijken, K. H., et al. 2013, Exp. Astron., 35, 25 [Google Scholar]

[19] Dempster, A. P., Laird, N. M., & Rubin, D. B. 1977, J. R. Stat. Soc.: Ser. B (Methodological), 39, 1 [Google Scholar]

[20] Drinkwater, M. J., Jurek, R. J., Blake, C., et al. 2010, MNRAS, 401, 1429 [Google Scholar]

[21] Driver, S. P., Hill, D. T., Kelvin, L. S., et al. 2011, MNRAS, 413, 971 [Google Scholar]

[22] Fawcett, T. 2006, Pattern Recognit. Lett., 27, 861 [Google Scholar]

[23] Ferguson, T. S. 1973, Ann. Statist., 1, 209 [Google Scholar]

[24] Fernández-Soto, A., Lanzetta, K. M., & Yahil, A. 1999, ApJ, 513, 34 [NASA ADS] [CrossRef] [Google Scholar]

[25] Galametz, A., Saglia, R., Paltani, S., et al. 2017, A&A, 598, A20 [CrossRef] [EDP Sciences] [Google Scholar]

[26] Garilli, B., Le Fèvre, O., Guzzo, L., et al. 2008, A&A, 486, 683 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[27] Gerdes, D. W., Sypniewski, A. J., McKay, T. A., et al. 2010, ApJ, 715, 823 [NASA ADS] [CrossRef] [Google Scholar]

[28] Görür, D., & Edward Rasmussen, C. 2010, J. Comp. Sci. Technol., 25, 653 [Google Scholar]

[29] Graham, M. L., Connolly, A. J., Ivezić, Ž., et al. 2018, AJ, 155, 1 [NASA ADS] [CrossRef] [Google Scholar]

[30] Guzzo, L., Scodeggio, M., Garilli, B., et al. 2014, A&A, 566, A108 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[31] He, K., Zhang, X., Ren, S., & Sun, J. 2015, ArXiv eprints [arXiv:1502.01852] [Google Scholar]

[32] Hildebrandt, H., Köhlinger, F., van den Busch, J. L., et al. 2020, A&A, 633, A69 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[33] Hornik, K. 1991, Neural Networks, 4, 251 [CrossRef] [MathSciNet] [Google Scholar]

[34] Hoyle, B., Gruen, D., Bernstein, G. M., et al. 2018, MNRAS, 478, 592 [Google Scholar]

[35] Ilbert, O., Arnouts, S., McCracken, H. J., et al. 2006, A&A, 457, 841 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[36] Joudaki, S., Blake, C., Johnson, A., et al. 2018, MNRAS, 474, 4894 [Google Scholar]

[37] Knox, L., Song, Y.-S., & Zhan, H. 2006, ApJ, 652, 857 [NASA ADS] [CrossRef] [Google Scholar]

[38] Lang, D. 2014, AJ, 147, 108 [NASA ADS] [CrossRef] [Google Scholar]

[39] Lang, D., & Hogg, D. W. 2014, ApJ, 151, 36 [Google Scholar]

[40] Lilly, S., & Zcosmos Team, 2008, The Messenger, 134, 35 [NASA ADS] [Google Scholar]

[41] Logan, C. H. A., & Fotopoulou, S. 2020, A&A, 633, A154 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[42] Magnier, E. A., Schlafly, E., Finkbeiner, D., et al. 2013, ApJS, 205, 20 [NASA ADS] [CrossRef] [Google Scholar]

[43] Melchior, P., & Goulding, A. D. 2018, Astron. Comput., 25, 183 [NASA ADS] [CrossRef] [Google Scholar]

[44] Nair, V., & Hinton, G. E. 2010, in Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10 (Madison, WI, USA: Omnipress), 807 [Google Scholar]

[45] Newman, J. A., Cooper, M. C., Davis, M., et al. 2013, ApJS, 208, 5 [Google Scholar]

[46] Nishizawa, A. J., Hsieh, B. C., Tanaka, M., & Takata, T. 2020, ArXiv eprints [arxiv: 2003.01511] [Google Scholar]

[47] Pasquet, J., Bertin, E., Treyer, M., et al. 2019, A&A, 621, A26 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[48] Sadeh, I., Abdalla, F. B., & Lahav, O. 2019, ANNz2: Estimating Photometric Redshift and Probability Density Functions Using Machine Learning Methods [Google Scholar]

[49] Salvato, M., Ilbert, O., & Hoyle, B. 2019, Nat. Astron., 3, 212 [NASA ADS] [CrossRef] [Google Scholar]

[50] Schmidt, S. J., Malz, A. I., Soo, J. Y. H., et al. 2020, MNRAS, 499, 1587 [Google Scholar]

[51] Shuntov, M., Pasquet, J., Arnouts, S., et al. 2020, A&A, 636, A90 [CrossRef] [EDP Sciences] [Google Scholar]

[52] Spiniello, C., & Agnello, A. 2019, A&A, 630, A146 [EDP Sciences] [Google Scholar]

[53] Teh, Y. W. 2010, in Dirichlet Process, eds. C. Sammut, & G. I. Webb (Boston, MA: Springer, US), 280 [Google Scholar]

[54] The LSST Dark Energy Science Collaboration (Mandelbaum, R., et al.) 2018, ArXiv eprints [arXiv:1809.01669] [Google Scholar]

[55] Wright, E. L., Eisenhardt, P. R. M., Mainzer, A. K., et al. 2010, AJ, 140, 1868 [Google Scholar]