Issue 
A&A
Volume 672, April 2023



Article Number  A150  
Number of page(s)  9  
Section  Numerical methods and codes  
DOI  https://doi.org/10.1051/00046361/202245369  
Published online  14 April 2023 
Augmenting photometric redshift estimates using spectroscopic nearest neighbours
^{1}
Dipartimento di Fisica “Aldo Pontremoli”, Università degli Studi di Milano,
Via G. Celoria 16,
20133
Milano,
Italy
email: federico.tosone@unimi.it; marina.cagliari@unimi.it
^{2}
INAF–Osservatorio Astronomico di Brera,
Via Brera 28, 20121 Milano, and Via E. Bianchi 46,
23807
Merate,
Italy
^{3}
INFN–Sezione di Milano,
Via G. Celoria 16,
20133
Milano,
Italy
Received:
3
November
2022
Accepted:
2
March
2023
As a consequence of galaxy clustering, close galaxies observed on the plane of the sky should be spatially correlated with a probability that is inversely proportional to their angular separation. In principle, this information can be used to improve photometric redshift estimates when spectroscopic redshifts are available for some of the neighbouring objects. Depending on the depth of the survey, however, this angular correlation is reduced by chance projections. In this work, we implement a deeplearning model to distinguish between apparent and real angular neighbours by solving a classification task. We adopted a graph neural network architecture to tie together photometry, spectroscopy, and the spatial information between neighbouring galaxies. We trained and validated the algorithm on the data of the VIPERS galaxy survey, for which photometric redshifts based on spectral energy distribution are also available. The model yields a confidence level for a pair of galaxies to be real angular neighbours, enabling us to disentangle chance superpositions in a probabilistic way. When objects for which no physical companion can be identified are excluded, all photometric redshift quality metrics improve significantly, confirming that their estimates were of lower quality. For our typical test configuration, the algorithm identifies a subset containing ~75% highquality photometric redshifts, for which the dispersion is reduced by as much as 50% (from 0.08 to 0.04), while the fraction of outliers reduces from 3% to 0.8%. Moreover, we show that the spectroscopic redshift of the angular neighbour with the highest detection probability provides an excellent estimate of the redshift of the target galaxy, comparable to or even better than the corresponding templatefitting estimate.
Key words: galaxies: distances and redshifts / methods: statistical / methods: data analysis
© The Authors 2023
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.
1 Introduction
Knowledge of galaxy distances is of the utmost importance for cosmology to reconstruct the underlying 3D dark matter distribution that encapsulates key information about the evolution and matter content of the Universe. On cosmological scales, the most efficient method for estimating distances is through their cosmological redshift, which directly connects to the standard definitions of distance. Sufficiently precise redshift measurements allow us to test the world model through the redshiftdistance relation, coupled with standard rulers and standard candles (e.g. Riess et al. 1998; Perlmutter et al. 1998).
Over the past 25 yr, galaxy clustering measurements from large redshift surveys have been able to quantify the universal expansion and growth histories, pinpointing the value of cosmological parameters to high precision (e.g. Tegmark et al. 2006; Colless et al. 2003; Blake et al. 2011; de la Torre et al. 2017; Alam et al. 2017; Pezzotta et al. 2017; Bautista et al. 2021). Even larger redshift surveys are now ongoing (DESI; DESI Collaboration 2016) or are scheduled to start soon (Euclid; Laureijs et al. 2011), with the goal of further refining these measurements to exquisite precision and find clues for the poorly understood ingredients of the remarkably successful standard model of cosmology.
The redshift is measured from the shift in the position of emission and absorption features identified in galaxy spectra, typically through crosscorrelation techniques with reference templates, which capture the full available information (e.g. Tonry & Davis 1979). Despite the considerable advances of multiobject spectrographs over the past 40 yr, collecting spectra for large samples of galaxies remains an expensive task. A cheaper, lowerprecision alternative is offered by photometric estimates, that is, by measurements based on multiband imaging, in which integrated lowresolution spectral information is collected at once for large numbers of objects over large areas. The price to be paid is that of larger measurement errors, together with a number of catastrophic failures, which limit the scientific usage of such photometric redshifts (photozs hereafter) to specific applications (e.g. Newman & Gruen 2022). Still, when a sufficient number of photometric bands is available (Benitez et al. 2014; Laigle et al. 2016; Alarcon et al. 2021) or when even information about the ensemble mean spectrum can be obtained (Cagliari et al. 2022), these samples become highly valuable in many respects. Photozs are traditionally estimated by fitting template spectral energy distributions (SED) to the measured photometric fluxes (see e.g. Bolzonella et al. 2000; Arnouts et al. 2002; Maraston 2005; Ilbert et al. 2006). Detailed reviews can be found in Salvato et al. (2019), Brescia et al. (2021), and Newman & Gruen (2022).
Since the pioneering work of Collister & Lahav (2004; see also Lahav 1994), who first used artificial neural networks (ANN) to obtain photoz estimates, machinelearning (ML) algorithms have seen many further applications in this context. These include random forests (Carliles et al. 2010), selforganizing maps (SOM; Masters et al. 2015), and advanced ANNs (Sadeh et al. 2016). A notable recent application uses the full images of galaxies through convolutional neural networks (CNN; Pasquet et al. 2019; Henghes et al. 2022). All these methods provide photoz estimates by using information that is strictly local, that is, the flux of each object measured in a number of photometric bands, independently of correlations with the other galaxies in the sample.
In the specific case when a photometric survey includes spectroscopic redshifts for a representative subsample spread over the same area, these represent additional information, which can be exploited to obtain improved estimates of the missing redshifts. Since galaxies are spatially clustered, angular neighbours on the sky preserve a degree of redshift correlation, depending on the depth of the catalogue. The deeper the catalogue, the weaker the correlation because the projection is made over a deeper baseline. Still, an angular correlation remains, as can be seen explicitly in Fig. 1, in the data of the VIMOS Public Extragalactic Redshift Survey (VIPERS; Guzzo et al. 2014).
This correlation was exploited, for example, to improve our knowledge of the overall sample redshift distribution (Newman 2008), which is a fundamental quantity for many cosmological investigations such as weaklensing tomography. With VIPERS, instead, it was used to estimate the galaxy density field to fill the gaps due to missing redshifts (Cucciati et al. 2014). Even more finely, AragonCalvo et al. (2015) used the fact that galaxies are typically confined within cosmic web structures to obtain a dramatic improvement in the estimate of photozs for ~200 million Sloan Digital Sky Survey galaxies, starting from only about one million spectroscopically measured redshifts.
Our goal with the work presented here has been to optimally retrieve this nonlocal information from the neighbouring objects of a given galaxy building upon a specific class of ML architectures, graph neural networks (GNN). The key property of this class is the ability to combine information from unstructured data based on our priors of the task at hand (Bronstein et al. 2017). The end goal is to obtain an improved estimate of the galaxy redshift.
As shown by Fig. 1, the existing correlation between angular neighbours is strongly diluted by the sea of chance superpositions along the line of sight. Thus, the problem can be more appropriately recast into quantifying the probability that a given angular neighbour (with known redshift) is a physical companion for a given galaxy and thus is closely correlated in redshift as well. Our GNN model, dubbed NezNet, combines the intrinsic features of a target galaxy and a neighbour, that is, their multiband fluxes, the spectroscopic redshift of the neighbour, and their relative angular distance, to output the probability for the two galaxies to be spatially correlated. We trained and tested NezNet using the spectroscopic sample of VIPERS. We show that discarding targets for which no real physical neighbour is identified with significant probability improves the quality of the associated photoz catalogue obtained through classic SED fitting, increasing precision and accuracy and reducing the fraction of catastrophic outliers. Moreover, when real neighbours are identified, the redshift of the highestprobability neighbour represents an estimate of the target redshift that is typically more precise than that obtained through the classical SED fitting.
The idea of using GNNs to draw additional redshift information from neighbouring galaxies is not new. Beck & Sadowski (2019) presented preliminary results of an approach based on using only the photometry of a neighbourhood of galaxies, obtaining a 10% improvement on the median absolute deviation of the photozs estimated via a single objectbased ML algorithm. The main shortcoming of methods that are based on apparent neighbours lies in the large fraction of chance superpositions, as evident in Fig. 1. Here, we reformulated the problem as a detection task that identifies the physical neighbours of the surrounding spectroscopic objects, also including the neighbour’s spectroscopic information. In this way, we obtain a significant improvement.
The paper is organised as follow. In Sect. 2, we give a brief description of how GNNs work and specify the architecture of our model. In Sect. 3, we describe the properties of VIPERS data and the way we prepared the training set, in particular, how we defined real or apparent neighbouring objects. Section 4 describes how the model is applied to the data and the metrics we used to quantify the performance of the results. Finally, in Sect. 5 we present and discuss our results, and we conclude in Sect. 6.
Fig. 1 Correlation between the galaxy redshift and that of its nth nearest angular neighbour (n = {1,2,3,4}, left to right), as seen in the VIPERS redshift survey data, which cover the range 0.5 < z < 1.2. Clearly, while a tight correlation exists for a number of objects, many other angular pairs just correspond to chance superpositions. 
2 Model
A neural network model can be summarised as a set of nonlinear functions applied to a set of inputs that undergo a linear mapping. Each mapping has many parameters that are optimised through a training process that allows the network model to approximate a wide variety of almost arbitrary functions (LeCun et al. 2015). In its simplest form, a neural network model corresponds to a multilayer perceptron (MLP), also known as dense neural network (Murtagh 1991). For images, neural architectures such as CNN are more suited because they take our a priori knowledge about the data structure into account (O’Shea & Nash 2015).
This reasoning can be pushed further by introducing neural networks for graph representations (Zhou et al. 2018). In this work, we make use of one key aspect of GNN, that is, message passing (Gilmer et al. 2017). To fix ideas, the problem we wish to address is the following: we need to find the spectroscopic galaxies with the highest probability of being close to a galaxy for which only photometric information is available. This can be recast as a classification task for each pair of galaxies, in which our aim is to distinguish between apparent and real neighbours when projected on the plane of the sky.
Intuitively, a model that distinguishes between apparent and real neighbours should be based on the relative difference between galaxy features. A neural network like this can be designed by including a layer of the form $${x}_{i}^{\prime}={\displaystyle \sum _{j\in \mathcal{N}\left(i\right)}h\left({x}_{i},{x}_{i}{x}_{j}\right),}$$(1)
where x_{i} refers to the array of input features of the node i, 𝒩(i) is the neighbourhood of the same node, Σ is the aggregation function that sums the outcomes from each pair of nodes. The function h is an MLP that explicitly combines the value of the input feature at the node and the relative difference of that feature with respect to the neighbour. It is worth noting that this GNN is both permutation equivariant and permutation invariant, so that it is not affected by a change in the order of the nodes, that is, the input galaxies.
The complete architecture of our model is illustrated in Fig. 2. Each node is a galaxy, whose inputs (e.g. the photometric measurements) were preprocessed through an MLP before undergoing the message passing of Eq. (1). We restricted ourselves to the case of galaxy pairs, so that the neighbourhood 𝒩(j) includes only one galaxy, and the aggregation function simply sums the features ${x}_{1}^{\prime}+{x}_{2}^{\prime}$. This model can be seen as a trivial version of EdgeConv (Wang et al. 2018), where the adjacency matrix is a 2 × 2 matrix, with 0 entries for diagonal elements and 1 for the offdiagonal elements. Finally, the summed features undergo a last dense layer with a scalar output. All the activation functions are rectified linear units, with the exception of the final layer, where we used a sigmoid, to represent a probability for our classification task.
We call this classification model Nearestz Network (NezNet). NezNet provides the probability for a pair of galaxies to be real neighbours. The loss function adopted to train NezNet is a standard binary cross entropy, $$\mathcal{L}=\frac{1}{n}{\displaystyle \sum _{i}^{n}\left[{y}_{i}\mathrm{log}{p}_{i}+\left(1{y}_{i}\right)\mathrm{log}\left(1{p}_{i}\right)\right]},$$(2)
where p_{i} is the output probability of NezNet for each galaxy pair, while y_{i} = 0, 1 is the corresponding training label, and the sum is averaged over the minibatch. To design our model, we made use of the Spektral library^{1} (Grattarola & Alippi 2020), where the EdgeConv layer is conveniently already implemented.
Fig. 2 Schematic architecture of NezNet. The input features are first processed by a dense network. Message passing between the two layers through Eq. (1) is then applied to take the relative differences and global values of the features into account. Before the final dense layer, the features are summed and then reprocessed with an MLP to output the score probability of two galaxies being actual neighbours. 
3 Data
We trained and tested our approach on the final data release of VIPERS (Guzzo et al. 2014; Scodeggio et al. 2018), for which the redshift correlation between angular neighbours is shown in Fig. 1. The survey used the VIMOS multiobject spectrograph at the ESO Very Large Telescope to target galaxies brighter than i_{AB} = 22.5 in the CanadaFranceHawaii Telescope Legacy Survey Wide (CFHTLSWide) catalogue, with an additional (r − i) vs. (u − g) colour preselection to remove objects at z < 0.5. The resulting sample covers the redshift range 0.5 ≲ z ≲ 1.2, with an effective sky coverage of 16.3 deg^{2}, split over the W1 and W4 fields of CFHTLSWide. We used only galaxies with secure redshift measurements, as identified by their quality flag, corresponding to a 96. 1% confidence level (see Scodeggio et al. 2018).
For each galaxy in the catalogue, the following information was considered: the spectroscopic redshift measurement z_{spec}, the six magnitudes u, g, r, i, z (not to be confused with redshift) and K_{s}, the right ascension α (RA), in radians, and the declination δ (Dec), in radians.
The angular separation on the sky between two objects with RA α_{1} and α_{2} and Dec δ_{1} and δ_{2} is given by the haversine formula, $$\text{\Delta \Theta}=\text{arccos}\left(\mathrm{sin}{\delta}_{1}\mathrm{sin}{\delta}_{2}+\mathrm{cos}{\delta}_{1}\mathrm{cos}{\delta}_{2}\mathrm{cos}\left({\alpha}_{1}\alpha 2\right)\right).$$(3)
We selected the parent photometric sample by applying the same VIPERS colour and magnitude cuts defined above, so as to be fully coherent with the spectroscopic data.
4 Application
We set up a training set from the VIPERS W1 galaxy catalogue. We randomly selected about 3 × 10^{4} target galaxies, whose spectroscopic redshift during training was ignored. For each of them, we identified the first n_{NN} angular nearest neighbours as defined by Eq. (3), which we called spectroscopic galaxies because their spectroscopic redshift information was used in our model. Each of these spectroscopic neighbours was associated with the same target galaxy, but the pairs can be considered as independent from one another in our model. Each angular pair was assigned label 1 when it was a real physical pair, otherwise, it was assigned a 0. The training set was thus made of galaxy pairs.
A target galaxy of a pair can also be the nearest neighbour of another target galaxy in another pair. We made this choice in order to maximise the number of training examples available in W1. Our final tests on the W4 catalogue show that this does not lead to any overfitting of VIPERS data, as the model generalises well. We note that this setting assumes a ratio of spectroscopic to photometric objects of 1 : 1. In the Conclusions section (Sect. 6), we also confirm these results in the more realistic case in which the number of spectroscopic redshifts used for training are a fraction of the number of photometric objects.
The definition of a real neighbour is arbitrary; it is reasonable to consider that two angular neighbours form a physical pair when their spectroscopic separation is smaller than a given threshold, $$\text{\Delta}z\left(1+{z}_{\text{spec}}\right).$$(4)
This means that in setting up the training data, there are two hyperparameters, the number of nearest neighbours n_{NN} to be considered, and the spectroscopic separation Δz. As we show below, these two hyperparameters can affect the results significantly, and it is thus relevant to set them up wisely, depending on the specific survey.
For each galaxy in the pairs, the input features of the nodes in NezNet are the photometry, the spectroscopy, and the angular position, as listed in Sect. 3. For the target galaxy, we always set z_{spec} = 0, so that the model considered it as a missing feature, while providing its value for the neighbouring galaxy. Magnitudes were normalised to the range [0,1], as computed over the whole VIPERS dataset. The angular inputs were provided in terms of relative distance with respect to the target galaxy, so that ΔΘ = 0 for the latter, while for the neighbour, it corresponded to Eq. (3). By adopting this choice, we guaranteed that the model has translational invariance.
Another tested option (see Sect. 6) is to use the relative distance in the two sky coordinates RA and Dec as input variables instead of the angular separation of the two galaxies. This choice arises because the surface distribution of the sample is not rotationally invariant on the sky because of the technical setup of the slits in the VIMOS focal plane, with the spectral dispersion oriented along the declination direction. As spectra must not overlap on the detector, targets need to be separated in Dec much more than in RA. As a result, the minimum separation is ~1.9 arcmin in Dec and 5 arcsec in RA. More details can be found in Bottini et al. (2005) and Pezzotta et al. (2017, see their Sect. 4.1). Our experiments show that providing the model with the angular separation ΔΘ introduces a bias in the redshift metrics, which is not observed when the relative separations along RA and Dec are given. In general, however, we find that the separation information does not significantly improve the classifier, and for this reason, we did not use it in our final model. Spatial information instead comes only from the number of nearest neighbours considered.
The other hyperparameters of the model, that is, the batch size, number of neurons, and learning rate, have a far weaker impact than Δz and n_{NN}, and were set to fiducial values: a batch size of 32, a learning rate of 0.001, and a total number of parameters of the order of a few thousands. We find little difference in the output metrics of the redshift estimates when the complexity of the model is increased, or when the batch size and the learning rate are changed around these fiducial values.
NezNet gives as output the probability for two galaxies to be real neighbours. As each target galaxy corresponds to n_{NN} independent pairs, we can select the neighbour with the highest probability among them. If this probability is below the classification threshold set to define a positive case, we conclude that there is no physical neighbour for that target galaxy in the catalogue. This implies that the probability for the latter is too high to be an outlier in terms of its properties when compared to its neighbours. Removing these objects from the final catalogue significantly improves the metrics when comparing photoz and spectroscopic measurements. In particular, the reduction in the number of catastrophic redshifts confirms our assumption. Finding a true neighbour instead reinforces the confidence in the photoz. At the same time, the spectroscopic redshift of the neighbour in this case is typically an even better estimate of the target redshift than the SEDestimated photoz. These tests are discussed in the following section.
The quantitative comparison between NezNet results, spectroscopic measurements ${z}_{\text{spec}}^{\left(i\right)}$, and SEDfitting estimated photozs was performed using the metrics defined in Salvato et al. (2019). These are the precision (i.e. the dispersion of the estimated values), $$\sigma =\sqrt{\frac{1}{N}{\displaystyle \sum _{i}^{N}{\left(\frac{{z}_{\text{spec}}^{\left(i\right)}{z}^{\left(i\right)}}{1+{z}_{\text{spec}}^{\left(i\right)}}\right)}^{2}}},$$(5)
the bias $$b=\frac{1}{N}{\displaystyle \sum _{i}^{N}\left({z}_{\text{spec}}^{\left(i\right)}{z}^{\left(i\right)}\right),}$$(6)
and the absolute bias $$\leftb\right=\frac{1}{N}{\displaystyle \sum _{i}^{N}\left{z}_{\text{spec}}^{\left(i\right)}{z}^{\left(i\right)}\right,}$$(7)
quantifying systematic deviations. Finally, the outliers are defined as objects for which $$\left{z}_{\text{spec}}^{\left(i\right)}{z}^{\left(i\right)}\right\ge 0.15\left(1+{z}_{\text{spec}}^{\left(i\right)}\right).$$(8)
All the results presented in the following section were obtained by applying the trained NezNet to a test catalogue built in a similar fashion to W1, randomly selecting about 2 × 10^{4} galaxies from the twin W4 field of VIPERS.
Finally, in the following discussion about our classifier, we use the notion of the true positive rate (TPR), which is the fraction of correctly predicted positive examples with respect to all the real positive examples. It is defined as $$\text{TPR}=\frac{\text{TP}}{\text{TP}+\text{FN}},$$(9)
where TP stands for true positives and FN stands for false negatives. Similarly, we can define the false positive rate (FPR), which is the fraction of negative examples classified as positives with respect to all the real negative examples, which reads $$\text{FPR}=\frac{\text{FP}}{\text{FP}+\text{TN}},$$(10)
where FP stands for false positives and TN stands for true negatives.
Fig. 3 Central galaxy spectroscopic redshift versus its photometric redshift measured with and without NezNet. The left panel shows the distribution of photometric vs. spectroscopic estimates in the original data. In the middle panel, we show the same distribution after removing the galaxies with low score probability from the catalogue (fr stands for the fraction of retained data). Finally, the right panel shows redshift estimates by assigning the spectroscopic redshift of the neighbour with the highest detection probability to the target galaxy. The model was trained with n_{NN} = 30 and Δz = 0.08. 
Fig. 4 Same as Fig. 3, but the model was trained with the higher Δz = 0.15, while n_{NN} = 30 is the same as before. 
5 Results
As explained in the previous section, NezNet can be used to simply clean a photoz sample by discarding lowprobability neighbours or to provide an alternative redshift estimate derived from the highestprobability neighbour. This is demonstrated on the test catalogue in Fig. 3 for a model trained using the hyperparameters Δz = 0.08 and n_{NN} = 30. In addition to the VIPERS spectroscopic redshifts, this comparison also includes the original photozs estimated by Moutard et al. (2016) using standard SED fitting. For these and all following results, angular information (i.e. the separation of the two objects on the sky) was not used as an input variable. The reason for this was already mentioned in the previous section, and is discussed again in more detail below.
Figure 3 shows that by simply dismissing the outliers as identified by NezNet, all the metrics improve significantly (central panel). Moreover, when the best neighbour redshifts are adopted for the target galaxies (right panel), we obtain metrics that are comparable to or even better than those of the cleaned photoz sample. It is worth noting that in this case, the plot shows a characteristic checkerboard pattern because the spectroscopic redshift striping is reflected, as spectroscopic redshifts are now assigned to target photometric objects.
Figure 3 also shows the limits of the method. Comparing the left panel with the other two, we can note that NezNet tends to cut off the highredshift tail of the distribution. This is easily understood considering the magnitudelimited (i_{AB} < 22.5) character of the sample used here, which becomes very sparse at z ≳ 1, where only rare luminous galaxies are present. This means that the model becomes intrinsically less efficient because fewer real physical neighbours are available both for the training and for inference, as is also evident from the density of points at high redshift in Fig. 1. Devising a different loss function to upweight the few physical pairs in this regime might improve the classification task, but an intrinsic limit to the method clearly exists when the density of the sample decreases.
Figure 4 shows the same set of plots, but using a higher value for the spectroscopic separation in the training, that is, Δz = 0.15. As expected, allowing for a larger separation in the definition of real angular neighbours discards fewer data. Conversely, there is in general a lower precision and a small increase in the fraction of outliers.
In principle, using a stricter Δz could remove even more outliers, retaining only pairs that are closer in redshift and leading to a smaller, but more precise subsample. We explore this dependence in Fig. 5. Overall, this method is always able to clean poor estimates from the sample, but at the price of discarding many data points. The minor improvement in precision probably does not justify the use of Δz < 0.08 in the case of VIPERS, because more than half of the sample is excluded.
It is apparent that the hyperparameter Δz is very relevant for the quality of the classifier. This is made clear by the receiver operator characteristic (ROC) curve in Fig. 6, which shows the TPR (Eq. (9)) against the FPR (Eq. (10)), and has been computed from the target galaxies in the test catalogue by considering their neighbour with the highest probability. In general, the area under the curve (AUC) is higher for the better classifier. Increasing Δz increases the AUC, which would tend to unity for very high values of this parameter, as all galaxies would then be considered real neighbours. However, our ultimate goal is not to increase the performance of the classifier per se, but to improve the metrics of our redshift estimates. These show that Δz ≳ 0.08 represents the best choice for VIPERS.
The other hyperparameter of NezNet, that is, n_{NN}, the number of nearest neighbours considered in the training, has a weaker impact on the classifier. We show this in Fig. 7, where each ROC curve corresponds to a model trained with a different n_{NN}, but all with the same Δz. A drastic change in n_{NN} does not correspond to comparable chang es in the AUC. However, n_{NN} has a large impact on the redshift estimates, as Fig. 8 shows. A larger number of angular neighbours increases the probability of finding a physical pair, as is shown by the metrics in Fig. 7. We also experimented with a higher value of n_{NN} up to 50, but found no further gain with respect to using n_{NN} = 30. The redshift metrics start to saturate to the optimal values already above n_{NN} = 10.
As a further test, we also computed the gradients of the predictions with respect to their input variables to detect the most relevant ones, as shown in Fig. 9. It is interesting to see that the neighbour redshift is a relevant input, as expected, and some of the photometric bands are even more relevant. This confirms the intuition that the photometric information of the neighbours does indeed provide additional information about the relative distance from the target. In this plot, we also show results for the case when the angular separation is considered as one of the input variables. These results show that the angular separation ΔΘ between the target and the neighbour does affect the predictions. This manifests itself as a bias in the redshift estimates, as visible in Fig. 10: in this case, NezNet systematically favours neighbours that are closer to us than the target, increasing the value of the bias b (Eq. (6)). We also tested what happens when the angular separation information is rather given in terms of the relative difference in the angular coordinates RA and Dec of the two galaxies. In this case, the bias disappears and the results are comparable to the standard case in which no angle information is provided. However, in this case, the two parameters clearly have smaller gradients than when ΔΘ alone is considered, which suggests that they do in fact not contribute to the predicting power of the model. For these reasons, the angular separation is not considered as input variable in our final results.
One of the novelties of NezNet is the message passing between node features. This is where GNNs differ from a standard ANN, where all input variables of both galaxies would be provided directly to dense layers. We also experimented with a simpler graph model, closely resembling the architecture of NezNet, but without message passing. The input features were processed independently by MLP layers for each node (we tried using either just one or several layers). The new architecture is as in Fig. 2, with the exception of h function blocks, which are now substituted with new MLP blocks, without applying any message passing. The ${x}_{i}^{\prime}$ features are summed by the aggregation function, and the summed features are mapped to the output probability through final dense layers with sigmoid activation output, just like in the model with message passing. This kind of model, which maintains the permutation invariance property of a graph, is often referred to as a deep set (Zaheer et al. 2017). We find that this simple model still works remarkably well and is comparable to NezNet in general. However, it systematically cuts off the highredshift tail of the catalogue (Fig. 11), even though the overall metrics remain good.
Fig. 5 Redshift estimates derived from the best nearest neighbour for various Δz at fixed n_{NN} = 30. Increasing the spectroscopic separation to define physical neighbours while diminishing the quality of the metrics increases the fraction of data that are not dismissed from the catalogue. 
Fig. 6 ROC curve for a varying redshift threshold Δz at fixed n_{NN} = 30. The performance of our classifier (AUC) improves as we use a less strict definition of what we define as a true neighbour. The probability that an angular neighbour is a physical neighbour increases at larger Δz, which is also reflected by the high detection threshold (thr). 
Fig. 7 ROC curve for a varying number of nearest neighbours n_{NN} = 30 at fixed Δz = 0.08. Increasing the number of neighbours that are given in input to the training seems to make the training more difficult. However, this test of the classifier does not reflect the quality of the final redshift estimate, as Fig. 8 shows. 
Fig. 8 Redshift estimates based on the best nearest neighbour for various n_{NN} at fixed Δz = 0.08. Increasing the number of nearest neighbours for each target improves the performance of NezNet in estimating redshiſts, as it increases the probability that physical pairs are considered. 
Fig. 9 Average absolute values of the gradients of NezNet with respect to the input features of the neighbours. For each target, we only considered the neighbour with the highest probability. 
Fig. 10 Results of redshift estimates for the target galaxies, in the case where the angular separation Eq. (3) is an explicit input of the model. Many galaxies have slightly lower values than the real spectroscopic value, resulting in a large bias b. Currently, we do not have an explanation of this observed effect. 
Fig. 11 Comparison of the redshift distribution for the predictions of NezNet, and a simpler graph model without message passing. While the latter performs reasonably well in general, it tends to cut the tail of the distribution. 
6 Conclusions
We have presented a new ML model, dubbed NezNet, which for a pair of galaxies takes as input their measured fluxes in a number of bands together with the redshift of one of the two galaxies. NezNet is capable of probabilistically learning whether their redshift distance is below a given threshold Δz, which is set as a hyperparameter of the model. The angular separation between the galaxies is implicit in the training set, as for every target galaxy we select its first n_{NN} angular neighbours (another hyperparameter), but it can be an explicit input variable of the model. The backbone of the model is a GNN, a class of neural networks based on message passing and the aggregation of features (Fig. 2). This message passing is explicitly performed as a relative difference between features (Eq. (1)).
NezNet outputs the score probability for a galaxy pair to be real neighbours. This information that can be used in two ways. On the one hand, if none of the n_{NN} nearest neighbours is identified as a physical neighbour, the target galaxy can be considered an outlier in terms of its properties. This may suggest that it is an interloper, that is, a foreground or background object with respect to the volume sampled by the spectroscopic sample we used for the comparison. It should therefore be discarded from any sample that aims to cover the same redshift range as the spectroscopic catalogue, for instance, via photometrically estimated redshifts. We have proved this to be true using the VIPERS catalogue. On the other hand, if a physical neighbour is identified, the target galaxy can be assigned the spectroscopic redshift of the highest scoring galaxy among the n_{NN} angular neighbours, providing an independent estimate of its redshift in this way.
These results are summarised in Figs. 3 and 4: when outliers as detected by NezNet are discarded, all the metrics of the sample improve considerably. Moreover, the NezNet redshift estimates are comparable to or superior in precision to SEDbased photometric redshifts, depending on the values chosen for the hyperparameters. Increasing Δz increases the goodness of the classifier (Fig. 6), as well as the fraction of retained data (Fig. 5). Changing n_{NN} has a smaller impact on the classifier (Fig. 7), although it significantly affects the redshift quality metrics because a large enough n_{NN} improves the probability of detecting a real neighbour; a value n_{NN} ~ 30 is optimal in the case of VIPERS (Fig. 8).
It is often the case that the fraction of the parent photometric sample without a spectroscopic measurement has a higher density than the spectroscopic sample. VIPERS indeed has a spectroscopic surface density of Σ ~ 6 × 10^{3}/deg^{2}, to compare against the photometric surface density Σ_{ph} ~ 45 × 10^{3}/deg^{2}. For this reason, we tested NezNet by varying the surface density of the spectroscopic sample used during training. We achieved this by repeating the training procedure on a uniformly subsampled catalogue extracted from W1. The test was performed on W4 without any subsampling, so that we tested for the effectiveness of NezNet trained on a lowerdensity catalogue. Figure 12 shows that NezNet keeps its effectiveness even when using a subsample of oneeighth of the original spectroscopic density Σ, similar to the VIPERS ratio of spectroscopic to photometric objects.
This suggests that NezNet could have an interesting potential also in the context of future experiments, such as Euclid or the NASA Nancy Grace Roman mission (Akeson et al. 2019). These slitless spectroscopic surveys will indeed naturally deliver overlapping photometric and spectroscopic data, which can be combined using NezNet to improve photometric redshift estimates.
It is worth stressing that some details of the results presented here depend on the specific features of VIPERS and its parent CFHTLS photometric sample. Some of them may have been advantageous, but others could have penalised the success of the method. For example, the slitplacement constraints in VIPERS limits the ability to target close galaxy pairs, which introduces a shadow in the layout of a VIMOS pointing (see Fig. 6 of Guzzo et al. 2014), and forces a lower limit in the separation of observable galaxy pairs (see Sect. 4). This means that the training sample of NezNet was not ideal in our analysis because surely many of the missed angular pairs were also physical pairs. This increases our confidence in the obtained results because it shows that for samples that are characterised by smallscale incompleteness, as is typical of surveys built using fibre or multislit spectrographs, the method still also delivers very useful results. In the case of the VIPERS data, an interesting exercise in this respect would be to use the data from the VLTVIMOS Deep Survey (VVDS; Le Fèvre et al. 2005) as training sample, which used the same spectrograph, but with repeated passes over the same area of 0.5 deg^{2} that substantially mitigate the proximity bias. We leave this exercise for a future work.
Fig. 12 Redshift estimates based on the best nearest neighbour, obtained by uniformly subsampling the W1 catalogue, at fixed n_{NN} = 30 and Δz = 0.08. The titles of the panels refer to the surface density of spectroscopic objects of W1 used for training, with Σ referring to the complete W1 sample. Except for minor fluctuations in the redshift statistics, NezNet maintains a performance similar to the case without subsampling. The only noticeable trend is the fraction of central galaxies for which a physical pair is found, which decreases for lower densities. This could be due to the decreasing number of available training data. The percentage of real physical neighbours for a central galaxy, which decreases only slightly from Σ to Σ/8, remains around 40% and explains why NezNet is still effective. 
Acknowledgements
We thank Davide Bianchi for useful suggestions during the development of this work. FT and MSC are thankful to Daniele Grattarola for insightful discussions on GNNs and the use of the Spektral library. We thank the anonymous referee for his comments and suggestions. FT and LG acknowledge financial support by grant MUR PRIN 2017 ‘From Darklight to Dark Matter’, grant no. 20179P3PKJ. LG and MSC acknowledge financial support from the Italian Space Agency, ASI agreement no. I/023/12/0.
References
 Akeson, R., Armus, L., Bachelet, E., et al. 2019, ArXiv eprints [arXiv:1902.05569] [Google Scholar]
 Alam, S., Ata, M., Bailey, S., et al. 2017, MNRAS, 470, 2617 [Google Scholar]
 Alarcon, A., Gaztanaga, E., Eriksen, M., et al. 2021, MNRAS, 501, 6103 [NASA ADS] [CrossRef] [Google Scholar]
 AragonCalvo, M. A., van de Weygaert, R., Jones, B. J. T., & Mobasher, B. 2015, MNRAS, 454, 463 [CrossRef] [Google Scholar]
 Arnouts, S., Moscardini, L., Vanzella, E., et al. 2002, MNRAS, 329, 355 [Google Scholar]
 Bautista, J. E., Paviot, R., Vargas Magaña, M., et al. 2021, MNRAS, 500, 736 [Google Scholar]
 Beck, R., Sadowski, P., Glaser, Y., & Szapudi, I. 2019, Refined Redshift Regression in Cosmology with Graph Convolution Networks, https://ml4physicalsciences.github.io/2019/files/NeurIPS_ML4PS_2019_80.pdf [Google Scholar]
 Benitez, N., Dupke, R., Moles, M., et al. 2014, ArXiv eprints [arXiv:1403.5237] [Google Scholar]
 Blake, C., Brough, S., Colless, M., et al. 2011, MNRAS, 415, 2876 [NASA ADS] [CrossRef] [Google Scholar]
 Bolzonella, M., Miralles, J. M., & Pelló, R. 2000, A&A, 363, 476 [NASA ADS] [Google Scholar]
 Bottini, D., Garilli, B., Maccagni, D., et al. 2005, PASP, 117, 996 [NASA ADS] [CrossRef] [Google Scholar]
 Brescia, M., Cavuoti, S., Razim, O., et al. 2021, Front. Astron. Space Sci., 8 [Google Scholar]
 Bronstein, M. M., Bruna, J., LeCun, Y., Szlam, A., & Vandergheynst, P. 2017, IEEE Signal Process. Mag., 34, 18 [CrossRef] [Google Scholar]
 Cagliari, M. S., Granett, B. R., Guzzo, L., et al. 2022, A&A, 660, A9 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Carliles, S., Budavári, T., Heinis, S., Priebe, C., & Szalay, A. S. 2010, ApJ, 712, 511 [NASA ADS] [CrossRef] [Google Scholar]
 Colless, M., Peterson, B. A., Jackson, C., et al. 2003, ArXiv eprints [arXiv:astroph/0306581] [Google Scholar]
 Collister, A. A., & Lahav, O. 2004, PASP, 116, 345 [NASA ADS] [CrossRef] [Google Scholar]
 Cucciati, O., Granett, B. R., Branchini, E., et al. 2014, A&A, 565, A67 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 de la Torre, S., Jullo, E., Giocoli, C., et al. 2017, A&A, 608, A44 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 DESI Collaboration (Aghamousa, A. et al.) 2016, ArXiv eprints [arXiv:1611.00036] [Google Scholar]
 Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., & Dahl, G. E. 2017, ArXiv eprints [arXiv:1704.01212] [Google Scholar]
 Grattarola, D., & Alippi, C. 2020, ArXiv eprints [arXiv:2006.12138] [Google Scholar]
 Guzzo, L., Scodeggio, M., Garilli, B., et al. 2014, A&A, 566, A108 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Henghes, B., Thiyagalingam, J., Pettitt, C., Hey, T., & Lahav, O. 2022, MNRAS, 512, 1696 [NASA ADS] [CrossRef] [Google Scholar]
 Ilbert, O., Arnouts, S., McCracken, H. J., et al. 2006, A&A, 457, 841 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Lahav, O. 1994, Vistas Astron., 38, 251 [NASA ADS] [CrossRef] [Google Scholar]
 Laigle, C., McCracken, H. J., Ilbert, O., et al. 2016, ApJS, 224, 24 [Google Scholar]
 Laureijs, R., Amiaux, J., Arduini, S., et al. 2011, ArXiv eprints [arXiv:1110.3193] [Google Scholar]
 Le Fevre, O., Vettolani, G., Garilli, B., et al. 2005, A&A, 439, 845 [CrossRef] [EDP Sciences] [Google Scholar]
 LeCun, Y., Bengio, Y., & Hinton, G. 2015, Nature, 521, 436 [Google Scholar]
 Maraston, C. 2005, MNRAS, 362, 799 [NASA ADS] [CrossRef] [Google Scholar]
 Masters, D., Capak, P., Stern, D., et al. 2015, ApJ, 813, 53 [Google Scholar]
 Moutard, T., Arnouts, S., Ilbert, O., et al. 2016, A&A, 590, A102 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Murtagh, F. 1991, Neurocomputing, 2, 183 [CrossRef] [Google Scholar]
 Newman, J. A. 2008, ApJ, 684, 88 [Google Scholar]
 Newman, J. A., & Gruen, D. 2022, ARA&A, 60, 363 [NASA ADS] [CrossRef] [Google Scholar]
 O’Shea, K., & Nash, R. 2015, ArXiv eprints [arXiv:1511.08458] [Google Scholar]
 Pasquet, J., Bertin, E., Treyer, M., Arnouts, S., & Fouchez, D. 2019, A&A, 621, A26 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Perlmutter, S., Aldering, G., della Valle, M., et al. 1998, Nature, 391, 51 [CrossRef] [Google Scholar]
 Pezzotta, A., dela Torre, S., Bel, J., et al. 2017, A&A, 604, A33 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Riess, A. G., Filippenko, A. V., Challis, P., et al. 1998, AJ, 116, 1009 [Google Scholar]
 Sadeh, I., Abdalla, F. B., & Lahav, O. 2016, PASP, 128, 104502 [NASA ADS] [CrossRef] [Google Scholar]
 Salvato, M., Ilbert, O., & Hoyle, B. 2019, Nat. Astron., 3, 212 [NASA ADS] [CrossRef] [Google Scholar]
 Scodeggio, M., Guzzo, L., Garilli, B., et al. 2018, A&A, 609, A84 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Tegmark, M., Eisenstein, D. J., Strauss, M. A., et al. 2006, Phys. Rev. D, 74, 123507 [Google Scholar]
 Tonry, J., & Davis, M. 1979, AJ, 84, 1511 [Google Scholar]
 Wang, Y., Sun, Y., Liu, Z., et al. 2018, ArXiv eprints [arXiv:1801.07829] [Google Scholar]
 Zaheer, M., Kottur, S., Ravanbakhsh, S., et al. 2017, ArXiv eprints [arXiv:1703.06114] [Google Scholar]
 Zhou, J., Cui, G., Hu, S., et al. 2018, ArXiv eprints [arXiv:1812.08434] [Google Scholar]
All Figures
Fig. 1 Correlation between the galaxy redshift and that of its nth nearest angular neighbour (n = {1,2,3,4}, left to right), as seen in the VIPERS redshift survey data, which cover the range 0.5 < z < 1.2. Clearly, while a tight correlation exists for a number of objects, many other angular pairs just correspond to chance superpositions. 

In the text 
Fig. 2 Schematic architecture of NezNet. The input features are first processed by a dense network. Message passing between the two layers through Eq. (1) is then applied to take the relative differences and global values of the features into account. Before the final dense layer, the features are summed and then reprocessed with an MLP to output the score probability of two galaxies being actual neighbours. 

In the text 
Fig. 3 Central galaxy spectroscopic redshift versus its photometric redshift measured with and without NezNet. The left panel shows the distribution of photometric vs. spectroscopic estimates in the original data. In the middle panel, we show the same distribution after removing the galaxies with low score probability from the catalogue (fr stands for the fraction of retained data). Finally, the right panel shows redshift estimates by assigning the spectroscopic redshift of the neighbour with the highest detection probability to the target galaxy. The model was trained with n_{NN} = 30 and Δz = 0.08. 

In the text 
Fig. 4 Same as Fig. 3, but the model was trained with the higher Δz = 0.15, while n_{NN} = 30 is the same as before. 

In the text 
Fig. 5 Redshift estimates derived from the best nearest neighbour for various Δz at fixed n_{NN} = 30. Increasing the spectroscopic separation to define physical neighbours while diminishing the quality of the metrics increases the fraction of data that are not dismissed from the catalogue. 

In the text 
Fig. 6 ROC curve for a varying redshift threshold Δz at fixed n_{NN} = 30. The performance of our classifier (AUC) improves as we use a less strict definition of what we define as a true neighbour. The probability that an angular neighbour is a physical neighbour increases at larger Δz, which is also reflected by the high detection threshold (thr). 

In the text 
Fig. 7 ROC curve for a varying number of nearest neighbours n_{NN} = 30 at fixed Δz = 0.08. Increasing the number of neighbours that are given in input to the training seems to make the training more difficult. However, this test of the classifier does not reflect the quality of the final redshift estimate, as Fig. 8 shows. 

In the text 
Fig. 8 Redshift estimates based on the best nearest neighbour for various n_{NN} at fixed Δz = 0.08. Increasing the number of nearest neighbours for each target improves the performance of NezNet in estimating redshiſts, as it increases the probability that physical pairs are considered. 

In the text 
Fig. 9 Average absolute values of the gradients of NezNet with respect to the input features of the neighbours. For each target, we only considered the neighbour with the highest probability. 

In the text 
Fig. 10 Results of redshift estimates for the target galaxies, in the case where the angular separation Eq. (3) is an explicit input of the model. Many galaxies have slightly lower values than the real spectroscopic value, resulting in a large bias b. Currently, we do not have an explanation of this observed effect. 

In the text 
Fig. 11 Comparison of the redshift distribution for the predictions of NezNet, and a simpler graph model without message passing. While the latter performs reasonably well in general, it tends to cut the tail of the distribution. 

In the text 
Fig. 12 Redshift estimates based on the best nearest neighbour, obtained by uniformly subsampling the W1 catalogue, at fixed n_{NN} = 30 and Δz = 0.08. The titles of the panels refer to the surface density of spectroscopic objects of W1 used for training, with Σ referring to the complete W1 sample. Except for minor fluctuations in the redshift statistics, NezNet maintains a performance similar to the case without subsampling. The only noticeable trend is the fraction of central galaxies for which a physical pair is found, which decreases for lower densities. This could be due to the decreasing number of available training data. The percentage of real physical neighbours for a central galaxy, which decreases only slightly from Σ to Σ/8, remains around 40% and explains why NezNet is still effective. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.