Deep learning for Sunyaev-Zel’dovich detection in Planck

,


Introduction
In the past decades, new statistical developments have begun to play an important role in data reduction and data analysis. Particularly, the studies involving machine-learning algorithms have increased exponentially because they are very efficient in identifying commonalities in a large amount of data, as well as in detecting very faint and/or very complex patterns. The machine-learning domain knows two main families of algorithms: the unsupervised, and the supervised families. In the first case, algorithms are designed to work on unlabelled data. This is the case for some clustering algorithms such as k-means (MacQueen 1967), soft k-means (Bezdek 1981), or Gaussian mixture models (GMM, Dempster et al. 1977), for density estimation algorithms such as generative adversial networks (GAN, Goodfellow et al. 2014), or for dimensionality reduction algorithms such as autoencoders (Kramer 1991), self-organizing maps (SOM, Kohonen 1982), or principal curves and manifold learning (Hastie & Stuetzle 1989). In the second case, machinelearning algorithms are designed to estimate properties or labels, based on inputs and outputs, which are both provided by the user. The user must in this case have a perfect knowledge of the labels or of the properties of reference that is used as output in the training catalogue. This type includes algorithms such as artificial neural networks (ANN, White & Rosenblatt 1963), random forests (RF, Ho 1995), and support vector machine (SVM, Hearst 1998). Some algorithms, with very complex architectures of superposed layers, may enter the category of deep-learning (DL) algorithms, as this is the case for ANN and convolutional neural networks (CNN, Fukushima 1980).
Machine-learning algorithms have previously been applied successfully in astronomy, astrophysics, and cosmology (e.g. Baron 2019, for a review). For example, unsupervised algorithms are used to reconstruct the cosmic web (e.g. Bonnaire et al., in prep.), while supervised machine-learning algorithms, such as ANN or RF, have been used to estimate galaxy redshifts or galaxy types (e.g. Bilicki et al. 2014Bilicki et al. , 2016Krakowski et al. 2016;Siudek et al. 2018;Bonjean et al. 2019), to estimate spectral properties of sources (e.g. Ucci et al. 2018), to classify sources (e.g. Aghanim et al. 2015), and to search for variable stars (e.g. Pashchenko et al. 2018), as a limited list of examples of applications. More sophisticated algorithms of machine learning, such as DL algorithms, widely improve the results compared to those obtained with physical models. In most cases, the computation time required to estimate the results is also significantly reduced. For instance, DL algorithms have previously been used to estimate galaxy morphologies and redshifts (e.g. Huertas-Company et al. 2015;Pasquet et al. 2019;Boucaud et al. 2020), to fit galaxy surface brightness profiles (e.g. Tuccillo et al. 2018), to compare galaxy surveys (e.g. Domínguez Sánchez et al. 2019), to detect cosmic structures (e.g. Aragon-Calvo 2019), to learn the structure formation from initial conditions (e.g. Lucie-Smith et al. 2018;He et al. 2019), or to generate fast cosmic web simulations (e.g. Rodríguez et al. 2018;Ullmo et al., in prep.). For galaxy cluster studies only, ML algorithms have been extensively used to successfully compute their properties, such as their masses (e.g. Ntampaka et al. 2015Ntampaka et al. , 2016Green et al. 2019;Calderon & Berlind 2019;Ho et al. 2019). Therefore, these very powerful algorithms may help us in the near future to handle the huge amount of data the community is collecting, and to answer some open questions in astrophysics and cosmology.
Today, the quest of the missing baryons remains one of the greatest challenges in cosmology (Fukugita et al. 1998, Cen & Ostriker 1999Shull et al. 2012;de Graaff et al. 2019). Baryons are expected to hide in the cosmic filaments between the nodes of the cosmic web (i.e. the galaxy clusters) in a form of a warm-hot inter-galactic medium (WHIM), heated to a temperature of about 10 5 < T < 10 7 K. These baryons are in a phase that is just too hot to be detectable by molecular gas tracers and not hot enough to be detectable in X-rays.
The Sunyaev-Zel'dovich effect (SZ, Sunyaev & Zeldovich 1970, 1972, that is, the inverse Compton scattering by the free electrons in the hot ionised gas that redistributes the energies of the cosmic microwave background (CMB) photons, is an ideal candidate for detecting the WHIM. This effect can be quantified by the Compton parameter, y, where σ T is the Thomson cross-section, m e is the mass of the electron, c is the speed of light, k B is the Boltzmann constant, and n e (l) and T e (l) are the density and temperature of the free electrons along the line of sight, respectively. Proportional to n e × T , the SZ emission is more sensitive to gas with lower pressure than the X-ray emission from Bremsstrahlung process, which is sensitive to n e 2 . The Planck satellite (Tauber et al. 2010) has provided ideal data to capture the signature of the SZ effect because its spectral coverage extends to between 30 GHz and 857 GHz. This results in a negative emission compared to the CMB spectrum below 217 GHz and to a positive emission beyond. Based on two-component separation techniques, that is, the needlet internal linear combination (NILC, Remazeilles et al. 2011) and the modified internal linear combination algorithm (MILCA, Hurier et al. 2013), the Planck collaboration has constructed full-sky maps of the y SZ Compton parameter at a resolution of 10 arcmin, using the six frequencies of HFI (Planck Collaboration XXVII 2016). The first statistical detections of the WHIM in its hottest phase and of part of the missing baryons at low redshift were made using the SZ maps from Planck (Tanimura et al. 2019a,b;de Graaff et al. 2019). The Planck collaboration has also applied matched multi-filters on the six Planck HFI frequency maps to detect hundreds of new galaxy clusters through the SZ effect. These galaxy clusters were later confirmed in the optical (e.g. Planck Collaboration XXVII 2016; Streblyanska et al. 2019). Since then, some studies have shown promising results by increasing the number of Planck SZ cluster sources to about 3500 using new approaches, such as artificial neural networks (Hurier et al. 2017), or by combinations with other wavelengths, for instance in X-ray with the Röntgensatellit (ROSAT; Tarrío et al. 2019). Other studies have aimed at producing new higher resolution and lower noise SZ maps by combining data from Planck and from the Atacama Cosmology Telescope (ACT; Aghanim et al. 2019). New detections of individual clusters or of stacked diffuse gas are still ongoing, which shows that the full potential of the Planck data has not yet been completely exploited.
Studies deriving SZ catalogues or maps have combined the data with prior knowledge (e.g. assuming a generalised Navarro, Frenk, and White GNFW, Nagai et al. 2007;Arnaud et al. 2010 profile) or by degrading the resolutions to the highest angular beam for homogeneity (Planck Collaboration XXII 2016). These biased knowledge priors may prevent us from using the full potential of the Planck data. In this study, we propose the application of DL algorithms on the Planck data to detect SZ sources with low signal-to-noise ratios by training on SZ sources with high signal-to-noise ratios, that is, galaxy clusters. We present the data we used for the training in Sect. 2 and the learning procedure in Sect. 3. In Sects. 4 and 4.4, we present the different results, and we summarise the work in Sect. 5.
The Planck collaboration has provided the community with nine full-sky maps of the sky, all of them publicly available in the Planck Legacy Archive 1 in HEALPIX format (Górski et al. 2005). Because the beam increases with decrease in frequency (especially in the frequencies of the LFI), and because that HFI frequencies encompass the spectrum of the SZ effect well, we have decided to work only with the six maps in the frequencies of HFI, that is, from 100 to 857 GHz.
We also used the latest MILCA Planck y map from 2015 Planck Collaboration XXII (2016). This map is publicly available 2 in HEALPIX format with n side = 2048 and a pixel size of θ pix = 1.7 arcmin.

PSZ2 cluster catalogue
The Planck collaboration has used the six Planck HFI frequency maps with multi-match filters, filtering with GNFW pressure profile model, and taking into account the beam at each frequency and the spectral dependency of the SZ effect. They have implemented three different algorithms for cluster detection: two implementations of the matched multi-filter (MMF1 Herranz et al. 2002 andMMF3 Melin et al. 2006), and PowellSnakes (PwS, Carvalho et al. 2009Carvalho et al. , 2012. With the combined three methods, they detected 1653 galaxy cluster candidates with a signal-to-noise ratio greater than 4.5σ (Planck Collaboration XXVII 2016). The purity of the catalogue is 83% (Planck Collaboration XXVII 2016), leading to about 300 false detections, which are infrared or CO residual sources. For the confirmed galaxy clusters with measured redshifts, mass M 500 are provided in the catalogue, estimated following Planck Collaboration XXIX (2014). The mass and redshift distributions of the PSZ2 clusters with confirmed redshifts are shown in Fig. 1.

MCXC cluster catalogue
Galaxy clusters can also be detected through the hot gas in the X-rays through Bremsstrahlung emission. The ROSAT All-Sky Survey (RASS, Truemper 1982) is the only full-sky survey in X-rays to date. Galaxy clusters detected based on ROSAT were combined to build a meta-catalogue: the Meta-Catalogue of X-ray detected Clusters (MCXC, Piffaretti et al. 2011). The MCXC combines galaxy clusters from RASS-based catalogues (i.e. the Northern ROSAT All-Sky Survey (NORAS, Böhringer et al. 2000), the ROSAT-ESO Flux Limited X-ray Survey (REFLEX, Böhringer et al. 2004), the ROSAT brightest cluster sample (BCS, Ebeling et al. 1998), galaxy clusters around the South Galactic Pole (SGP, Cruddace et al. 2002), galaxy clusters around the North Ecliptic Pole (NEP, Henry et al. 2006), the Massive Cluster Survey (MACS, Ebeling et al. 2001), and the Clusters In the Zone of Avoidance (CIZA, Ebeling et al. 2002), and from ROSAT serendipitous catalogues (i.e. the 160 square degree ROSAT Survey catalogue 160SD, Mullis et al. 2003), the 400 square degree ROSAT Cluster Survey catalogue (400SD, Burenin et al. 2007), the bright SHARC survey cluster catalogue (Romer et al. 2000), the Southern SHARC catalogue (Burke et al. 2003), the WARPS survey catalogues (Perlman et al. 2002;Horner et al. 2008), and the Einstein Extended Medium Sensitivity Survey catalogue (EMSS, Gioia et al. 1990). The MCXC provides a mass M 500 , a radius R 500 , and a redshift z for 1743 galaxy clusters in the all sky. The mass and redshift distributions of MCXC clusters are shown and compared to other SZ and optical catalogues in Fig. 1. The MCXC contains mainly lower mass clusters than the PSZ2 or the RedMaPPer cluster catalogues.

RedMaPPer
The Red-sequence Matched-filter Probabilistic Percolation (RedMaPPer, Rykoff et al. 2014) is an algorithm developed to detect clusters in large galaxy optical surveys, such as the SDSS or the Dark Energy Survey 3 (DES, The Dark Energy Survey Collaboration 2005). Based on the detection of red sequence galaxy over-densities, the algorithm provides positions and redshift probability distributions for the detected clusters, together with membership probabilities assigned to galaxies, and a richness λ related to the number of galaxies in the clusters. Rykoff et al. (2014) have successfully applied RedMaPPer to the SDSS DR8 spectroscopic galaxies and have detected 25 325 galaxy clusters in the redshift range 0.08 < z < 0.55 over approximately 10 500 squared degrees on the sky. The RedMaPPer catalogue has been extensively studied in different wavelengths (e.g. Saro et al. 2015;Hurier & Angulo 2018;Geach & Peacock 2017), allowing the confirmation of the galaxy clusters and measurements of their properties (e.g. their masses). The resdhift z and the mass M 500 distributions are shown in Fig. 1, compared with other catalogues of galaxy clusters detected in different wavelengths that are presented hereafter. For this figure, the scaling relation between the richness and stellar mass M 500 from Saro et al. (2015) was used to compute the mass estimate of the clusters.

Learning procedure
We trained a DL algorithm applied on the Planck HFI frequency maps to detect SZ emission with a low signal-to-noise ratio through SZ emission with a high signal-to-noise ratio that comes from the hot gas in galaxy clusters. To do so, we chose as inputs of the ML algorithm small patches of the Planck HFI frequency maps, and as outputs, we selected segmentation maps that show the positions of the clusters in the patches. The trained model thus provides an SZ prediction map, between 0 and 1, that can be compared with known clusters in a test sample, or with the Planck MILCA SZ map.

Catalogue of clusters
We selected three catalogues of galaxy clusters to construct the segmentation maps that are used as output data for the training of the ML algorithm. First, the PSZ2 catalogue of clusters, to start with the very simplest case: learning Planck with Planck.
To ensure the purity of the catalogue, we selected the 1094 PSZ2 sources that are confirmed galaxy clusters, that is, clusters with measured redshifts. We call this cleaned catalogue the Planck_z catalogue. We call the catalogue of remaining candidates the Planck_no-z catalogue. Second, we chose the MCXC catalogue of X-ray clusters because it contains clusters with lower mass. We selected MCXC clusters that are not included in the Planck catalogue to construct a catalogue with less massive clusters. In the following, we call the 1193 galaxy clusters of the MCXC catalogue that are not included in the PSZ2 catalogue the MCXCwP catalogue. Finally, we selected clusters from the RedMaPPer cluster catalogue from optical data to test the limits of the model and to try to detect very low SZ signals. This catalogue contains clusters with lower mass and higher redshift. We used RedMaPPer with different selections in richness λ Notes. This shows the number of sources and a short description of the five catalogues we used to train the U-net model: the Planck_z, the Planck_no-z, the MCXCwP, the RM 50 , and the RM 30 catalogues.
(relative to the number of galaxies in the clusters), that is, with a different selection in mass. We denote with RM i the selection of the RedMaPPer clusters with the criterion λ > i. In the following, we use the RM 50 and the RM 30 cluster catalogues. We list a summary of the different catalogues in Table 1.

Training set and test set
We used HEALPIX with n side = 2 to split the sky into 48 tiles with equal sizes of 860 square degrees each. The seventh tile was arbitrarily chosen to define a test area. This area is centred on the position (l = 112.5 • , b = 41.81 • ). This area contains 40 clusters from the Planck_z catalogue, 18 from the Planck_no-z catalogue, and 50 from the MCXCwP catalogue. This area, and especially the clusters inside, were used as a test area and a test sample, and none of its 860 square degrees was seen by the model during the training. The training set was based on patches (projected as described in the next section) that were extracted in the remaining 47 tiles of the sky.

Planck patches and segmentations
We extracted from the Planck HFI frequency maps n = 100 000 multi-channel patches of 64 × 64 pixels with a resolution of θ pix = 1.7 arcmin (giving a FoV of 1.83 • × 1.83 • ). These patches were chosen in the sky with random positions and random orientations, but each of the 100 000 patches contained at least one galaxy cluster of the cluster catalogue chosen as output. To construct a training sample as well as a validation sample that must be independent, 10 000 patches over the entire sample of 100 000 patches were chosen so that their positions fell into the pixel numbers 10, 39, and 42 on the HEALPIX map with n side = 2. This split the sky into 48 tiles with equal sizes of 860 square degrees (same method as for the construction of the test area). The positions of the validation sample were split into three regions (i.e. pixels 10, 39, and 42) in the sky to take the noise in the Planck maps into account, which is not uniform. This equals 10% of the entire sample that was set aside for the validation sample, and for which the regions were not seen by the U-net during the training process. We then constructed the segmentation maps associated with the 100 000 patches by drawing circles at the cluster positions. The pixels showing the cluster positions were set to 1, while the pixels in any other regions were set to 0. The diameter of the circles showing the galaxy cluster positions was arbitrarily set to 5 arcmin, which is the size of the smallest beam in the Planck HFI frequency maps (at 857 GHz). This fixed size acted like a filter and probably induced a bias in the reconstruction of the SZ sources in the SZ prediction maps, preventing us from computing any reliable flux. However, the information up to a resolution of 5 arcmin can be learned by the network. We note that changing this diameter may change the results, but we did not study this effect in this proof-of-concept paper. Finally, the dimension of the input data was 100 000 × 64 × 64 × 6 pixels, and the dimension of the output data was 100 000 × 64 × 64 × 1 pixels.

Data pre-processing
To successfully apply the DL algorithm on Planck frequency HFI maps, a pre-processing of the data is needed. The mean input data and their standard deviations should be of the order of unity because ML algorithms produce better results for this range of values. However, in the Planck maps, a large variety of sources are detected that produce signals with very different spectral responses (e.g. radio sources bright in the low frequencies, and dust sources bright in the high frequencies). The shapes of the pixel distributions of the Planck HFI frequency maps are thus highly non-Gaussian, preventing a simple normalisation of the maps to their means and their standard deviations. Here, we chose an approach to optimise the capture of the CMB spectrum deviation at the scale of the CMB fluctuation values (i.e. the secondary anisotropies, in particular, the SZ effect). The primary CMB fluctuations are themselves Gaussian distributed (Planck Collaboration XVII 2016), but any external source (other than CMB anisotropies) adds positive emissions to the Planck frequency maps. This produces an asymmetric Gaussian distribution, extended to the right part of the pixel distributions. Therefore, we fitted a Gaussian to the left part of the distributions, up to their statistical modes (values that appear most often), as shown in the left panel of Fig. 2. This part of the pixel distributions must contain the noise and the CMB fluctuations in each frequency. We then normalised each map by the means and the standard deviations of the fitted Gaussian. This approach optimises the use of DL algorithms in the Planck frequency maps, especially for the study of the CMB fluctuations at each frequency, and thus of the SZ effect. The pixel distributions of each of the pre-processed HFI frequency maps are shown in the right panel of Fig. 2.

U-net architecture
We trained a CNN applied on the Planck HFI frequency maps to detect sources with a low signal-to-noise ratio through SZ emissions with a high signal-to-noise ratio coming from the hot gas inside the galaxy clusters. CNN are indeed very efficient in encoding information on extended objects that can be invariant in translation or in rotation, as is the case for SZ sources in the Planck maps. Moreover, the U-net architecture, based on CNN, is one of the most efficient ways to reconstruct segmentation images (Ronneberger et al. 2015) (e.g. the U-net architecture has won the international symposium on biological imaging celltracking challenge 2015 applied on biomedical image segmentation). For these reasons, we chose this network architecture to perform the training. We trained the U-net by choosing small patches of the Planck HFI frequency maps as input and segmentation maps showing the positions of the clusters in the patches as output. The model thus provides an SZ prediction map, between 0 and 1, that we can compare with the known clusters in a test sample, or with the Planck MILCA SZ map.
The architecture of the U-net is symmetric and is composed of two parts: a contracting part and an expansive part. These parts encode and decode the information. The encoder is composed of different layers of convolutions that filter the input images with convolutional kernels. The weights of the kernels are learned during the training, so that the encoder itself can represent the information contained in the data in a reduced and optimal dimensional space. The scales of the captured features increase with the layers. The second block, or decoder, deconvolves the convolved maps and communicates with the smaller scales to reconstruct forms or objects. Its architecture makes this network very efficient in identifying objects and in reconstructing and classifying them in segmentation maps. This network has previously been successfully applied in astrophysics, for example to mimic numerical simulations, detect cosmic structures, or compute fluxes in blended galaxies (e.g. He et al. 2019; Aragon-Calvo 2019; Boucaud et al. 2020).
The different blocks of the encoder in the U-net are based on a typical convolutional network architecture: two 3 × 3 convolution layers, followed by a rectified linear unit (ReLU) activation function φ that quantifies the significance of the information learned by the kernels, Then, a max pooling layer (or downsampling) is added to reduce the dimensions of the convolved images. The blocks are repeated five times and the number of filters are doubled at each downsampling because the complexity of the possible features increases with decreasing scales. The convolved images in the output of the blocks are the input images of the next blocks. The decoder is also composed of a succession of blocks: a 2 × 2 up-convolution, with the number of filters decreasing at each block, a concatenation with the convolved images of the same dimension from the encoder part, followed by two 3 × 3 convolutions, with a ReLU activation function. A last 1 × 1 convolution layer is added at the end of the network for a total of 23 layers. We also added a dropout of 0.2 after each convolutional layers, which helps the network to learn the most relevant information in the different filters more efficiently. The number of filters in the beginning of the network is related to the complexity of the different recovered features. Because the maps of Planck we used do not contain many complex spatial features, we started with 8 kernels and increased them to 128 in the last layer of the encoder. As a last activation function, we used a sigmoid function s to convert the maps at the output of the network into a probability map, bounded between 0 and 1, We used the computationally efficient Adam optimizer to update the network weights, with a learning rate of lr = 10 −4 to iteratively adapt the precision of the back-propagated errors. We used a binary cross-entropy loss function to compare the errors between the prediction map output by the network and the segmentation maps we used as outputs. Training was performed with a batch size of 20, and a patience condition of 12 epochs without improvement of the value of the loss (computed on the validation set, which was not used in the training) before the training was stopped. All the parameters on which the training depend (i.e. dropout rate, kernel size, number of filters, and initial learning rate) were varied in different trainings of the U-net, until we found one that learned the SZ features.

Results
We investigated two cases. First, by training with the Planck_z clusters, we studied the possibility of potentially detecting more clusters in the Planck maps with deep learning. We show the results hereafter. Second, we trained three other models based on four different cluster catalogues in the output segmentation maps: we successively added MCXCwP, RM 50 , and RM 30 . Even though these clusters are not included in the Planck catalogue, they show statistical evidence of SZ emission (as seen by the positive fluxes in Fig. 7 that we describe below). This was

Results on the test area
To train the algorithm to detect SZ sources with a low signalto-noise ratio in Planck HFI frequency maps, we started with the simplest possible case: learn Planck with Planck. The segmentation maps we used as outputs were generated based on the Planck_z catalogue to ensure an absolutely pure catalogue of galaxy clusters. The U-net has been successfully trained for about ∼3 h on a GPU NVidia Tesla K80, with more than 30 epochs. Based on the trained model, full-sky maps of SZ prediction p were constructed.
To estimate the performance of the model and ensure that the U-net has learned to detect SZ sources, we compared the SZ prediction map in the test area and the test catalogues described in Sect. 3.1.2. To detect galaxy clusters, we simply defined the clusters as areas of prediction index p greater than a threshold p max . For each area recovered above p max , we computed the position as the barycentre of the pixels. This detection method is very simplistic and not optimal, but it is still efficient enough to roughly check the overall consistency. We cross-matched the sources detected with this method with the three catalogues Planck_z, Planck_no-z, and MCXCwP, and studied the recovered clusters as a function of the detection threshold p max . The recovered clusters in the three catalogues are shown in the left panel of Fig. 3, together with the detected sources and potential SZ source candidates that are not in the Planck catalogue or in the MCXC catalogue in the right panel. For a threshold p max = 0.1, all the Planck_z clusters are recovered, together with 89% of Planck_no-z clusters, and 12% of MCXCwP clusters. For the same threshold, 218 sources do not belong to any of the mentioned catalogues.
We investigated the nature of the 218 sources detected in the test area with the threshold p max = 0.1 that do not belong to any of the catalogues by stacking them in the Planck MILCA SZ map and in the Planck HFI frequency maps. We followed indications of galaxy cluster signatures and wished to ensure that they did not statistically correspond to obvious point sources that might contaminate the model (e.g. infra-red point sources). The results of the stack are shown in Fig. 4. A statistical presence of SZ emission is suggested by a y emission in the Planck MILCA SZ map and by a signature of the SZ effect in the centre of the HFI frequency maps (i.e. negative emission in the 100 and 143 GHz maps and positive emission for the frequencies above 217 GHz). SZ sources may be populated by dust, suggested by the excess of signal that is also seen in the centre of the map at 217 GHz. A bright infra-red source is also seen in the HFI frequency stacked maps (in the lower part of the quadrants), with an intensity that increases with the frequencies, together with a complex background in the 100, 143, and 217 GHz stacked maps that mainly comes from the CMB.

Results on the full sky
After checking the results of the U-net in the test area, we applied the same detection method to the full-sky SZ prediction map, with a detection threshold of p max = 0.1 in order to recover the maximum number of Planck_z clusters. We detected 20 204 sources in the full-sky map with the U-net with p max = 0.1. We compared the detections with the three catalogues of known galaxy clusters, Planck_z, Planck_no-z, Fig. 5. Stack of the 18 415 sources detected with the U-net that do not belong to the Planck or the MCXC catalogue in 16 different maps to determine statistical counterparts of galaxy cluster components in different wavelengths: the SZ MILCA y map, the six Planck HFI frequency maps, the IRIS map at 100 µm, the CMB lensing map, the four galaxy over-density maps based on all, and on passive, transitioning, and active populations of galaxies from the WISExSCOS photometric redshift catalogue (in the redshift range 0.1 < z < 0.3), and the ROSAT X-ray map. Each of them shows statistical indications of galaxy cluster emissions. Patches are 3.4 • × 3.4 • in order to show the large-scale contribution around the sources. and MCXCwP. Among the 20 204 detected sources, 98.5% of the Planck_z clusters are recovered, together with 76.4% of Planck_no-z clusters, and 20.8% of MCXCwP clusters. Moreover, 11 cluster are identified by ACT (Hasselfield et al. 2013) and 98 clusters are identified by the South Pole Telescope (SPT; Bleem et al. 2015), but are not included in the Planck PSZ2 catalogue. This means that 18 415 sources do not belong to any of the catalogues. We investigated the nature of the sources detected with the U-net. First, we cross-matched the sample of 18 415 sources with Planck point sources. Only 6.1% are matched within a cross-match radius of 5 arcmin with the positions of the Planck catalogue of galactic cold cores, and only 0.2% are matched with the positions of the Planck sources identified at 353 GHz. Second, we stacked at their positions 16 maps in different wavelengths, each of them potentially probing different galaxy cluster counterparts. Some of the 16 maps are also based on Planck data and thus are not independent, but some of the maps are independent and may show indications of galaxy cluster counterparts in other wavelengths, that is, in near-infra-red (where galaxies emit) and in X-rays (where the same gas emits as is detected with the SZ). The 16 maps are the Planck SZ MILCA map, the 6 Planck HFI frequency maps, the IRIS map at 100 µm (Miville-Deschênes & Lagache 2005), the CMB lensing map (based on Planck; Planck Collaboration VIII 2020), 4 galaxy over-density maps of all galaxies (called GAL ALL), passive galaxies (called GAL P), transitioning galaxies (called GAL T), and active galaxies (called GAL A) following Bonjean et al. (2019), star formation rate density maps (called SFR), a stellar mass density maps (called Mstar) constructed with the method from Bonjean et al. (2019), and finally, the ROSAT X-ray map (ByoPiC 4 product). The Planck maps were masked from the Planck Catalogue of Compact Sources (PCCS, Planck Collaboration XXVI 2016), and the ROSAT map was masked from the point sources detected in ROSAT, Chandra, and XMM-Newton (Boller et al. 2016;Evans et al. 2010;Rosen et al. 2016, respectively). The result of the 16 stacks are shown in Fig. 5.
The stacks show indications of statistical emission of galaxy clusters. A statistical indication of dark matter potential is seen in the CMB lensing map. In the over-density maps and galaxy property density maps, a statistical indication of passive galaxy over-density is seen in each of the maps, suggesting that some of the detected SZ sources might be galaxy clusters in the redshift range 0.1 < z < 0.3 (i.e. the redshift range of the catalogue of galaxies used to construct the galaxy density maps). In the six Planck HFI frequency maps, a complex background is seen that correlates with the IRIS map at 100 µm, showing the dust. This suggests that the background in the Planck frequency maps is due to the emission of the dust in our galaxy, and thus that the sources detected with the U-net are preferentially in areas without this foreground dust contamination. In the very centre of the HFI maps, inside the hole produced by the background, a statistical SZ signature is seen, with a decrement at 100 and 143 GHZ and an increment in the 353 GHZ map, decreasing up to the 857 GHz map. A statistical indication of Bremsstrahlung emission is also seen in the ROSAT stacked map. Each of these stacks shows statistical indications of cluster counterparts in different wavelengths, and thus potentially some of the sources detected with the U-net might be actual galaxy clusters. To ensure that setting the threshold p max to 0.1 is acceptable, we also stacked the sources detected with the U-net in bins of their maximum associated prediction index p in the X-ray ROSAT map. The bins were chosen so that they contained the same number of sources, that is, 3683 each. The stacked radial profiles measured in the ROSAT map are shown in Fig. 6, with error bars computed with the bootstrap method. This method resamples the profiles in each bin. The background of each of the 18 415 sources detected with the U-net that do not belong to the Planck or the MCMC catalogues was individually substracted in their maps to take the galactic plane into account. Figure 6 also shows the point spread function (PSF) of ROSAT with an effective beam of about 2 arcmin, normalised to the profile of the first bin.
This result shows statistical indication of X-ray emission that might come from actual galaxy clusters for each bin of associated prediction index p, and decreasing with p. Moreover, X-ray emission detected for the five bins is diffuse emission, as seen by the radial extensions of the profiles, which are larger than the ROSAT PSF. The bootstrap method ensures the significance of the profiles by re-sampling the profiles in each bin. Even in the lowest prediction index bin, 0.1 < p < 0.12, diffuse X-ray emission is observed. This suggests that the low prediction index output by the U-net may indicate potential SZ emission with a low signal-to-noise ratio, such as low-pressure, low-mass, or high-z galaxy clusters.

Scaling relation
These results show that the U-net trained on SZ sources with a high signal-to-noise ratio (i.e. Planck clusters) has learned the frequency dependency and the spatial features of the SZ effect in the six Planck HFI frequency maps, and that some of the sources detected with the U-net that do not belong to the Planck or the MCXC catalogue might be actual galaxy clusters, even at the lowest bin of associated prediction index p. Results shown in the stacked profiles of the X-ray maps show that the amplitude of the signal and p appear to be related. If this is confirmed, p should be related to the SZ flux in the MILCA y map in some way. We analysed the associated prediction index p of the 20 204 sources detected with the U-net in the full sky by stacking the Planck SZ MILCA map for different bins of p at their positions. The results are shown in Fig. 7, where the blue points are the fluxes (computed with aperture photometry) in the stacked Planck SZ MILCA maps as a function of the mean prediction index p in the bins. The stacked flux (also computed by aperture photometry) in the Planck SZ MILCA map of the Planck PSZ2 clusters is displayed in orange for comparison, together with the stacked flux of the MCXC clusters in green, and the stacked fluxes of the RedMaPPer clusters selected in richness in red and purple. All the errors are computed with the bootstrap method. This result suggests that the maximum value of the prediction index p that is output by the U-net might be statistically related to the integrated flux in the Planck SZ MILCA map. In the future and with more investigation, a scaling law may therefore be estimated to translate the maps from p units into y units. A81, page 8 of 11 In Fig. 7 we also show that the sources detected with the U-net have average fluxes between those of the MCXC and the RedMaPPer clusters. This suggests that the sources detected with the U-net, if they are actual clusters, may contain less massive SZ sources at low redshift or higher redshift SZ sources, so that the stellar mass and redshift distributions of the sources detected with the U-net are between the distributions of the MCXC and the RedMaPPer clusters that are shown in Fig. 1. However, because the SZ effect is a projected effect, there is a degeneracy between the redshift and the mass of the individual clusters. We did not investigate the range of redshifts and masses of the newly detected sources in detail here. We focus instead on the potential of the application of DL algorithms on the Planck data.

Diffuse SZ emission
Motivated by the construction of a new SZ map that is more sensitive to SZ emission with a lower signal-to-noise ratio, we trained three other U-nets by choosing SZ reference galaxy cluster catalogues with a lower signal-to-noise ratio to construct the segmentation maps of the training catalogue. In practice, we successively added the MCXCwP clusters, the RM 50 , and finally the RM 30 clusters in addition to the Planck_z clusters. We had four U-net models in total (the first was presented in the previous section).
We generated four SZ prediction full-sky maps based on these four U-net-trained models. To illustrate the potential recognition of diffuse gas, we arbitrarily focused on four regions around large-scale structures that were identified as containing diffuse SZ signal before: the Shapley super-cluster, the galaxy cluster pair A399-A401, the Coma super-cluster, and the Leo super-cluster. Patches extracted from the SZ prediction maps derived from the four models around these structures are shown in Fig. 8, together with patches of the Planck SZ MILCA map for visual comparison.
When the MCXCwP clusters are added, the diffuse gas around the super-clusters and the bridge of matter between A399 and A401 are recovered. For the models with the RedMaPPer clusters, potential indications of large-scale structures that connect the structures are seen. The SZ prediction maps obtained with the U-net are visually very similar to the MILCA SZ map, but they seem less noisy and better resolved. These preliminary results are promising for the detection and the characterisation of the diffuse gas in the large-scale structures.

Discussion and summary
A U-net trained on the Planck HFI frequency maps to recognise the spatial and spectral features of SZ signatures with a high signal-to-noise ratio produced by the hot gas in known galaxy clusters is able to recognise Planck clusters and shows promising results for the potential detection of SZ sources with a lower signal-to-noise ratio. In the most conservative case, when we trained with the Planck_z catalogue of 1094 sources, 200 MCXC clusters that are not in the Planck catalogue are recovered, together with more than 18 000 new potential sources detected with the U-net above a detection threshold p max = 0.1. This confirm results obtained in Hurier et al. (2017) and Tarrío et al. (2019), who detected about twice more galaxy clusters than in the Planck catalogue using Planck data. Although the detection threshold was set to a low value, the different stacks of the 18 415 sources detected with the U-net in bins of their associated SZ prediction index show indications of galaxy cluster signatures. The statistical presence of dark matter halos is suggested by the stacked CMB lensing map, the statistical presence of hot gas is suggested by the stacked Planck SZ MILCA map and by the X-ray ROSAT map, and the statistical presence of passive galaxy over-density is suggested by stacked galaxy density maps (constructed with the value-added WISExSCOS photometric redshift catalogue). With more investigation, a catalogue of potential galaxy cluster candidates might be constructed with this approach in the future. These potential candidates might be compared to the next generation of full-sky surveys, allowing the confirmation of the presence of diffuse gas (i.e. with SRG/eROSITA in the X-rays). Galaxy clusters or complex structures that might be newly detected with this method are expected at intermediate or high redshift (i.e. z ≥ 0.5).
By focusing on areas around multiple-cluster systems, we have shown that DL models can be used to reconstruct an SZ map that is more sensitive to SZ emission with a lower signalto-noise ratio. Although a qualitative study is still to be performed in a future analysis, this proof-of-concept study shows the potential of applying DL algorithms to Planck data.
Furthermore, the U-net and the method presented in this study can in principle also be applied to any component separation in the Planck data, for example radio emission, dust emission, CO emission, and the CMB. We are currently investigating the training of more complex DL architecture multi-class classification in order to perform multi-component separation.
To improve the results even more, ACT and SPT higher resolution maps could be included in the training process: results could then be compared to other studies that combined these two missions with the Planck data (e.g. Chown et al. 2018;Aghanim et al. 2019).