Open Access
Issue
A&A
Volume 687, July 2024
Article Number A238
Number of page(s) 28
Section Cosmology (including clusters of galaxies)
DOI https://doi.org/10.1051/0004-6361/202449447
Published online 16 July 2024

© The Authors 2024

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

1. Introduction

Galaxy clusters sit at the most massive nodes of the cosmic web. They form last in the cosmic evolution by accreting groups and smaller structures. Their distribution is sensitive to the underlying cosmological model and for this reason they are recognised as a key cosmological probe (Clerc & Finoguenov 2023). Being rare objects, their census requires surveys spanning a large fraction of the sky with sensitive instrumentation. The diffuse X-ray emitting gas, filling the megaparsec-wide intracluster medium (ICM), signposts virialised halos and enables their discovery up to large distances. In the context of X-ray cluster studies, accurate knowledge of a selection function has been required for cosmological abundance studies (e.g. Böhringer & Chon 2015; Mantz et al. 2015; Finoguenov et al. 2020; Garrel et al. 2022), cluster clustering studies (e.g. Marulli et al. 2018; Lindholm et al. 2021), scaling relation studies (e.g. Pacaud et al. 2006; Mantz 2019; Bahar et al. 2022), and for the investigation of extreme objects (e.g. Hoyle et al. 2012).

The eROSITA instrument on board the Spectrum Roentgen Gamma Mission (SRG/eROSITA, Predehl et al. 2021) has surveyed the entire sky during its first six months of operations (Merloni et al. 2024), collecting enough photons to discover several thousand galaxy clusters in the western Galactic hemisphere (Bulbul et al. 2024). Those extended sources are identified in multi-band optical surveys and their redshift is measured, hence providing a distance to us observers. The first catalogues of clusters discovered in the first eROSITA All-Sky Survey (eRASS1) western Galactic hemisphere (Bulbul et al. 2024; Kluge et al. 2024) support detailed individual cluster studies (e.g. Liu et al. 2023; Veronica et al. 2024) and population studies revealing for the first time the large-scale evolution of clusters and groups up to z ≃ 1 and beyond. Among them, the cosmological analyses particularly stand out, being a driver of the eROSITA mission design and of the construction of cluster catalogues. The first cosmological results based on cluster number counts are presented in Ghirardini et al. (2024) and Artis et al. (2024). In this article, we explain how the selection function model supporting these results was constructed. In fact, virtually any cluster population study based on the published eRASS1 catalogues needs to account for incompleteness at some stage. It is indeed inefficient, if not ill-defined, to require complete samples for population analyses (Rix et al. 2021); modelling incompleteness is in general more fruitful and, in fact, rather inevitable. The current blooming of large astronomical surveys fosters the development of accurate selection models with powerful statistical methods. The recent Gaia survey is an illustrative instance where selection functions for the parent catalogue (Boubert & Everall 2020) and for its subsets (Rix et al. 2021; Boubert & Everall 2021) are modelled using detailed accounting for the magnitudes, position, and parallaxes (and uncertainties) of the stars.

This paper is organised as follows. We first present the detailed motivation for this work in Sect. 2, where we also introduce our notations. We present the twin eRASS1 simulations in Sect. 3, in particular we discuss our procedure matching detections and simulated clusters. We describe a first class of selection function models uniquely relying on X-ray count profiles in Sect. 4, highlighting some salient features of cluster selection in the observable space. In Sect. 5 we introduce a second class of models relying on intermediate variables. We show the outcome of a validation of our intermediate models using external catalogues (X-ray and millimetre-band detections) in Sect. 6. We discuss our results in Sect. 7 and summarise our findings in Sect. 8.

Unless stated otherwise, we assume a flat ΛCDM cosmological model with parameters from Planck Collaboration XIII (2016). We use ln for natural logarithm, and the base ten logarithm is written as log10. The notation 𝒩(μ;σ) indicates a normal distribution centred on μ with standard deviation σ. We use the ″ symbol for arcsecond units.

2. Motivation

Let us follow an illustrative example based on cluster count cosmology studies, although this demonstration may be generalised to other kinds of population analyses. Most cosmology analyses involve a numerical likelihood:

L ( model ) p ( data | model ) . $$ \begin{aligned} L(\mathrm{model}) \propto p\left(\mathrm{data}|\mathrm{model}\right). \end{aligned} $$

This quantity represents the probability that a set of observed data derives from a certain model. One assumes the model to be dependent on a set of parameters Θmodel fully describing it (see Table 1):

L ( Θ model ) p ( data | Θ model ) . $$ \begin{aligned} L({\boldsymbol{\Theta }_{\rm model}}) \propto p\left(\mathrm{data}|{{\boldsymbol{\Theta }_{\rm model}}}\right). \end{aligned} $$(1)

Table 1.

Glossary of symbols and conventions used throughout this paper.

Posterior distributions p(Θmodel|data) are final products of cosmology analyses, obtained by application of the Bayes theorem involving the likelihood (Eq. (1)). Most often the required integrals are computed by repeated samplings of the parameter space, and by formulating as many queries to the likelihood function value L. A practical consequence is that likelihood evaluation must be computationally efficient.

In experiments involving counts of galaxy clusters – which include cluster abundance and cluster clustering studies – objects are grouped in bins of one or several measured quantities. The bins are drawn in a predefined parameter space Θbin: measured mass, redshift, flux, pairwise distances, etc. The likelihood functions L combine observed numbers N ̂ j $ \hat{N}_j $ and model-predicted numbers nj by means of a statistics (j is an index for the bins). A famous example is the Poisson log-likelihood (see also Cash 1979), adequate for shot-noise dominated bins. The size of the bins is part of the likelihood design and it is usual practice to consider infinitesimally small bins containing N ̂ j = 0 $ \hat{N}_j = 0 $ or 1 object. In such a case, nj transforms into dn/dΘbin. In presence of additional variance terms, the Poisson likelihood must be modified. For instance, Garrel et al. (2022) present a likelihood accounting for sample variance (e.g. Hu & Kravtsov 2003). This component is an important source of variance when dealing with large numbers of objects (relatively to Poisson noise that affects small samples).

A model should then predict the value of nj. In the ideal case of unrestricted computational power, excellent fidelity may be achieved by means of end-to-end models of the eRASS1 sky, and by retrieving the number nj from the simulated catalogues. However it is prohibitive to run one new end-to-end simulation each time a value is queried by the likelihood function (Eq. (1)). The cosmological analysis process necessarily resorts to approximations, interpolations and emulators. For instance, a parameterised halo mass function (HMF) model would replace series of full N-body simulations; intra-cluster medium (ICM) scaling relations replace detailed hydrodynamical simulations; while a selection function model replaces mock image generation and processing.

We write p(I|Θsel) the probability of selecting an object in the sample given the value of parameters Θsel. These values must be a prediction of the model. The expression for the modelled counts nj may write formally:

n j = d Θ bin d Θ sel d Θ model 1 ( Θ bin j ) × p ( I | Θ sel ) p ( Θ bin , Θ sel | Θ model ) d n d Θ model · $$ \begin{aligned} n_j &= \int {\mathrm{d}} {\boldsymbol{\Theta }_{\rm bin}} {\mathrm{d}} {\boldsymbol{\Theta }_{\rm sel}} {\mathrm{d}} {\boldsymbol{\Theta }_{\rm model}} \boldsymbol{1}({\boldsymbol{\Theta }_{\rm bin}} \in j) \nonumber \\& \times p\left({I}|{{\boldsymbol{\Theta }_{\rm sel}}}\right) p\left({{\boldsymbol{\Theta }_{\rm bin}}, {\boldsymbol{\Theta }_{\rm sel}}}|{{\boldsymbol{\Theta }_{\rm model}}}\right) \frac{{\mathrm{d}} n}{{\mathrm{d}} {\boldsymbol{\Theta }_{\rm model}}}\cdot \end{aligned} $$(2)

In the latter expression we have introduced dn/dΘmodel, the predicted number distribution of galaxy clusters under a certain model assumption. The indicator function 1 ensures only those objects with Θbin taking values in bin indexed by j are counted. The three sets of parameters may share part or all of their members.

An important preparatory task in the cosmological analysis consists in carefully selecting Θbin and Θsel so the model is accurate and the modelling effort is well balanced between p(I|Θsel) and p(Θbin, Θsel|Θmodel). Depending on how close Θsel is from purely observable quantities, selection function models involve a varying amount of astrophysical modelling. It is also important to recognise that some aspects of selection may result from human intervention and thus they are difficult to model with analytic formulae. A certain degree of empiricism may then be introduced in building selection models. We will discuss several options in this paper and compare their advantages and shortcomings.

3. The eROSITA galaxy cluster survey and its simulated twin

3.1. The eRASS1 galaxy cluster samples

The primary eRASS1 cluster catalogue in the western Galactic hemisphere (hereafter eRASS1-primary or eRASS1-main) is described in detail in Bulbul et al. (2024) and Kluge et al. (2024). This catalogue builds upon the eRASS1 X-ray source detection list in the soft band (0.2–2.3 keV) presented in Merloni et al. (2024). The source detection is achieved with the eROSITA Science Analysis Software (eSASS in version 010), and it incorporates two essential steps in its final stages (see Brunner et al. 2022, for further information). First, sources are detected by searching counts significantly above the background level; those sources are individually characterised using a point-spread function (PSF) fitting algorithm. The detection likelihood (DET_LIKE_0 in the parent catalogue, hereafter shortened to ℒdet) and the extent likelihood (EXT_LIKE, hereafter ℒext) are among the most relevant parameters for selecting extended sources. In particular, the eRASS1-primary catalogue is compiled based on a low threshold of ℒext > 3 to maximise completeness. A rigorous cleaning process is applied to the X-ray source catalogue to ensure clean and uncontaminated images are suitable for extended source identification. A series of additional cleaning steps remove spurious split detections; we will assume their impact on completeness is null, as suggested by the number of secondary matches found in the digital twin matching the fraction of split sources in real data (Seppi et al. 2022, and next section).

The description of the eRASS1 cosmology sample (hereafter eRASS1-cosmo) is detailed in Bulbul et al. (2024). It is compiled with a more conservative cut on the measured parameter ℒext > 6 compared to the eRASS1-primary sample to maximise the purity. Exclusively, galaxy clusters with measured photometric redshifts between 0.1 and 0.8 are selected, keeping only areas of the sky where the photometry can be uniformly applied in redshift measurements in the Legacy DR10-south area (Kluge et al. 2024). The cosmology sample comprises 5259 identified clusters with low contamination levels below 5%.

This paper focuses only on X-ray selection function models. However, both the eRASS1-primary and eRASS1-cosmo catalogues are derived from extensive optical follow-up designed for identification and redshift measurement. This identification may act as an additional selection filter impacting completeness and purity estimates and should be accounted for in science analyses with these catalogues (see Ghirardini et al. 2024). Most of the discussion in the paper neglects this effect, and we refer to Kluge et al. (2024) for a detailed description of the optical selection function.

3.2. The eRASS1 digital twin

Understanding selection effects requires mock observations reproducing as many of the characteristics of the actual data as possible. We briefly summarise here the main features of the eRASS1 digital twin depicted in Seppi et al. (2022). The parent halo catalogue originates from UNIT1i dark-matter simulations (Chuang et al. 2019). A full sky light cone is created by replicating shells of the individual snapshots of the simulation (Comparat et al. 2020). The large, albeit finite, box size prevents simulating the very nearby, massive halos that constitute a tiny fraction of the sources in eRASS1 and require a separate treatment. X-ray sources are associated to the light cone using models for the emission of AGN (Comparat et al. 2019) and galaxy clusters and groups (Comparat et al. 2020). The cluster model is generated from a set of real clusters by accounting for the covariances between the surface brightness profile, halo mass, temperature, and redshift. The baryon profiles are painted onto halos based on observations; therefore, the predicted emissivity profiles are representative of observations present in the literature, possibly biased against low-luminosity and low-mass objects. Therefore, low-mass groups M500c < 5 × 1013M require a flux rescaling correction (see appendix of Seppi et al. 2022). This mass-dependent correction is derived by displacing simulated objects along the observed stellar mass–X-ray luminosity relation of groups and clusters (Anderson et al. 2015; Comparat et al. 2022; Zhang et al. 2024). Stellar masses for distinct haloes are predicted via the stellar to halo mass relation (and its scatter) from Moster et al. (2013), see Eq. (1) in Comparat et al. (2019). The correction factor so obtained is negligible at high mass, while it can reach down to 0.1 at 1013M. This correction procedure preserves scatter in the X-ray luminosity at fixed halo mass. This approach is validated a posteriori by the good agreement between the observed and simulated galaxy cluster flux distributions.

The AGN model (Comparat et al. 2019) reproduces by construction the observed hard X-ray AGN luminosity function from Aird et al. (2015) and the number density as a function of AGN flux at redshifts 0 < z < 4. AGN are associated to dark matter halos and subhalos by combining together a stellar mass to halo mass relation with a duty cycle, and a abundance matching between stellar mass and X-ray luminosity. The observed rate of AGN in galaxies as a function of stellar mass and redshift (i.e. the AGN duty cycle, Georgakakis et al. 2017) is thus naturally reproduced. As the algorithm matches halos to AGN sources, a parametric scatter in stellar mass at fixed AGN luminosity is tuned to a value ensuring the distribution of specific accretion rates follows observed data. Since AGN and clusters are painted onto halos drawn from the same N-body simulation, these two populations are spatially correlated. Except for the link to stellar mass, our simulations make no account for environmental AGN dependence nor dependence on the star formation rate of galaxies (e.g. Aird et al. 2018). Despite being simple, this model produces AGN Halo Occupation Distributions in good agreement with eFEDS data in group- and cluster-size halos, and this is true in particular for central galaxies in those halos (Comparat et al. 2023). Brightest Central Galaxies (BCG) are not treated separately from other halos in our simulation. Because they are hosted in higher mass halos, the stellar mass associated to BCG is higher than that in average field or satellite galaxies; so is their associated AGN X-ray luminosity. In practice, due to the scatter in the abundance matching procedure and to the low value of the duty cycle of high-LX AGN, it is rare to find a luminous AGN at the centre of a simulated cluster. In fact, only 0.4% of the M > 1014M simulated halos host a central AGN more X-ray luminous than the cluster itself.

A simulated half-sky twin contains more than 106 simulated halos up to z ∼ 1.5. Besides AGN and clusters, foreground X-ray emitting stars and galactic foregrounds also have their share of simulated photons. A total of 373 316 simulated stars have a flux assigned according to the X-ray flux cumulative distribution measured in eFEDS (Schneider et al. 2022); their positions are selected at random among true Gaia DR2 positions to follow the observed density of stars. Diffuse foregrounds include mainly emission from the local hot bubble, eROSITA bubble, and the galactic halo (Predehl et al. 2020; Ponti et al. 2023; Locatelli et al. 2024; Zheng et al. 2024a,b). Normal galaxies whose X-ray emission is driven by hot gas and X-ray binaries, also contribute to the unresolved background component. The sky density of X-ray emitting galaxies is well below the number of AGN in stars at the limiting flux of eRASS1 (Lehmer et al. 2012), hence we do not simulate these sources individually, rather their contribution is entirely considered as part of the eRASS1 unresolved background component. Our practical implementation relies on resampling real eRASS1 background events obtained after conservatively masking each eRASS1 detected source region. This approach enables the generation of spatially varying background (at the angular resolution of 0.9 deg) that correctly reproduces actual eRASS1 data.

The SIXTE software (Dauser et al. 2019) generates mock event lists and images of these photons, as would be seen by eROSITA in the eRASS1 survey. In particular, instrumental characteristics and scanning laws enable reproducing the response of the telescopes to the X-ray simulated sky. In order to examine variability due to stochastic Poisson noise, several tens of simulations are produced (using the same X-ray parent population). Processing of event files in tiles of size 3.6 × 3.6 deg2 took place in a similar way as for real data. In particular the eSASS (version eSASSusers_201009) source finding algorithm runs over each tile and delivers a list of detections with associated measurements ℒdet and ℒext.

All clusters simulated in a twin mock are associated to a set of properties, such as mass M500, true redshift z, X-ray luminosity LX, flux fX or count-rate CR (all within R500, see Table 1). These “true” properties will serve in establishing selection models in Sect. 5. However, we consider important at this stage to recall that such properties are “labels” associated to simulated sources. The link between those labels and the mock events depends obviously on the models imprinted in the twin simulation.

3.3. Matching input and detected sources

The next step consists in attributing a binary flag to each simulated halo stating whether it is selected in the cluster sample. This operation relies on matching the input catalogue to entries in the detection list. We explore and compare two different procedures. The first method, hereafter “photon-based”, takes advantage of the SIXTE simulator as it individually tracks simulated CCD events back to the source that emitted the photon. The second method, hereafter “position-based”, uses positions and sizes of sources on sky, aided with prior knowledge of the flux distribution of selected sources.

3.3.1. Photon-based matching

The photon-based matching is in fact our baseline method. We refer the reader to Seppi et al. (2022) for a comprehensive explanation of the procedure. The input object contributing most photons to the counts making a detection is identified as the best matching counterpart. If a detection is split into several sources by the detection software, again preference is given to the source comprising most photons originating from the identified counterparts. This enables assigning a unique match (ID_Uniq) to a given detection and to a given simulated object. This is the definition we take for matches in the following. Sources with a unique match are flagged as selected (Imain = 1). We note that this procedure involves sources of all types: simulated AGN, clusters, stars; and detected point-like and extended sources.

3.3.2. Positional matching

The positional matching technique is a two-way match between the input and detection catalogues. It was employed in the context of establishing a selection function for the eROSITA Final Equatorial-Depth Survey (eFEDS, see Liu et al. 2022). This technique does not assume a fixed cross-matching radius; instead it takes into account the sizes of sources. Each simulated halo is listed with a central coordinate and an angular extent (we take 10% of the virial radius). Each extended detection is reported with a coordinate, a 1-σ error circle, and an angular extent (we take the core-radius of the best-fit β-model). We combine source extents with the source positional uncertainty; effectively it is similar to spreading location of the source over a small region. We use the NWAY algorithm (Salvato et al. 2018), since it is well-suited to cross-matching catalogues in presence of positional uncertainties. We first take the input catalogue as reference and look for matches in the detection catalogue. For each reference source, the algorithm returns a list of all matches located within a large buffer region of radius 3 arcmin. Each of the matches is assigned a probability pi that it is indeed a valid association, based on its proximity and positional uncertainties. A second value pany is computed, representing the probability for the input source to have at least one counterpart among the detections. The probabilities pi and pany account for chance associations, through the source density estimated over a degree-scale region. Sorting matches by the value of pi provides a ranking of most likely counterparts. More often than not our catalogues enclose complex configurations with multiple input sources projected along neighbouring line of sights, and detections split in multiple sources. To deal with these effect we update the probabilities with a prior distribution of the flux of selected sources, as enabled by the NWAY formalism. The exact shape of this prior is unimportant in this context, and we simply take it from the simpler pre-launch selection functions derived in Clerc et al. (2018). As a consequence, NWAY will preferentially up-weight the probability of brighter sources. Converting probabilities into binary flags requires setting thresholds on pi and pany. Our experiments showed that the result is rather insensitive to their exact values. For this first pass (input catalogue as reference) we kept only pairs with pany > 0.9 and pi > 0.1. Pairs with the highest pi among all associations are called primary matches. We reiterate the above procedure after exchanging the role of the input and detection catalogues. For this second pass (detection catalogue as reference) we do not input any prior and we chose pany > 0.5 and pi > 0.1 to select valid pairs and primary matches. Finally, matches that are primary in both runs were selected as solid matches. For each input source, it is flagged as a selected cluster (Imain = 1) if it belongs to a solid match. Such a double matching procedure is more conservative than a single-pass procedure; as a primary matched source will not be available for a second pair.

3.3.3. Comparison of matching techniques

Figure 1 illustrates the comparison between both matching procedures. They agree well on a wide range of fluxes. Deviations are visible at a low flux, that is, below a few 10−14 erg s−1 cm−2. One possible cause (not unique) stems from the presence of a bright AGN near a relatively dim cluster. An example is shown in Fig. 2. In photon-based matching, even if there is an extended detection nearby, it is matched to the input simulated AGN due to the large amount of photons (and events) it deposits in the vicinity of the extended detection. In the position-based procedure, the extended source is matched to the input cluster, since it does not include input AGN sources in the analysis. The problem of characterising a source as detected in such case is ill-posed. Another subtle difference in the treatment of the catalogues plays a role, due to the definition of an extended detection. The position-based matching takes as input the list of sources with extent likelihood greater than a predefined threshold (6 for the cosmology sample). The photon-based matching takes as input the entire list of sources, associates a detection in the list and only then flags as selected those halos with a detection having extent likelihood above threshold. Reversing the order of operations has an impact in crowded regions with multiple split sources, or multiple halos along neighouring line of sights.

thumbnail Fig. 1.

Comparing the outcome of two matching procedures relating simulated halos (plain black histogram) to detected sources (photon-based matching is a dashed blue line, and position-based matching is a thin orange line). A simulated halo is flagged as selected (Imain = 1) whenever it is (solidly) matched to an extended source in the detection catalogue. The details of the matching algorithms impact the flux distribution of detected sources, especially at flux below a few 10−14 erg s−1 cm−2, with deviations up to a factor 2 in certain bins.

thumbnail Fig. 2.

Example case of blending, where a bright AGN drives the detection and the presence of a faint extended source favours its classification as extended. This figure is a cut-out of a simulated count image, smoothed with a Gaussian. The red 2- and 5-arcmin circles are centred on a faint simulated cluster (filled triangle symbol) with flux ∼5 × 10−14 erg s−1 cm−2. Detected sources appear with blue squares. A detection close to the cluster centre is classified as extended (magenta circle). It also coincides with a bright simulated AGN (orange “x”). The position-based matching algorithm does consider the simulated cluster as selected (Imain = 1), contrary to the photon-based matching procedure. Two other faint simulated clusters are shown with open triangles, they have no impact on the detection.

The comparison exercise described in this section serves in bracketing the uncertainty associated to the matching procedure. In practice, model uncertainties in the low count regime are expected to overcome such variations. The photon-based matching is taken as a baseline for the rest of the paper.

3.4. Training and test samples

Each realisation of the twin eRASS1 sky comprises about 106 halos, with less than 4% being detected (IeRASS1 = 1) and less than 1% being selected (Imain = 1) according to the matching algorithm. The sample is split into two parts: two thirds are saved for training a model, the other third is left untouched and will serve to test the validity of the model. Splitting is performed after shuffling the list of halos at random.

We also create supersets of the simulated sky by concatenating training sets of eleven realisations, and randomly shuffling their content. Although the realisations are not independent stricto sensu (they share the exact same population of objects), this procedure helps in reducing the impact of photon noise. Each super-training set is 11 × 2/3 ≃ 7 times larger than the actual eRASS1 sky.

4. Selection models with surface brightness profiles

Equation (2) involves three important factors that require modelling. The task of modelling the selection function consists in building p(I|Θsel), this is the main interest of this paper. The choice of Θsel usually results from a compromise between precision of the selection function and complexity of the model p(Θbin,Θsel|Θmodel).

A natural choice for the selection parameters Θsel is the collection of pixel values that form the images of clusters. Then p(I|image) is obtained by feeding eSASS with an image and applying thresholds on the values (ℒext, EXT) returned by the algorithm; because eSASS is deterministic, p(I|image) takes either value 0 or 1. The selection function model is thus very precise – it is actually the most precise model since it exactly reproduces the actual data processing. However, analysing a typical 10′×10′ image with eSASS takes about three seconds with standard CPUs. Computing a likelihood (Eq. (1)) for about 12 000 clusters would then amount to about three CPU⋅hours, which is unrealistically long for a cosmological analysis. To this cost must be added the time to compute a model cluster image given Θmodel.

Realising that galaxy cluster images are almost circularly symmetric, we may accept to lose precision and to reduce the complexity of modelling a cluster. Let us consider the number of photon hits (counts) deposited by sources onto the detectors, split in several sky annular apertures around the centre of a putative dark matter halo. We emphasise that the number of counts should be a prediction of the model p(counts|Θmodel). It is not the outcome of a measurement process – that is available for detected sources only. It is not the scope of this paper to discuss how such a generative model is constructed, we simply assume it exists. For constructing our selection model we rely on the twin simulations, that have kept a record of the origin of each count deposited in the image.

4.1. Illustration with a single feature: the 90″ cluster counts

Let us initiate our procedure by involving only one feature, namely the number of cluster counts received in a circular aperture around the (RA, Dec) centre of the simulated halo. The aperture radius is set to R = 90″. We transform the counts (integer values N) with the following:

η = ln ( 1 + Σ c ) . $$ \begin{aligned} {\eta } = \ln \left(1+ \frac{{\Sigma }}{c}\right). \end{aligned} $$(3)

We set Σ = N/(πR2) and c = 0.0002 counts arcsec−2. In our training set, N ranges from 0 to 3194 counts within 90″ aperture. Figure 3 displays the histogram of η associated to all halos in the training set, and for those flagged as selected (Imain = 1). The ratio of the two histograms provides an empirical estimate of the probability of detecting a halo given the number of counts deposited on the detectors. The empirical probability follows a characteristic “S-shaped” curve. We perform logistic regression to fit a model to the probability of detecting a cluster as extended (ℒext > 3) given the value η. We use the scikit-learn implementation (Pedregosa et al. 2011) with the Broyden–Fletcher–Goldfarb–Shanno optimizer and fit two coefficients, namely the intercept w0 and the slope wη of the linear model f(η) = w0 + wηη. The model probabilities are such that

p ( I main | η ) = ( 1 + e f ( η ) ) 1 . $$ \begin{aligned} {p\left({I_{\mathrm{main}}}|{{\eta }}\right)} = \left(1+e^{-f({\eta })}\right)^{-1}. \end{aligned} $$(4)

thumbnail Fig. 3.

Distribution of clusters in the training set as a function of Σ (units counts arcsec−2), and the average surface brightness in the 90″ radius around their centre. The x-axis is rescaled with c = 2 × 10−4 arcsec−2. Top panel: histogram of all simulated clusters (blue) and histogram of the subset of those found as extended by eSASS (orange). Bottom panel: dots indicate the ratio of the histograms (empirical selection rate). Error bars are the 68% confidence range estimated according to Appendix C. The vertical dashed line indicates the transition η ̂ = 1.94 $ \hat{\eta}=1.94 $ in the logistic model described by Eqs. (3) and (4) – corresponding to N ≃ 30 counts.

The cost function to minimise during the fitting procedure is the log-loss 𝒞(w0, wη), its expression is shown in Eq. (A.2).

Thanks to the one-to-one relation between η and N (Eq. (3)), we thus obtain a model p(Imain|N). We can interpret the values of the best-fit coefficients as follows: wη governs the sharpness of the transition in the S-shaped curve, η ̂ = ( w 0 / w η ) $ \hat {\eta}=(-\mathit{w}_0/\mathit{w}_{{\eta}}) $ is the value at which this transition occurs. In our experiment, we find wη = 4.59 ± 0.04 and w0 = −8.92 ± 0.06, leading to η ̂ = 1.94 $ \hat{\eta} = 1.94 $. This corresponds to a transition of the S-shaped curve taking place at N ≃ 30 cluster counts in the 90″ circular region. Positiveness of wη reflects the fact that detection probability increases with number of counts N. The computation of coefficient uncertainties is detailed in Appendix A.

4.2. A model using light profiles from all sources

The model presented so far involves only one feature, it is too simplistic to explain subtle variations in the probability of selecting a given cluster. The model improves by involving more of the quantities available from the twin simulations, namely the number of detector hits deposited by each of the five sky source components: the cluster of interest, Active Galactic Nuclei (AGN), stars, back- and foreground and other clusters. Each of these contribute to the count image and deposit photons into seven annular regions around the central coordinate of the cluster. Radial boundaries (units arcseconds) are 0–20, 20–40, 40–60, 60–90, 90–120, 120–150 and 150–180. For each simulated cluster, we thus construct a 5 × 7 = 35-element feature vector η = {ηj}j = 1…35 and perform logistic regression, involving a total of 36 coefficients {wj}j = 0…35 (one for each feature, plus the intercept). Introducing Σj the average surface brightness in an annulus, we write the following:

f ( η ) = w 0 + j = 1 35 w j η j , w i t h η j = ln ( 1 + Σ j c ) . $$ \begin{aligned} f({\eta }) = { w}_0 + \sum _{j=1}^{35} { w}_j {\eta }_j {\ \mathrm , with\ \ } {\eta }_j = \ln \left( 1 + \frac{{\Sigma }_j}{c}\right). \end{aligned} $$(5)

One may interpret the value of the coefficients wj as the sensitivity of the detection rate p(Imain|η) to a small variation of the surface brightness of one component in one annulus. Indeed we have

p η j | η k = p ( 1 p ) w j , $$ \begin{aligned} \left.\frac{\partial p}{\partial {\eta }_j}\right|_{{\eta }_k} = p(1-p) { w}_j, \end{aligned} $$

where the derivative is performed at constant ηk (k ≠ j). It yields the following:

p Σ j | Σ k = p ( 1 p ) Σ j + c w j . $$ \begin{aligned} \left.\frac{\partial p}{\partial {\Sigma }_j}\right|_{{\Sigma }_k} = \frac{p(1-p)}{{\Sigma }_j+c} { w}_j. \end{aligned} $$(6)

In other terms, everything else maintained fixed, a small relative variation ϵ = ΔΣjj of the surface brightness of one component in one given annulus leads to a variation of the detection probability proportional to wjϵ. In particular, positive coefficients relate to those features that marginally contribute to selecting a cluster, and conversely for negative coefficients. Figure 4 conveniently represents the value of the 35 coefficients wj (j ≠ 0) split into each component and annulus; the last coefficient not shown on this figure is w0 = −8.82 ± 0.08.

thumbnail Fig. 4.

Representation of the 35 coefficients wj of a logistic regression model p(Imain|counts), trained to predict whether a cluster was selected in the primary cluster sample. The 35 features are surface brightness in 7 radial annuli, associated to the 5 components indicated in legend (counts from the galaxy cluster of interest, from neighbouring AGN, fore- and background counts, counts from foreground stars, counts from other neighbouring galaxy clusters). High absolute value of a coefficient indicates high importance of the associated feature (Eq. (6)).

Keeping in mind that wj are proportional to the marginal increase of the detection probability, comparing together values of coefficients provides an approximate way of assessing the “importance” of a feature in promoting detectability of a cluster. For instance, it appears that a small increase of surface brightness in the 20 − 40″ annulus from a given cluster has the strongest impact on the detectability of this cluster. The fact that it is twice as important as the 0 − 20″ surface brightness is not surprising. A detected source (IeRASS1 = 1) must be classified as extended in order to be selected (Imain = 1); from this perspective, it is much more profitable to increase the counts beyond the eRASS point-spread function radius (∼30″). This argument may appear more clearly from Fig. B.1, which shows the coefficients wj of a model trained to predict the detection (not the selection) of a cluster. Clearly in this case the central 20″ surface brightness is the most relevant feature contributing to detectability of a cluster.

The effect of star-emitted photons is barely constrained, mostly due to their paucity in the simulation. AGN photons within one arcmin of the centre tend to marginally increase the detection rate, while those located beyond 1 arcmin tend to decrease the detection rate. This is because the source detection algorithm would put a mask and remove photons useful for cluster detection. An excess of photons from instrumental and astrophysical fore- and background within 1.5′ of a cluster tend to increase its probability of being selected; while at larger distances they decrease its probability by a large factor. The impact of neighbouring clusters is similar, although less pronounced.

4.3. Internal validation of the selection models

The previous interpretation of the wj coefficients in terms of marginal probabilities should not hide it is only relevant to a specific model, here a logistic model, that is not a perfect classifier. We assess performance of this model on the left-apart test sample. Figure 5 represents the so-called “precision-recall” curve. This curve is obtained by scanning values of a threshold set to convert probabilities p(Imain|counts) into a binary classification Imain. Once this threshold is set, one retrieves the number of true positives (TP), false positives (FP) and false negatives (FN) by comparison with the actual detection flag in the test sample. The higher the recall1 TP/(TP + FN) and the precision TP/(TP + FP), the better performance of a model. For any value of the threshold, the model using 35 surface brightness indicators performs significantly better than the model using only the 1.5-arcmin cluster counts as input. We have also displayed the curve obtained from a logistic model using two features, namely the 1.5-arcmin cluster and background counts. Its performance lies between the simple and the sophisticated (7 × 5) models. This agrees with common sense, in that adding more detailed information provides more precise models.

thumbnail Fig. 5.

Performance of three different logistic models p(Imain|counts) to predict a cluster as selected given the values of surface brightness in radial bins. The plain black line is obtained with a model using surface brightness values in 7 annuli for each of the 5 sky components. The dot-dashed line is for a model taking as input both cluster and background counts within 90″. The dashed grey line stands for a model using as input only the 90″ cluster counts. Each precision-recall curve results from model evaluations on the test sample. It is obtained by varying the threshold over which a cluster should be considered as selected and counting the number of true positives (TP), false negatives (FN) and false positives (FP).

A second test is shown on Fig. 6. This evaluation is performed by binning test clusters by their value of p(Imain|counts) returned by the model. The actual fraction of objects selected in the sample is reported on the vertical axis. The 35-feature model produces the plain envelope, in satisfactory agreement with the one-to-one line. This result indicates reliable probabilities delivered by the model in a statistical sense. The probabilities from the model using only one cluster count indicator (round dots and dashed envelope) are less reliable. This is reflected in Fig. 3 where the S-shaped curve does not reach unity around values of η ≃ 2.5 − 4 (corresponding to N ≃ 60 − 270 counts in the 90″ aperture).

thumbnail Fig. 6.

Reliablity of two different logistic models p(Imain|counts) to predict the probability of a cluster being selected given the values of surface brightness in radial bins. Each series of points is obtained with the test sample, by comparing probabilities predicted by the model (horizontal axis) with the actual fraction of selected objects (vertical axis) in each bin of probability. The envelopes materialise the 68% uncertainties (Appendix C). An ideal model would align along the 1:1 curve (thin dotted line). Black squares and plain lines correspond to the model taking 7 × 5 parameters as input, grey dots and dashed lines for the model using only cluster counts within 90″ to make its prediction.

Despite its apparent simplicity, a linear logistic model using circularly averaged surface brightness as training features does output probabilities that are representative of the actual selection probabilities. It is therefore well-suited to selection function problems involving populations of clusters. However, an average precision of 77% (Fig. 6) indicates that it is not well-suited to the classification on an individual basis. This floor performance is due to using circularly averaged profiles, hence neglecting two-dimensional effects (e.g. blending, masking, gradients) in the selection process. Furthermore, logistic regression is a simple linear model. An interesting development would involve more sophisticated models such as neural networks (e.g. building upon the multi-layer perceptron). Their use may be able to add the right level of non-linear feedback and complexity to accurately reproduce the selection process, at the expense of clarity and simplicity.

A selection function model based on surface brightness as above is attractive due to its use of observable features. In the context of cosmological studies (as idealised in Eq. (2)), it is necessary to construct a second, independent model able to predict the radial surface brightness profile of five components: clusters, AGN, stars, background and neighbouring clusters. Such a complex model p(counts|Θmodel) represents a computational bottleneck.

5. Selection models with intermediate variables

We may instead choose Θsel such as to reduce this complexity, at the cost of a less precise selection model. We consider variables (i.e. selection model features) such as cluster mass, redshift, luminosity, size, central emissivity, etc. We also consider exposure time, background level, galactic hydrogen column density. For some of these variables (e.g. flux, luminosity) it is more convenient to manipulate their logarithmic values.

Having fixed the set of selection variables Θsel, we build a model assuming that the detection probability varies slowly over the range of interest of these (rescaled) features. However, we do not make prior assumption on the exact shape, nor on monotonicity of the function. A kernel-based model is well-suited to describe this problem, as it reflects the fact that similarly looking clusters should have similar detection probabilities. We use Gaussian process classification (GPC) to build our model, using the SVGP implementation in the GPy package (GPy 2012). Details on the procedure and algorithm are discussed in Appendix D. The very nature of GPC allows to issue not only an estimate of p(I|Θsel), but also a range of statistical uncertainty. Our implementation does not provide extrapolation properties and a model returns a default value (p = 0.5) when Θsel is outside the domain where the training set lives. For the purpose of cosmological analysis it is not deemed an issue, as long as care is taken in handling the integration bounds in Eq. (2).

In the following we highlight three models relevant to this paper. In particular we demonstrate the interest of a model based on count-rate CR, which was selected for the eRASS1 cosmological analysis (Ghirardini et al. 2024). In Appendix E we describe several other models, notably those including cluster morphology in their input parameters.

5.1. Mass and redshift

Setting Θsel = {M500c, z} allows to gain insights on the selection process in the fundamental parameter space of cluster cosmology. Figure 7 is a visualisation of a model adjusted to predict the selection of a cluster with extent likelihood above 3. The probability of selecting a cluster located at z = 0.3 monotonically increases with mass, with a transition taking place at M500c ≃ 5 × 1014M.

thumbnail Fig. 7.

Representation of the model p(Imain|M500c, z) predicting a cluster to be detected and selected with an extent likelihood above three. The explanatory variables (features) are labels attached to simulated clusters in the twin simulation, standing for galaxy cluster M500c mass and cosmological redshift. The bottom-left panel represents contours of equal detection probability, labelled in steps of 0.1. Both one-dimensional curves (top-left and bottom-right panels) are slices through the function displayed in the bottom-left corner, at fixed z = 0.3 and M500c = 3 × 1014M (indicated with dotted lines).

Despite its formal interest, this model is not of practical use in context of cosmology studies. The two features entering this model (mass and redshift) are mere labels attached to clusters in the twin simulation. The model is therefore valid only under assumption of physical models implemented in the simulation. In particular, it is only valid for the set of scaling relations and cosmological parameters (Planck Collaboration XIII 2016) imprinted in the simulation.

5.2. Luminosity and redshift

Setting Θsel = {LX, z} advantageously removes the strong dependence of the model upon the M500 → LX relation that is imprinted in the twin simulation. It is therefore up to the cosmologist to model the relation p(LX|Θmodel) with (most likely) parameterised scaling laws. A visual representation of the model is shown in Fig. 8. As expected, clusters brighter and closer to us are more likely to be selected.

thumbnail Fig. 8.

Representation of the model p(Imain|LX, z) predicting a cluster to be detected and selected with an extent likelihood above three. The explanatory variables (features) are labels attached to simulated clusters in the twin simulation, standing for galaxy cluster luminosity LX measured in the 0.5–2 keV energy band at the cluster rest-frame (units erg s−1, rescaled in log10) and for cosmological redshift z. Other details are similar as in Fig. 7.

Although useful, this model cannot account for variations of the detection probability as a function of sky position: galactic absorption, background and exposure time values all have an impact on cluster detection. Offering these degrees of freedom in the selection function allows one to detach from the assumptions imprinted in the twin simulation. We have trained a second model with those three features, using the GPC formalism. Figure 9 is a partial visualisation of the model output, all following quantities but exposure time Texp being fixed at z = 0.3, NH = 3 × 1020 cm−2, background brightness 5.2 × 10−15 erg s−1 cm −2 arcmin−2 in the 0.3–2.3 keV energy band, LX = 1043 or 1044 erg s−1. The dependence of the model output on the exposure time appears clearly on this figure and thus the model is more precise in delivering a selection probability for a given cluster.

thumbnail Fig. 9.

Representation of the models p(Imain|LX, z, NH, Texp, bkg) and p(Icosmo|LX, z, NH, Texp, bkg) for fixed values of cluster redshift z, two different values for the luminosity LX, and local galactic absorption NH, in addition to a nominal background level and a range of exposure times Texp (along the x-axis). The shaded regions indicate the approximate 68% confidence range output of the model.

5.3. Unabsorbed count rate and redshift

In the course of the development of the eROSITA cosmology pipeline, trade-off discussions led to setting Θsel as a five-component feature, namely: the cluster redshift z, sky position-dependent quantities ℋ≡ (NH, Texp and background) and the cluster CR. Here CR represents the unabsorbed, 0.2–2.3 keV survey-average count-rate collected within a R500 aperture and unaffected by shot noise. It derives from the formula:

C R = L X 4 π d L ( z ) 2 ECF ( z , k B T , N H = 0 ) ( units s 1 ) . $$ \begin{aligned} CR = \frac{L_{X}}{4 \pi d_L(z)^2 \mathrm{ECF}(z, k_B T, N_H=0)} \ \ \ (\mathrm{units\,s}^{-1}). \end{aligned} $$(7)

In this expression, ECF is the Energy Conversion Factor from the 0.5–2 keV band in the cluster rest-frame to the 0.2–2.3 keV band in the observer frame. It is computed for a APEC emission spectrum at temperature kBT, redshift z, unabsorbed (hence NH = 0) and folded through the survey-averaged instrumental response of the 7 telescopes. This quantity is readily available for each simulated cluster in the twin simulation, since we know its redshift and temperature. The luminosity-distance dL is computed using the same cosmology as for the twin simulation. Figure 10 represents a few slices through the multi-dimensional model p(Icosmo|CR, z, NH, Texp, bkg), fixing the value of NH and background level as previously, for two different redshift values and a range of count-rate (CR) values. The dependence on count-rate shows the usual S-shaped curve and larger exposure times imply higher sensitivities. The slight dependence on redshift is less intuitive: all other quantities being fixed, a source at higher redshift is more compact, thus its surface brightness is concentrated over a smaller area, making its detection comparatively easier.

thumbnail Fig. 10.

Representation of a model p(Icosmo|CR,z,NH,Texp,bkg) for two values of cluster redshift z = 0.2 and z = 0.5, fixed local galactic absorption NH, a nominal background level, two values of exposure times Texp (units s), and a range of count-rate values (x-axis). The shaded regions indicate the approximate 68% confidence range output of the model.

The selection model obtained this way is quasi-independent of the cosmology imprinted in the twin simulation. It is also capable of reflecting the variations of selection depth over the sky. The cluster model is relatively simple (one cluster is represented by one count-rate and one redshift), which makes the computation of p(CR,z|Θmodel) fast and easy in likelihood (Eq. (2)). However, this simplicity comes at a cost, since all variations of the selection due to cluster morphology, for example, are marginalised over, relying on the distribution of morphologies in our twin simulation.

5.4. Internal validation of the selection models

We now quantify the absolute performance of models relying on intermediate variables Θsel. By doing so we will also assess the relative performance between models involving various parameters as input. We use our test sample to make such tests and compare the predicted model outcomes to the actual detection flags in the sample. Similarly as in Sect. 4 we will highlight two performance tests: the precision-recall curve and the calibration curve.

Figure 11 compares five of our models predicting the presence of a cluster in the eRASS1-main sample. They differ from each other by their input parameters (features Θsel). The model involving only mass and redshift has quite a low precision and recall (average precision of 34%). Predicting the detectability of a cluster requires indeed more detailed characterisation. Changing to a model taking luminosity and redshift notably improves the overall performance, since luminosity is more tightly linked to photon counts than mass. However, it still has an average precision of 56%, unsatisfactory enough for detailed analyses. Including sky position-dependent parameters ℋ = (NH, Texp, bkg) dramatically boosts the classification performance (average precision of 67%). As expected, adding information on the local background, exposure time and foreground absorption increases the capability of the model. An additional performance gain is obtained by adding a morphological parameter, here the EM0 parameter (as presented in Appendix E). The average precision increases to 71%, closer to that obtained with the 7 × 5 surface brightness model depicted in Fig. 5. Finally, a model involving (absorbed) flux, apparent R500 radius and ℋ shows good performance with an average precision reaching 68%.

A similar assessment is provided in Fig. 12, now for models predicting the presence of a cluster in the eRASS1 cosmology sample. The baseline model used for the eRASS1 cosmological analysis relies on CR (count-rate) and redshift, as well as on sky local information. Its good performance (average precision 70%) is comparable to a model involving luminosity and redshift. Again, adding morphological information (here via EM0, see Appendix E) provides a noticeable enhancement of the model precision (average precision 74%).

thumbnail Fig. 11.

Precision-recall curves obtained with selection models trained to predict the selection of a cluster in the primary sample. The input parameters vary from one model to the other, as indicated in the legend. A model involving luminosity, redshift, morphological features and local sky information performs best (yellow dot-dashed curve).

thumbnail Fig. 12.

Similar to Fig. 11, but comparing three models predicting the presence of a cluster in the cosmology. The model represented with the black curve corresponds to the baseline selection model used in the analysis of Ghirardini et al. (2024).

We now assess the reliability of the probabilistic output of the models. In Fig. 13 we compare the same five models for the primary catalogue selection function. All curves lie very close to the one-to-one line: the probability outputs reflect well the actual fraction of positive detections in the test catalogue. The result for the cosmology sample classifiers appear in Fig. 14 and they all are very well aligned along the one-to-one line.

thumbnail Fig. 13.

Evaluating the reliability of five models predicting the probability of a cluster to appear in the eRASS1 primary cluster catalogue (x-axis). The y-axis represents the actual fraction of selected objects in the test sample. The bottom panel shows the residuals (difference) of each curve with respect to the one-to-one line. Shaded regions indicate 68% confidence intervals in each bin of probability.

thumbnail Fig. 14.

Similar to Fig. 13, but comparing three models predicting the presence of a cluster in the cosmology sample.

The tests depicted above illustrate good prediction performance of the models, in the sense that probability outputs correctly reflect the actual selection probabilities. This performance indicator is valid for the ensemble of clusters in the simulated eRASS1 sky. In the course of our model production efforts, we have monitored several other indicators in order to assess the performance and reliability of the models. Among them were: the Receiver-Operator Curve (ROC) comparing recall and fall-out rates FP/(FP + TN); the Detection-Error Tradeoff Curve (DET) comparing fall-out and miss rates FN/(FN + TP) and the correlation coefficient suggested by Matthews (1975). All together, these performance indicators helped in selecting the best models to be used in the cosmological analysis.

6. External validation of the selection function

We have described several models and we validated their performance on a simulated test set. This test set is produced in a very similar manner as the training sample, splitting the original twin simulation set in two parts. Additional tests are required using new samples independent from the twin simulation. We aim to test eRASS1 selection models with (i) a sample of clusters selected from a much deeper dataset (eFEDS) and (ii) two SZ-selected sample selected from South Pole Telescope surveys of different depths: SPTpol-100d and SPT-Early Cluster Sample (SPT-ECS). Selection biases affecting these samples must be taken into account; moreover we have to link their measured properties to eRASS1 selection variables. We present our formalism in the most generic manner, then we apply it to the specific cases of eFEDS and SPT-SZ.

6.1. General presentation of the formalism

Given two surveys, R taken as a reference and the other T to be tested, we wish to validate our selection function models for both R and T. We do so by comparing the populations selected by each of the survey. In the optimal case where R is much more complete (e.g. deeper) than T, the problem does not require precise knowledge of the selection process leading to catalogue R. In general both catalogues comprise a biased selection of the true underlying population of clusters, the latter we take as prior in the following analysis.

Extending the formalism laid out in previous work (e.g. Grandis et al. 2020) we calculate for each object i in the reference catalogue an associated probability:

P i = p ( I T i | I R i , Θ meas i ) . $$ \begin{aligned} P_i = p\left(I_{T}^{i}|I_{{R}}^i, {\boldsymbol{\Theta }}_{\mathrm{meas}}^{i}\right). \end{aligned} $$(8)

There, IT and IR are random boolean variables indicating whether an object is listed in catalogue T and R, respectively. Θmeas is a collection of measured properties attached to the cluster: its measured flux, redshift, luminosity, extent, etc. are likely candidates. Using Bayes’ rule, Pi writes as the ratio of two joint probability distributions, the denominator being interpreted as the number of objects expected in the reference catalogue with measured properties Θ meas i $ {\boldsymbol{\Theta}}_{\mathrm{meas}}^{i} $; the numerator as the number of objects expected in both the reference and test catalogues with these same measured properties. In this perspective, Pi is a model for the fraction of reference objects listed with some measured properties Θ meas i $ {\boldsymbol{\Theta}}_{\mathrm{meas}}^{i} $ that are also present in the test sample. Introducing Θmodel and Θsel enables hierarchical Bayesian expansion of the joint probabilities into elementary factors. The numerator specifically involves the conditional probability distribution:

p ( I T i , I R i , Θ meas i | Θ model ) = d Θ sel , R d Θ sel , T d α × p ( Θ sel , T , Θ sel , R , α | Θ model ) × p ( I R i | Θ sel , R ) p ( I T i | Θ sel , T ) p ( Θ meas i | α ) , $$ \begin{aligned} p\left(I_{{T}}^i, I_{{R}}^i, {\boldsymbol{\Theta }}_{\rm meas}^{i}|{\boldsymbol{\Theta }_{\rm model}}\right)&= \int {\mathrm{d}} {\boldsymbol{\Theta }_{\mathrm{sel}, R}} {\mathrm{d}} {\boldsymbol{\Theta }_{\mathrm{sel}, T}} {\mathrm{d}} \boldsymbol{\alpha } \\&\times p\left({\boldsymbol{\Theta }_{\mathrm{sel}, T}}, {\boldsymbol{\Theta }_{\mathrm{sel}, R}}, \boldsymbol{\alpha }|{\boldsymbol{\Theta }_{\rm model}}\right) \\&\times p\left(I_{{R}}^i|{\boldsymbol{\Theta }_{\mathrm{sel}, R}}\right) p\left(I_{{T}}^i|{\boldsymbol{\Theta }_{\mathrm{sel}, T}}\right) p\left({\boldsymbol{\Theta }}_{\rm meas}^{i}|\boldsymbol{\alpha }\right), \end{aligned} $$

and similarly for the denominator:

p ( I R i , Θ meas i | Θ model ) = d Θ sel , R d α p ( Θ sel , R , α | Θ model ) × p ( I R i | Θ sel , R ) p ( Θ meas i | α ) . $$ \begin{aligned} p\left(I_{{R}}^i, {\boldsymbol{\Theta }}_{\rm meas}^{i}|{\boldsymbol{\Theta }_{\rm model}}\right)&= \int {\mathrm{d}} {\boldsymbol{\Theta }_{\mathrm{sel}, R}} {\mathrm{d}} \boldsymbol{\alpha } p\left({\boldsymbol{\Theta }_{\mathrm{sel}, R}}, \boldsymbol{\alpha }|{\boldsymbol{\Theta }_{\rm model}}\right) \\&\times p\left(I_{{R}}^i|{\boldsymbol{\Theta }_{\mathrm{sel}, R}}\right) p\left({\boldsymbol{\Theta }}_{\rm meas}^{i}|\boldsymbol{\alpha }\right). \end{aligned} $$

The last two expressions are valid as long as selection in the test survey only depends on variable Θsel, T and selection in the reference survey only depends on variable Θsel, R. We have introduced a new, intermediate variable α to reflect the dependence of the measurement vector. For instance, if the measurement vector is X-ray flux, α may represent the cluster luminosity and redshift. The interest of this decomposition lies in it involving the selection functions p(I|Θsel), which are tabulated for each survey. It also accounts for covariances in the selection and measurement processes, through the multivariate distribution of (Θsel, T, Θsel, R, α) given the values of Θmodel.

The probability Pi (Eq. (8)) is the main outcome of this model. It is calculated independently for each object listed in the reference catalogue. By comparing this probability to the actual presence of a match in the test catalogue, we obtain powerful diagnostics on the validity of the models, and the presence of outliers in the population. In particular, we will denote by “surprising detections” those objects listed in R and detected in the test catalogue despite their low forecast probability (Pi < 0.025 and I T i =1 $ I_{{T}}^i=1 $). We will denote “missed objects” those listed in R and not detected in the test catalogue despite their high forecast probability (Pi > 0.925 and I T i =0 $ I_{{T}}^i=0 $).

Our procedure to find matches between the reference and the test catalogues relies on simple positional match. For efficiency reason we trim the reference catalogue to the approximate sky footprint of the test sample and conversely. We then search for symmetric 2-arcmin matches between the samples, keeping only the best match, that is the closest distance. Such a procedure does not require redshifts to be identical in both catalogues. Our choice of the maximal separation corresponds to the most sensible cross-matching radius for surveys and instruments considered in this study (Bulbul et al. 2022).

Beyond the interpretation of individual Pi values, we also base our evaluation on three aggregated diagnostics. First, from repeated Bernoulli samples of Pi values, we obtain a distribution for the expected number of matching entries between both catalogues, and its associated uncertainty. Comparing the actual number of matches to this distribution provides a bulk validation test of the model. The second diagnostic consists in grouping clusters by their Pi values in bins of finite width, and in computing the actual fraction of matches in each bin. This diagnostics informs on the reliability of the Pi values, hence on the underlying model. Our third diagnostic empirically tests for leakage in our model, by introducing two numbers representing the fraction of objects that should be in the test catalogue, but they are not (δ1) and the fraction of objects that should not be in the test catalogue, but they are present (δ2). We compute posterior distributions for δ1 and δ2 by sampling the following likelihood (product of independent Bernoulli likelihoods):

L ( δ 1 , δ 2 ) = i R [ P i ( 1 δ 1 ) + ( 1 P i ) δ 2 ] I T i × [ P i δ 1 + ( 1 P i ) ( 1 δ 2 ) ] 1 I T i . $$ \begin{aligned} \mathcal{L} (\delta _1, \delta _2) &= \prod _{i\, \in \, R} \left[ P_i (1-\delta _1) + (1-P_i) \delta _2 \right]^{I_{{T}}^i}\nonumber \\&\times \left[ P_i \delta _1 + (1-P_i)(1-\delta _2) \right]^{1-I_{{T}}^i}. \end{aligned} $$(9)

The product runs over all clusters in the reference catalogue, and I T i $ I_{{T}}^i $ equals 1 if the cluster is matched in the test catalogue, it equals 0 otherwise. If the posterior distribution for δ1 significantly excludes zero, it indicates that part of the clusters predicted to be in the test catalogue actually escape our model and are not detected for some reason. Similarly, a distribution of δ2 significantly away from zero hints towards our model not capturing properly the number of undetected objects. In this computation we assume flat prior distribution for δ1 and δ2 bounded in the [0, 1] interval. We sample the posterior with a Monte-Carlo Markov Chain algorithm provided in the emcee package (Foreman-Mackey et al. 2013).

Using the formalism depicted in this section, we have run several study cases:

  1. eFEDS as reference, eRASS1-cosmo as test (Sect. 6.2);

  2. eRASS1-cosmo as reference, eFEDS as test (Sect. 6.3);

  3. eFEDS as reference, eRASS1-primary as test (Sect. 6.4);

  4. SPTpol-100d as reference, eRASS1-cosmo as test (Sect. 6.5);

  5. eRASS1-cosmo as reference, SPTpol-100d as test (Sect. 6.6).

  6. SPTpol-ECS as reference, eRASS1-cosmo as test (Sect. 6.7).

  7. eRASS1-cosmo as reference, SPTpol-ECS as test (Sect. 6.8).

Table 2 provides a summary of the setup for each test, along with the number of matched entries between the catalogues.

Table 2.

Setup summary of the external validation tests.

6.2. Testing the eRASS1 cosmological cluster sample against eFEDS

The deep eFEDS survey consists of a 140 deg2 area scanned by eROSITA during the CalPV phase with exposure about 10 times that of the average eRASS1 survey. Specifically, the average net exposure of eFEDS is about 1200 s, while in eRASS1 it amounts to a median 80 s in the same field (in both cases accounting for vignetting).

We opt for a population model described by Θmodel = (M500, z) representing the (true) mass within R500c and the (true) redshift of halos. The prior distribution p(Θmodel) is the halo mass function dn/dM500dz. Its exact shape in this specific experiment is not of critical relevance, given the wide sensitivity difference between eFEDS and eRASS1. We assume it follows the fit of Tinker et al. (2008).

The eFEDS cluster sample (Liu et al. 2022) comprises 542 candidate galaxy groups and clusters, among them 531 are in the eRASS1 catalogue footprint. Each entry is associated to a measured redshift zeFEDS (column z in the catalogue) and to an estimated 0.5–2 keV flux within a 300 kpc aperture (column F_300kpc, expressed in units erg s−1 cm−2). These two elements constitute our vector of measured features Θmeas. We construct a generative model for flux and redshift that depends on α = (LX, z, M500), where LX is a cluster (true) 0.5–2 keV luminosity integrated in a cylinder of radius R500. Our generative model further assumes independence of flux and redshift measurements, that is:

p ( Θ meas | α ) = p ( z eFEDS | z ) × p ( ln F _ 300 kpc | L X , z , M 500 ) . $$ \begin{aligned} p\left({\boldsymbol{\Theta }_{\rm meas}}|\boldsymbol{\alpha }\right) = p\left({z_{\rm eFEDS}}|z\right) \times p\left(\ln \mathtt{F\_300kpc }|L_X, z, M_{500}\right). \end{aligned} $$(10)

We take redshift measurements to be unbiased, with uniform Gaussian scatter at fixed z, that is supposed equal to 0.01(1 + z):

p ( z eFEDS | z ) N ( z ; 0.01 ( 1 + z ) ) . $$ \begin{aligned} p\left({z_{\rm eFEDS}}|z\right) \sim {\mathcal{N} \left({z};{0.01 (1+z)}\right)}. \end{aligned} $$(11)

We suppose a log-normally distributed flux F_300kpc given the values of α, that is:

p ( ln F _ 300 kpc | L X , z , M 500 ) N ( ln μ F _ 300 kpc ; σ ln F _ 300 kpc ) . $$ \begin{aligned} p\left(\ln \mathtt{F\_300kpc }|L_X, z, M_{500}\right) \sim {\mathcal{N} \left({\ln \mu _{\mathtt{F\_300kpc }}};{\sigma _{\ln \mathtt{F\_300kpc }}}\right)}. \end{aligned} $$(12)

In this expression, both μF_300kpc and σlnF_300kpc depend on (LX, z, M500) as follows. We first convert any set of values (LX, z) into a 0.5–2 keV flux within R500 by assuming an APEC plasma model with temperature set by a standard luminosity–temperature relation (Bulbul et al. 2019). Foreground absorption is taken into account using a position-dependent value of the hydrogen column density (HI4PI Collaboration 2016). Aperture correction (from R500 to 300 kpc) requires knowledge of an emissivity profile for the ICM. We take an isothermal, isometallicity gas density profile obtained from local X-COP clusters (Ghirardini et al. 2019). The aperture correction thus depends solely on the value of R500 (itself deriving from M500 and z). We assume unbiased flux measurements and set μF_300kpc to the value of this aperture-corrected flux. We build a simple model for measurement errors by setting a power-law model, with an extra term ϵi:

σ ln F _ 300 kpc = 0.14 ( μ F _ 300 kpc 10 13 erg s 1 cm 2 ) 0.47 + ϵ i . $$ \begin{aligned} \sigma _{\ln \mathtt{F\_300kpc }} = 0.14 \left(\frac{\mu _{\mathtt{F\_300kpc }}}{10^{-13}\,\mathrm{erg\,s}^{-1}\,\mathrm{cm}^{-2}}\right)^{-0.47} + \epsilon ^i. \end{aligned} $$(13)

Numerical coefficients in this expression were obtained by fitting a power-law model to the flux uncertainties reported in Liu et al. (2022). In order to account for cluster-to-cluster variability of the error, the extra term ϵi is specific to each cluster (labelled by i). As an assumption, ϵi is the deviation of the reported error to the value predicted by the empirical power-law model. The eFEDS cluster catalogue contains some clusters for which only an upper limit is reported on the measured 300 kpc flux. For these objects, we still assume a log-normal model. Its central value μF_300kpc is taken to be half of the upper limit value; the dispersion is such that ϵi is the deviation of the reported upper limit to the fixed value of 2.5 × 10−14 erg s−1 cm−2 (median of all upper limits in the catalogue). Although imperfect, the generative model depicted here should capture the main trends underlying the flux measurement process. It is aimed at predicting what would be the reported flux F_300kpc and redshift zeFEDS of an eFEDS-detected cluster, for any set of (true) values α = LX, z, M500.

The reference survey selection function p( I R i | Θ sel,R ) $ {p\left({I_{{R}}^i}|{{\boldsymbol{\Theta}_{{\rm sel}, R}}}\right)} $ is the eFEDS selection function p(IeFEDS|LX,z,EM0,Texp). It is constructed as described in Liu et al. (2022), with addition of the morphological parameter EM0 characterising the central emissivity of the clusters (Appendix E). The exposure time Texp varies from one cluster to the other and its value is read from the eFEDS exposure map (Brunner et al. 2022).

The eRASS1 selection p(Icosmo|Θsel, T) is the main model ingredient we wish to validate in this work. We present results for the model used in the eRASS1 cosmological analysis (Ghirardini et al. 2024, see also Fig. 10), involving count-rate CR, redshift z as well as position-dependent, cluster-independent, selection features ℋi. These features are galactic absorption column density, local background brightness and exposure time and their values are read from static sky maps at each cluster position. The eRASS1 cosmology sample is restricted to clusters with eRASS1-measured redshift 0.1 < zλ < 0.8. We assume Gaussian-distributed errors for the eRASS1 redshifts with standard deviation σz/(1 + z) = 0.015, see also Eq. (14). The criterion IN_ZVLIM = True, recommended by Kluge et al. (2024) is present in our model through the implementation of a sky mask.

The last ingredient required in the model is the distribution p(Θsel, T,Θsel, R,α|Θmodel). Given the assumptions enumerated in this section, it reduces to p(LX|M500,z). We take the scaling relation of Bulbul et al. (2019) relating the core-included 0.5–2 keV luminosity within R500 to the M500 mass and redshift z. It follows a log-normal distribution with scatter 0.27 in the luminosity direction. In practice, this operation implies computing a probabilistic weight on a three-dimensional grid representing the values of mass, redshift and luminosity.

Putting together the model ingredients, we are able to predict for each cluster in eFEDS, associated to a given flux and redshift measurement, what would be its probability to appear in the eRASS1 cosmological catalogue of clusters. By performing a 2 arcmin match between the eFEDS and eRASS1 catalogue, we find 35 actual clusters in common between both samples (see also Bulbul et al. 2024), and 90% of the matches are separated by less than 30″.

Figure 15 summarises the outcome of this exercise. As expected, low-flux clusters are associated to a low Pi value. Reassuringly, none of the “bright” (rigorously, the high-Pi) eFEDS clusters is missed in the eRASS1 cosmology sample. The model predicts 19.5 ± 3.2 eFEDS clusters to be in the eRASS1 cosmology sample, while the actual number is 35 (that is, a 5-σ underestimate). We find the empirical parameters δ1 and δ2 have posterior distributions compatible with zero. However, the Pi values appear systematically lower than the actual fraction of matches in each bin of Pi values (Fig. 16). Overall these results point to a slight mismatch between the predicted detection rates and the actual number of matches. We will discuss this issue later in the paper (Sect. 7.2).

thumbnail Fig. 15.

Testing the models with eRASS1-cosmo as a test catalogue, and eFEDS as a reference catalogue. Each dot corresponds to one of the 531 eFEDS clusters in the plane of measured 300 kpc flux (F_300kpc in units erg s−1 cm−2) and measured redshift (zeFEDS). eFEDS clusters with only flux upper limits are placed at the bottom with star symbols. Black circles indicate clusters also found in the eRASS1 cosmology sample. Colour reflects the model probability Pi (Eq. (8)) computed for each eFEDS cluster.

thumbnail Fig. 16.

Testing the models with eRASS1-cosmo as a test catalogue, and eFEDS as a reference catalogue. The 531 eFEDS entries are grouped in bins of the model output probabilities Pi (horizontal axis). In each bin, the actual fraction of matches in the eRASS1 catalogue is reported on the vertical axis. Error bars represent the estimated 68% confidence range (Appendix C). The dotted line indicates the one-to-one relation, that would follow a perfectly reliable model. In general, the model predicts slightly lower probabilities than actually observed.

6.3. Testing the eFEDS cluster sample against eRASS1 cosmology sample

By exchanging the role of eFEDS and eRASS1 in the test presented above, we can use the eRASS1 cosmology catalogue as a reference sample and test the eFEDS selection model. The interest of this operation is limited to sanity checks, because eRASS1 is 15 times shallower than eFEDS in their overlapping area. We consider only the 85 eRASS1 clusters extracted from the cosmology sample located in the fiducial eFEDS area bounded in Right Ascension by 8 h and 10 h and in Declination by −5 and 8deg.

To describe each of the eRASS1 detections, we constructed a model for the flux and redshift measurements of each entry. We assume independence of the estimated redshift zλ (column Z_LAMBDA in the eRASS1 cosmology catalogue) and the estimated R500 flux in the band 0.5–2 keV (F500_0520, units erg s−1 cm−2) and we form the model:

p ( z λ | z ) N ( z ; 0.015 ( 1 + z ) ) . $$ \begin{aligned} p\left({z_{\lambda }}|z\right) \sim {\mathcal{N} \left({z};{0.015 (1+z)}\right)}. \end{aligned} $$(14)

This model expresses normally distributed redshift uncertainties with the error increasing with redshift. For flux measurements, we assume a log-normal distribution:

p ( ln F 500 _ 0520 | L X , z ) N ( ln μ F 500 _ 0520 ; σ ln F 500 _ 0520 ) . $$ \begin{aligned} p\left(\ln \mathtt{F500\_0520 }|L_X, z\right) \sim {\mathcal{N} \left({\ln \mu _{\mathtt{F500\_0520 }}};{\sigma _{\ln \mathtt{F500\_0520 }}}\right)}. \end{aligned} $$(15)

In this equation, μF500_0520 and σF500_0520 depend on LX the true luminosity within R500 and on the true redshift z. Assuming unbiased flux measurements, μF500_0520 is obtained by converting the cluster rest-frame 0.5–2 keV luminosity into the observer-frame 0.5–2 keV flux, assuming an isothermal ICM and a position-dependent hydrogen column density taken from the HI4PI survey (HI4PI Collaboration 2016). As for the uncertainty model, we proceed by fitting a simple model linking the catalogue flux and errors reported in Bulbul et al. (2024). We take:

σ ln F 500 _ 0520 = 0.48 ( μ F 500 _ 0520 10 13 erg s 1 cm 2 ) 0.51 + ϵ i . $$ \begin{aligned} \sigma _{\ln \mathtt{F500\_0520 }} = 0.48 \left(\frac{\mu _{\mathtt{F500\_0520 }}}{10^{-13}\,\mathrm{erg\,s}^{-1}\,\mathrm{cm}^{-2}}\right)^{-0.51} + \epsilon ^i. \end{aligned} $$(16)

The quantity ϵi is specific to each cluster in the eRASS1 sample and equals the deviation of the catalogue uncertainty relative to the power-law model.

Figure 17 illustrates the outcome of this comparison exercise. The model predicts 35.7 ± 1.3 eRASS1 clusters to be found in eFEDS. The actual number of matches is 362. A number of eRASS1 clusters fall in the non-exposed eFEDS area, hence their value of Pi = 0. Most of the clusters covered by eFEDS have very high probabilities of being detected (Pi ≃ 1), as a consequence of its much deeper exposure. Reassuringly the eFEDS survey does not miss any eRASS1 cluster, highlighting in turn the low level of spurious contamination in the eRASS1 cosmology sample. In particular, it is not considered problematic that both clusters 1eRASS J094023.3+022824 and 1eRASS J084147.8−031154 have no corresponding match in eFEDS despite their respective predicted values Pi = 0.79 and 0.55. At their location on sky, the eFEDS survey is very shallow (with 140 s and 10 s effective exposure time, respectively) and the eFEDS selection model is not well calibrated in this regime.

thumbnail Fig. 17.

Similar to Fig. 15, but now taking eRASS1-cosmo as the reference sample (85 entries) and eFEDS as the test sample. The vertical axis represents F500_0520, the 0.5–2 keV flux within R500 reported in the eRASS1 catalogue (units 10−14 erg s−1 cm−2). The horizontal axis is the catalogue redshift zλ. Clusters falling in zones of zero eFEDS exposure have vanishing probability (Pi = 0). Other eRASS1 clusters are very likely to be detected in eFEDS (Pi ≃ 1); in fact they are, as shown by the black circles. The isolated point at the bottom of the figure only has a flux upper limit 1.5 × 10−13 erg s−1 cm−2.

The model correctly explains the outcome of matching eFEDS to the eRASS1 cosmology catalogue. There is no evidence for a bias in the value of the probabilities Pi; most of them are close to zero or one, in agreement with the cluster being absent or present in the eFEDS catalogue (respectively). The values of δ1 and δ2 are themselves compatible with zero, indicating again consistency between the model and the present catalogues being tested.

6.4. Testing the eRASS1-main sample against eFEDS

We now turn to a test using the extended version of the eRASS1 catalogue, comprising all X-ray clusters with ℒext > 3 (Bulbul et al. 2024). In contrast to the cosmology sample this catalogue does not apply any redshift subselection. The model we construct is very similar to that presented in Sect. 6.2. In particular the distributions of eFEDS flux and redshift measurements are identical to that depicted in Eqs. (10)–(13). The main modification consists in using a model selection function p( I T i = I main i | Θ sel,T ) $ {p\left({I_{{T}}^i = I_{{\rm main}}^i}|{{\boldsymbol{\Theta}_{{\rm sel}, T}}}\right)} $. We take as selection variables the count-rate CR, true redshift z and sky position-dependent parameters ℋ.

Our model predicts 30.5 ± 4.2 matches, while the actual number is 61 (hence a larger than 7-σ difference). We show one diagnostic plot on Fig. 18, comparing the agreement between the Pi values output of the model and the actual fraction of matches in the eRASS1 catalogue. This validation experiment using the eRASS1-main sample highlights a higher discrepancy than that obtained in the experiment described in Sect. 6.2, using the eRASS1 cosmology catalogue. The model predicts less clusters to be found in eRASS1 than actually observed, due to underestimated Pi values. This is confirmed with the posterior distribution of (δ1, δ2) shown in Fig. 19, significantly away from the zero-point. Seven objects are qualified as “surprises” (Pi < 0.025 and matched). According to our model, they are too faint to appear in the eRASS1 sample. For two cases (1eRASS J093024.6−020635 and 1eRASS J084558.5+031340), deeper eFEDS data reveals point-sources near the cluster emission, that were correctly excised to provide a eFEDS flux measurement, however in the eRASS1 data they are undistinguishable from the cluster. We do not find any cluster missed by eRASS1 according to this model. Overall, this result indicates our selection function for the primary cluster sample is less well understood than for the cosmology sample. Alternatively, our generative model for eFEDS flux F_300kpc may fail to exactly reproduce the measurement process described in the eFEDS cluster catalogue paper (Liu et al. 2022). We will return to this issue in Sect. 7.

thumbnail Fig. 18.

Similar to Fig. 16, but using the primary sample of clusters eRASS1-main as a test sample, instead of the eRASS1 cosmology sample.

thumbnail Fig. 19.

Posterior distribution for the pair of parameters (δ1 and δ2) introduced in Eq. (9), for the experiment using eRASS1-main as a test and eFEDS as a reference sample. The histograms represent the marginalised posterior distributions, with mean and 68% range overprinted. The contours represent the equivalent 0.5-, 1-, 1.5- and 2-σ distribution of the joint posterior. Clearly the parameter δ2 departs from zero, indicating that a fraction of the systems we predict not be detected leak into the actual eRASS1 sample.

6.5. Testing the eRASS1 cosmological cluster sample against SPTpol-100d

Huang et al. (2020) present a catalogue of 89 galaxy clusters discovered via the Sunyaev–Zeldovich (SZ) effect in the SPTpol-100d field, located at RA = 23h30m and Dec = −55deg. In the area bounded by 23h < RA < 0h and −60 deg < Dec < −55 deg, there are 65 SPTpol clusters at z > 0.25. This lower redshift limit is chosen to account for growing incompleteness of the SPT catalogue in the low-z regime. In all that follows we will force the SPTpol-100d selection function to take value zero at redshifts below z = 0.25. At higher redshifts, the selection function model takes an analytic form involving the SZ signal-to-noise parameter ξ and the unbiased significance ζSPTpol (see Appendix F). The measurement vector Θmeas is taken from the SPTpol-100d catalogue, namely the xi and redshift columns, standing respectively for ξ and the measured redshift zSPT. We again assume approximate independence of both measurements. We assume unbiased redshift measurements with Gaussian uncertainties 0.02(1 + z). We use again Eq. (F.2) to generate the distribution for ξ, now seen as a measurement feature.

Involving the selection model for the eRASS1 cosmology sample (Sect. 6.2) requires incorporation of the 0.5–2 keV luminosity LX as a latent variable. Consequently we need to construct a joint model for the distribution of luminosities and ζSPTpol. We combine the LX − M500 − z scaling relation mentioned previously (Bulbul et al. 2019) and the ζ − M500 − z relation3 of Huang et al. (2020) into a bivariate log-normal distribution with covariance matrix:

Cov ( ln L X , ln ζ SPTpol ) = ( σ ln L X 2 ρ σ ln L X σ ln ζ ρ σ ln L X σ ln ζ σ ln ζ 2 ) . $$ \begin{aligned} \mathrm{Cov}\left(\ln L_X, \ln {\zeta _{\rm SPTpol}} \right) = \begin{pmatrix} \sigma _{\ln L_X}^2&\rho \sigma _{\ln L_X} \sigma _{\ln \zeta } \\ \rho \sigma _{\ln L_X} \sigma _{\ln \zeta }&\sigma _{\ln \zeta }^2 \end{pmatrix}. \end{aligned} $$

Here, σlnLX and σlnζ are the log-normal scatters of both distributions at fixed mass M500 and redshift z (with values respectively 0.27 and 0.18). The value of ρ ranges between −1 and 1, it reflects correlated scatter between the SZ signal-to-noise and X-ray luminosity at fixed mass and redshift. We take a fiducial value ρ = 0.2, we will investigate the impact of varying this parameter in Sect. 7.

The model predicts 11.4 ± 2.4 matches, while the actual number of matches is 8. Figure 20 shows the distribution of values Pi output of the model, in the plane of the measured SPT mass and redshift. A small number of SPTpol-100d clusters have their Pi high enough to be likely detectable in eRASS1. In fact, the matched instances are those with high Pi values in general.

thumbnail Fig. 20.

Similar to Fig. 15, but using the cosmology sample of eRASS1 clusters as a test sample, and the SPTpol-100d catalogue as a reference sample (65 objects at z > 0.25, coloured dots). The vertical axis represents the column xi in the SPTpol catalogue, standing for the measured cluster signal-to-noise (unitless). The horizontal axis represents the measured redshift in the SPT catalogue. The model predicts only a handful of eRASS1 detections among those objects, consistently with the observed number of 8 matches.

Figure 21 demonstrates the fair agreement between the 65 Pi values outcome of the model and the actual fraction of matches (in bins of Pi values). From these diagnostics we conclude there is no evidence for inconsistencies either in the catalogues nor in the model. In particular, the selection function models account correctly for the observed matches between SPTpol-100d and eRASS1-cosmo.

thumbnail Fig. 21.

Similar to Fig. 16, but using the cosmology sample of eRASS1 clusters as a test sample, and the SPTpol-100d cluster sample as a reference. Within the large uncertainties, there is good agreement between the predicted eRASS1 detection probability and the actual fraction of matched systems.

6.6. Testing the SPTpol-100d cluster sample against eRASS1-cosmo

The previous test is reverted by exchanging the roles of the SPTpol-100d catalogue with that of the eRASS1 cosmology sample. We thus test the SPTpol catalogue, taking eRASS1 as reference. There are 19 eRASS1 clusters in the common footprint between both surveys. Construction of the model involves both selection functions and the same joint probability distribution p(LX,ζSPTpol|M500,z) as described in Sect. 6.5. We also use a similar generative model as in Sect. 6.3 for the flux F500_0520 and redshift zλ listed in the eRASS1 catalogue. Combining together the 19 pi values returned by our model, we predict 7.0 ± 1.3 matches, while the actual number is 8, indicating consistency between the results and the predictions. We have also checked the adequacy between the values of Pi predicted by the model and the actual fraction of matches in each bin. Despite its statistical limitations, this experiment demonstrates good behaviour of the model in the high-redshift, intermediate-mass range.

Two cases deserve further examination. The eRASS1 cluster 1eRASS J230029.2−510022 is measured at zλ = 0.37 with F500_0520 = 3.4 × 10−13 erg s−1 cm−2 and should be massive enough to appear in the SPTpol survey. In fact our models predicts Pi = 0.90. It is actually missed by SPTpol, likely due to its location close to the very edge of the surveyed field (at 23h Right Ascension). Similarly, the eRASS1 cluster 1eRASS J231306.7−550417 measured with zλ = 0.31 with F500_0520 = 3.4 × 10−13 erg s−1 cm−2 is not detected by SPTpol despite the model predicted Pi = 0.72. This is due to a sky mask covering a point source brighter than 6 mJy and located at a position coinciding with the galaxy cluster (Bleem, L., priv. comm.).

6.7. Testing the eRASS1 cosmological cluster sample against SPTpol-ECS

The model described for the SPTpol-100d catalogue is straightforwardly adapted to the SPTpol-ECS catalogue presented in Bleem et al. (2020). The ECS sample is collected over 2770 deg2 area in two regions of the Southern Hemisphere. A total of 163 clusters with significance ξ > 5 and located at redshift z > 0.25 fall within the footprint of the eRASS1 cosmology sample. It is shallower than the SPTpol-100d survey, hence we expect a majority of the clusters to be found in the eRASS1 sample, up to z = 0.8.

Our model follows exactly the same steps as in Sect. 6.5, except that the unbiased significance parameter ruling SZ detections is denoted ζECS. Its logarithm scales as a power-law with mass, as described in Eq. (4) of Bleem et al. (2020). In particular we multiply the normalisation ASZ of this scaling relation by a factor γfield specific to each sub-field of the SPT-ECS survey. An additional multiplicative factor γECS = 1.124 rescales all values of ASZ, thus reflecting calibration updates (Table 1 and Sect. 5 in Bleem et al. 2020). We keep a fixed value ρ = 0.2 for the correlated scatter between lnLX and lnζECS. There are 103 matching instances between both catalogues, based on a 2′ positional match. Our model predicts 103 ± 4 matches, in excellent agreement with the observed number. Figure 22 shows the distribution of the 163 entries in the (measured) significance and redshift plane. Most of the SPT-ECS clusters at z < 0.8 are detected by eRASS1 (black circles). There is no obvious missing cluster among the 163 SPT detections. However we identified one “surprising” detection, SPT-CLJ0333-3707 (black “X” in the figure, Pi ≃ 0). It is listed in the ECS catalogue with significance 5.1 and photometric redshift zSPT = 1.05 ± 0.04. The matching instance in eRASS1 is 1eRASS J033323.9−370744, listed at zλ = 0.52 ± 0.01. This cluster is one of the 3% ambiguous cases discussed in Kluge et al. (2024), where two unrelated clusters overlap along the same line of sight.

thumbnail Fig. 22.

Similar to Fig. 20, but using 163 clusters in SPTpol-ECS catalogue as a reference sample and the eRASS1 cosmology sample as a test.

Figure 23 demonstrates the reliability of the Pi values as predicted by the model. We conclude to satisfactory agreement between the model and the observed matches and non-matches in the two catalogues.

thumbnail Fig. 23.

Similar to Fig. 21, but obtained by using the SPT-ECS sample as a reference (163 entries) and using the cosmology sample of eRASS1 clusters as a test.

6.8. Testing the SPTpol-ECS sample against eRASS1 cosmology sample

We conclude this series of tests by reversing the experiment of Sect. 6.7. There are 958 eRASS1 clusters in the cosmology catalogue falling within the sky footprint of SPTpol-ECS survey. The model takes as measurement features F500_0520 and zλ and it assumes identical generative distributions as in Sect. 6.3.

Figure 24 shows the outcome of this calculation. Our model predicts 65.4 ± 5.8 matches, significantly less than the observed 101 matches observed in the common footprint. In particular we find three unexpected SPT-ECS detections (Pi < 0.025). Secondary components along the line of sight of these three clusters may be responsible for the discrepant mass estimates, either by boosting the SZ significance or by decreasing the measured X-ray flux. These cases are discussed in Appendix G.

thumbnail Fig. 24.

Similar to Fig. 17, but using 958 eRASS1 clusters from the cosmology sample as a reference, and the SPTpol-ECS catalogue as a test sample. Star symbols at the bottom of the figure indicate eRASS1 objects with no flux measurement, possibly with an upper flux limit.

Figure 25 assesses the reliability of the Pi values as returned by the model. The predicted probabilities are generally lower than the actual fraction of matches. Several causes for this shift may include: our fiducial LX − M500 − z relation (here taken from Bulbul et al. 2019) predicting too high values of X-ray luminosity at fixed halo mass; or an underestimate of eRASS1 flux measurements; or an overestimate of some SPT significance values ξ due for instance to Malmquist bias. As discussed in Sect. 7.3, progress in this direction requires implementation of a self-consistent ICM model propagated to the X-ray and SZ selection functions. We therefore defer detailed discussion of this apparent offset to future work, in particular through the use of deeper SPT-SZ cluster samples (e.g. SPT-3G).

thumbnail Fig. 25.

Similar to Fig. 16, but obtained by using the eRASS1 cosmology sample (958 entries) as a reference and using SPTpol-ECS as a test sample.

7. Discussion

7.1. Linking our models with ℒext and ℒdet

In processing real and simulated data, eSASS assigns several measured quantities to each detected source. Among the most relevant for understanding selection effects are detection (ℒdet) and extent (ℒext) likelihoods. Applying thresholds on these quantities leads to defining a new sample, characterised by a balance between contamination and completeness (e.g. Seppi et al. 2022). In the specific case of the eRASS1 cluster catalogue, a source is detected if it shows ℒdet greater than 5. The threshold on ℒext for selecting this source as a cluster is either 3 or 6, the latter being chosen for the cosmology sample.

Such parameters are absent from all models discussed in this paper, neither as an input nor as an output of their predictions. It may seem at odds with studies (e.g. Rix et al. 2021) suggesting to model selection function based on catalogue parameters cut only (here collectively designed under a generic name, ξX). In fact, let us acknowledge that:

p ( I | Θ model ) = p ( ξ X > ξ X min | Θ model ) = ξ X min + d ξ X p ( ξ X | Θ model ) . $$ \begin{aligned} p\left(I|{\boldsymbol{\Theta }_{\rm model}}\right) = p\left(\xi _X > \xi _X^\mathrm{min}|{\boldsymbol{\Theta }_{\rm model}}\right) = \int _{\xi _X^\mathrm{min}}^{+ \infty } {\mathrm{d}} \xi _X p\left(\xi _X|{\boldsymbol{\Theta }_{\rm model}}\right). \end{aligned} $$

Building a selection model then translates into constructing a distribution of ξX given a set of model parameters. This may involve auxiliary variables (such as observed flux, see e.g. Grandis et al. 2020) and a combination of empirical scaling laws between ξX and those variables, involving some parameterised form of scatter. An appealing method consists in letting all or a subset of these parameters free in the scientific analysis, together with the cosmological parameters (e.g. Vanderlinde et al. 2010). These scaling relation parameters would then be “learnt” directly from data themselves; they would self-adjust to catalogue parameters and contribute in increasing the global cosmology likelihood, via calibration through weak gravitational lensing or other mass proxies. In this approach, the exact shape of distributions (e.g. log-normal or Gaussian scatter) is a strong prior assumption. In absence of other information (e.g. external or simulated catalogues), we extrapolate our knowledge on brighter systems to fainter ones, potentially leading to interpretation errors close to the detection limit (Gallagher & Smeenk 2023). We may even imagine extreme situations in which these assumptions are not verifiable, for example if the unimodal scatter in scaling relations does not account for a separate class of faint, flat surface-brightness systems that escapes detection in our catalogue.

A safer alternative may consist in constructing a model ξX directly from ℒext and ℒdet as recorded in simulated datasets. There are also difficulties associated with this approach. Most notably, only detected (resp. selected) objects have an associated value of ℒdet (resp. ℒext). A replacement value needs to be assigned to those simulated systems missing a detection. One may also attempt to run the source detection algorithm with lower thresholds (detlikemin and extlikemin in eSASS) in order to assign a value to fainter objects; however this has a number of non-trivial side-effects from the algorithmic point of view, such as masking, source splitting, background estimates, etc. As a consequence, there is no guarantee that the samples selected with ℒdet > 5 in both runs (the one with detlikemin = 5 and the other with detlikemin < 5) are truly identical.

Interestingly, the logistic regression formalism presented in Sect. 4 uses a real-valued function f, whose expression is given in Eq. (5). In our analysis f is a latent variable, without predefined physical meaning. The selection probability monotonically increases with f, through the sigmoid function of Eq. (4). As a useful cross-check, we show the correlation between f and the value of ℒdet and ℒext for the logistic regression involving 35 features (Fig. 26). Reminding that only binary flags where exposed to the model in the training phase, it may be reassuring that the probability output is driven by a quantity behaving similarly as detection and extent likelihood.

thumbnail Fig. 26.

Relation between the latent variable f(η) = w0 + ∑jwjηj calculated for each simulated cluster in the test sample after performing logistic regression with 35 surface brightness features, and the values of ℒdet and ℒext as provided by eSASS for those matched to a detection. Each dot stands for a simulated cluster, density contours ease visualisation in crowded regions. Undetected clusters have no associated ℒdet value (red histogram in the lower panel). Clusters detected but not categorised as extended have zero value for ℒext (orange histogram). The apparent correlation between f and both measurements indicate that the model has “discovered” the importance of ℒdet and ℒext in the selection process.

We find a similar result with the mean of the latent variable f governing the predictions output of the Gaussian process classifier (see Appendix D). Figure 27 in particular shows the a posteriori correlation between ⟨f⟩ and the detection and extent likelihood obtained in a test sample of simulated clusters.

thumbnail Fig. 27.

Similar to Fig. 26, but replacing the x-axis by the mean of the latent variable implicit in the Gaussian process classifier p(Icosmo|CR,z,ℋ).

7.2. Potential underestimate of the eRASS1 selection model

Our external validation results in Sect. 6 suggest an inconsistency between the predicted number of eRASS1-detected objects among the eFEDS sample and the actual number of matching entries. The difference is about 5-σ when considering the eRASS1 cosmology sample and above 7-σ with the primary eRASS1 cluster sample. Our validation against the much brighter SPT-selected samples does not reveal such inconsistency. This suggests that our model somehow fails at describing the fainter population of eFEDS clusters.

7.2.1. Uncertainties in the generative model

We may first incriminate our simple generative model for eFEDS-measured fluxes. Our model for F_300kpc (Eqs. (12) and (13)) assumes unbiased flux measurements regardless of the brightness of the source. If the reported eFEDS fluxes (Liu et al. 2022) of faint systems were known to be biased low for some reason, our models would predict a higher number of matched entries in the eRASS1 sample and we would conclude to no discrepancy. On the one hand, eFEDS luminosity measurements appear to agree with XMM-Newton measurements (e.g. Turner et al. 2022), as also shown by the luminosity functions in Liu et al. (2022). On the other hand, we have found a 15% offset between eRASS-measured fluxes and Chandra measurements, the latter showing higher value, likely attributed to calibration issues (Bulbul et al. 2024).

We obtained very similar predictions, and thus similarly discrepant results, when modelling the measured R500 X-ray luminosity of eFEDS clusters (column Lx500 in the catalogue of Bahar et al. 2022) in lieu of the measured flux. This operation somewhat simplifies our generative model, and it suppresses any potential pitfall in deriving aperture corrections and spectrally dependent K-corrections. There we assumed unbiased, normally distributed luminosity measurements, with an empirical uncertainty model following:

σ Lx 500 = 0.2 × 10 43 erg s 1 ( L X 10 43 erg s 1 ) 0.9 + ϵ i . $$ \begin{aligned} \sigma _{\mathtt{Lx500 }} = 0.2 \times 10^{43}\,\mathrm{erg\,s}^{-1} \left(\frac{L_{X}}{10^{43}\,\mathrm{erg\,s}^{-1}} \right)^{0.9} + \epsilon ^i. \end{aligned} $$(17)

In this expression ϵi is a constant for each eFEDS cluster indexed by i. We obtained ϵi by subtracting the value predicted by the power-law model (without ϵi) in the above equation to the uncertainty reported in the catalogue of Bahar et al. (2022).

7.2.2. Uncertainties in trained models

Our selection models inherently embed some uncertainty, due to the limited size of the training sample and to the complex algorithm leading to optimise many hundreds of Gaussian process hyperparameters at once (Appendix D). We have taken benefit of our multiple Poisson realisations of the eRASS1 twin simulations to produce seven selection models, each relying on a distinct training set as large as eight times the actual eRASS1 sky. We folded these selection functions through our eFEDS/eRASS1 comparison algorithm. In all cases we retrieved the under-prediction of eRASS1 clusters, the discrepancy varying between 5 and 7-σ among the seven models. The variability and uncertainty attached to the training phase are thus too small to explain the observed discrepancy.

7.2.3. Uncertainties in the training sample

We now discuss whether our eRASS1 training set may be biased in some way, which would offset the models from the truth. Such offsets would not be visible in our internal validation tests (Sect. 5.4) since the test sample is constructed in exactly the same way as the training sample.

Our baseline procedure associating simulated clusters to detected sources leads to fewer selected halos than with the positional matching procedure (see Sect. 3.3). It is especially visible at cluster fluxes around 10−14 erg s−1 cm−2, where the number of matched sources can differ up to a factor two (Fig. 1). We have trained a selection function model with a new set of detection flags constructed with the alternative matching technique. This model indeed predicts slightly higher detection probabilities at low flux, in accordance with the update in the training set. However, the 5–7-σ discrepancy remains in the comparison with the eRASS1 clusters detected in eFEDS. This is not surprising, since the eRASS1 flux limit in the eFEDS field is around 10−13 erg s−1 cm−2, well above the regime where the two matching methods depart from each other (Fig. 1).

After completion of this work, we found minor differences in the eSASS software configuration parameters used in processing the simulations (version eSASSusers_201009, Seppi et al. 2022) and in processing the actual eRASS1 data (version 010, Merloni et al. 2024). These changes mainly impact the background map creation and source splitting into multiple entries. In particular, in real data the ermldet algorithm would attempt to split detections into two sources only if the number of counts is greater than 25; while in our simulations no specific threshold has been set on this value. We have tried and processed a subset of the twin simulation with the updated algorithm. We find a tentative higher detection probability than in the original mock catalogue, which may explain part of the deficit in the predicted number of eRASS1 sources. However, this finding relies only on a subset of the entire twin simulations, insufficient to firmly close the issue.

Our final test consists in using a selection model that explicitly involves the cluster sizes R500 in their features (Fig. E.3). By using a selection model in the form p(Imain|fX,R500,ℋ), we do not rely any longer on the distribution of sizes imprinted in the simulation. Rather, we assume our LX − M500 is correct and we distribute our modelled flux within the radius associated to M500. Folding this new model into our eFEDS/eRASS1 analysis, we find better agreement between predictions and actual number of matches. Hence, using this alternative selection model instead of our baseline model, the discrepancy reduces from 7- to 5-σ for the eRASS1-main catalogue and from 5- to 3-σ for the cosmology catalogue. This finding corroborates Fig. 6 in Comparat et al. (2020), showing that the average R500 radius of our simulated clusters at fixed luminosity is larger than in several published samples4.

7.2.4. Sensitivity of the external validation test

In this section we have attempted to address the apparent mismatch between the number of eRASS1 systems predicted in the eFEDS field and the actual number of matches. We could not identify one single cause for the discrepancy, rather we have found several potential difficulties. Overall, our experiments reveal the high sensitivity of our external validation tests to small variations in the model assumptions. They also highlight the amount of effort needed to properly understand selection effects at low fluxes and close to the detection threshold.

7.3. X-ray and SZ correlated detection

In Sect. 6 we have introduced a joint distribution for the X-ray luminosity and SZ significance p(lnLX,lnζ|M500,z) as a bivariate normal distribution with correlation coefficient ρ = 0.2. We now investigate its impact on the results. We varied ρ between −0.8 and 0.8, while repeating the experiment with the SPT-ECS sample as reference R and the eRASS1-cosmo sample as test T (Sect. 6.7). We form the quantity:

B = 2 ln L = 2 i R I T i ln P i + ( 1 I T i ) ln ( 1 P i ) . $$ \begin{aligned} B = -2 \ln \mathcal{L} = -2 \sum _{i \in R} I_{{T}}^i \ln P_i + (1-I_{{T}}^i) \ln (1-P_i). \end{aligned} $$(18)

Its mimimum Bmin is obtained by varying ρ, all other model parameters held fixed at their (supposedly) true value. We removed the cluster SPT-CLJ0333-3707 whose redshift measurements disagree with the matched entry 1eRASS J033323.9−370744 (Sect. 6.7). There are 162 entries entering the sum. Following Cash (1979), B − Bmin approximately follows a χ2 distribution with one degree of freedom. Figure 28 shows the curve B − Bmin for various values of ρ, from which we derive an approximate 68% confidence interval −0.4 < ρ < 0.8. Consistently with previous studies (e.g. de Haan et al. 2016; Dietrich et al. 2019) our data does not favour strongly correlated or anti-correlated scatter between the X-ray and SZ signals.

thumbnail Fig. 28.

Value of the combined Bernoulli negative log-likelihood (Eq. (18)) in the comparison of the SPT-ECS sample (reference) and the eRASS1 cosmology sample (test) for various values of the correlation coefficient ρ. This coefficient expresses the level of covariance between lnLX and lnζ at fixed halo mass and redshift. Assuming all other model parameters are correct, the constraint obtained from the catalogue comparison exercise depicted in this paper is relatively loose: ρ ≃ 0.2 ± 0.6 (68% confidence level) as deduced from (B B min )~ χ 1 2 $ (B-B_{\rm min}) \sim \chi^2_1 $.

The fiducial value ρ = 0.2 chosen in Sect. 6 is in fact the best-fit, with Bmin = 99.9. We have checked the goodness of fit by multiple random Bernoulli draws from the Pi values returned by the best-fit model; only 24% of the simulations provide a value of B smaller than Bmin. Given the many other assumptions made (e.g. fixed scaling relation parameters, etc.), we cannot firmly exclude specific values of ρ from our dataset. A comparison involving larger samples may put stronger constraints on ρ through this method.

It is interesting to note that ρ is intimately related to the ICM distribution for clusters in the reference survey. A physical link should exist between LX and ζ that allows computing ρ or at least a prior distribution, without using the halo mass M500c. However, our implementation of the SPT-SZ selection function through ζ imposes to rely on an empirical scaling relation with mass, which prevents from investigating further in this direction.

8. Conclusions

This work presents an investigation of X-ray selection function models aimed at describing the completeness of the primary and cosmological eRASS1 cluster samples. Driven by our astrophysical and cosmological objectives, we have developed a framework inspired by classification problems in statistics and especially machine learning research. Such empirical models are well suited for our purpose since the selection of sources in astronomical catalogues results (at least in part) from human decisions, whose detailed physical modelling may be cumbersome. The models we selected are intentionally simple, enabling them to be interpreted and explained.

Our main findings are summarised as follows:

  1. We have performed extensive twin simulations of the eRASS1_DE sky under certain cosmological and physical assumptions. These simulations reproduce, to a large extent, the chain of selection filters along the path from sources to final catalogues.

  2. We have identified which simulated halos were detected and selected by means of custom matching algorithms. One algorithm relies on tracking the origin of simulated photons, the other algorithm on Bayesian positional matching. We have highlighted relevant differences in the outcome of the two algorithms, mostly below cluster fluxes a few times 10−14 erg s−1 cm−2.

  3. By means of a simple logistic (linear) model, we have asked which surface brightness features govern the detectability and selection of clusters. In particular, we find the central 20″ counts help in detecting a cluster, but not so in characterising it as extended. Cluster counts beyond approximately 1.5′ from the cluster centre play a minor role in the selection of clusters. This model may be seen as a simplified emulator of the eSASS source detection algorithm.

  4. Acknowledging that the selection function based on surface brightness features requires an important modelling effort on the cosmological likelihood side, we have developed models relying on intermediate variables. We identified a hierarchy of models, with some strongly depending on assumptions made in the twin simulations, for example p(I|M500,z), and others enabling less of a dependence on such assumptions, for example p(I|CR,z,Texp,NH,bkg). Using the latter class of models comes at a cost, namely a heavier modelling effort in scientific analyses.

  5. We have tested all our models with leftover simulation samples. In particular, we established a ranking of models based on their precision, their sensitivity (also known as recall), and the reliability of their predicted probabilities. Models accounting for at least one morphological feature (e.g. the cluster R500 radius or its central emission measure EM0) perform better.

  6. Using real data, we have tested our models by means of a custom population model. In a comparison run involving eFEDS or SZ-based surveys, we find no obvious missing cluster from our eRASS1 samples. Conversely, there is no eRASS1 cluster in the cosmology sample that is strongly missing in other tested surveys, which is in agreement with the known low contamination of the sample (Seppi et al. 2022; Ghirardini et al. 2024).

  7. While comparison with the (brighter) SPT-SZ population reveals no specific issue, we find more eRASS1 clusters in the eFEDS sample than our models would predict. We have highlighted a series of putative causes for the discrepancy. We will pursue work to deepen our understanding of the selection of faint clusters.

The outcome of the numerical deliverables for this work have been used directly in other eRASS1 cosmology papers. The eRASS1 group population analysis (Bahar et al. 2024) considers the model depicted in Sect. 5.2. The baseline model presented in Sect. 5.3 feeds the main cosmological analysis based on cluster abundance (Ghirardini et al. 2024; Artis et al. 2024) and on cluster clustering (Seppi et al. 2024). This model also serves in estimating sample completeness and in assigning total masses to catalogued clusters (Bulbul et al. 2024). Other models presented in this paper have indirectly fed these analyses by providing a critical, quantitative assessment of our understanding of X-ray cluster selection in the context of eRASS1.

In future work we will refine the simulation set with updated cluster models and halo simulations. An alternative assessment of selection systematics may be gained through the use of “inject-and-retrieve” methods (e.g. Pacaud et al. 2006; Suchyta et al. 2016; Kong et al. 2020), where mock cluster images are distributed in real data. Moreover, the ultimate validation of selection models should rely on a performance metric that directly relates to cosmological parameters, especially since our purpose is to constrain those parameters from data. Developing multiple eRASS simulations spanning a range of cosmological models is a promising route in this direction, although computationally intensive. Interestingly, forward models of the observed large-scale structure using effective field theory (EFT) have progressed tremendously (e.g. Jasche & Lavaux 2019; Schmidt 2021a). These models predict the cosmological matter density field at a percent (and even sub-percent) accuracy. In the future, these models will enable tight constraints on cosmological parameters by using not only linear but also mildly non-linear information contained in the density field (Schmidt 2021b; Kostić et al. 2023) and by marginalising fully over the selection function. Finally, pursuing the comparison exercise depicted in this paper with multiple other surveys across a range of wavelengths will provide interesting constraints on model parameters and reveal any significant population outliers worth a deep follow-up study.


1

Recall (also known as sensitivity) and precision are respectively equivalent to completeness and purity, defined with respect to the true classification in the test sample. However we prefer using the former terminology in order to avoid possible confusion with the usual measures of completeness and purity of a certain sample, defined with respect to the entire population of clusters.

2

Border effects due to more or less restrictive sky masking lead to a different number of matches than mentioned in the previous section (35 matches). This effect is explicitly taken into account in our model.

3

We multiply their value of ASZ by γfield = 2.66, as prescribed by the authors. We also account for a misprinted h in their Eq. (10) (Bocquet, S., priv. comm.).

4

Different cosmological models between samples and simulations entering this figure may induce deviations of a few percent in the luminosity and angular distance values, which is not enough to account for the shift.

5

This number reduces to n = 4.8 × 105 when training models for the cosmology sample, as we restrict simulated clusters to the range 0.05 < z < 0.85.

Acknowledgments

The authors thank the referee for a careful reading of the manuscript. This work is based on data from eROSITA, the soft X-ray instrument aboard SRG, a joint Russian-German science mission supported by the Russian Space Agency (Roskosmos), in the interests of the Russian Academy of Sciences represented by its Space Research Institute (IKI), and the Deutsches Zentrum für Luft- und Raumfahrt (DLR). The SRG spacecraft was built by Lavochkin Association (NPOL) and its subcontractors, and is operated by NPOL with support from the Max Planck Institute for Extraterrestrial Physics (MPE). The development and construction of the eROSITA X-ray instrument was led by MPE, with contributions from the Dr. Karl Remeis Observatory Bamberg & ECAP (FAU Erlangen-Nuernberg), the University of Hamburg Observatory, the Leibniz Institute for Astrophysics Potsdam (AIP), and the Institute for Astronomy and Astrophysics of the University of Tübingen, with the support of DLR and the Max Planck Society. The Argelander Institute for Astronomy of the University of Bonn and the Ludwig Maximilians Universität Munich also participated in the science preparation for eROSITA. The eROSITA data shown here were processed using the eSASS software system developed by the German eROSITA consortium. N. Clerc acknowledges financial support from CNES. E. Bulbul, A. Liu, V. Ghirardini, C. Garrel, S. Zelmer, and X. Zhang acknowledge financial support from the European Research Council (ERC) Consolidator Grant under the European Union’s Horizon 2020 research and innovation program (grant agreement CoG DarkQuest No 101002585). Some of the results in this paper have been derived using the following packages: scikit-learn (Pedregosa et al. 2011), MOCpy (https://github.com/cds-astro/mocpy/) (Fernique et al. 2014), GPy (https://github.com/SheffieldML/GPy) (GPy 2012), astropy (Astropy Collaboration 2022), matplotlib (Hunter 2007), healpy and HEALPix (http://healpix.sourceforge.net) (Górski et al. 2005; Zonca et al. 2019), HEASoftpy (https://heasarc.gsfc.nasa.gov/lheasoft/heasoftpy/), climin (https://github.com/BRML/climin) (Bayer et al. 2015).

References

  1. Agresti, A., & Coull, B. A. 1998, TAS, 52, 119 [Google Scholar]
  2. Aird, J., Coil, A. L., Georgakakis, A., et al. 2015, MNRAS, 451, 1892 [Google Scholar]
  3. Aird, J., Coil, A. L., & Georgakakis, A. 2018, MNRAS, 474, 1225 [NASA ADS] [CrossRef] [Google Scholar]
  4. Anderson, M. E., Gaspari, M., White, S. D. M., Wang, W., & Dai, X. 2015, MNRAS, 449, 3806 [NASA ADS] [CrossRef] [Google Scholar]
  5. Artis, E., Ghirardini, V., Bulbul, E., et al. 2024, A&A, submitted [arXiv:2402.08459] [Google Scholar]
  6. Astropy Collaboration (Price-Whelan, A. M., et al.) 2022, ApJ, 935, 167 [NASA ADS] [CrossRef] [Google Scholar]
  7. Bahar, Y. E., Bulbul, E., Clerc, N., et al. 2022, A&A, 661, A7 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  8. Bahar, Y. E., Ghirardini, V., Sanders, J. S., et al. 2024, A&A, submitted [arXiv:2401.17276] [Google Scholar]
  9. Bayer, J., Osendorfer, C., Diot-Girard, S., Rueckstiess, T., & Urban, S. 2015, TUM, Tech. Rep., https://github.com/BRML/climin [Google Scholar]
  10. Bleem, L. E., Bocquet, S., Stalder, B., et al. 2020, ApJS, 247, 25 [Google Scholar]
  11. Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. 2017, J. Am. Stat. Assoc., 112, 859 [CrossRef] [Google Scholar]
  12. Böhringer, H., & Chon, G. 2015, A&A, 574, L8 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  13. Boubert, D., & Everall, A. 2020, MNRAS, 497, 4246 [NASA ADS] [CrossRef] [Google Scholar]
  14. Boubert, D., & Everall, A. 2021, MNRAS, 510, 4626 [Google Scholar]
  15. Brunner, H., Liu, T., Lamer, G., et al. 2022, A&A, 661, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  16. Bulbul, E., Chiu, I. N., Mohr, J. J., et al. 2019, ApJ, 871, 50 [Google Scholar]
  17. Bulbul, E., Liu, A., Pasini, T., et al. 2022, A&A, 661, A10 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  18. Bulbul, E., Liu, A., Kluge, M., et al. 2024, A&A, 685, A106 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  19. Cash, W. 1979, ApJ, 228, 939 [Google Scholar]
  20. Chuang, C.-H., Yepes, G., Kitaura, F.-S., et al. 2019, MNRAS, 487, 48 [NASA ADS] [CrossRef] [Google Scholar]
  21. Clerc, N., & Finoguenov, A. 2023, in Handbook of X-ray and Gamma-ray Astrophysics, eds. C. Bambi, & A. Santangelo, 123 [Google Scholar]
  22. Clerc, N., Ramos-Ceja, M. E., Ridl, J., et al. 2018, A&A, 617, A92 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  23. Comparat, J., Merloni, A., Salvato, M., et al. 2019, MNRAS, 487, 2005 [Google Scholar]
  24. Comparat, J., Eckert, D., Finoguenov, A., et al. 2020, Open J. Astrophys., 3, 13 [Google Scholar]
  25. Comparat, J., Truong, N., Merloni, A., et al. 2022, A&A, 666, A156 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  26. Comparat, J., Luo, W., Merloni, A., et al. 2023, A&A, 673, A122 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  27. Dauser, T., Falkner, S., Lorenz, M., et al. 2019, A&A, 630, A66 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  28. Debackere, S. N. B., Hoekstra, H., Schaye, J., Heitmann, K., & Habib, S. 2022, MNRAS, 515, 3383 [NASA ADS] [CrossRef] [Google Scholar]
  29. de Haan, T., Benson, B. A., Bleem, L. E., et al. 2016, ApJ, 832, 95 [NASA ADS] [CrossRef] [Google Scholar]
  30. Dietrich, J. P., Bocquet, S., Schrabback, T., et al. 2019, MNRAS, 483, 2871 [Google Scholar]
  31. Fernique, P., Boch, T., Donaldson, T., et al. 2014, MOC – HEALPix Multi-Order Coverage Map Version 1.0, IVOA Recommendation 02 June 2014 [Google Scholar]
  32. Finoguenov, A., Rykoff, E., Clerc, N., et al. 2020, A&A, 638, A114 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  33. Foreman-Mackey, D., Hogg, D. W., Lang, D., & Goodman, J. 2013, PASP, 125, 306 [Google Scholar]
  34. Gallagher, S. C., & Smeenk, C. 2023, in What’s in a Survey? Simulation-Induced Selection Effects in Astronomy, eds. N. Mills Boyd, S. De Baerdemaeker, K. Heng, & V. Matarese (Cham: Springer International Publishing), 207 [Google Scholar]
  35. Garrel, C., Pierre, M., Valageas, P., et al. 2022, A&A, 663, A3 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  36. Georgakakis, A., Aird, J., Schulze, A., et al. 2017, MNRAS, 471, 1976 [NASA ADS] [CrossRef] [Google Scholar]
  37. Ghirardini, V., Eckert, D., Ettori, S., et al. 2019, A&A, 621, A41 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  38. Ghirardini, V., Bulbul, E., Artis, E., et al. 2024, A&A, submitted [arXiv:2402.08458] [Google Scholar]
  39. Górski, K. M., Hivon, E., Banday, A. J., et al. 2005, ApJ, 622, 759 [Google Scholar]
  40. GPy 2012, GPy: A Gaussian Process Framework in Python, http://github.com/SheffieldML/GPy [Google Scholar]
  41. Grandis, S., Klein, M., Mohr, J. J., et al. 2020, MNRAS, 498, 771 [NASA ADS] [CrossRef] [Google Scholar]
  42. Hensman, J., Matthews, A., & Ghahramani, Z. 2015, in Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, eds. G. Lebanon, & S. V. N. Vishwanathan (San Diego, California, USA: PMLR), Proceedings of Machine Learning Research, 38, 351 [Google Scholar]
  43. HI4PI Collaboration (Ben Bekhti, N., et al.) 2016, A&A, 594, A116 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  44. Hoyle, B., Jimenez, R., Verde, L., & Hotchkiss, S. 2012, JCAP, 2012, 009 [CrossRef] [Google Scholar]
  45. Hu, W., & Kravtsov, A. V. 2003, ApJ, 584, 702 [Google Scholar]
  46. Huang, N., Bleem, L. E., Stalder, B., et al. 2020, AJ, 159, 110 [Google Scholar]
  47. Hunter, J. D. 2007, Comput. Sci. Eng., 9, 90 [NASA ADS] [CrossRef] [Google Scholar]
  48. Jasche, J., & Lavaux, G. 2019, A&A, 625, A64 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  49. Kluge, M., Comparat, J., Liu, A., et al. 2024, A&A, in press, https://doi.org/10.1051/0004-6361/202349031 [Google Scholar]
  50. Kong, H., Burleigh, K. J., Ross, A., et al. 2020, MNRAS, 499, 3943 [NASA ADS] [CrossRef] [Google Scholar]
  51. Kostić, A., Nguyen, N.-M., Schmidt, F., & Reinecke, M. 2023, JCAP, 2023, 063 [Google Scholar]
  52. Lehmer, B. D., Xue, Y. Q., Brandt, W. N., et al. 2012, ApJ, 752, 46 [NASA ADS] [CrossRef] [Google Scholar]
  53. Lindholm, V., Finoguenov, A., Comparat, J., et al. 2021, A&A, 646, A8 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  54. Liu, A., Bulbul, E., Ghirardini, V., et al. 2022, A&A, 661, A2 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  55. Liu, A., Bulbul, E., Ramos-Ceja, M. E., et al. 2023, A&A, 670, A96 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  56. Locatelli, N., Ponti, G., Zheng, X., et al. 2024, A&A, 681, A78 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  57. Mantz, A. B. 2019, MNRAS, 485, 4863 [NASA ADS] [CrossRef] [Google Scholar]
  58. Mantz, A. B., von der Linden, A., Allen, S. W., et al. 2015, MNRAS, 446, 2205 [Google Scholar]
  59. Marulli, F., Veropalumbo, A., Sereno, M., et al. 2018, A&A, 620, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  60. Matthews, B. 1975, Biochimica et Biophysica Acta (BBA)– Protein Structure, 405, 442 [CrossRef] [Google Scholar]
  61. Merloni, A., Lamer, G., Liu, T., et al. 2024, A&A, 682, A34 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  62. Moster, B. P., Naab, T., & White, S. D. M. 2013, MNRAS, 428, 3121 [Google Scholar]
  63. Newcombe, R. G. 1998, Stat. Med., 17, 857 [CrossRef] [Google Scholar]
  64. Pacaud, F., Pierre, M., Refregier, A., et al. 2006, MNRAS, 372, 578 [NASA ADS] [CrossRef] [Google Scholar]
  65. Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, J. Mach. Learn. Res., 12, 2825 [Google Scholar]
  66. Planck Collaboration XIII. 2016, A&A, 594, A13 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  67. Ponti, G., Zheng, X., Locatelli, N., et al. 2023, A&A, 674, A195 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  68. Predehl, P., Sunyaev, R. A., Becker, W., et al. 2020, Nature, 588, 227 [CrossRef] [Google Scholar]
  69. Predehl, P., Andritschke, R., Arefiev, V., et al. 2021, A&A, 647, A1 [EDP Sciences] [Google Scholar]
  70. Rasmussen, C. E., & Williams, C. K. I. 2006, Gaussian Processes for Machine Learning (MIT Press) [Google Scholar]
  71. Rix, H.-W., Hogg, D. W., Boubert, D., et al. 2021, AJ, 162, 142 [NASA ADS] [CrossRef] [Google Scholar]
  72. Salvato, M., Buchner, J., Budavári, T., et al. 2018, MNRAS, 473, 4937 [Google Scholar]
  73. Schmidt, F. 2021a, JCAP, 2021, 033 [CrossRef] [Google Scholar]
  74. Schmidt, F. 2021b, JCAP, 2021, 032 [CrossRef] [Google Scholar]
  75. Schneider, P. C., Freund, S., Czesla, S., et al. 2022, A&A, 661, A6 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  76. Seppi, R., Comparat, J., Bulbul, E., et al. 2022, A&A, 665, A78 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  77. Seppi, R., Comparat, J., Ghirardini, V., et al. 2024, A&A, 686, A196 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  78. Suchyta, E., Huff, E. M., Aleksić, J., et al. 2016, MNRAS, 457, 786 [Google Scholar]
  79. Tinker, J., Kravtsov, A. V., Klypin, A., et al. 2008, ApJ, 688, 709 [Google Scholar]
  80. Turner, D. J., Giles, P. A., Romer, A. K., et al. 2022, MNRAS, 517, 657 [NASA ADS] [CrossRef] [Google Scholar]
  81. Vanderlinde, K., Crawford, T. M., de Haan, T., et al. 2010, ApJ, 722, 1180 [CrossRef] [Google Scholar]
  82. Veronica, A., Reiprich, T. H., Pacaud, F., et al. 2024, A&A, 681, A108 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  83. Vollset, S. E. 1993, Stat. Med., 12, 809 [CrossRef] [Google Scholar]
  84. Wilson, E. B. 1927, J. Am. Stat. Assoc., 22, 209 [CrossRef] [Google Scholar]
  85. Zeiler, M. D. 2012, arXiv e-prints [arXiv:1212.5701] [Google Scholar]
  86. Zhang, Y., Comparat, J., Ponti, G., et al. 2024, A&A, submitted [arXiv:2401.17308] [Google Scholar]
  87. Zheng, X., Ponti, G., Locatelli, N., et al. 2024a, A&A, submitted [arXiv:2401.17310] [Google Scholar]
  88. Zheng, X., Ponti, G., Freyberg, M., et al. 2024b, A&A, 681, A77 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  89. Zonca, A., Singer, L., Lenz, D., et al. 2019, J. Open Source Softw., 4, 1298 [Google Scholar]

Appendix A: Uncertainties on logistic regression coefficients

Considering a galaxy cluster indexed by i among n simulated systems in the training set, we denote by ηi the vector of its m associated features. Simulations indicate whether this cluster is detected as extended (Ii = 1) or not (Ii = 0). Our implementation of logistic regression writes:

p ( I i | η i ) = φ ( f ( η i ) ) = φ ( w η i ) . $$ \begin{aligned} p\left(I_i|\boldsymbol{{{\eta }}}_i\right) = \varphi \left( f(\boldsymbol{{{\eta }}}_i) \right) = \varphi \left(\boldsymbol{{{ w}^{\prime }}}^{\intercal } \boldsymbol{{{\eta }}}_i\right) .\end{aligned} $$(A.1)

We used φ(t) = (1+et)−1 and w = (w0,w1,w2,⋯,wm) the vector of coefficients (w0 is the intercept of the underlying linear model). We have defined η i = ( 1 , η i , 1 , η i , 2 , , η i , m ) $ \boldsymbol{{{\eta^{{\prime}}}}}_i = (1, {\eta}_{i, 1}, {\eta}_{i, 2}, \dotsc, {\eta}_{i, m}) $.

The fit is performed by minimising the following cost function (known as log-loss) over all possible values for w, where we have written qi = p(Ii|ηi):

C ( w ) = i = 1 n ( I i ln ( q i ) ( 1 I i ) ln ( 1 q i ) ) . $$ \begin{aligned} \mathcal{C} (\boldsymbol{{{ w}}}) = \sum _{i=1}^n \left(-I_i \ln (q_i) - (1 - I_i) \ln (1 - q_i)\right) .\end{aligned} $$(A.2)

A formula for the Hessian matrix H derives by noticing that:

q i / w j = q i ( 1 q i ) η i , j . $$ \begin{aligned} \partial q_i/\partial { w}_j = q_i (1-q_i) {\eta^{\prime }}_{i, j} .\end{aligned} $$(A.3)

We obtain:

H jk = C w j w k = i = 1 n q i ( 1 q i ) η i , j η i , k . $$ \begin{aligned} H_{jk} = \frac{\partial \mathcal{C} }{\partial { w}_j \partial { w}_k} = \sum _{i=1}^{n} q_i (1-q_i) {\eta^{\prime }}_{i, j} {\eta^{\prime }}_{i, k}. \end{aligned} $$(A.4)

We obtain an estimate of the covariance matrix of best-fit coefficients w by inverting the (m + 1)×(m + 1) matrix H. This approximation is roughly valid as long as the best-fit model corresponds to the minimum of the cost function and in the limit of large n.

Appendix B: Logistic regression predicting detected clusters

We have shown results in Sect. 4 for a model trained to predict whether a cluster is selected (that is, detected and characterised as extended), given 7 × 5 surface brightness features (7 radial ranges and 5 components). We reiterate this exercise with a model trained to predict whether a cluster is detected. Figure B.1 shows the 35 coefficients associated to each of these features. As discussed in Sect. 4, these coefficients can be interpreted as the sensitivity of the detection to a small variation in a given feature. Fig. 4 and Fig. B.1 differ notably through the coefficients associated to the cluster emission (black crosses). The central 0 − 20″ surface brightness plays a major role in the detection model. Counts deposited at radii larger than 2 arcmin are relatively more important for detecting than for selecting a cluster.

thumbnail Fig. B.1.

Similar to Fig. 4, but for a logistic model p(IeRASS1|counts), trained to predict whether a cluster is detected among all simulated clusters.

Appendix C: Confidence range for proportions n/N

Some figures in the paper incorporate uncertainties on a certain ratio n/N with n and N both integer numbers, 0 ≤ n ≤ N. We estimated the 68% confidence range [L, U] by means of the score interval (Wilson 1927) with continuity correction (Vollset 1993):

L = n + λ 2 / 2 N + λ 2 λ n n 2 / N + λ 2 / 4 N + λ 2 , $$ \begin{aligned}&L = \frac{n_{-} + \lambda ^2/2}{N + \lambda ^2} - \lambda \frac{\sqrt{n_{-} - n_{-}^2/N + \lambda ^2/4}}{N+\lambda ^2},\end{aligned} $$(C.1)

U = n + + λ 2 / 2 N + λ 2 + λ n + n + 2 / N + λ 2 / 4 N + λ 2 , $$ \begin{aligned}&U = \frac{n_{+} + \lambda ^2/2}{N + \lambda ^2} + \lambda \frac{\sqrt{n_{+} - n_{+}^2/N + \lambda ^2/4}}{N+\lambda ^2}, \end{aligned} $$(C.2)

where n = n − 1/2 and n+ = n + 1/2. The bounds are set to L = 0 if n = 0 and to U = 1 if n = N. The value for λ is the 1 − α/2 quantile of the normal distribution, in our case we set λ = 1 for α = 0.32 (hence, we show the 68% confidence intervals). Without continuity correction, we would have used n in place of n and n+ in those formulas.

These intervals should be adequate to represent uncertainties in the proportion n/N (Newcombe 1998), although the confidence interval may be too conservative in general (Agresti & Coull 1998). We emphasise that we do not use such uncertainties while computing our models, nor in performing calculations. They only serve for visualisation purposes.

Appendix D: Gaussian process classification

The selection function models presented in Sect. 5 rely on Gaussian process classification (GPC). To overcome the well-known computational bottleneck encountered in standard GPC we have applied the Stochastic Variational Gaussian Process (SVGP) algorithm (Hensman et al. 2015) implemented in the GPy library (GPy 2012). This section presents briefly the main characteristics of the method and some relevant references. For an application in the context of cluster astrophysics, see e.g. Debackere et al. (2022), in particular their Appendix B provides concise descriptions of the method for binomial likelihood (we use instead Bernoulli likelihood).

D.1. Defining a Gaussian process prior

We refer the reader to Rasmussen & Williams (2006) for a pedagogical introduction of GPC and for the notations followed in this Appendix. The principle of Gaussian process binary classification builds upon a latent variable f(x), function of the m features x = {xi}1⋯m and on the assumption of a Gaussian process (GP) prior on f. Values of the latent function are then mapped to probability values ranging between 0 and 1 by means of a bijective link function. In our specific case we choose the inverse probit for the link function, that is the cumulative Gaussian distribution Φ(f). Other sigmoid-shaped link functions would convene (e.g. the inverse logit).

By definition, any finite subset of random variables constituting a Gaussian process follows a joint multivariate normal distribution. The mean function μ(x) and the covariance function k suffice to fully specify the Gaussian process. We write:

f GP ( μ ( x ) ; k ( x , x ) ) . $$ \begin{aligned} f \sim \mathcal{GP} \left( \mu (\boldsymbol{x}) \mathrel ; \mathrel k(\boldsymbol{{x}}, \boldsymbol{{x^{\prime }}}) \right).\end{aligned} $$(D.1)

Using 𝔼 to denote the expectation value of a random variable, we have defined:

μ ( x ) = E [ f ( x ) ] , $$ \begin{aligned}&\mu (\boldsymbol{{x}}) = \mathbb{E} \left[ f(\boldsymbol{{x}}) \right],\end{aligned} $$(D.2)

k ( x , x ) = E [ ( f ( x ) μ ( x ) ) ( f ( x ) μ ( x ) ) ] . $$ \begin{aligned}&k(\boldsymbol{{x}}, \boldsymbol{{x^{\prime }}}) = \mathbb{E} \left[ \left(f(\boldsymbol{{x}}) - \mu (\boldsymbol{{x}}) \right) \left(f(\boldsymbol{{x^{\prime }}}) - \mu (\boldsymbol{{x^{\prime }}}) \right) \right]. \end{aligned} $$(D.3)

In our setup the mean function μ is constantly zero and the covariance function is the Radial Basis Function kernel (RBF), also known as stationary squared exponential kernel. This latter choice ensures extremely smooth properties of the underlying Gaussian process. One introduces m hyperparameters li entering the definition of the RBF kernel, defining the length-scales of variation along each feature dimension:

k ( x , x ) = σ 2 exp [ 1 2 i = 1 m ( x i x i l i ) 2 ] . $$ \begin{aligned} k(\boldsymbol{{x}}, \boldsymbol{{x^{\prime }}}) = \sigma ^2 \exp \left[ -\frac{1}{2} \sum _{i=1}^m \left(\frac{x_i - {x^{\prime }}_i}{l_i} \right)^2 \right]. \end{aligned} $$(D.4)

In this expression σ is an additional hyperparameter governing the amplitude of the correlation. The RBF correlation function is intuitively easy to understand: close pairs of points in the m-dimensional feature space are associated to strongly covariant random variables, the actual value of covariance being close to σ2. Points far from each other are uncorrelated, with zero covariance. The length-scales govern the typical distance correlation cut-off along each dimension.

Given a GP prior and a set of n* test points X = ( x 1 , x 2 , , x n ) , $ X^* = (\boldsymbol{{x}}^*_1, \boldsymbol{{x}}^*_2,\dotsc,\boldsymbol{{x}}^*_{n^*}), $ one may generate a random vector f* by sampling the multivariate normal distribution f* ∼ 𝒩(0;K(X*, X*)). Here K(X*, X*) is a shorthand notation for the n* × n* covariance matrix whose coefficients are the k( x p * , x q * ) $ k(\boldsymbol{{x}}^*_p, \boldsymbol{{x}}^*_q) $.

D.2. Predicting and training in standard GPC

In order to finely model our predictions, we wish to condition the GP on a set of known measurements (observations), also named training set in this paper. A set of n known input data points X = {xi}i = 1⋯n is associated to n binary observations I = {Ii}i = 1⋯n, with Ii taking either value 0 or 1 (compliant with our notations in Table 1). The ensemble (X, I) constitutes the training set. We assume a Bernoulli likelihood to describe the data given the model values, that is:

p ( I i | f i ) = Φ ( f i ) I i ( 1 Φ ( f i ) ) 1 I i . $$ \begin{aligned} p\left(I_i|f_i\right) = \Phi (f_i)^{I_i} \left(1-\Phi (f_i)\right)^{1-I_i}. \end{aligned} $$(D.5)

Given a new test point x*, the model should provide a probabilistic prediction, that is interpreted as the cluster detection probability given our knowledge of the training dataset. Our desired model output writes:

p ( I | X , I , x ) = Φ ( f ) p ( f | X , I , x ) d f . $$ \begin{aligned} p\left(I^*|X, {\boldsymbol{I}}, {\boldsymbol{x}}^*\right) = \int \Phi (f^*) p\left(f^*|X, {\boldsymbol{I}}, {\boldsymbol{x}}^*\right) {\mathrm{d}} f^*. \end{aligned} $$(D.6)

It is therefore required to obtain a distribution of the latent function f* at the point x*. We have that:

p ( f | X , I , x ) = p ( f | X , x , f ) p ( f | X , I ) d f . $$ \begin{aligned} p\left(f^*|X, {\boldsymbol{I}}, {\boldsymbol{x}}^*\right) = \int p\left(f^*|X, {\boldsymbol{x}}^*, {\boldsymbol{f}}\right) p\left({\boldsymbol{f}}|X, {\boldsymbol{I}}\right) {\mathrm{d}} {\boldsymbol{f}}. \end{aligned} $$(D.7)

The first term in the integral derives from the GP prior directly. It can be expressed in a standard way by conditioning the joint Gaussian distribution p(f,f*|X,x*) on the latent function variables f:

f | X , x , f N ( k K 1 f ; k ( x , x ) k K 1 k ) . $$ \begin{aligned} f^* \mathrel | \mathrel X, {\boldsymbol{x}}^*, {\boldsymbol{f}} \sim {\mathcal{N} \left({{\boldsymbol{k}}^{* \intercal } K^{-1} {\boldsymbol{f}}};{k({\boldsymbol{x}}^*, {\boldsymbol{x}}^*) - {\boldsymbol{k}}^{* \intercal } K^{-1} {\boldsymbol{k}}^*}\right)} . \end{aligned} $$(D.8)

We have defined K ≡ K(X, X) and k* ≡ K(x*, X), both are obtained through the RBF kernel, see Eq. D.4. If the number of training points n is large, the inversion of the n × n matrix K may be cumbersome and this represents a common numerical bottleneck in standard GP analysis. The term p(f|X,I) = p(I|f)p(f|X)/p(I|X) is the posterior over latent variables. It is generally not analytically tractable and often it must be approximated.

In GP classification, the marginal likelihood (or evidence) p(I|X) = ∫p(I|f,X)p(f|X)df plays an important role. The hyperparameters of the model (including those from the kernel) are optimally found by maximising the value of the marginal likelihood. Throughout this paper, we identify this operation with the ‘training phase’ of a GP classifier. In our case, an approximation is again needed as the marginal likelihood does not have a simple analytic expression.

D.3. Principles of the selected SVGP algorithm

Several works propose schemes to approximate the distributions and make the associated computations numerically tractable. We only briefly depict here the principles associated to the SVGP algorithm used in our modelling. Details and related references are found in Hensman et al. (2015).

It is useful to introduce a new set of hyperparameters made of k pairs of input data points and their associated output values (Z, u). The k × m coordinates of these points, called inducing inputs, will be optimised together with other hyperparameters during the training phase. Loosely speaking, these points will approximately ‘summarise’ the GP at locations that are most relevant. The second stage of approximations uses searching variational bounds (e.g. Blei et al. 2017, for a review), leading to the following inequality:

ln p ( I | X ) i = 1 n E q ( f i ) [ ln p ( I i | f i ) ] KL [ q ( u ) | | p ( u ) ] . $$ \begin{aligned} \ln p\left({\boldsymbol{I}}|X\right) \ge \sum _{i=1}^{n} \mathbb{E} _{q(f_i)} \left[ \ln p\left(I_i|f_i\right) \right] - \mathrm{KL} \left[ q({\boldsymbol{u}}) \mathrel | | \mathrel p({\boldsymbol{u}}) \right]. \end{aligned} $$(D.9)

In this expression KL represents the Kullback-Leibler divergence, it measures the proximity between the two densities q(u) and p(u). The distribution q(u) belongs to a predefined family of distributions. Selecting a suitable family of distributions is at the core of variational inference, since it amounts to finding the most suitable q among this family. In practice, this procedure is equivalent to resolving an optimisation problem. In our case the selected family is constituted by the normal multivariate distributions, hence q(u)∼𝒩(m;S). This formulation requires the introduction of an additional set of hyperparameters indexing the members in the family: the vector m, with k coefficients, and the symmetric matrix S, with k(k + 1)/2 independent coefficients. The algorithm specifically uses lower Cholesky factorisation S = LL for numerical stability reasons. The expectation under the sum sign is taken with respect to the marginals q(fi) of the distribution q(f), defined such that q(f)≡∫p(f|u)q(u)du. The computation of p(f|u) follows directly from the GP prior assumption, in a similar way as in Eq. D.8 (replacing f with u and f* with f).

In the training phase, the SVGP algorithm consists in finding the set of hyperparameters maximising the bound on the marginal likelihood, that is the right-hand side of Eq. D.9. Gradient descent algorithms are well-suited to such optimisation tasks. Our algorithm uses stochastic optimisation techniques, by drawing random mini-batches of the input data in a large number of successive steps.

The interest of the method depicted here consists in a reduced numerical complexity, as compared to standard GPC. Ensuring that the number of inducing points k is much smaller than the training set (i.e. k ≪ n) puts much of the cost on the calculation of the expectation E q ( f i ) [ ] $ \mathbb{E}_{q(f_i)}\left[\dotsc\right] $ and of the gradients with respect to the hyperparameters. In any case, there is no requirement for inverting of a n × n matrix any longer. Model predictions are also numerically efficient, thanks to the approximation of the posterior p(f,u|X,I) → p(f|u)q(u). In contrast to the standard GPC method (Eq. D.7), computing a prediction does not need knowledge of the values of the training set (X, I); only the inducing points (along with all other hyperparameters) are required. Saving a trained model on disk is thus very memory efficient.

It is important to remind here that several approximations lead to the method outlined above, which results from a compromise between efficiency and accuracy. In particular, Blei et al. (2017) discuss open problems related to variational inference and the use of KL divergence to set a lower bound to the evidence. For our own application we rely on a posteriori validation of the models (e.g. Sect. 5.4). In general, the expectation values of the GPC models reflect well the cluster detection rates, but we find the associated variance to be slightly underestimated.

D.4. Setup of our GPC model construction

Let us now relate the previous notations to the setup of our numerical experiments, as described in Sect. 5. The number of training points is the number of simulated clusters in the training set5, n = 5.7 × 105. The number of features may take values ranging from m = 2 (e.g. in the case of models based on M500 and z) to m = 6 (e.g. for models with LX, z, EM0, NH, Texp and bkg). We choose a small number of inducing points k = 30, thus reaching an empirical trade-off between representativeness of the input data space, computational efficiency and smoothness of the model. The baseline model for cosmological studies has m = 5, this leads to optimising 651 hyperparameters. This number decomposes into: 6 for the kernel (σ and the five length-scales), 30 × 5 coordinates of the inducing points Z, 30 for the values of the m parameter and 30 × 31/2 for the coefficients of the L matrix. Before training, each feature in the training set X is rescaled to the [0, 1] interval using simple functions (e.g. linear or logarithmic), selected according to the dynamic range and distribution of each feature.

Given the large number of free hyperparameters, we used the ADADELTA algorithm (Zeiler 2012) as implemented in the climin library. Our training set is extremely unbalanced, with only a few percent of the simulated clusters listed as detected or selected. In this context, we have found that stochastic optimisation does provide satisfactory results only if mini-batches are of large enough size. In our case we choose random mini-batches containing 216 = 65536 simulated clusters, among them a few hundreds are detected systems. Therefore only about 7 different mini-batches are available. We secured a sufficiently high number of iterations (typically of order 104) before stopping the training phase. In the course of our model production, we have noticed some models converge to undesired solutions, as apparent from unusual best-fit length-scales. This behaviour was in many cases due to unlucky draws of the first mini-batches, containing an uncommon fraction of detected systems. In such failure cases, launching a new training with a fresh random seed would generally provide a satisfactory solution.

The training phase of a classifier took typically a few hours on a standard multi-core machine. Once finalised, each model went through the list of diagnostic tests listed in Sect. 5.4. Each satisfactory model was shared in the eRASS1 collaboration after dumping onto disk the values of the hyperparameters and of the factors rescaling the features to the [0, 1] interval. Prediction of the probabilistic values p(I*|x*) is a very quick operation, even for large numbers of test points x* (Eq. D.6). This is in fact a requirement of the cosmological modelling (Ghirardini et al. 2024, and Eq. 2) to obtain computationally efficient response of the selection function models. A crude estimate of the uncertainties on the model outputs may be obtained by folding the ±1σ enveloppe of the latent function posterior (given by the variance in Eq. D.8) into the link function Φ.

Appendix E: Additional selection models with intermediate variables

This section presents additional models discussed throughout this paper, constructed from intermediate variables as discussed in Sect. 5.

E.1. Absorbed flux and redshift

The absorbed R500 flux in the 0.5–2 keV band is an alternative to the unabsorbed count-rate. It has units erg s−1 cm−2 and it is available for all simulated clusters from the twin simulation. Fig. E.1 shows slices through this model p(Imain|fX,z,NH,Texp,bkg). It has very similar behaviour as a selection expressed with CR (Fig. 10).

thumbnail Fig. E.1.

Representation of a model p(Imain|fX,z,NH,Texp,bkg) for fixed values of cluster redshift z = 0.2 and z = 0.5, local galactic absorption NH, a nominal background level, two values of exposure times Texp (units s) and a range of flux values (x-axis). The shaded regions indicate the approximate 68% confidence range output of the model.

E.2. Integrating morphological features

Closer inspection of twin simulations reveals that cluster flux (or count rate, or luminosity) is not the only determinant of the detection probability. Also its extent, characterised by R500 and its central emission measure, characterised by EM0, play a secondary role. We have (see also Comparat et al. 2020):

E M ( R ) = log 10 + n e n p d l . $$ \begin{aligned} EM(R) = - \log _{10} \int _{- \infty }^{+\infty } n_e n_p {\mathrm{d}} l. \end{aligned} $$(E.1)

The integration is along the line of sight (coordinate l, expressed in units Mpc) at distance R from the cluster centre, ne and np are the electron and proton densities in the galaxy cluster (expressed in units cm−3). For numerical stability reasons, the central emission measure EM0 is defined as the average of EM(R) in the radial range 0 < R < 0.025 R500 and the integration bounds are set to ±20 R500 instead of infinity.

Morphological features intervene non-trivially in the selection function. At fixed R500 luminosity, a larger cluster size R500 dilutes the photons over a larger area, thus making detection of faint objects increasingly difficult for eSASS; very bright extended objects are affected differently, since detection is split into a number of smaller sources, not necessarily classified as extended by eSASS and requiring thorough cleaning (Bulbul et al. 2024). Similarly, low values of EM0 indicate peaked surface brightness profiles, decreasing the probability of selecting a cluster as extended. Large values of EM0 indicate very flat cores, and thus difficulties in detecting such faint objects against the background. It is clear however that these trends depend also on the background level, cluster flux, redshift, exposure time. The GPC formalism comes particularly handy in constructing a model without imposing strong priors on the exact role of these parameters. Fig. E.2 shows four slices through this multi-dimensional model, as a function of the parameter EM0 and for two different cluster luminosities, two different exposure times.

thumbnail Fig. E.2.

Similar to Fig. E.1, but representing the outcome of a model p(Imain|LX,z,EM0,NH,Texp,bkg) for fixed values of cluster redshift z, two different values for the luminosity LX, and local galactic absorption NH, in addition to a nominal background level and two values of exposure time Texp. The shaded regions indicate the approximate 68% confidence range output of the model.

Also in Fig. E.3 we show the dependence on R500 at fixed flux. The larger the apparent cluster extent (here the projected R500 in units arcmin), the lower the probability of selecting the cluster in the sample. This is a consequence of diluting the photons emitted by the cluster over a larger area, hence decreasing the average surface brightness and therefore the ability to distinguish extended emission from background fluctuations.

thumbnail Fig. E.3.

Similar to Fig. E.1, but representing the outcome of a model p(Imain|fX,R500,NH,Texp,bkg) for two different values for the flux fX, fixed local galactic absorption NH, a nominal background level, and two values of exposure time Texp. The shaded regions indicate the approximate 68% confidence range output of the model.

Appendix F: Model for the SPT-SZ selection

The SPTpol-100d selection function depends on one single parameter ζSPTpol. We write:

p ( I SPTpol | ζ SPTpol ) = u ( ζ ) ( 2 π ) 1 / 2 e x 2 / 2 d x . $$ \begin{aligned} p\left(I_{\mathrm{SPTpol}}|{\zeta _{\rm SPTpol}}\right) = \int ^{u(\zeta )}_{-\infty } \left( 2 \pi \right)^{-1/2} e^{-x^2/2} {\mathrm{d}} x. \end{aligned} $$(F.1)

The upper integration bound is a function of ζSPTpol, namely u( ζ SPTpol )=[ ( 3+ ζ SPTpol 2 ) 1/2 4.5 ] $ u({\zeta_{\rm SPTpol}}) = \left[\left(3+{\zeta_{\rm SPTpol}}^2\right)^{1/2} - 4.5\right] $. This expression reflects the SPTpol-100d selection being a mere thresholding on the signal-to-noise parameter ξ > 4.5. The parameter ξ is biased and normally distributed with unit standard deviation (Huang et al. 2020):

p ( ξ | ζ SPTpol ) N ( ζ SPTpol 2 + 3 ; 1 ) . $$ \begin{aligned} p\left(\xi |{\zeta _{\rm SPTpol}}\right) \sim {\mathcal{N} \left({\sqrt{{\zeta _{\rm SPTpol}}^2+3}};{1}\right)}. \end{aligned} $$(F.2)

The expression for the SPT-ECS selection function writes in a similar manner (Bleem et al. 2020). There, ζSPTpol is replaced by ζECS, both quantities scaling with mass as power-laws, albeit with a different normalisation.

Appendix G: Unexpected SPT-ECS detections

The analysis presented in Sect. 6.8 has revealed three eRASS1 clusters having too low an X-ray flux for them to appear in the SPT-ECS sample. Our model indeed predicts a probability value Pi = p(ISPT − ECS|Icosmo,F500_0520,zλ) < 0.025 for each of these entries:

  • 1eRASS J021731.8-320002 (Fig. G.1) is listed at zλ = 0.352 ± 0.009 with flux 2.1 ± 0.4 × 10−14 erg s−1 cm−2 and extent likelihood 14.4. The estimated X-ray mass is M500, X ≃ 3 × 1014M. The matched SPT-ECS detection is at zSPT = 0.354 ± 0.009 (from Magellan/FourStar) and ξ = 5.1. Its estimated SZ mass is M500, SPT ≃ 4 × 1014M.

  • 1eRASS J034733.3-333351 (Fig. G.2) is listed at zλ = 0.465 ± 0.009 with flux 8 3 + 10 × 10 14 $ 8_{-3}^{+10}\times 10^{-14} $ erg s−1 cm−2 and extent likelihood 7.5. The estimated X-ray mass is M500, X ≃ 2 × 1014 M. The matched SPT-ECS detection is at zSPT = 0.45 ± 0.01 (photometric, from Magellan/FourStar) and ξ = 5.4. Its estimated mass is M500, SPT ≃ 4 × 1014 M.

  • 1RASS J045420.6-373708 (Fig. G.3) is listed at zλ = 0.513 ± 0.008 without any flux measurement, nor an upper limit and extent likelihood 6.3. The corresponding SPT detection is at zSPT = 0.509 ± 0.006 (photometric) and significance ξ = 5.9, with estimated mass M500, SPT ≃ 4.8 × 1014 M.

thumbnail Fig. G.1.

Cluster 1eRASS J021731.8-320002. The image shows an optical cutout (DR10 Legacy survey) and eRASS1 X-ray contours overlaid.

thumbnail Fig. G.2.

Cluster 1eRASS J034733.3-333351.

thumbnail Fig. G.3.

Cluster 1eRASS J045420.6-373708.

All Tables

Table 1.

Glossary of symbols and conventions used throughout this paper.

Table 2.

Setup summary of the external validation tests.

All Figures

thumbnail Fig. 1.

Comparing the outcome of two matching procedures relating simulated halos (plain black histogram) to detected sources (photon-based matching is a dashed blue line, and position-based matching is a thin orange line). A simulated halo is flagged as selected (Imain = 1) whenever it is (solidly) matched to an extended source in the detection catalogue. The details of the matching algorithms impact the flux distribution of detected sources, especially at flux below a few 10−14 erg s−1 cm−2, with deviations up to a factor 2 in certain bins.

In the text
thumbnail Fig. 2.

Example case of blending, where a bright AGN drives the detection and the presence of a faint extended source favours its classification as extended. This figure is a cut-out of a simulated count image, smoothed with a Gaussian. The red 2- and 5-arcmin circles are centred on a faint simulated cluster (filled triangle symbol) with flux ∼5 × 10−14 erg s−1 cm−2. Detected sources appear with blue squares. A detection close to the cluster centre is classified as extended (magenta circle). It also coincides with a bright simulated AGN (orange “x”). The position-based matching algorithm does consider the simulated cluster as selected (Imain = 1), contrary to the photon-based matching procedure. Two other faint simulated clusters are shown with open triangles, they have no impact on the detection.

In the text
thumbnail Fig. 3.

Distribution of clusters in the training set as a function of Σ (units counts arcsec−2), and the average surface brightness in the 90″ radius around their centre. The x-axis is rescaled with c = 2 × 10−4 arcsec−2. Top panel: histogram of all simulated clusters (blue) and histogram of the subset of those found as extended by eSASS (orange). Bottom panel: dots indicate the ratio of the histograms (empirical selection rate). Error bars are the 68% confidence range estimated according to Appendix C. The vertical dashed line indicates the transition η ̂ = 1.94 $ \hat{\eta}=1.94 $ in the logistic model described by Eqs. (3) and (4) – corresponding to N ≃ 30 counts.

In the text
thumbnail Fig. 4.

Representation of the 35 coefficients wj of a logistic regression model p(Imain|counts), trained to predict whether a cluster was selected in the primary cluster sample. The 35 features are surface brightness in 7 radial annuli, associated to the 5 components indicated in legend (counts from the galaxy cluster of interest, from neighbouring AGN, fore- and background counts, counts from foreground stars, counts from other neighbouring galaxy clusters). High absolute value of a coefficient indicates high importance of the associated feature (Eq. (6)).

In the text
thumbnail Fig. 5.

Performance of three different logistic models p(Imain|counts) to predict a cluster as selected given the values of surface brightness in radial bins. The plain black line is obtained with a model using surface brightness values in 7 annuli for each of the 5 sky components. The dot-dashed line is for a model taking as input both cluster and background counts within 90″. The dashed grey line stands for a model using as input only the 90″ cluster counts. Each precision-recall curve results from model evaluations on the test sample. It is obtained by varying the threshold over which a cluster should be considered as selected and counting the number of true positives (TP), false negatives (FN) and false positives (FP).

In the text
thumbnail Fig. 6.

Reliablity of two different logistic models p(Imain|counts) to predict the probability of a cluster being selected given the values of surface brightness in radial bins. Each series of points is obtained with the test sample, by comparing probabilities predicted by the model (horizontal axis) with the actual fraction of selected objects (vertical axis) in each bin of probability. The envelopes materialise the 68% uncertainties (Appendix C). An ideal model would align along the 1:1 curve (thin dotted line). Black squares and plain lines correspond to the model taking 7 × 5 parameters as input, grey dots and dashed lines for the model using only cluster counts within 90″ to make its prediction.

In the text
thumbnail Fig. 7.

Representation of the model p(Imain|M500c, z) predicting a cluster to be detected and selected with an extent likelihood above three. The explanatory variables (features) are labels attached to simulated clusters in the twin simulation, standing for galaxy cluster M500c mass and cosmological redshift. The bottom-left panel represents contours of equal detection probability, labelled in steps of 0.1. Both one-dimensional curves (top-left and bottom-right panels) are slices through the function displayed in the bottom-left corner, at fixed z = 0.3 and M500c = 3 × 1014M (indicated with dotted lines).

In the text
thumbnail Fig. 8.

Representation of the model p(Imain|LX, z) predicting a cluster to be detected and selected with an extent likelihood above three. The explanatory variables (features) are labels attached to simulated clusters in the twin simulation, standing for galaxy cluster luminosity LX measured in the 0.5–2 keV energy band at the cluster rest-frame (units erg s−1, rescaled in log10) and for cosmological redshift z. Other details are similar as in Fig. 7.

In the text
thumbnail Fig. 9.

Representation of the models p(Imain|LX, z, NH, Texp, bkg) and p(Icosmo|LX, z, NH, Texp, bkg) for fixed values of cluster redshift z, two different values for the luminosity LX, and local galactic absorption NH, in addition to a nominal background level and a range of exposure times Texp (along the x-axis). The shaded regions indicate the approximate 68% confidence range output of the model.

In the text
thumbnail Fig. 10.

Representation of a model p(Icosmo|CR,z,NH,Texp,bkg) for two values of cluster redshift z = 0.2 and z = 0.5, fixed local galactic absorption NH, a nominal background level, two values of exposure times Texp (units s), and a range of count-rate values (x-axis). The shaded regions indicate the approximate 68% confidence range output of the model.

In the text
thumbnail Fig. 11.

Precision-recall curves obtained with selection models trained to predict the selection of a cluster in the primary sample. The input parameters vary from one model to the other, as indicated in the legend. A model involving luminosity, redshift, morphological features and local sky information performs best (yellow dot-dashed curve).

In the text
thumbnail Fig. 12.

Similar to Fig. 11, but comparing three models predicting the presence of a cluster in the cosmology. The model represented with the black curve corresponds to the baseline selection model used in the analysis of Ghirardini et al. (2024).

In the text
thumbnail Fig. 13.

Evaluating the reliability of five models predicting the probability of a cluster to appear in the eRASS1 primary cluster catalogue (x-axis). The y-axis represents the actual fraction of selected objects in the test sample. The bottom panel shows the residuals (difference) of each curve with respect to the one-to-one line. Shaded regions indicate 68% confidence intervals in each bin of probability.

In the text
thumbnail Fig. 14.

Similar to Fig. 13, but comparing three models predicting the presence of a cluster in the cosmology sample.

In the text
thumbnail Fig. 15.

Testing the models with eRASS1-cosmo as a test catalogue, and eFEDS as a reference catalogue. Each dot corresponds to one of the 531 eFEDS clusters in the plane of measured 300 kpc flux (F_300kpc in units erg s−1 cm−2) and measured redshift (zeFEDS). eFEDS clusters with only flux upper limits are placed at the bottom with star symbols. Black circles indicate clusters also found in the eRASS1 cosmology sample. Colour reflects the model probability Pi (Eq. (8)) computed for each eFEDS cluster.

In the text
thumbnail Fig. 16.

Testing the models with eRASS1-cosmo as a test catalogue, and eFEDS as a reference catalogue. The 531 eFEDS entries are grouped in bins of the model output probabilities Pi (horizontal axis). In each bin, the actual fraction of matches in the eRASS1 catalogue is reported on the vertical axis. Error bars represent the estimated 68% confidence range (Appendix C). The dotted line indicates the one-to-one relation, that would follow a perfectly reliable model. In general, the model predicts slightly lower probabilities than actually observed.

In the text
thumbnail Fig. 17.

Similar to Fig. 15, but now taking eRASS1-cosmo as the reference sample (85 entries) and eFEDS as the test sample. The vertical axis represents F500_0520, the 0.5–2 keV flux within R500 reported in the eRASS1 catalogue (units 10−14 erg s−1 cm−2). The horizontal axis is the catalogue redshift zλ. Clusters falling in zones of zero eFEDS exposure have vanishing probability (Pi = 0). Other eRASS1 clusters are very likely to be detected in eFEDS (Pi ≃ 1); in fact they are, as shown by the black circles. The isolated point at the bottom of the figure only has a flux upper limit 1.5 × 10−13 erg s−1 cm−2.

In the text
thumbnail Fig. 18.

Similar to Fig. 16, but using the primary sample of clusters eRASS1-main as a test sample, instead of the eRASS1 cosmology sample.

In the text
thumbnail Fig. 19.

Posterior distribution for the pair of parameters (δ1 and δ2) introduced in Eq. (9), for the experiment using eRASS1-main as a test and eFEDS as a reference sample. The histograms represent the marginalised posterior distributions, with mean and 68% range overprinted. The contours represent the equivalent 0.5-, 1-, 1.5- and 2-σ distribution of the joint posterior. Clearly the parameter δ2 departs from zero, indicating that a fraction of the systems we predict not be detected leak into the actual eRASS1 sample.

In the text
thumbnail Fig. 20.

Similar to Fig. 15, but using the cosmology sample of eRASS1 clusters as a test sample, and the SPTpol-100d catalogue as a reference sample (65 objects at z > 0.25, coloured dots). The vertical axis represents the column xi in the SPTpol catalogue, standing for the measured cluster signal-to-noise (unitless). The horizontal axis represents the measured redshift in the SPT catalogue. The model predicts only a handful of eRASS1 detections among those objects, consistently with the observed number of 8 matches.

In the text
thumbnail Fig. 21.

Similar to Fig. 16, but using the cosmology sample of eRASS1 clusters as a test sample, and the SPTpol-100d cluster sample as a reference. Within the large uncertainties, there is good agreement between the predicted eRASS1 detection probability and the actual fraction of matched systems.

In the text
thumbnail Fig. 22.

Similar to Fig. 20, but using 163 clusters in SPTpol-ECS catalogue as a reference sample and the eRASS1 cosmology sample as a test.

In the text
thumbnail Fig. 23.

Similar to Fig. 21, but obtained by using the SPT-ECS sample as a reference (163 entries) and using the cosmology sample of eRASS1 clusters as a test.

In the text
thumbnail Fig. 24.

Similar to Fig. 17, but using 958 eRASS1 clusters from the cosmology sample as a reference, and the SPTpol-ECS catalogue as a test sample. Star symbols at the bottom of the figure indicate eRASS1 objects with no flux measurement, possibly with an upper flux limit.

In the text
thumbnail Fig. 25.

Similar to Fig. 16, but obtained by using the eRASS1 cosmology sample (958 entries) as a reference and using SPTpol-ECS as a test sample.

In the text
thumbnail Fig. 26.

Relation between the latent variable f(η) = w0 + ∑jwjηj calculated for each simulated cluster in the test sample after performing logistic regression with 35 surface brightness features, and the values of ℒdet and ℒext as provided by eSASS for those matched to a detection. Each dot stands for a simulated cluster, density contours ease visualisation in crowded regions. Undetected clusters have no associated ℒdet value (red histogram in the lower panel). Clusters detected but not categorised as extended have zero value for ℒext (orange histogram). The apparent correlation between f and both measurements indicate that the model has “discovered” the importance of ℒdet and ℒext in the selection process.

In the text
thumbnail Fig. 27.

Similar to Fig. 26, but replacing the x-axis by the mean of the latent variable implicit in the Gaussian process classifier p(Icosmo|CR,z,ℋ).

In the text
thumbnail Fig. 28.

Value of the combined Bernoulli negative log-likelihood (Eq. (18)) in the comparison of the SPT-ECS sample (reference) and the eRASS1 cosmology sample (test) for various values of the correlation coefficient ρ. This coefficient expresses the level of covariance between lnLX and lnζ at fixed halo mass and redshift. Assuming all other model parameters are correct, the constraint obtained from the catalogue comparison exercise depicted in this paper is relatively loose: ρ ≃ 0.2 ± 0.6 (68% confidence level) as deduced from (B B min )~ χ 1 2 $ (B-B_{\rm min}) \sim \chi^2_1 $.

In the text
thumbnail Fig. B.1.

Similar to Fig. 4, but for a logistic model p(IeRASS1|counts), trained to predict whether a cluster is detected among all simulated clusters.

In the text
thumbnail Fig. E.1.

Representation of a model p(Imain|fX,z,NH,Texp,bkg) for fixed values of cluster redshift z = 0.2 and z = 0.5, local galactic absorption NH, a nominal background level, two values of exposure times Texp (units s) and a range of flux values (x-axis). The shaded regions indicate the approximate 68% confidence range output of the model.

In the text
thumbnail Fig. E.2.

Similar to Fig. E.1, but representing the outcome of a model p(Imain|LX,z,EM0,NH,Texp,bkg) for fixed values of cluster redshift z, two different values for the luminosity LX, and local galactic absorption NH, in addition to a nominal background level and two values of exposure time Texp. The shaded regions indicate the approximate 68% confidence range output of the model.

In the text
thumbnail Fig. E.3.

Similar to Fig. E.1, but representing the outcome of a model p(Imain|fX,R500,NH,Texp,bkg) for two different values for the flux fX, fixed local galactic absorption NH, a nominal background level, and two values of exposure time Texp. The shaded regions indicate the approximate 68% confidence range output of the model.

In the text
thumbnail Fig. G.1.

Cluster 1eRASS J021731.8-320002. The image shows an optical cutout (DR10 Legacy survey) and eRASS1 X-ray contours overlaid.

In the text
thumbnail Fig. G.2.

Cluster 1eRASS J034733.3-333351.

In the text
thumbnail Fig. G.3.

Cluster 1eRASS J045420.6-373708.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.