Toward the low-scatter selection of X-ray clusters: cluster detection by outskirts for eROSITA

Context. One key ingredient in using galaxy clusters as a precision cosmological probe in large X-ray surveys is to understand selection effects. The dependence of the X-ray emission on the square of the gas density leads to a predominant role of cluster cool cores in the detection. The contribution of cool cores to the X-ray luminosity does not scale with cluster mass and cosmology and therefore affects the use of X-ray clusters in producing cosmological constraints. Aims. One of the main science goals of the extended ROentgen Survey with an Imaging Telescope Array (eROSITA) mission is to constrain cosmology with a wide X-ray survey. We propose an eROSITA galaxy cluster detection scheme that avoids the use of X-ray cluster centers in detection. We calculate theoretical expectations and characterize the performance of this scheme by simulations. Methods. We perform Monte-Carlo simulations of the upcoming eROSITA mission, including known foreground and background components. Performing realistic simulations of point sources in survey mode we search for spatial scales where the extended signal is uncontaminated by the point-source flux. We derive a combination of scales and thresholds, which result in a clean extended source catalog. We design the output of the cluster detection which enables calibrating the core-excised luminosity using external mass measurements. We provide a way to incorporate the results of this calibration in the production of final core-excised luminosity. Results. Similarly to other galaxy cluster detection pipelines, we sample the flux – cluster core radius detection space of our method and find many similarities with the pipeline used in the 400d survey. Both detection methods require large statistics on compact clusters, in order to reduce the contamination from point sources. The benefit of our pipeline consists in the sensitivity to the outer cluster shapes, which are characterized by large core sizes with little cluster to cluster variation at a fixed cluster total mass. Conclusions. Galaxy cluster detection by outskirts improves the cluster characterization using eROSITA survey data and is expected to yield well characterized cluster catalogs having simple selection functions.


Introduction
The expansion and structure formation history of the Universe is imprinted on the spatial distribution and number density of its largest collapsed entities, galaxy clusters. This makes galaxy clusters powerful probes for constraining cosmological parameters such as the dark energy equation of state (e.g., Vikhlinin et al. 2009;Allen et al. 2011 for a review). Among others, X-ray observations of galaxy clusters are of particular interest because they trace the bulk of the baryonic component, the hot intracluster medium (ICM). With the launch of the extended ROentgen Survey with an Imaging Telescope Array (eROSITA, Merloni et al. 2012;Predehl et al. 2018) in July 2019, X-ray astronomy ushers in a new era. As the primary instrument of the Russian-German Spektrum-Roentgen-Gamma (SRG) mission, eROSITA will perform eight all-sky surveys within four years. The unprecedented survey speed and capability over a wide range of energies mean that the final all-sky survey will be ∼20-30 times deeper than that of its predecessor (the ROSAT all-sky survey, Send offprint requests to: F. Käfer, e-mail: fkaefer@mpe.mpg.de Voges et al. 1999) in the 0.5-2 keV energy range and will provide the first ever imaging all-sky survey in the 2-10 keV energy band. With the expected detection of 10 5 galaxy clusters (Pillepich et al. 2012), eROSITA will place tight constraints on the dark energy equation of state, among others.
Understanding selection effects is an essential but complicated requirement for precision cosmology. Determining the selection function is especially complex for extended X-ray sources because the detection probability and proper classification depend on their morphology, for example (Eckert et al. 2011;Rossetti et al. 2016Rossetti et al. , 2017Andrade-Santos et al. 2017;Lovisari et al. 2017). The cluster outskirts (0.2-0.8 r 500 ) are found to evolve with redshift in a self-similar fashion (McDonald et al. 2017;Käfer et al. 2019) and exhibit low scatter (Ghirardini et al. 2018;Käfer et al. 2019). Therefore, cluster samples that are selected based on the properties of cluster outskirts will closely trace the selection by cluster mass and reduce the systematics of cluster use in cosmological studies. Another important aspect of detailed image decomposition consists of the removal of point sources. In the extragalactic sky, the X-ray point-source popula-Article number, page 1 of 14 arXiv:1912.01024v2 [astro-ph.CO] 11 Dec 2019 A&A proofs: manuscript no. AA_2019_36131 tion is dominated by active galactic nuclei (AGN). Active galactic nuclei cause false detections through the noise in the realization of their photon distribution. In addition, they contribute to the total flux of the cluster because the AGN halo occupation distribution extends to high masses, especially at high redshifts (Allevato et al. 2012;Oh et al. 2014). The importance of AGN in contaminating cluster fluxes of eROSITA observations has been highlighted by Biffi et al. (2018).
Spatial filtering of X-ray images to describe the emission that is produced on different spatial scales has been introduced by Starck & Bijaoui (1991) and was successfully applied for source detection in cluster cosmology (Vikhlinin et al. 1998;Pacaud et al. 2006). Finoguenov et al. (2009Finoguenov et al. ( , 2010bFinoguenov et al. ( , 2015, Erfanianfar et al. (2013), Mirkazemi et al. (2015), and Gozaliasl et al. (2019) applied the method to detect groups and clusters of galaxies using only the large scales of the X-ray emission. In this paper, we present the adaptation of the wavelet decomposition method for eROSITA.
The paper contains the characterization of the eROSITA point-spread function, simulations of eROSITA observations of the extragalactic fields, calibration of the point-source model, description of the cluster detection pipeline, and its characterization using synthetic simulations.

eROSITA and the eROSITA simulator
eROSITA is a new X-ray telescope that was launched in July 2019 on board the SRG. The full description of the telescope can be found in Predehl et al. (2018).

Point-spread function
The point-spread function (PSF) of an X-ray telescope describes its ability to focus photons. The image produced by a point source is blurred, mostly as a result of misalignments and microroughnesses or is caused by the support structures of the instrument's grazing incidence mirrors. The shape and size of the PSF depends among others on the photon energy and its distance from the optical axis. The current eROSITA PSF model is based on measurements made at the PANTER X-ray test facility, where the PSF is sampled on an 11 × 11 grid, plus an additional central 6 × 6 grid to increase the small off-axis angle density. Each grid is spaced by 6 , and the two grids are displaced by 3 with respect to one another. The energy dependence is sampled using X-ray emission lines at photon energies of 0.3, 0.9, 1.5, 3.0, 4.5, 6.4, and 8.0 keV. The PSF image at each position and energy is described by shapelets (Refregier 2003), that is, by a linear image decomposition into a series of differently shaped basis functions of characteristic scales. The shapelet description is a convenient way to compress the PSF information over a few coefficients. Two different scale parameters with individual shapelet coefficients are used in order to reproduce the complex behavior of the PSF core on small scales and the PSF wings on large scales. Each of the seven eROSITA mirror modules are made out of 54 nested Wolter-I type (Wolter 1952) shells and have their individual PSF measurements. However, in the current implemen-tation of the X-ray telescope simulator (Sect. 2.4), the PSFs of all seven modules are assumed to be the same, using only the shapelet reconstruction of flight module number 2. We note that the eROSITA PSF will be different in orbit, for example, due to shaking of the telescope during launch or temperature and gravitational effects. During the performance-verification and all-sky survey phases, the eROSITA PSF will be determined and calibrated against ground-based measurements.

Point sources and background components
We followed the recipe of Clerc et al. (2018) and used SIXTE 1 (Dauser et al. 2019, see Sect. 2.4) to simulate eROSITA fields containing AGN and unresolved X-ray background. Individual AGNs were drawn from a luminosity function down to a field exposure time-dependent flux threshold and uniformly distributed in a field. Thus spatial clustering of AGNs and spatial correlations between AGNs and galaxy clusters are not considered; this is the topic of a future study. The AGN spectra of the lowluminosity tail of the distribution were stacked and redistributed uniformly to construct an unresolved X-ray background component. Emission of the hot plasma in the halo and disk of our Galaxy was simulated using a double MEKAL model (Mewe et al. 1985(Mewe et al. , 1986Liedahl et al. 1995) with temperatures of 0.081 keV and 0.204 keV (Lumb et al. 2002). In addition, a nonvignetted eROSITA instrument particle background component according to the expected radiation level at the Lagrange point L2 was simulated (Tenzer et al. 2010).

Extended objects
We here focus on the detection of extended sources. To compare our results to previous studies, we characterize the spatial flux distributions by spherically symmetric β-models (Cavaliere & Fusco-Femiano 1978) with β = 2/3 on a discrete grid of core radii. The cluster emission was characterized by a partially absorbed Astrophysical Plasma Emission Code (APEC, Brickhouse et al. 2000) model with a fixed abundance of 0.3 Z (Anders & Grevesse 1989) and a survey-field-dependent Galactic column density of hydrogen. The Galactic absorption was described by a phabs model (Balucinska-Church & McCammon 1992) and was fixed to 3 × 10 20 cm −2 , 8.8 × 10 20 cm −2 , and 6.3 × 10 20 cm −2 for the equatorial, intermediate, and deep field, respectively (see Sect. 2.5). Cluster temperatures, redshifts, and fluxes were sampled on a grid and ranged between 1-5 keV, 0.05-1.2, and 2 × 10 −15 -5 × 10 −15 erg s −1 cm −2 , respectively.

X-ray telescope simulator
The simulations of the extragalactic eROSITA sky were performed using the Monte Carlo based SIXTE simulator (Dauser et al. 2019). A sample of photons was produced based on the effective area of the instrument and input source characteristics, for example, a source spectrum, or if necessary, a model of the extent. These photons were virtually propagated through the instrument simulator. Based on the telescope specifications, a list of impact times, positions, and energies of the photons was produced. The final output event list was then created by simulating the read-out characteristics. The simulator already provides an implementation of the eROSITA characteristics described by the PSF, vignetting, response matrix files, and ancillary response files.

eROSITA mission planning and survey fields
We assumed a simple survey strategy for the four-year all-sky survey, where the scanning axis is pointed toward the Sun and eROSITA scans one great circle every four hours (Merloni et al. 2012). One full coverage of the sky is achieved every half year. We note that the final survey strategy will be more complicated due to additional constraints. Since the attitude file we used was created, the movable antenna was replaced by a fixed antenna, thus the spacecraft needs to perform compensating motions to maintain the angular constraints with respect to the Earth and the Sun. In addition, the antenna opening angle and the spacecraft-Sun-vector constraints were changed. This leads to a more inhomogeneous exposure in ecliptic longitude, among others.
We studied three 3.6 • × 3.6 • sky tiles with approximately 2 ks, 4 ks, and 10 ks exposure. We refer to these fields as equatorial, intermediate, and deep, respectively. Taking vignetting into account, the median net exposures of the fields were roughly halved, that is, approximately 1 ks, 2.5 ks, and 6 ks, respectively. The equatorial field shows a uniform exposure, but the deep field has a large exposure gradient (Clerc et al. 2018).

Source detection and characterization
The standard technique when source catalogs are created is to split source detection and characterization because different optimized software packages perform better on the individual tasks. After the initial detection, a maximum likelihood (ML) source characterization is used to separate extended and pointlike sources, based on the value and the significance of the extent (Vikhlinin et al. 1998;Burenin et al. 2007;Pacaud et al. 2016;Clerc et al. 2018). The approach of splitting detection and characterization is also implemented in the standard eROSITA data-processing pipeline based on the eROSITA Science Analysis Software System (eSASS) 2 . The forward-fitting routine employed by the ML fitting ensures the best sensitivity toward detecting an object with the assumed characteristics. However, the assumed symmetric β = 2/3 model is too simplistic for many extended sources. The goal of our investigation is to provide a framework that selects extended sources based on their extended emission rather than relying on a blind fitting method. Our galaxy cluster detection scheme is physically motivated and sensitive to the outer self-similar cluster regions. This ensures cluster selection from the point of view of best cluster characterization because the outer cluster regions show less scatter at a given cluster mass.

Wavelet decomposition method
The general idea of wavelet decomposition is the isolation of differently sized structures by convolving the input image with kernels of variable scales. Starting with the smallest scale, significant emission is subtracted before continuing on the next larger scale. This allows us to model point-source emission based upon their detection on scales that are unresolved or are the size of the PSF. The angular sizes of these scales depend on instrumental and observational characteristics and can vary from arcseconds for the Chandra observatory to arcminutes for ROSAT all-skysurvey data. We refer to these small scales as point-like emission detection (PED) scales, and greater scales are labeled extended emission detection (EED) scales. The removal of point sources based on the PED-scale detection and a PSF model prior to running the wavelet decomposition on EED scales is a natural step within the philosophy of wavelet decomposition and was introduced by Finoguenov et al. (2009). Following this approach, the general concept of our algorithm is to detect point sources and extended sources separately. An overview of the general steps of our procedure is as follows: [1.] The first step is the calibration of point-source modeling, which means addressing which angular sizes the PED scales have in the particular science case. We only used the simulated image of point sources, which contains resolved and unresolved sources. The background level was determined by iterating the detection of point-source emission and excising point sources from the background estimates, as was done for XMM-Newton and Chandra in Finoguenov et al. (2015). Next, we modeled the instrument PSF with a sum of Gaussians without assuming any prior knowledge about its shape. This has the advantage of being robust and fast to implement. We modeled the PSF up to a scale on which the emission is almost free of pointsource contamination. These scales are defined as PED scales. Using the converged background estimate, we ran the detection of point sources on the PED scales to obtain a wavelet image of resolved point sources. This point-source image was smoothed with Gaussians of different widths to obtain fitting templates. We fit these templates to the wavelet-subtracted image to derive the amplitude of the image that best describes the residuals. We did this by cross-correlating the maps in order to take the covariance of the templates into account. The results are individual normalization coefficients for the used templates. These normalization coefficients were used to model the PSF effect in the simulations that contain extended sources. We note that including actual PSF measurements might improve the description of the PSF wings, which cannot be characterized by our approach of combining several Gaussians. The point-source subtraction technique has proven to be very efficient in deep X-ray fields and has also allowed the separation of extended sources due to inverse Compton scattering of the cosmic microwave background photons on the relativistic plasma of radio jets (Finoguenov et al. 2010a;Jelić et al. 2010). The detection threshold is the level in the convolved image above which the peaks are statistically significant. For the purpose of subtracting or modeling point-source contamination, the detection thresholds considered extended to 3σ (Vikhlinin et al. 1998).
[2.(a/b)] After the calibration of the point-source model, we ran the detection of point sources on the realistic simulations, which contain both point and extended sources. We smoothed the resulting point-source wavelet image with Gaussians of the same widths as in the calibration. These templates were multiplied by the normalization coefficients obtained in the calibration and were added to the unsmoothed wavelet image to model the point-source emission on the PED scales.
[2.(c/d)] To preserve the Poisson statistics, we added the point-source model to the background estimate and searched for residual signal over the background plus point-source model to detect and catalog extended objects. As a result, we obtained maps that were free of point-source emission. The maps retain the spatial shape of the extended source emission, such that ellipticity can be measured, for example. In addition, the maps allow for a simple visual characterization of the detected emission. This can be a complicated task, for instance, if the cluster does not look like a β-model because of extended source confusion. Furthermore, maps obtained by different satellites can be combined, as was done for Chandra and XMM-Newton observations in Finoguenov et al. (2015). The choice of the detection thresholds for cataloging the extended sources depends on the objective, and they were adjusted to the desired level of completeness and purity of the catalog, as discussed in Sect. 6.1. Typically, the detection thresholds were at least 4σ.
The goal of our pipeline is to select sources based on the extended emission, compared to selecting sources based on a symmetric β = 2/3 model fit to some angular range. Thus, our catalogs include sources with a greater variety of shapes. This detection scheme has obvious benefits at low-mass halos, such as galaxy groups, because they exhibit a wide variety of X-ray morphologies (e.g., Finoguenov et al. 2006Finoguenov et al. , 2007. From the point of view of source selection, the effect of contaminating sources is very different between this pipeline and classical wavelets. Here, the ability to detect and select a cluster as an extended source might be reduced due to the large noise caused by point-source induced background, while in other methods, the source might be classified as a cluster because of the point-source contribution to the total flux. If we were to only keep the emission above the selected detection threshold, we would discard the bulk of the source flux. Wavelets provide a secondary filtering threshold for estimating the region around the detected maximum where significant flux is detected. A lower filtering threshold compared to the detection threshold therefore minimizes the loss of source flux by keeping a larger region around the detected maximum. This region can be used in the flux estimation of the source in the pointsource-subtracted map. However, setting the filtering threshold too low has the drawback of potentially including secondary peaks within the region around the main peak, which would normally not be detected. These secondary peaks might increase the number of spurious detections. Flux measurements within a wavelet reconstructed region have been extensively tested in Connelly et al. (2012). Together with the source flux, the detection efficiency of galaxy clusters depends on their extent. To achieve a comparison to previous studies, we considered the performance of our pipeline by adopting the same framework as for β-model profiles. Within the β-model approach, the extent is characterized by the value of the core radius. A discussion to extend the existing β-model tools to capture the wide variety of expected source shapes is beyond the scope of this paper.
In addition to the standard β-model characterization, our pipeline can be calibrated using any set of cluster characterization. In addition, the catalog of extended sources can be fed back into the β-model-extent fitting routine to identify why certain sources are lacking from the cluster list. This approach is similar to the XMM-XXL survey pipeline (Pacaud et al. 2006). We note that compared to our method, the flux estimate of the XMM-XXL pipeline includes potential excess cool-core emission.
3.2. Adjusting the detection pipeline to eROSITA As described in Sect. 3.1, the proposed source detection algorithm needs to be tuned to the characteristics of the particular observation. In this section, we focus on how to adapt the general framework to eROSITA. Currently, our training is limited to the pre-flight calibration, and a further tuning of the pipeline is required in-flight. Compared to similar pipelines for Chandra and XMM-Newton, we have not yet addressed the minor deficiencies associated with wavelet flux redistribution between adjacent scales. This will be accomplished as a part of the in-flight calibration and will serve to reduce the root mean square of the residual image. Right now, we propose an effective scheme of the procedure and apply it to current eROSITA-survey mock observations. We follow this path because the incorporation of in-orbit PSF calibration data into the software analysis has proven to be time consuming in our experience.
Similar to Chandra and XMM-Newton, the off-axis degradation of the eROSITA PSF is driven by the fact that the detector plane is out of focus. Thus, we can directly apply our experience with developing the source detection algorithm for Chandra and XMM-Newton to eROSITA. However, the eROSITA maximum degradation in terms of the half-energy width is 20% (Predehl et al. 2010), which is lower than for XMM-Newton and far lower compared to Chandra. The eROSITA PSF does not have a core, but a typical survey half-energy width of 28 at 1 keV. In scanning mode, the eROSITA PSF is roughly uniform across sky tiles. The detector pixel size corresponds to 9.6 , and sky tiles are rebinned into images with 4 pixel size. In our simulations, the impact position of each photon is known. In the eROSITA survey, the rebinning will be made by reconstructing split events using the charge division among adjacent pixels, allowing for subpixel resolution (Dennerl et al. 2012). We detect sources in eROSITA-survey images in the 0.5-2 keV energy band. Events are not split or selected based on their off-axis angles.
[1.] First, we study the limitations of the point-source-model process on the eROSITA cluster detection. The goal is to answer the questions whether we can reliably model the point-source contribution and to define the angular scales required for this. The angular scales on which point sources are first detected is a strong function of the survey depths, instrument PSF, and assumed background. A discussion of the effects is presented in Mirkazemi et al. (2015). With respect to eROSITA survey observations, we are not able to reliably predict the residual pointsource emission on scales below 32 because most of the point sources are only detected on the 32 wavelet scale. Even on scales of 64 , we detect point sources that are not detected on any smaller scales. The point-source contamination on scales starting from 128 is minimal. In training for the point-source model, we ran the wavelet decomposition up to a scale of 32 . Because we are interested in a complete source subtraction, we adopted a low detection and filtering threshold of 3.3σ and 1σ, respectively. These small scales are smoothed with Gaussians of 64 and 128 widths and fitted to the 32 wavelet-subtracted image. The two Gaussian-smoothed templates describe the residual image best, with normalization coefficients of 0.47 and 0.1, respectively. We did not include the 64 scale in modeling the point-source flux on the EED scales because we wished to retain sensitivity for extended objects on this scale.
[2.(c/d)] The prediction of the point-source emission on PED scales was included in the background model, and we ran the wavelet decomposition on the EED scales in order to detect and catalog extended objects. A widely adopted way for cleaning catalogs is to set the detection threshold for extended sources higher, which reduces the chance of including misclassified point sources as extended. For eROSITA, we did not detect point sources on the 64 scale when we set the detection threshold to 7σ. On the other hand, scales starting from 128 are already very clean from the point-source contamination, and the lowest statistically motivated thresholds can be adopted there. This is very good news for the science of galaxy groups with eROSITA, as well as for studies of the unresolved background fluctuation. For this work, we illustrate the performance of the pipeline using two detection thresholds: one maximally sensitive, of 4σ, and another maximally clean, of 7σ. The filtering thresholds were set to 1.6σ and 3σ for high-sensitivity and lowcontamination wavelet detection, respectively. In addition, we adapt a 5σ detection threshold with a 1.6σ filtering threshold in Sect. 6.1 for a better comparison with an existing study. We motivate these thresholds further in Sect 6.2. Our current simulations do not consider spatial AGN clustering, and the quantification of this effect is topic of a future study. We note that a potential AGN clustering might create false fluctuations on larger scales.
Considering the shallow depths of the eROSITA survey, the limitation of using EED scales is primarily for detecting sources at high redshift (z ∼ 1). There, using smaller scales for fitting the cluster shapes will be complicated by the enhanced AGN activity in clusters (Biffi et al. 2018) and might present a fundamental limitation of the survey to achieve clean high-z cluster flux estimates in any case, as opposed to merely detecting a cluster. When the in-orbit background is higher than we assume here, the detection threshold can be lowered, staying the same in terms of the source flux. Thus, our results in terms of source flux detection will be quite representative for a wide range of in-orbit conditions.

Selection criteria
The point-source-cleaned maps provide a way to detect extended sources and to measure their flux. From the point of view of the flux extraction, it is clear that the flux on the spatial scales used to estimate the point-source flux will be partially removed. On the other scales, the work on eROSITA sample construction has put forward a demand on defining the simplest possible observable on which the selection is made with a preferable step-functionlike selection (Grandis et al. 2018). In our method, this is the residual cluster flux in the 1-4 range. This represents a simple aperture extraction, which is linked to the total cluster flux. The radial range is not directly motivated by the wavelet analysis, except that we need to consider scales above 1 due to point-source confusion (see Sect. 6.1). In addition, we also present a consideration of the source flux in the 1-16 range. This allows us to study the effects of using a larger aperture. Our experiments with cluster detection in the equatorial fields led to a conclusion of using 40 and 80 counts for these two detection ranges, respectively. This assumes a lower detection threshold for the large area. Previous studies (e.g., Pillepich et al. 2012;Borm et al. 2014) neglected aperture effects in addition to a count threshold and assumed a fixed minimum number of total photons to classify a source as cluster. This leads to an artificially high sensitivity toward a detection of X-ray emission from low-redshift galaxy groups. For cosmological studies, however, these systems are not the intended targets and can be neglected. In the following we derive the analytical description of the cluster selection based on these two thresholds.
It is clear that the redshift range for which our technique is the most attractive is also the range where the selected radial range samples the part of the cluster with the lowest scatter against the total mass. This corresponds to typical clusters of, for example, 10 14 M at redshift 0.4. The sampled part of the cluster changes with redshift as well as mass, and we prefer to model this effect as opposed to changing the extraction region as a function of the redshift-dependent limiting mass. The actual reconstruction of the cluster properties does not have to follow this prescription, and several efforts are underway to provide the core-excised luminosity for the eROSITA clusters (e.g., Eckert et al. in prep.).
Using the integrated counts (or count rates) is just one of the possibilities for cluster selection based on our maps. Our source lists can be used with the ML fitting in its standard form and in the modified form, in which the core radii of the clusters are examined only at large radii. This avoids the influence of the cool cores on the estimate, as found by Käfer et al. (2019).

Theoretical predictions
When we assume the minimum number of counts to detect a galaxy cluster as an extended object (C det ) in a field with a given exposure time (T exp ), we can iteratively calculate the corresponding cluster flux ( f 500,lim ), luminosity (L 500,lim ), and mass limit (M 500,lim ) as a function of redshift. Given an initial cluster mass and temperature-limit guess, the corresponding overdensity radii are calculated assuming spherical symmetry through r 500,lim = 3M 500,lim 4π · 500ρ crit,z 1/3 . (1) The core radii are assumed to scale with the overdensity radii (r c = r 500 /3). This ensures that the apparent size scales with redshift, that is, clusters at higher redshift have a smaller angular extent. The relation between core and overdensity radii is calibrated on non-cool-core clusters at low redshift (Käfer et al. 2019) and holds at high redshift, where the relative contribution of the cool core to the outer parts of the cluster becomes minor (McDonald et al. 2013). The compactness of clusters at high redshift matters for the detection. In practice, we need to characterize the detected population of groups and clusters and correct the numbers for the differential sensitivity of the detection method. With the core radius estimate, the count rate of a cluster is calculated by integrating a single β-model with fixed slope (β = 2/3) in a given radial range. Realistic deviations from the β = 2/3 assumption have little impact on the shown thresholds because the actual distribution of the counts is less important. We denote the β-model count rate on the 1-4 scale as R(1 , 4 ) and the count rate within r 500 as R(0 , r 500 ). Both predicted β-model count rates are independent of PSF redistribution effects. We used the X-ray spectral-fitting program XSPEC (Arnaud 1996) as well as the temperature guess to calculate the conversion factor of count rate to flux (λ RF ) by dividing the model flux of a partially absorbed APEC model (see Sect. 2.3) with unity normalization by the corresponding APEC-model count rate. The conversion factors of count rate to flux range between (6.45-7.65) × 10 −13 erg count −1 cm −2 . The cluster flux limit is derived according to Using the redshift, we calculated the conversion factor of count rate to luminosity (λ RL ) by shifting the desired rest-frame energy band (0.5-2 keV) to the observed one. This is in order to correct for the fact that the energies of detected photons in a given passband are (1 + z) times lower than in the cluster restframe. The intrinsic APEC-model luminosity is calculated by multiplying the unabsorbed APEC-model flux in the observed band with 4π times luminosity distance squared and divided by the APEC-model count rate to obtain λ RL . The luminosity conversion factor as a function of redshift is roughly a broken power law, in the form of monomials, with break point around z = 0.1. It ranges between 2 × 10 39 -2 × 10 43 erg count −1 for redshifts between 0.001 to 0.1 and steepens to values of 2 × 10 46 erg count −1 at z = 2. Replacing λ RF in Eq. 2 with λ RL yields the cluster luminosity limit. Then, the cluster temperature and mass are updated according to the Giles et al. (2016) temperature-luminosity, kT lim = 3 keV L 500,lim 3 · 10 43 erg s −1 With these updated temperature and mass estimates, the procedure starts over and iterates until the change in mass is lower than 0.1%. As outlined above, we calculated the different selection thresholds for a step-function-like cluster detection (see Sect. 4) with 40 and 80 counts on the 1-4 and 1-16 scale, respectively. The flux and luminosity limit of the two angular scales in fields with different exposures are shown in Fig. 1 and Fig. 2, respectively. Figure 3 shows the analytical cluster mass and overdensity radius limit as a function of redshift. The 1-16 scale has a lower sensitivity at higher redshift because the area is larger, but it performs better at lower redshift than the 1-4 scale. This is promising for galaxy group studies with eROSITA, assuming that the considered scaling relations hold at these low masses. The core radius limit as a function of flux is shown in Fig. 4. The optimal core radius to detect clusters is approximately 1 . For a smaller extent, the flux threshold increases because the surface brightness profiles decline faster, such that there are fewer counts in the outskirts. For a larger extent, the flat inner core of the beta model profile causes more photons to lie beyond 4 and 16 . This also causes the crossing of the scales around 2 . As expected, the flux threshold decreases with increasing net exposure time. Figure 5 shows the total count limit of clusters on the two considered angular scales as a function of redshift. Toward low redshift, increasingly larger statistics are required to detect a cluster because the angular extent increases. This emphasises the challenge for eROSITA to securely detect very nearby extended sources. We used the Python packages COLOSSUS (Diemer 2018) and Astropy (Astropy Collaboration et al. 2013 to calculate the differential number of galaxy clusters per square degree at a given redshift by integrating the cluster mass function (dn/dM, Tinker et al. 2008) in units of Mpc −3 multiplied by the differential comoving volume (dV/dz) in units of Mpc 3 /deg 2 over mass, The lower integration limit, M lim (z), corresponds to the cluster mass limit at the corresponding redshift, and we set the upper limit, M max , to 10 16 M , above which the contribution of the mass function to the integral is negligible. Figure 6 shows the differential number of galaxy clusters per square degree as a function of redshift for the three final eROSITA survey fields.
We computed the total number of clusters in a given survey area A s detected by eROSITA according to For the performance verification (PV) phase of eROSITA, a program to reach the average equatorial depth of the final survey on a smaller patch of the sky is planned, the eROSITA Final Equatorial-Depth Survey (eFEDS). This will demonstrate the survey capabilities of eROSITA and will allow us to calibrate the scaling relations of galaxy clusters. When we assume an upper redshift limit of z max = 2, 3 ks net exposure, and a survey area of 180 square degree, the analytical expectation is to detect approximately 625 clusters using the proposed detection scheme. An in-depth cosmological forecast for galaxy cluster observations with eROSITA is left to a future study.

Simulated field
This section demonstrates the performance of the source detection based on wavelet decomposition and characterization on a simulated equatorial eROSITA survey field. It serves as an exemplification of the method, and the final adjustments and finetuning of the pipeline need to be made on real eROSITA data. We simulated the field as described in Sect. 2 and processed the output of the simulator using a preliminary version of the eSASS package (User release of 2018 April 20).

Selection function of extended sources
The determination of X-ray survey extended source catalogs and the corresponding selection functions is a trade-off between completeness and purity. The completeness describes the fraction of clusters as function of mass and redshift. Determining it requires an accurate galaxy cluster model because the extended source detection probability depends on the cluster shape. The purity characterizes the contamination of the final sample and requires realistic synthetic simulations. Contamination occurs as a result of point sources that are misclassified as extended or detections that cannot be associated with any input source within a given search radius (spurious detections). We simulated clusters on a predefined spatial grid with a source density such that the emission from neighboring sources did not overlap. This prevented source confusion. The source detection is primarily on the 1-4 scale, and we cross-matched extended sources within this typical detection scale of 4 to the input catalogs. This radius is much smaller than the grid size and slightly larger than the maximum simulated core radius of 3.3 . We show the maximally clean (i.e., 7σ threshold) extended source detection efficiency in the final equatorial survey field as a function of core radius and input flux in Fig. 7. Similar to the wavelet decomposition techniques of Vikhlinin et al. (1998) and Burenin et al. (2007), our method requires larger photon statistics on compact sources to reduce point-source contamination. The deficiency of detecting compact objects is the topic of a future study, which relates the angular size to physical scales of the galaxy cluster. A study of the trade-off between cool-core bias and detection efficiency is also deferred to a future work. The question of how to clean the PED scales is still open. One possibility is to perform a blind analysis by feeding the maximally sensitive source candidate list into the eSASS ML fitting routine. For each candidate, a set of source parameters (position, count rate, and extent) was determined by fitting a PSF-convolved β-model to the spatial distribution of the source counts. The final extended source catalog was compiled by exploring the output parameter space (detection likelihood, extent parameter, and extent likelihood) and by determining appropriate classification thresholds, for instance, to distinguish point-like and extended sources or reduce contamination. This resembles the approach used in Clerc et al. (2018) to characterize extended sources that are detected by a sliding-cell algorithm, which scans the X-ray image with a sliding square box of different sizes and weights the counts in the detection box with a β-model kernel. This method is a modified version of a sliding-cell and ML fitting adapted for the XMM-Newton Science Analysis Software. Valtchanov et al. (2001) compared the performances of several source detection algorithms and found serious drawbacks of this method for the analysis of extended sources because a relatively large number of spurious detections are made and extended sources are split. The sliding-cell method has a high detection rate of sources with small angular extent ( 60 ) at the cost of higher contamination in the ML characterization. When we assume that the detection comes from similar Article number, page 7 of 14 A&A proofs: manuscript no. AA_2019_36131 angular scales, the region with most of the misclassified AGNs is excluded when we apply our maximally clean threshold of 7σ (extension likelihood of approximately 50) and the extent cut of 60 to Fig. 9 of Clerc et al. (2018). Our detection algorithm naturally excludes this highly contaminated region and does not require tuning of extended-source parameters like in the classical wavelet or sliding-cell approach. Thus, both detection methods can be used complementary or individually to determine discrepancies in the recovered cosmological parameters. Above 60 , the detection probability stays roughly constant for clusters with larger core radius and does not decrease for clusters up to 200 because the cluster fluxes are spread over a larger area. Thus, our detection algorithm outperforms the sliding-cell plus ML characterization routine (see Fig. 7) for large extended sources above approximately 80 . In Fig. 7 we also show the 90% completeness level of the 5σ detection threshold. This threshold corresponds to a similar number of detected clusters per square degree between the sliding-cell plus ML characterization algorithm and our method (see below). At the expense of purity, the sliding-cell method is more sensitive for extended sources with core radii smaller than approximately 40 , which correspond to clusters with r 500 values below 2 . For the eROSITA survey, this gain in sensitivity is a minor effect because the flux of these objects is expected to be close to zero. Our proposed scheme shows an improvement in detection for flat sources, which are considered as background in other techniques. A more realistic treatment of cluster shapes requires a library of real cluster images, also to properly scale the cool-core emission. This is left for a future study. The classical wavelet approach for eROSITA source detection is under development, and we can only compare to the existing study based on the sliding-cell algorithm. The main difference is a change in input list because it also requires an ML characterization. Similar to the description in Sect. 5, we used the input temperature to convert the input flux into a luminosity and also used the XXL scaling relations to calculate the galaxy cluster mass, M 500,ML . The extended source detection efficiency as a function of mass and redshift is shown in Fig. 8. The increasing apparent size toward low redshift causes a drop in the detection efficiency.
We folded the 5σ and 7σ selection on the 1-4 scale (Figs. 7 and 8), as well as the sliding-cell selection (Clerc et al. 2018, Appendix A), into the calculation of the differential number of clusters per square degree by multiplying the mass function in Eq. 5 with the probability of detecting a cluster of the given mass, that is, the selection function θ(M), In practice, we analytically parameterized the selection as a function of core radius and flux. The overall functional form of the detection efficiency is described by an error function, which was scaled to range between zero and one. The overall shape of the error function is defined by its argument. Compared to Clerc et al. (2018), we required a more complex functional form of the argument because it needs to describe a change in slope for different core radii in addition to an offset in flux for different core radius values. The goal is to find a functional form that is as simple as possible but still accounts for these observed features. The functional form of the argument is found by iteratively adding . . more complexity to it until the detection efficiency is described well. Then, the free parameters are optimized using a Markov chain Monte Carlo posterior sampling technique. Therefore, the functional form has no physical motivation. To improve the iterative finding of the functional form, we reduced the dynamical range of the core radius and flux by taking the logarithm and subtracted the corresponding means to rescale the offsets. The The parameters a, b, c, and d depend on the detection threshold. We show the models and their parameters in Fig. 9 and Table  1 for the 5σ and 7σ thresholds. These simple models cannot capture the complexity of the selection, but they provide a good estimate of the detection efficiency. The impact of the different selection functions on the differential number counts is shown in Fig. 10. The expected number of galaxy clusters per square degree for the Clerc et al. (2018) and the 5σ selection is approximately 4.2. At the cost of reduced purity, high-redshift clusters are detected more efficiently by the sliding-cell algorithm plus ML fitting technique, while the method based on wavelet decomposition performs much better in detecting the local population, that is, in particular galaxy groups. The 7σ selection reduces the contamination by more than two orders of magnitude (see Sect. 6.2), but the number of detected clusters per square degree is, with approximately 1.7, more than halved. We require better knowledge of how the background behaves in reality to securely forecast the detection of very extended lowredshift objects for which the core radius limits are larger than 200 . The uncertainty on small scales is dominated by the unknown shape of the survey PSF. The eROSITA PSF does not vary much over the eROSITA field of view compared to other X-ray instruments like Chandra and is, to first approximation, constant in survey mode. An interesting planned implementation for our proposed method is therefore subtracting point sources using a precise PSF model in the ML fitting routine.

Selection
We address the question how well the detection through cluster outskirts resembles a favored step-function-like selection. Figure  11 shows the detection efficiency on two angular scales and different core radius bins as a function of predicted model counts, which are independent of PSF effects. For a given number of predicted counts, clusters with larger extent are detected more efficiently. In other words, even with a larger number of predicted counts, clusters with smaller extent are harder to detect. The interesting finding that gradually increasing counts toward smaller core radii are required is summarized in Fig. 12, showing the predicted model counts for a given detection efficiency as a function of core radius. In addition, it shows values of the model count ratio on the 0-1 over the 1-4 and 1-16 radial range, respectively. This emphasizes that for a given detection efficiency, the required counts in the outskirts increase with increasing inner-to-outer counts ratios. Considering an additional contribution of AGN in cluster centers, this is particular challenging for clusters above a redshift of 0.6, where simulations indicate that the distribution of the ratio becomes broader and Core radius (arcsec) exhibits a significant fraction larger than two (Biffi et al. 2018). These findings motivate the estimation of contamination due to bright sources and due to low photon statistics separately because the flux distribution of faint sources is different from that of bright sources. We studied these two effects by creating two extended source catalogs, setting the detection thresholds of cataloging to 4σ and 7σ for the maximally sensitive and maximally clean selection, respectively. The number of false detections as a function of detection threshold is shown in Fig. 13. We obtain close to 1.1 and 0.008 spurious or misclassified extended sources per square degree in equatorial fields for the 4σ and 7σ detection thresholds, respectively. Detection thresholds greater than 7σ show zero contamination but also a lower detection efficiency in regimes of low photon statistics. The extended source detection efficiency as a function of detection threshold is exemplified, showing mass and redshift dependencies in Fig. 14.
In several cases, the efficiency for the 2σ threshold drops because the algorithm keeps so much structure that the extracted sources cannot be associated with the correct input within the given matching radius.

Summary and conclusions
Large-area X-ray cluster surveys are powerful tools for deriving cosmological parameters when the selection effects are well understood. We proposed and characterized an algorithm based on a wavelet decomposition to detect extended source for the upcoming eROSITA mission. This technique produces welldefined cluster catalogs with simple selection functions. We detect clusters by their large-scale emission, which minimizes the predominant impact of excess cool-core emission. Our main result is that progressively more counts are required with decreasing cluster extent to achieve a specific detection efficiency. In addition, our analytical calculation shows that an increasing number of total counts toward low redshift is required, meaning a larger angular extent, to detect clusters as extended sources. These two findings disgree with the assumption that a fixed min- imum number of total photons are necessary to identify clusters (e.g., Pillepich et al. 2012;Borm et al. 2014). We predict redshift-dependent cluster observables and mass limits for an equatorial, intermediate, and deep final eROSITA survey field by assuming a minimum number of 40 and 80 counts to identify a cluster on a 1-4 and 1-16 angular scale, respectively. The counts in the cluster outskirts define an easy-to-measure observable, and applying a minimum photon threshold provides a selection that approximately resembles a step function. We tested the performance of our detection scheme through Monte Carlo simulations of a final equatorial eROSITA survey field of approximately 1 ks net exposure time. Our maximally clean detection method requires larger photon statistics on objects with core radii smaller than 60 to minimize point-source contamination and has an approximately 90% detection efficiency at input fluxes of 10 −13 erg s −1 cm −2 for clusters with larger extent. This is complementary to the sliding-cell algorithm plus ML fitting technique that is currently implemented as default in eSASS, which shows a drop in detection efficiency at this flux for clusters with core radii larger than 60 (Clerc et al. 2018). We note that this blind analysis approach increases the contamination of the final catalog by misclassified AGNs and spurious extended sources. At a similar level of completeness, our catalogs are approximately 2.5 times purer than the current eSASS default. Our performance results are limited because we worked with preflight assumptions of instrumental and astrophysical characteristics. The proposed pipeline has the advantage that the final tuning, that is, the point-source model training due to a different in-orbit PSF or the optimized selection of the detection thresholds, is easy to implement, robust, and can be achieved very fast during the PV phase. An in-flight calibration of the pipeline below 5% is expected to keep the loss of clusters through central AGN contributions below 1%. r c > 100 60 < r c ≤ 100 30 < r c ≤ 60 r c ≤ 30 Fig. 11. Detection efficiency as a function of predicted model counts on the 1-4 (upper panel) and 1-16 (lower panel) radial scale for four core radius bins. The dotted vertical lines correspond to 40 and 80 aperture counts, respectively.
Predicted model counts