Synthetic simulations of the extragalactic sky seen by eROSITA. I. Pre-launch selection functions from Monte-Carlo simulations

Studies of galaxy clusters provide stringent constraints on models of structure formation. Provided that selection effects are under control, large X-ray surveys are well suited to derive cosmological parameters, in particular those governing the dark energy equation of state. We forecast the capabilities of the all-sky eROSITA (the extended ROentgen Survey with an Imaging Telescope Array) survey to be achieved by the early 2020s. We bring special attention to modeling the entire chain from photon emission to source detection and cataloguing. The selection function of galaxy clusters for the upcoming eROSITA mission is investigated by means of extensive and dedicated Monte-Carlo simulations. Employing a combination of accurate instrument characterization and of state-of-the-art source detection technique, we determine a cluster detection efficiency based on the cluster fluxes and sizes. Using this eROSITA cluster selection function, we find that eROSITA will detect a total of $\sim 10^5$ clusters in the extra-galactic sky. This number of clusters will allow eROSITA to put stringent constraints on cosmological models. We show that incomplete assumptions on selection effects, such as neglecting the distribution of cluster sizes, induce a bias in the derived value of cosmological parameters. Synthetic simulations of the eROSITA sky capture the essential characteristics impacting the next-generation galaxy cluster surveys and they highlight parameters requiring tight monitoring in order to avoid biases in cosmological analyses.


Introduction
Clusters of galaxies are the most massive matter halos. They formed last in the history of the Universe by a hierarchical growth of structures in the Hubble expansion flow. Their presence, observed space density, and mass distributions confirm the standard cosmological model (e.g. Hasselfield et al. 2013;Mantz et al. 2014;Planck Collaboration XXIV 2016;de Haan et al. 2016), making galaxy clusters powerful probes of cosmological parameters, such as the dark energy content and its equation of state (e.g. Vikhlinin et al. 2009); see also Allen et al. (2011) for a review. The identification and study of the different components of galaxy clusters (dark matter halo, intracluster medium, galaxies, and relativistic particles) require the use of several different observational techniques. Among such techniques, X-ray observations stand out, since clusters of galaxies are the most luminous extended sources in the extra-galactic X-ray sky, and therefore are easily detectable in large surveys. The importance of galaxy clusters in a cosmological context has been realized since the pioneering surveys undertaken with the Einstein observatory (e.g. Forman & Jones 1982;Gioia et al. 1990), followed by studies with the ROSAT all-sky survey (e.g. Ebeling et al. 2000;Borgani et al. 2001;Böhringer et al. 2004Böhringer et al. , 2017Henry et al. 2009;Klein et al. 2018). By simply counting the number of observed galaxy clusters one can confront cosmological model predictions and survey observations. However, it has been established that observational selection effects play a crucial role and must be controlled accurately when pursuing the goal of precision cosmology (e.g. Vikhlinin et al. 2009;Mantz et al. 2010b;Allen et al. 2011;Pacaud et al. 2016).
X-ray astronomy will enter a new era with the extended ROentgen Survey with an Imaging Telescope Array (eROSITA, Predehl 2017). This telescope is the primary instrument of the Russian/German Spektrum-Roentgen-Gamma (SRG) observatory, expected to be launched in 2019 (P. Predehl, priv. comm.). eROSITA will possess unprecedented sensitivity and imaging capabilities for extended source emission (Merloni et al. 2012), and allow the detection of approximately 10 5 galaxy clusters (Pillepich et al. 2012). In order to detect this huge number of galaxy clusters, eROSITA will scan the entire sky for four years, making it the second imaging X-ray all-sky survey ever made after ROSAT in the soft band (0.5−2 keV), and the first ever imaging survey in the hard band (2−8 keV). The promising capabilities of eROSITA bring great expectations to constrain dark matter and dark energy models through galaxy cluster science.
The derivation of a selection function for extended X-ray sources involves first their detection and then their classification as extended objects. Because extended objects are defined, in contrast to point-like sources, this paper also focuses on the simulation and selection of point-like sources in the eROSITA All-Sky Survey (eRASS).
A reliable detection probability function of point-sources is crucial for assessing the completeness of samples, understanding the X-ray background, evaluating clustering studies, and so on. Given the simple morphology of point-sources, detection probabilities may rely on knowledge of the local exposure time and background levels in a given observation (e.g. Georgakakis et al. 2008). An alternative and common approach consists in simulating mock observations accounting for a range of instrumental and astrophysical effects. Although this method is more computationally demanding, it embraces the entire chain from light emission to source detection and cataloguing, and this is the approach adopted in this work.
As mentioned previously, a selection function for extended sources is a critical ingredient in almost all studies of the X-ray galaxy cluster population, including cosmological studies, scaling relation works (Stanek et al. 2006;Pacaud et al. 2007;Mantz et al. 2010a;Giodini et al. 2013;Lovisari et al. 2015;Andreon et al. 2016), and detailed studies of the evolution of the intra-cluster medium physics and chemistry (see Böhringer & Werner 2010, for a review). The morphological complexity and diversity of the X-ray cluster population makes it more difficult to accurately describe selection effects. Comparison between samples detected at different wavelengths (e.g. Wen et al. 2012;Rozo et al. 2014;Sadibekova et al. 2014;Nurgaliev et al. 2017) allows an understanding of potential selection biases, but does not a priori provide a truth table for source detection. Therefore, Monte-Carlo simulations play an essential role in understanding the entire process leading to a validated galaxy cluster catalog. Reducing the diversity of cluster shapes to a sensible and reduced set of parameters sets limits on the computational demand, and, importantly, allows for a link between theoretical (e.g. mass, redshift, etc.) and observational quantities. Cluster fluxes and apparent sizes are among the most relevant of these observables (Böhringer et al. 2000;Pacaud et al. 2006;Burenin et al. 2007).
Such synthetic simulations are not the unique route to address the selection function of clusters and active galactic nuclei (AGNs). Numerical N-body and hydrodynamic simulations play an increasingly important role in this debate. Indeed, as they become more and more realistic in reproducing the observed sky at multiple wavelengths (e.g. Ragagnin et al. 2016), they offer invaluable support in the understanding of selection biases. However, the still large computational requirements limit their usage for statistical studies.
The aim of this work is to forecast and illustrate realistic selection functions for the eRASS cluster and point-source population. It relies on multiple realisations of selected areas of the eROSITA sky, with X-ray-emitting sources described by controlled parametric inputs. For instance, the galaxy cluster population is uniquely described by its apparent flux and size on the sky. We make a special effort to reproduce the main spectrophotometric features of the extragalactic point-source population (AGNs). For the first time, we process eRASS simulation fields with the eROSITA source-detection software (preliminary version). We derive realistic detection lists, similar to the real detection lists expected for scientific use. In particular, we explore thresholds needed to distinguish between spurious, point-like, and extended sources and provide, given a chosen set of cuts, a first series of selection functions for point-like and extended sources. We demonstrate their practical usability with a prediction of the distribution of galaxy clusters in the eROSITA sky by means of a forward-modelling approach.
This paper is constructed as follows. We first describe the built-in components of the simulations in Sect. 2, and then we describe the simulation engine at the core of the analysis in Sect. 3. We describe our selected simulation and instrumental setup, as well as our choice of fields in Sect. 4. In Sect. 5 we show the source detection results, in particular the selection functions. We discuss the impact of our important assumptions in Sect. 6 and bring perspectives in Sect. 7.

Simulated components
This section presents the main expected components in a typical blank field of our simulations of the extragalactic eROSITA sky.

AGN and cosmic X-ray background
We attempt to accurately reproduce the observed distribution of spectro-photometric properties of X-ray-emitting AGNs. A list of spectra and positions, each corresponding to an individual source, is produced down to extremely low fluxes. The integration of the low-flux tail of the distribution provides a model for the unresolved X-ray background component up to the limit at which we simulate sources individually.

Spectral models
We rely on a custom implementation of the formalism by Gilli et al. (2007) to generate spectral models on a log-spaced grid of energies in the range [0.1, 100] keV using XSPEC v12.7.0u (Arnaud 1996). Parameters governing the spectral shape of a source are a power-law photon index, Γ, the absorbing column density, N H , the source redshift, z, and the (unabsorbed) luminosity, L X , of the object in a given rest-frame 2−10 keV band. A critical parameter governing the choice of spectral model is the intrinsic absorption N H . We call unobscured those sources with log 10 N H < 21, Compton-thin those showing 21 < log 10 N H < 24, Compton-thick mild those with 24 < log 10 N H < 25, and Compton-thick heavy those that have log 10 N H > 25. For a given obscuration class, two regimes are considered, Seyfert or QSO, depending on whether the 0.5−2 keV rest-frame luminosity of the source is lower or greater than L X = 10 46 erg s −1 . We refer to Gilli et al. (2007) for details on the modelling of spectral energy distribution (SED) for each of these classes. The energy range and level of detail in the SED were chosen to match the expected detector performances of eROSITA. Depending on source class, they include a (cutoff) power-law with index Γ, and a 6.4 keV iron line with various equivalent widths (Gilli et al. 1999), possibly modulated by a reflection component. Compton-thick mild sources have their cut-off power-law replaced by a more complex plcabs model (Yaqoob et al. 1997). The source is redshifted before applying Fig. 1. Two-dimensional histogram distribution of simulated sources in one realisation of our X-ray AGN luminosity function sampling for a 22.7 deg 2 area on the sky (253, 297 sources in total). Each black contour encloses the fraction of sources indicated as a label. To each source belongs one X-ray spectral model uniquely defined by the source luminosity, redshift, power-law index Γ and absorbing column density N H (Sect. 2.1). an additional absorption by the Galaxy (N gal H ) depending on the location of the source on the sky. Finally, the flux of a source is obtained by integration of its SED, accounting for the luminosity distance computed in our reference cosmology.

Sampling the luminosity functions
Similarly to Gilli et al. (2007) we describe the luminosity function of unobscured AGN sources with the luminosity-dependent density evolution (LDDE) model of Hasinger et al. (2005). Obscured sources are sampled from the LDDE modulated by a multiplicative factor, ranging from four to one as the source intrinsic luminosity increases. Obscuration values are distributed following the prescription by Gilli et al. (2007), while power-law index parameters are drawn from a normal distribution of mean Γ = 1.9 and spread 0.2 regardless of the source obscuration level. Source luminosities range from 10 42 erg s −1 and redshifts span the 0 < z < 5 interval. After accounting for the cosmological volume, we compute the sky density n(Γ, N H , z, L X ) (units deg −2 ) and random-sample this distribution in order to obtain a discrete list of sources. Figure 1 represents the density of one such source list in the luminosity-redshift plane. Each source is then assigned an SED as described in the previous section. Sky positions are uniformly distributed in a field, as we do not aim to accurately model the spatial distribution of sources in this work (see Paper II, Ramos-Ceja et al. for a more detailed treatment).
We verified the validity of our sampling procedure by computing the flux distributions of the simulated sources in different bands. We compared our results to Gilli et al. (2007) and to published log N− log S : the agreement in the soft-band is excellent (see Fig. 2), while we predict twice as many heavily obscured sources (log 10 N H > 24) in the 2−10 keV band in comparison to Gilli et al. (2007). We attribute this discrepancy for the rarest sources to our choices made in the high-energy modeling of the SED. This has practically no impact on this work which focuses on the soft-band characteristics of the eROSITA images.

Constructing the unresolved X-ray background
The above-described procedure does not assume a lower limit on the flux of simulated sources. Sources well below the eROSITA detection limit are actually not simulated in order to save computation resources. A flux threshold f lim is set depending on the exposure time of a simulated field (Sect. 4.3) and only sources with f > f lim are individually simulated. The spectra of the remaining faint sources are stacked together and uniformly redistributed over a simulated patch of sky, thereby constituting one single "uniformly extended source" instead of many point-sources. By doing so, we ensure self-consistent and realistic modelling of the spectral emission of the X-ray background (XRB) generated by unresolved AGNs. As an illustration, the spectrum of the AGN background component in the equatorial field ( f lim = 3 × 10 −15 erg s −1 cm 2 ) in the 0.5−2 keV band with galactic absorption N gal H = 3 × 10 20 cm −2 is shown with a dashed line in Fig. 3. This figure also demonstrates the good agreement between the XMM-Newton measurements of (Lumb et al. 2002; derived from XMM-Newton observations with sources excised down to ∼10 −14 erg s −1 cm −2 in the soft-band) and our unresolved XRB model with a similar f lim .

Extended sources as β-models
Galaxy clusters are simulated in the simplest way using spherically symmetric β-models (Cavaliere & Fusco-Femiano 1978) with different fluxes and core radii values and β = 2/3. Our goal is indeed to derive selection functions that depend on a limited number of parameters. Sources representing galaxy clusters are randomly distributed across a simulated field, with a density of around 2 per deg 2 . Their spectral emission is rendered by an isothermal APEC model with 0.3 Z abundance, at temperature T ∈ {1, 5} keV and redshift z ∈ {0.3, 0.8}. Clusters have 0.5−2 keV fluxes chosen among discrete values ranging between 2 × 10 −15 and 5 × 10 −13 erg s −1 cm −2 ; core radii are also picked among discrete values ranging between 10 and 80 arcsec. The redshift and temperature of the spectral models have practically no impact on the 0.5−2 keV energy conversion factors transforming fluxes into count-rates, and therefore have no impact on the 0.5−2 keV detection tests, which are the core of this study.

Particle and galactic background components
In addition to the X-ray background originating from unresolved AGNs in the field, two other main background components were added to our set of simulations. The contribution of unresolved galaxy clusters and groups to the eROSITA soft-band background is neglected, since it is a small component in the energy and sensitivity regimes relevant to this study (e.g. Gilli et al. 1999Gilli et al. , 2007Kolodzig et al. 2017).
Following Lumb et al. (2002), the emission of the Galaxy is modelled with a double MEKAL model of temperatures 0.21 and 0.081 keV and solar abundance, representing the emission of the hot plasma located in the Galactic disk and halo. We assume a local photo absorbing column density equivalent to that of the field under consideration. We neglect here any spatially dependent contribution to the Galactic background such as emission from the Hot Local Bubble.
Particle background is sampled from a list of events drawn from a GEANT4 simulation designed to reproduce the expected radiation environment at the Lagrange point L2 (Tenzer et al. 2010). We assume this background component is not focused by the telescope mirror systems, and therefore it is not vignetted and impacts the detectors uniformly. Soft proton flares can create rapid enhancement of the level of unvignetted background. However, we limit our present study to the case of nominal particle background level and defer the analysis of the flare-induced background to further work. Therefore, the exposure assumptions in this work are on the optimistic side.

The eROSITA simulation engine
The simulations presented in this paper result in realistic eROSITA-calibrated event lists, similar to those expected to be delivered by the eROSITA ground segment. Such event lists contain the arrival time and CCD coordinates of the incoming events (photons or particles), as well as a reconstruction of their sky location and absolute energy. We reconstruct these characteristics assuming perfect knowledge of the calibration and spacecraft attitude. We make use of the Monte-Carlo simulator SIXTE 1 . This simulator virtually implements a realistic transfer function converting sky photons into detector events, accurately accounting for CCD characteristics (including response functions and clocking) and telescope mirror behaviour. In order to save computation time, some parts of the telescope+instrument transfer function are modelled statistically, thus deviating from a pure ray-tracing simulator. These simplifications show notably at the mirror (point-spread and vignetting functions) and the CCD (response function) stages. We refer to Schmid (2012) for a detailed description of the SIXTE and its implementation in the context of eROSITA.
The detectors were simulated assuming an integration time of 50 ms and a finite readout time of the 384 CCD lines (pileup effects are not relevant in this work). Response matrices are taken from rescaled EPIC-pn response matrices; those are of sufficient accuracy here, as we are focusing on broad-band properties. The field-of-view of each of the seven detectors is circular with a diameter of 1.02 deg, corresponding to the extent of the 384 × 384 pixel cameras with pixel size 9.6 .

Exposure maps and attitude files
A simple scanning strategy for the four-year survey is assumed in this work, with the spacecraft scanning axis always pointing towards the Sun. The actual spacecraft law will be subject to subtle changes in the scanning pattern in order to fulfill angular constraints linked to, for example, the solar panels or stray-light requirements. Those ultimately lead to less uniform all-sky exposure maps, as discussed in Merloni et al. (2012). Since the present paper focuses on small patches of sky sufficiently far away from the ecliptic poles, these differences are neglected. Extrapolation of our results to the all-sky survey needs, in principle, a proper treatment of these exposure variations. The corresponding attitude files describing the coordinates of the scanning axis in steps of 60 s serve as input to the simulator. We assumed no gaps or jumps over the full duration of the survey, as well as ideal reconstruction of the attitude from the on-board star trackers.

Point-spread function and vignetting
During the simulation procedure, photons originating from a source at infinite distance are redistributed using synthetic point-spread functions simulated with a ray-tracing procedure (P. Friedrich, priv. comm.). This accurately reproduces an eROSITA ideal mirror system made of 54 nested shells (Wolter-I configuration), including the spokes and the presence of an Xray baffle. Such simulations were performed assuming a focal length of 1.6 m and a 0.4 mm intra-focal shift of the detector relative to the best on-axis focal point. This small shift was found to optimize the overall survey PSF size, at the cost of degrading the on-axis PSF. We note that the actual point-spread function will be measured on the sky when the instrument operates and will be compared to ray-tracing simulations and ground measurements (e.g. as done at the PANTER facility).  The PSF we used is described as a tabulated series of images in steps of 1 off-axis angles ranging from 0 to 30 and for energies E = {1, 2, 3, 4, 7} keV (see Fig. 5). We assume constant PSF shape as a function of azimuthal angle, as we consider only axial rotation, as is usual with the Wolter-I telescope symmetry. Because it is counting photons individually, the ray-tracing simulation additionally provides an estimate of the vignetting factor on a grid of energies and off-axis angles. It is used to compute the ratio of flux between double-reflected photons and all photons emitted by a source located at a given off-axis angle, and usually expressed relative to the on-axis position. Figure 6 shows the combined effect of vignetting and PSF distortion on a bright point source passing about 50 times through the eROSITA fieldof-view during the four-year scan duration.  Notes. Each field is a square of 3.6 deg × 3.6 deg. The galactic absorption is assumed uniform with a value N gal H . AGNs are simulated individually down to a flux f lim in the 0.5−2 keV band and sources below f lim contribute to a diffuse background component. The maximal variation of exposure across a field is listed as ∆T exp = (T max − T min )/T mean .

Simulated fields
We selected three fields at specific locations in the eROSITA sky (see Fig. 4). A field corresponds to an elementary region of the eROSITA sky tiling pattern, and shows as a 3.6 deg × 3.6 deg square in tangential projection. In the following we name these fields: Equatorial (∼2 ks exposure time, uniform), Intermediate (∼4 ks, less uniform), and Deep (∼10 ks, larger exposure gradient). Table 1 provides key parameters relevant to these simulated fields. over the sky: any slight apparent gradient in source concentration is an effect of varying exposure times across the fields. The increase in sensitivity clearly makes more sources visible by eye; this figure also outlines the excellent angular resolution of eROSITA, well-adapted to beat confusion effects over most of the survey area, even in deep fields.

Source detection and characterisation
The source detection and characterisation procedure used in this work is a preliminary version of the source detection tool in the eROSITA Science Analysis Software System (eSASS) package. It builds upon the source detection algorithm used in the XMM-Newton Science Analysis System (XMM-SAS) with several revisions and upgrades. The detection procedure is based on the sliding-cell method. As a first step, this algorithm scans an X-ray image with a sliding square box, and if the signal-tonoise ratio (S/N) in the box is greater than a specified threshold value it is marked as a source candidate. The signal is calculated from the pixel values inside the cell, and the background is estimated from the neighbouring pixels. Subsequently, the candidate objects are removed from the image creating a sourcefree image which is interpolated by a spline function to create a smooth background map. The algorithm convolves the input image with a 9 × 9 pixel (36 × 36 arcsec) kernel described by a β = 2/3-profile with r c = 15 arcsec, which roughly matches the survey PSF. The convolved image and the corresponding background map are then used to calculate an S/N map, in which the significant peaks are the positions of the detected sources. In order to increase the sensitivity for large extended sources, this procedure is repeated for 2 × 2 and 4 × 4 rebinned images corresponding to kernels with r c = 30 and r c = 60 arcsec, respectively.
Each source candidate identified by the sliding cell algorithm is further analysed by a maximum likelihood fitting method. This technique compares the spatial distribution of the input sources with a PSF 2 convolved with a source extent model (βprofile). The final log-likelihood is calculated by varying the input source parameters, i.e. position, counts, and extent. A multi-PSF fit is also implemented which helps in deblending and reconstructing the parameters of close-by sources. In the output list, only sources with a log-likelihood above a given threshold are kept.
Among the maximum likelihood fit, output parameters of interest are: i) detection log-likelihood, which gives the significance of the detection; ii) extent, which is the apparent extension of the best fitting β-model in pixel units; and iii) extension log-likelihood, which compares the significance of the extended model and the point-like model. This last parameter classifies the detected sources as point-like (value equal zero) or as extendedlike (value greater than zero).
Given that the PSF fitting of the maximum likelihood fitting method is more sensitive to the core of the PSF when on-and off-axis photons are separated, two images from the same simulation and covering the same sky region are produced with photons chosen according to their position on the FoV. The photons are split into inner photons (<16.5 ) and outer photons (>16.5 ). In this way, the source detection pipeline runs simultaneously over two images (see Fig. 6).
All simulated images were analysed with the method described above. The detected sources were cross-identified with the simulation inputs using a matching radius of 28 arcsec for point-like sources and 80 arcsec for extended ones.

Source classification
A trade-off between sample completeness and contamination is inevitable when the source selection function in surveys is estimated. Following a methodology introduced in Pacaud et al. (2006), we explore the output parameter space of the maximum likelihood fitting method by means of our simulations in order to set point-like and extended source classification criteria and to estimate their contamination by spurious and misclassified sources. We define spurious detections as those that cannot be identified with any input source within the search radius, and misclassified sources as those point-sources classified by the pipeline as extended sources or vice versa. We define false detections as a single concept that includes spurious and misclassified detections.

Point-source selection functions
AGNs represent the dominant extra-galactic population at X-ray wavelengths. Although the goal of this work is to determine the A92, page 6 of 12 galaxy cluster selection function, the estimation of the point-like detection efficiency and its contamination helps to control the systematics in the detection and characterisation of the extended source population.
First, we restrict ourselves to estimating the false detection rate based on the blank field simulations, that is, with point-like sources plus background only. We simulate each field 30 times. We find that a simple threshold in the source detection log-likelihood parameter removes most of the false pointlike sources while maintaining a good detection efficiency. We choose a threshold value of 10, obtaining ∼0.1, ∼0.2, and ∼1.1 spurious sources per deg 2 for the equatorial, intermediate, and deep fields, respectively. Such false detection numbers correspond to ∼0.1%, ∼0.2%, and ∼0.3% of the average detected sources per deg 2 in their respective fields.
The resulting AGN detection efficiency as a function of input flux is shown in the top panel of Fig. 8. This efficiency is obtained by calculating the ratio of the cross-identified objects to the input sources. The displayed error is given by the standard deviation over the 30 simulations of each simulated field. For the equatorial field, the point-like sources have a 90% completeness at a flux limit of ∼1.7 × 10 −14 erg s −1 cm −2 , while for the intermediate field this flux limit is ∼9.7 × 10 −15 erg s −1 cm −2 , and for the deep field it is ∼6.5 × 10 −15 erg s −1 cm −2 . The 50% completeness is reached at ∼1.0 × 10 −14 erg s −1 cm −2 for the equatorial field, ∼5.2 × 10 −15 erg s −1 cm −2 for the intermediate field, and ∼3.1 × 10 −15 erg s −1 cm −2 for the deep field. The large error bars in bright sources reflect mainly their lower number density, which is given by the AGN log N− log S distribution.

Cluster selection functions
The extended source classification is a complicated task since it has not to only deal with spurious detections but also with misclassified point-like sources, that is, point-like sources characterised as extended. Moreover, extended sources usually have a low surface brightness making their detection and characterisation a difficult process. Our goal is to find a location in the detection/characterisation parameter space where the majority of the simulated extended sources are recovered while keeping the contamination level at a reasonable rate. This is of special importance given that the goal of eROSITA is to use galaxy cluster counts to constrain the dark energy. We remind here that in contrast with the AGN population, which was simulated following a log N− log S , sources representing galaxy clusters are randomly distributed across the simulated fields with a density of around 2 per deg 2 (see Sect. 2).
Besides the source detection log-likelihood values stated in the previous section, we scanned the source extent-extension log-likelihood parameter space to look for criteria that allow us to obtain a large and uncontaminated extended source sample while maintaining a high detection rate. For this, we use cluster fields, that is, simulations that contain X-ray background, pointlike sources, and extended sources. Figure 9 shows the final selection process in the extent-extension log-likelihood plane for the Equatorial (top), Intermediate (middle), and Deep (bottom) fields.
We specify that the maximum extent value that the algorithm should assign to a source is 30 pixels (120 arcsec), even if the algorithm drifts towards a larger value. The minimum requested extent value is 1.5 pixels (6 arcsec), and the threshold of the extension log-likelihood is 6. These thresholds ensure a low contamination by spurious sources, but the number of misclassified point-like sources varies in the different fields. For the equatorial field we obtain ∼0.5 false extended sources per deg 2 . In the intermediate field we have ∼1.4 false extended sources per deg 2 , and for the deep field we obtain ∼8.5 false extended sources per deg 2 . Table 2 shows in detail the fraction of spurious and misclassified sources in each simulated field. It is worth mentioning that similar numbers of spurious and misclassified sources are found in both the blank and cluster fields when using the same thresholds. In Sect. 6.2 we forecast the number of expected clusters assuming a survey with a depth equal to the Equatorial field all over the sky. We expect to detect ∼5.2 clusters per deg 2 plus 10% contamination from our false sources. The middle and right panels of Fig. 9 show the extended sources colour-coded according to the input core radius and flux values, respectively. The middle panels display the distribution of the discrete values used for the core radius of the simulated clusters (see Sect. 2), while the right panels show that mainly sources with high-flux end within the plane of the selection criteria.
As seen in Fig. 9, one could put more stringent criteria to obtain a non-contaminated cluster sample, for example, increasing the minimum value of the extension log-likelihood, but this would lead to excluding a considerable amount of extended sources, especially the faintest ones.
The normalized detection probabilities of extended sources for the three simulated fields are presented in Fig. 10, as a function of the input flux. In these plots, a detection efficiency equal to 1 means that 100% of the simulated sources have been detected and classified as extended. As expected, the deeper the observation, the fainter the recovered extended sources. Figure 11 shows the mean detection probability of extended sources as a function of input flux and input core radius. Similar to other works (e.g. Vikhlinin et al. 1998;Pacaud et al. 2006;Clerc et al. 2012a), we also found that the extended source detection efficiency is not a function of source flux only, especially for the shallower observations. 6. Discussion

Effect of source classification criteria
One could argue that the number of false extended source detections, that is, spurious and misclassified detections, found in the different simulated fields (see Table 2) is high considering that eROSITA will perform an all-sky survey. However, most of the false extended detections are misclassified point-sources.
A92, page 8 of 12 Such sources might be close pairs of point-sources which cannot be disentangled by the detection algorithm and were therefore classified as an extended source. One way to reduce the number of misclassified sources is by doing a complete follow-up on the detected extended sources. Another way is by putting stricter thresholds in source classification criteria; for example by increasing the extent and extension log-likelihood thresholds (see Sect. 5.2 and Fig. 9). For example, using a threshold value in extension log-likelihood of 20 reduces the number of missclassified point-like sources in the three fields by 95%. Although such an approach gives a cleaner sample, many real extended sources are missed.

Relevance on cosmological forecasts
Uncertainties in the selection function of a sample of clusters can introduce biases to the cosmological constraints which are determined from them. In this section, we discuss the impact that incomplete knowledge of the selection has on the recovered cosmological constraints. For this test, we follow the methodology of Clerc et al. (2012a) and use the z-CR-HR method. We assume that the selection has eliminated all spurious clusters and misclassified AGNs.

The z-CR-HR method
The z-CR-HR method is based on the premise that the raw X-ray data of a galaxy cluster contain significant information about its redshift, luminosity, and temperature and that this information can be statistically extracted. The cosmological analysis is then simplified by basing it on only the cluster redshift and quantities that are directly observable in X-rays, namely the countrate in the 0.5−2 keV band (CR) and the hardness ratio (HR), which is the ratio of the count-rates measured in the 1−2 keV and 0.5−1 keV bands. A particular advantage of this method is that it bypasses the need to derive individual cluster masses, X-ray luminosities, and temperatures and that the scaling relations between mass and its X-ray proxies can be constrained simultaneously with the cosmological parameters. The key steps in this procedure are as follows: -Compute the halo mass function.
-Derive the 3D distributions of temperature, luminosity, and core radius using the M−T , L−T, and M−r c scaling relations, taking the relevant scatters into account. -Apply an instrumental model for eROSITA to obtain a theoretical distribution of clusters in the CR-HR plane for each slice in the redshift space. -Apply the selection function to obtain a synthetic observed distribution of clusters that one would expect eROSITA to detect (here the equatorial selection for nominal thresholds, Fig. 10, top).
-Apply an error model to account for measurement errors of CR and HR.

Simulated eROSITA z-CR-HR catalogues
After following the procedure described in the previous section and with the unconvolved, error-free z-CR-HR distribution in hand, we randomly sampled the CR-HR plane for each redshift slice to obtain a catalogue of mock clusters each with a redshift, A92, page 9 of 12

Cosmological analysis of mocks
In order to recover the input cosmological parameters, we employ a maximum likelihood method and sample the cosmological parameters using a Markov chain Monte Carlo (MCMC) 3 Although the characterisation of photometric measurements is beyond the scope of this paper, the same simulations as presented in this work can support derivation of such uncertainties. The black contours show the recovered constraints from the complete selection function while the green contours are the results obtained by fitting the cosmology assuming a single core radius in the selection function. The contours represent the 68% and 95% confidence intervals, respectively. The red lines indicate the position of the fiducial input values used in the creation of the mock catalogue and the values quoted above the plots indicate the median value recovered when using the incorrect selection function.
method. For the description of the likelihood we make use of the unbinned Cash C-statistic (Cash 1979) which provides a useful way of determining how well a given set of data fits the expected distribution. The log-likelihood which we compute for each set of cosmological parameters is given by, where the sum in the above equation runs over all selected clusters and the integral (calculated over the cluster selection criteria) gives the number of clusters expected to be within the CR-HR region.
For this work, we chose to use the publicly available Python package emcee (Foreman-Mackey et al. 2013), an affine invariant ensemble sampler.
For this analysis, we assume a ΛCDM cosmological model relying on the parameters calculated by Hinshaw et al. (2013), in particular with Ω m = 0.28, Ω Λ = 0.72, σ 8 = 0.82 and H 0 = 70 km s −1 Mpc −1 . The scaling relations for M−T and L−T are those derived by the XXL collaboration Giles et al. 2016;Lieu et al. 2016). We only fit for two cosmological parameters, Ω M and σ 8 , since we only wish to show that incomplete knowledge of the selection function results in a bias to the recovered parameters. As shown in Fig. 10, the eROSITA selection function is defined for a series of values for the core radius. Here we consider the effect of assuming a selection function which is defined only for a single value of 35 arcsec for the core radius. This core radius is obtained as a weighted average A92, page 10 of 12 of the core radii (in arcminutes) of the X-CLASS sample of clusters, Clerc et al. (2012b) and Ridl et al. (2017).
A total of 104 574 clusters were generated over a hypothetical survey of 20 000 square degrees to a uniform depth of 1.6 ks. The selection criteria for clusters entering the mock were 0.002 < CR < 1.0 cts s −1 and 0.02 < HR < 2.0. The results obtained from the MCMC likelihood analysis are shown in Fig. 12. We see that very tight and unbiased constraints on both Ω m and σ 8 are obtained when the selection function is precisely known, as illustrated by the black contours in Fig. 12. On the other hand, a significant bias (shown by the green contours) is observed for both of these parameters when one assumes a coreradius-independent selection function when attempting to fit the cosmological parameters.

Conclusions
We have produced and analysed a set of realistic simulations for the eROSITA All-Sky Survey (eRASS) aiming towards precise selection functions for galaxy clusters. Our approach represents a trade-off between realism and tractability, capturing the essential (expected) instrumental and astrophysical features of the eRASS: -Fields of typical sizes in typical locations of the sky were selected and the exposure maps derived according to the spacecraft scanning law; they are populated with AGNs following a realistic spectrophotometric distribution; expected X-ray backgrounds (extragalactic and instrumental) are added; the instrument is accurately modelled using the SIXTE simulator, combined with accurate ray-tracing PSF and vignetting models as well as a detailed detector model; galaxy clusters are simulated with various fluxes and sizes following an average β-model profile. Our main result consists in a revisited selection function for extended sources defined in the (flux, extent) parameter space. We show that such a selection function can be coupled to cosmological codes and we provide an example with forwardmodelling the entire galaxy cluster population with the CR-HR method (Clerc et al. 2012a). Adjusting cosmological parameters to a mock catalog, we demonstrate that inaccurate knowledge of the selection function can lead to a significant bias in the derivation of cosmological parameters.
Such selection functions and results are valid to the extent of our current instrumental and astrophysical knowledge. Refined calibration and measurements (e.g. background, point-spread function, etc.), on-ground and in-orbit, will provide updated results, critically needed for statistical analyses based on the eROSITA all-sky survey. Different source-detection algorithms, possibly combining data from other wavelengths, may result in different quantitative selection functions; however the framework presented in this paper remains valid and can be used to quickly and efficiently assess their ability to provide constraints on cosmological models of structure formation. extended source-selection curves We provide analytic functions that represent the results obtained in Figs. 8 and 10. Due to the limited number of points sampling the curves in the steep transition region, we fitted functions that constitute a reasonable representation of the simulation.
For the extended source selection (galaxy clusters), we parametrize the completeness, dubbed c, as a function of 0.5−2 keV flux, exposure time (T exp ) and core radius (r c ) as follows: a(T, R) = 13.5 − (R − 1.2) 2 + (T − 3.204)/1.28 c(F, T, R) = 0.5 + 0.5 erf ((F + a(T, R))/0.2) , where erf represents the error function, In Figs. 8 and A.1 we show the models and their relatively good agreement to the data points extracted from the simulations. Such simple models cannot fully account for the details of the selection function curves, but they should be useful to provide ready-to-use estimates of completeness for various forecasts.