Free Access
Issue
A&A
Volume 564, April 2014
Article Number A125
Number of page(s) 25
Section Extragalactic astronomy
DOI https://doi.org/10.1051/0004-6361/201322971
Published online 17 April 2014

© ESO, 2014

1. Introduction

Active galactic nuclei (AGN) are thought to represent accretion events onto super-massive black holes (SMBHs) at the centres of galaxies. The demographics of this population therefore provide important insights into the formation history of the black holes we observe in nearly all massive bulges in the nearby Universe (Richstone et al. 1998; Shankar et al. 2009). Moreover an increasing body of observational evidence (e.g. Kormendy & Ho 2013,and references therein) as well as theoretical calculations (e.g. Silk & Rees 1998; Fabian 1999; King 2005, 2010) suggest that AGN are also important for understanding the formation and evolution of galaxies. The evidence above motivated efforts to constrain the cosmological evolution of AGN (e.g. Aird et al. 2010) and determine the statistical properties of their host galaxies (e.g. Alexander & Hickox 2012). Despite significant progress in these fields in the last decade important details are still missing. There are for example, open questions relating to the contribution of obscured accretion to the black hole mass budget (e.g. Shankar et al. 2004; Akylas et al. 2012) as well as the overall impact of the energy released by AGN on their immediate environment on kpc and Mpc scales (Kormendy & Ho 2013).

One approach for addressing the points above is via studies of the population properties of AGN averaged over cosmological volumes. The first step in this direction is the characterisation of the basic properties of individual AGN in a statistical sample, e.g. their accretion luminosities, level of line-of-sight obscuration and if possible the basic geometrical properties of the obscuring material in the vicinity of the SMBH. The importance of X-ray observations for constraining these properties are twofold. Firstly, selection at high energies yields AGN samples that are least biased in terms of either line-of-sight obscuration or dilution by stellar light from the host galaxy (Comastri et al. 2002; Severgnini et al. 2003; Mushotzky 2004; Georgantopoulos & Georgakakis 2005). Secondly, because the X-ray flux emitted by AGN is believed to originate at small scales close to the SMBH, spectroscopy at high energies is a unique diagnostic of the accretion properties and the geometry of the material in the vicinity of the central engine.

In this paper we develop a novel framework based on the Bayesian inference to analyse the X-ray spectra of AGN. This method is applied to the problem of the characterisation of the X-ray spectral properties of AGN detected in blank field surveys. Such datasets are extensively used to constrain the statistical properties of AGN and their hosts at moderate and high redshift (z ≈ 0.5−3), when the bulk of the SMBH we observe in local spheroids were put in place. This particular application is challenging because (i) typical deep field X-ray sources only have a small number of photon counts, well into the Poisson regime, where the familiar χ2-regression approach is often not applicable; (ii) a large fraction of the deep field X-ray sources have redshifts measured via photometric methods (e.g. Salvato et al. 2009, 2011) and therefore suffer uncertainties that affect the inferred X-ray spectral parameters; (iii) it is often difficult for traditional spectral fitting methods to select among different physically plausible spectral models the one that provides a better representation of the observations. A direct result of the latter point is that deep field X-ray spectra are typically fit with simple models (but see for example Brightman & Ueda 2012), because it is unclear whether more complex, but also more realistic, model families are justified by the observations.

The methodology presented in this paper overcomes the issues above by propagating into the analysis and the parameter estimation all the uncertainties related to both Poisson noise and redshift measurement errors. Bayesian model comparison and selection is also used to infer from deep survey observations structural information on the distribution of material close to the SMBH both for individual sources and for the overall population. The field of choice for the application of this methodology is the 4 Ms Chandra Deep Field South (CDFS). This is motivated by the X-ray depth and the availability of multi-wavelength data in that field.

We begin in Sect. 2 by reviewing our current understanding of the structure of AGN and the physical processes relevant to X-ray emission in the 0.5−10 keV band. In Sect. 3, the modelling of the spectral components is defined in detail. Section 4 describes the data used. Section 5 presents our approach for comparing models and estimating model parameters. Finally, the results are presented in Sect. 6 and discussed in Sect. 7.

We adopt a cosmology of H0 = 70.4 km s-1 Mpc-1, ΩM = 0.272, and ΩΛ = 0.728 (Komatsu et al. 2011). Solar abundances are assumed. The galactic photo-electric absorption along the line of sight direction to the CDFS is modelled with NH ≈ 8.8 × 1019 cm-2 (Stark et al. 1992).

2. Spectral components

This section describes the main spectral compoments identified by both observational data and theoretical work which constitute the X-ray spectrum of AGN, although not all of them may be present simultaneously in each source.

X-ray observations show that all AGN share an intrinsic power law spectrum of the form E− Γ with photon index Γ and a steep cut-off at higher energies (Zdziarski et al. 2000). Observed values for the photon index range between 1.4 and 2.8, and its distribution in local AGN can be approximated by a Gaussian of mean 1.95 and standard deviation 0.15. (Nandra & Pounds 1994). This spectrum is thought to be the result of thermal comptonisation of accretion disk UV radiation by a hot electron corona (Zdziarski et al. 2000; Sunyaev & Titarchuk 1980). A hotter plasma can up-scatter photons to higher energies, thereby producing a power law with a lower photon index. Up-scattering cools the corona which needs to be repopulated with high-velocity electrons to stay in equilibrium. The ultimate source of power is accretion onto the black hole, and thus Γ is thought to be related to physical AGN properties such as the Eddington rate (see Brightman et al. 2013, for a recent discussion).

The intrinsic power law is often obscured by cold material in the line of sight (see e.g. Turner et al. 1997; Risaliti et al. 1999; Brightman & Nandra 2011). The most important effects of this obscuring screen for X-rays are photo-electric absorption, Compton scattering and Fe K-shell fluorescence. Under the unification scheme of AGN (Antonucci 1982, 1993; Antonucci & Miller 1985), which postulates that the viewing angle is the main cause for the variety of AGN spectra, the simplest common obscuring geometry is a toroidal structure usually referred to as “the torus”. X-ray observations do not resolve AGN and spectral models can not distinguish the scales at which e.g. photo-electric absorption occurs. However, because the circum-nuclear obscuring material is cold and molecular, it must be distant or self-shielding (see e.g. Gaskell et al. 2008). This picture is further motivated by high-resolution optical space-based observations (Ferrarese et al. 1996; van der Marel & van den Bosch 1998), which show such a torus-like structures in ~1001000 pc scale, or more recently, mid-infrared interferometry (Burtscher et al. 2013).

thumbnail Fig. 1

Illustration of the typical shapes of the discussed spectral features. Emission lines and absorption edges have been omitted for simplicity.

Open with DEXTER

thumbnail Fig. 2

Cartoon illustrations of the geometries associated with each model. The wabs model b) represents an absorbing slab in the line of sight, and can also be interpreted as the case of a torus with extreme opening angle. While torus d) uses a intermediate opening angle, the sphere c) represents the other extreme, a vanishing opening angle. The +scattering extension e) of the named models correspond to Thomson scattering from outside the line of sight, which does not experience any energy-dependent effects. Finally, the reflection component (+pexmon) corresponds to either disk reflection f) or additional reflection if the torus is not viewed through the same column density as the reflection g), h), i). For the sphere it should be noted that scattering is not physically possible, as no unabsorbed radiation can escape.

Open with DEXTER

In addition to photo-electric absorption, the obscuring material in the vicinity of the SMBH can also Compton-scatter photons in or out of the line of sight (LOS). Compton scattering, unlike photo-electric absorption, is anisotropic. From a single, unresolved viewpoint, the geometry of the obscurer thus influences the obscured spectrum. For example, compared to a sphere, a torus geometry with a certain opening angle produces a smaller Compton reflection hump between ~10−30 keV (see Fig. 1), proportional to the solid angle illuminated by the source and the surface area exposed to the observer (Murphy & Yaqoob 2009; Brightman & Nandra 2011). At neutral hydrogen equivalent column densities NH ≳ 1023 cm-2 Compton scattering becomes important compared to photo-electric absorption (Compton-regime).

In obscured AGN, it is believed that a fraction of the intrinsic radiation leaks through encountering no Compton scattering or photo-electric absorption (Ueda et al. 2007). Explanations for this component include Thomson scattering off ionised material within the torus opening angle, which reflects the intrinsic spectrum into the line of sight (Krolik & Kallman 1987; Turner et al. 1997; Guainazzi & Bianchi 2007). This component is referred to as “scattering” in this work, as opposed to “Compton scattering” or “reflection” in the cold obscurer.

In unobscured AGN, extrapolating the 2−10 keV power law to softer energies shows an excess of soft X-rays in some sources (Turner & Pounds 1989; Gondhalekar et al. 1997). Processes that have been proposed to explain the soft component include thermal disk emission by the torus at kT ≈ 0.2 keV (Gierliński & Done 2004), or a relativistically blurred, photo-ionized reflection spectrum by the accretion disk (Ross & Fabian 1993). In medium-to-high redshift objects, which constitute the bulk of the objects of interest in this work, the soft excess component lies outside the observed energy range and thus we choose to neglect it.

3. Model definitions

The focus of this work is a study of the AGN population to constrain the geometry of the X-ray obscuring material in the vicinity of SMBHs. To identify which physical mechanisms and geometries produce the observed spectra we identify and compare 10 physically motivated models with different levels of complexity. For the model comparison, we develop a Bayesian methodology (discussed in Sect. 5) that propagates the uncertainties from X-ray observations and redshifts correctly. The data and extraction method is described in Sect. 4. The spectral components considered are defined below.

The simplest model considered is a power law, referred to as powerlaw. It represents the intrinsic spectrum of AGN without any obscurer, as shown in Fig. 2a. The high-energy cutoff observed in the energy range 80−300 keV (Perola et al. 2002) lies well above the energy range used in this work (<10 keV) even at moderate redshifts. We therefore neglect the cutoff. The parameters of powerlaw are the normalisation at 1 keV and the power law index, Γ.

The most commonly used model to describe obscuration is to apply photo-electric absorption to the intrinsic power law. Absorption has the largest cross-section among the interaction processes, and thus is a good first order approximation for the X-ray spectra of AGN detected in deep fields. However, at higher column densities (NH ≳ 1024 cm-2), matter becomes optically thick to Compton scattering, and re-emission in lines due toX-ray fluorescence is prevalent. In contrast to photo-electric absorption, which is opaque at low energies, Compton scattering introduces an energy loss at each interaction, thus low-energy photons can be received by the observer. Furthermore, Compton scattering induces a non-uniform scattering angle distribution making the spectrum dependent on the assumed geometry. Influenced by the solid angle illuminated by the source and the surface area exposed to the observer, the contribution by Compton scattering varies, as radiation is scattered into the line of sight. For instance, a torus geometry with a certain opening angle produces a smaller Compton reflection hump between ~1030 keV (see Fig. 1) than a completely closed, spherical geometry (Murphy & Yaqoob 2009; Brightman & Nandra 2011). The opening angle of the torus, and the viewing angle to a minor degree, thus influences the strength of the reflection hump. This allows in principle to determine the viewing/opening angle and NH independently, possibly probing the density gradient of the obscurer. However, because the effects on the spectrum are small, particularly in the case of low photon count spectra, this has not yet been successfully applied to discriminate these parameters in individual obscured sources. In this work, we also assume a limited range of geometries (e.g. 45° opening angle, edge-on view) as we do not attempt to constrain the viewing and opening angle. This geometry-dependence of spectra also makes modelling challenging. Multiple inter-dependent interactions of several elements can realistically only be done by X-ray radiative transfer simulations for a fixed geometry (see e.g. Nandra & George 1994; Murphy & Yaqoob 2009; Brightman & Nandra 2011).

In this work, three obscuration models are considered, wabs, sphere and torus, which correspond to different geometries:

The “wabs” model applies only photo-electric absorption on a power law with the cross-sections and polynomial approximations computed by Morrison & McCammon (1983). This model does not include any Compton scattering. It can be considered as an infinitely small blob (see Fig. 2b) in the line of sight (LOS): no radiation is scattered into the LOS, and all Compton scattering leaves the LOS. However, the loss of high-energy photons due to Compton scattering at high column densities is not modelled.

We also consider an absorber with spherical geometry (sphere) illustrated in Fig. 2c. This model, computed by Brightman & Nandra (2011), consistently models photoelectric absorption, Compton scattering and K-shell fluorescence of a power-law spectrum source at the centre of a cold, neutral medium. Similarly, a toroidal geometry (torus) is used, which was simulated in the same way as sphere, but with a bi-conical cut-out. We restrain the torus model, using a 45° opening angle viewed edge on. The physical scenario represented by this model is shown in Fig. 2d, which indicates that Compton scattering into the LOS is possible.

The three obscuration models used mark the extreme cases of torus geometries: wabs for a almost 90° half-opening angle, sphere for a vanishing opening angle and torus represents an intermediate case where 30% of the incident radiation encounters the obscurer. Figure 2b-d illustrates these differences. In comparison to the powerlaw model, wabs, sphere and torus have the additional parameter NH, the neutral hydrogen equivalent column density in the LOS.

Observations of some obscured AGN show that a fraction of the intrinsic radiation escapes without encountering any obscuration. This is understood to be Thomson scattering off ionised material inside the opening of the obscurer, as illustrated in Fig. 2e for the torus geometry. We model this component, referred to as scattering (+scattering) by adding a simple, unabsorbed power law component with the same Γ as the incident radiation. The normalisation of this component, fscat, is modelled relative to the intrinsic power-law component, and may vary between 10-10 and 10%. In this fashion, torus and wabs are extended to torus+scattering and wabs+scattering. For the sphere model, it should be noted that no unabsorbed radiation can escape, and thus adding scattering is unphysical. However, for completeness, we also consider sphere+scattering, which mimics the case of a torus with a very large covering fraction.

The observed NH distribution requires a stronger gradient than a constant-density torus geometry can provide (see e.g. Treister et al. 2004). We thus consider an additional density gradient contribution by adding a denser slab outside the LOS inside the obscurer. The NH – measured largely through photo-electric absorption – indicates only the “effective” column density along the LOS, and thus remains unaffected. However, a denser region outside the LOS can scatter Compton reflection into the LOS, as illustrated in Fig. 2g–i. Alternatively, reflection off the accretion disk may also be part of the LOS spectrum entering the obscurer (Fabian 1989; George & Fabian 1991). This is illustrated in Fig. 2f for wabs. We thus add a Compton reflection component +pexmon to the obscured spectrum. We assume that this component passes through the obscuring column, so it is introduced photon-absorbed with the same column density as the LOS obscuring material. The used model, PEXMON (Nandra et al. 2007) combines Compton reflection off a neutral dense semi-infinite slab geometry (PEXRAV, Magdziarz & Zdziarski 1995) consistently with Fe Kα, Fe Kβ and Ni Kα emission lines and the Fe Kα Compton shoulder (George & Fabian 1991; Matt 2002). We use the same incident spectrum (photon index Γ and no cutoff) for the reflection, and follow Nandra et al. (2007) in assuming a fixed inclination of 60°. The normalisation of this component is modelled relative to the wabs component (R parameter), and may vary between 0.1 and 100. Figure 2 illustrates various causes for an additional reflection spectrum (f-i). In the case of wabs model in particular, the +pexmon component can compensate for the lack of forward-scattering inside the obscurer.

thumbnail Fig. 3

Models considered. For model comparison, we compute the evidence (see Sect. 5.2) for each source and model. We then start by assuming a power law (powerlaw) and move along the arrows to more complex models if model comparison justifies the preference. The three obscurer models, wabs, sphere and torus, are compared against each other, as well as the introduction of additional features (+scattering, +pexmon). See text for details.

Open with DEXTER

To summarise, we consider 10 models identified by typewriter font style. The statistical method used for model comparison is described in Sect. 5. Figure 3 illustrates the comparisons we are interested in with arrows. Firstly, obscurer geometries are compared (powerlaw: no obscurer, Fig. 2a; wabs: bullet-like blob, 2b; sphere: complete covering of the source, 2c; torus: intermediate case, 2d). We test for the existence of a soft scattering component which is represented by wabs+scattering, sphere+scattering and torus+scattering (Fig. 2e). Finally, we explore the need for additional Compton reflection as shown in Fig. 2f,g for wabs+pexmon+scattering, 2i for sphere+pexmon+scattering and 2h for torus+pexmon+scattering. The following section describes the data to which we apply our methodology.

4. Observational data

We intend to study the obscuring material in the vicinity of active SMBHs by analysing X-ray spectra with a variety of physically motivated models. The typical SMBH grows through active accretion at redshift 0.5−3 (see e.g. Aird et al. 2010). To obtain a sample of such high-redshift AGN, long exposure times are needed. The deepest X-ray survey to date is the CDFS campaign. This survey region has been observed extensively providing excellent multi-wavelength coverage, which we utilise for redshift estimation.

The CDFS survey consists of 51 observations giving a total exposure time of 4 Ms on an area of 464.5 arcmin2. Data reduction and source detection followed Laird et al. (2009) and is described in detail in Rangel et al. (2013b,a). Briefly, hot pixels, cosmic ray afterglows and times of anomalously high backgrounds were removed to produce clean level-2 event files. These were then aligned using bright sources and subsequently merged. Images and exposure maps in four energy bands (0.5–7, 0.5–2, 2–7 and 4–7 keV) were computed. A candidate source list was created using wavdetect with a low significance threshold (10-4). Source and background counts were extracted on the found positions using the Chandra point spread function tables of Laird et al. (2009). The source region is constructed to contain 70% encircled energy fraction (EEF) of the point spread function (PSF). The background region is a surrounding annulus between 1.5 × 90% EEF PSF and an additional 100 pixel radius. Other candidate sources are masked from the background extraction region. For each candidate source position, the Poisson probability that the observed counts are a background fluctuation was computed, and the source accepted if the significance exceeds 4 × 10-6 (Nandra et al. 2005). This yields a final catalogue of 569 sources.

The ACIS EXTRACT (AE) software package (Broos et al. 2010) was used to extract spectra for each source following Brightman & Ueda (2012). Initially, each source and each pointing is dealt with separately. AE simulates the PSFs at each source position. Regions enclosing 90% PSF at 1.5 keV were used to extract source spectra. The background regions are constructed around the sources such that they contain at least 100 counts, with other sources masked out. AE also constructs local response matrix files (RMF) and auxiliary matrix files (ARF) using ray-tracing. As a final step, AE merges the extracted spectra so that each source has a single source spectrum, a single local background spectrum, ARF and RMF. It was possible to extract 567 spectra.

For consistent analysis using Poisson statistics, a model has to be compared to the observed counts. The background contribution can not be subtracted away because unlike in Gaussian distributions, the subtraction of two Poisson distributions does not yield a analytic distribution. It is common practise to use per-bin background estimates. However, this yields unstable results in bins with few counts. We thus choose to model the background in a parametric way using a Gaussian mixture model (see Appendix A for details). This may not be physical but provides a good approximation to the background which is very similar to the On-orbit background measurements in Baganoff (1999) (compare the left panel of Fig. 2 therein with Fig. A.1). In particular we model a number of instrumental emission lines by introducing Gaussians at mean energies of 1.486 (Al Kα), 1.739 (Si Kα), 2.142 (Au Mα,β), 7.478 (Ni Kα), 9.713 (Au Lα; all in keV, Thompson et al. 2001). We allow the centres of these lines to vary within 0.1 keV. A feature at ~8.3 keV (possibly Ni/Au lines) is described by three additional Gaussians, while the overall continuum shape is modelled of using a flat continuum level and two Gaussians at softer energies (see Appendix A for details). All background model parameters (means, widths, heights) are then fitted to the background spectrum of each source. The goodness of the fit is evaluated using Q–Q plots (see Appendix A for details). The found parameters then remained fixed for the subsequent X-ray spectral analysis of the source spectrum. In general, there may be cases where simultaneous analysis of background and source model parameters provides better results. However, because the larger background region captures many more background photons than the source region, the background model is well constrained by the background data alone.

X-ray spectra extracted from survey data have to be associated with multi-wavelength data to obtain a measure of the distance of each source, i.e. redshifts. For identification of counterparts, the J/K/H-band positions from the CANDELS (Koekemoer et al. 2011; Grogin et al. 2012) CDFS field (Guo et al. 2013; Hsieh et al. 2012) are used in combination with mid-infrared positions from IRAC (Damen et al. 2011; Ashby et al. 2013). A Bayesian method was developed that matches X-ray positions to mid- and near-infrared counterparts simultaneously (Budavári & Szalay 2008; Salvato et al. in prep). For 528 sources optical counterparts have been found (see Hsu et al., in prep). For 301 sources, reliable spectroscopic redshifts are available (Hathi et al. in prep.); for 185 of the remaining sources, photometric redshifts were successfully computed, giving a total of 476 sources.

Optical, IR and UV photometry was used from the TFIT 10-epoch multi-wavelength catalogue (Guo et al. 2013) within the CANDELS GOODS South Survey, 32-band data from the MUSYC survey (Cardamone et al. 2010) and Far/Near-UV from GALEX. Outside the CANDELS region, TENIS J/K band data (Hsieh et al. 2012) was used. The computation of photometric redshift by Hsu et al. (in prep.) follows the same SED fitting procedures described in Salvato et al. (2011), using SED templates from Ilbert et al. (2009) and Salvato et al. (2009) and software developed for Arnouts et al. (1999) and Ilbert et al. (2006).

While it is common practise to merely use the best fit redshifts, here we marginalise the probabilities to obtain photometric redshift probability distributions (“photo-z PDFs”). The benefit is that all solutions are considered in subsequent analysis. The benefits of propagating the uncertainty of the redshift determination is demonstrated in Appendix B, by comparing to using a point estimator. By an instructive example, it is demonstrated there that Maximum Likelihood fitting methods can significantly underestimate the uncertainty in the derived parameters. A future improvement could be to incorporate systematic errors into these PDFs.

Star-burst galaxies can contaminate the sample and skew the inferences we try to make about AGN, especially with regards to the soft energy ranges. We adopt the criteria for inclusion of AGN from Xue et al. (2011): Sources with LX, 2− 10 keV < 3 × 1042, effective power law index Γeff > 1 and log  LX/Lopt < −1 are not used. The criterion selects weak X-ray sources that are much brighter in the optical, and additionally emit mostly soft X-rays. This selection is designed to exclude sources with non-negligable host contribution. Thus, AGNs in star-burst galaxies and low-luminosity AGN are not studied. Furthermore, objects classified as stars or galaxies (i.e. host-dominated sources in the X-ray) by Xue et al. (2011) are removed. 346 AGN remain.

Model selection assumes that one model is the correct one (see Sect. 5.3 below). For determining whether the model could produce the data at hand, we adaptively bin the spectrum counts so each bin contains 10 counts. Then we compute the χ2-Statistic for the best fit model parameters. If χ2/n> 2, where n is the number of bins, the object is not used for model comparison. For low count spectrum, this criterion is relaxed further (2.3 if less than 500 counts, 3 if less than 100 counts, 5 if less than 50 counts) due to the stronger Poisson variance. These limits were obtained by simulating a flat Poisson spectrum across bins, so that they exclude the true value in fewer than 1% of the simulations. The 13 affected sources were visually inspected in the X-ray and optical and at least 5 of them can be clearly explained by outliers in the photometric redshift due to contaminated photometry or incorrect association. With an outlier fraction of η = 4−6%, the expected number of outliers is ~20. 334 AGN remain.

thumbnail Fig. 4

The redshift and number of counts in the 0.5−8 keV band of the AGN sample. For the redshift, the spectral redshift is shown where available and otherwise the median on the photometric redshift distribution is used.

Open with DEXTER

It is worth noting that in this work the selection criteria are of minor importance, as the unification argument is not applied. Instead of assuming a common torus that is viewed from a variety of angles, we only require that all sources considered are built using (a subset of) the same physical processes.

5. Statistical analysis methods

5.1. Parameter estimation

In X-ray spectral analysis it is common practice to vary the parameters of the spectral components until a certain statistic is optimized with regard to the observed X-ray spectrum.

Because X-ray sources in deep extragalactic surveys are typically faint, their spectral bins are filled by counts according to a Poisson process. A χ2 statistic is not applicable. Re-binning the data to use a χ2-statistic is a common practise, however this results in loss of information in the energy resolution. Additionally, the reliability of the χ2-estimator is unknown if the incorrect model (χ2n) is used, and thus its confidence intervals are problematic.

The maximum likelihood C-statistic C = −2 × lnℒPoisson + const (Cash 1979), based on the Poisson likelihood Poisson, does not suffer from these issues. However it is not useful as a measure of goodness-of-fit. Both the C-statistic and χ2 denote the −2 × ln of a likelihood, and the statements made for C are also applicable to χ2. While C is not χ2-distributed, is, according to Wilks’ theorem (Cash 1979; Wilks 1938).

For optimising the parameters, one can not simply step through the parameter space due to the curse of dimensionality. Typically, local optimisation algorithms like the Levenberg-Marquardt algorithm (Levenberg 1944; Marquardt 1963) are employed to iteratively explore the space from a starting point. The final point is taken as the most likely parameter vector. If there are multiple, separate, adequate solutions, i.e. local probability maxima, in the parameter space, these algorithms can not identify them or jump from one local maximum to the other.

For error estimation, there are three commonly used options: (1) assume independence of parameters and Gaussian errors, and estimate the error for each parameter in turn by finding the value where the statistic drops by a level corresponding to 1σ; (2) assume Gaussian errors, estimate the local derivatives and invert the Fisher matrix; (3) compute contours of some ΔC value motivated by the χ2 distribution. The last is more computationally costly than optimisation and can at most be done in 2 dimensions at a time. These three error estimates are approximations that are only sufficient in the case where the probability distributions either approach the shape of a single Gaussian, or the interdependency between parameters is negligible. A general maximum likelihood approach that does not have these shortcomings is to estimate the error using simulated data to find a confidence interval (ΔC) that contains the used parameter e.g. 95% of the time. This is the most costly option, as it fits to generated data many times.

Alternatively, a Bayesian approach can be adopted, e.g. van Dyk et al. (2001) introduces Bayesian X-ray spectral analysis in detail. As before, the Poisson likelihood is used to explore a parameter space. However, there are some important differences. The space is now weighted or deformed by priors. The goal in Bayesian parameter estimation is to identify sub-volumes which constitute the bulk of the probability integral over the parameter space. This approach does optimisation and error estimation simultaneously, but requires an integration and exploration technique that does not suffer from the curse of dimensionality.

Markov chain Monte Carlo (MCMC) is a commonly employed integration method for Bayesian parameter estimation. This algorithm tests a starting point against a new random point chosen from the prior distribution with a local bias. With a probability proportional to the ratio of the two points, the algorithm picks the new point as the new starting point. With each iteration producing a point, a chain is created with the interesting property that parameter vectors are represented proportional to their probability. This full representation of the parameter space can be marginalized to look separately at each parameters distribution. Furthermore, error propagation in derived quantities can be done correctly without assuming a underlying normal distribution.

MCMC has its own share of problems however. First, convergence, i.e. if a chain has become a proper representation of the parameter space, can not be tested for. Only non-convergence may be detected. For example, a chain can stagnate for a while but at a later point suddenly discover a parameter sub-space of high probability. When or if this would happen however, can not be determined. As a local algorithm, MCMC has great difficulty finding and jumping between well-separated maxima. MCMC is a powerful algorithm also applicable to high dimensionality, and thus many extensions and variants have been developed for MCMC to mitigate these issues.

Our approach for circumventing the problems of MCMC in exploration and integration of parameter spaces is to use a different Monte Carlo algorithm, Nested Sampling (Skilling 2004). Nested Sampling keeps a set of fixed size of parameter vectors (“live points”) sorted by their likelihood. These points are always randomly drawn from the prior distribution. The algorithm then removes the least likely point by drawing points until one is found with a higher likelihood. Effectively, this approach “scans” the parameter space vertically from the least probable zones to the most probable. For each removed point, the volume this point is representative for is computed, and the according probability mass added to the integration. When the remaining parameter volume is negligible, the algorithm terminates. A difficulty of this algorithm is to avoid the curse of dimensionality when drawing from the prior distribution to get higher valued points.

MultiNest (Feroz et al. 2009) solves this problem efficiently by clustering the live points into multi-dimensional ellipses, and drawing from these subspaces under the assumption that higher-valued points can only be found in proximity of already drawn points. Because of the clustering, MultiNest can follow distinct local maxima without difficulty. MultiNest is applicable to the low-dimensional problems of X-ray spectral modelling. It can compute points of equal weighting akin to a Markov chain, provide values and error estimates for each local maximum as well as marginal probability distributions for each parameter. The integral over the parameter space is computed globally and for each local maximum.

5.2. Model comparison

For classic model comparison, Wilks’ theorem can be used, which states that the difference in the best fit C-statistic is χ2-distributed with the degrees of freedom equal to the difference in number of parameters (Wilks 1938). This asymptotic result is only valid for nested models, i.e. M1 is a special case of M2.

Bayesian model comparison is done by comparing the integrals over parameter space, called the evidence , where is the parameter vector and represents the weighting or the deformation of the parameter space by the prior. Often, the logarithm log  Z is computed. The Bayes factor B12 = Z1/Z2 is then compared to the a priori expectation. This method does not require models to be nested nor does it make assumptions about the parameter space or the data. Without altering the Bayes factor, the used statistic can be modified by arbitrary constants as long as they are independent of the parameters.

Alternatively, information criteria can be used. These are approximations in the limit using certain assumptions, which compare the difference in the best fit C-statistic between the models to an over-fitting penalisation term. Typically, the more complex model is punished by the additional number of parameters. Computation is thus very similar to Wilks’ theorem, although the background is very different.

The Bayesian information criterion (BIC, Schwarz 1978) is an approximation to Bayesian model comparison. Unlike in the Bayesian evidence integration, only the maximum likelihood is needed, as the posterior is assumed to be strongly single-peaked, making the prior negligible. Because the BIC uses the Laplace integral approximation, its results are in principle unreliable at the boundaries of the parameter space (e.g. checking whether a non-negative parameter is zero). The BIC decides, just like Bayesian model selection, which model is more probable to have produced the data. The model with the smallest BIC = Cm × ln n should be preferred, where n is the number of observations (data points) and m donates the number of free parameters of the model.

The Akaike Information Criterion (AIC, Akaike 1974) is not rooted in Bayesian inference, but information theory. The AIC measures the information loss by using a specific model. Thus it can be said to consider the case of having multiple data sets and predicting the next. Technically, the Kullback-Leibler divergence, sometimes referred to as the “surprise” or “information gain” is minimised. The model with the smallest AIC = C − 2 × m should be preferred.

5.3. Model verification

Any model comparison, or more generally any inference problem, assumes that one of the stated hypotheses is the true one. If no model is correct, the least bad model will be preferred. However, this assumption also requires examination.

Traditionally, this is quantitatively addressed by Goodness of Fit (GoF) measures such as the reduced χ2 for normal distributions, with posterior-based approaches such as posterior predictive tests recently being developed (see e.g. Bayarri & Castellanos 2007). No consensus has been reached on these methods. Asymptotic results make assumptions not applicable in practice (e.g. the Likelihood-ratio and F-test are invalid at boundaries, making them unsuitable for feature-detection problems, see Protassov et al. 2002). Posterior-based approaches are diverse, and can be overly conservative (see e.g. Sinharay & Stern 2003). In the strict sense, there is no probabilistic framework available, frequentist or Bayesian, for testing whether the fitted model is correct.

We can, however, relax the question to ask whether the model “explains” the data by looking at the range covered in the observable. Monte Carlo simulations (parametric bootstrap) can be performed with the best-fit parameters to check whether the spectrum can actually be produced by the given model. This can be computationally expensive however. It is easier to use a measure of “distance” between model and data (e.g. the Kolmogorov-Smirnov statistic) to discover potentially problematic cases and to visually examine the data.

Q–Q plots (Wilk & Gnanadesikan 1968, see Appendix A) provide a generic tool to visualise the goodness of fit, and are independent of the underlying distribution. The quantiles of the integrated observed counts are plotted against the integrated expected counts from the model. A good fit shows a straight line (y = x). This method is shown in Appendix A: The Q–Q plot is used for model discovery for improved fits, while model comparison using the AIC tests the significance of the model alterations.

5.4. Experiment design and predictions

When a result turns out to be inconclusive, it is often interesting to find out whether a future observation can improve the discriminatory power, i.e. how well parameters can be constrained or whether two models can be distinguished. Classically, simulations are employed with a specific setup. A large number of data sets is generated and fitted. The distribution of best fit values then shows the uncertainty.

Alternatively, the Bayesian analysis can be employed. Here we only consider a simple approach, while Bayesian Experimental Design theory – a theory based on maximising the information gain – is not covered. Since the diversity of possible spectra is less interesting than determining the discriminatory power regarding parameters or models, it is sufficient to generate and analyse a single spectrum for each problem as the Poisson likelihood is already informative of the uncertainty. This was done in Georgakakis et al. (2013) for Athena+/WFI predictions to detect and measure outflows. For a grid of problems (L,z) the ability to distinguish a warm absorbers from a cold absorber were tested using the same Bayesian spectral analysis methodology laid out here, and the uncertainty of parameters recorded.

5.5. Priors

A model definition in the Bayesian framework would be incomplete without describing the priors to the models defined above. These encode an a priori weighting within the parameter space. Equivalently, a transformation of the unit space to the parameter space is sought. The simplest case is a uniform prior, giving equal weight in the parameter space. However, a uniform prior means something different whether the parameter or its logarithm is used. The parameters are very similar across the models used, so we describe them all together.

In the problem at hand, we assume the prior distributions to be independent. We use log-uniform priors for scale variables such as normalisations and NH. The normalisation of spectral models is allowed to vary between 10-30 and 1. This conservatively contains physically possible photon fluxes. However, since all considered models contain this parameter, the limits are irrelevant in practise. The normalisation of individual spectral components is then modelled relative to this model normalisation. NH is defined from 1020 to 1026 cm-2. The photon index Γ is modelled after the local sample analysed in Nandra & Pounds (1994), as a normal distribution with mean 1.95 and a standard deviation of 0.15. This encodes the assumption that distant AGNs behave like local AGNs with regard to their intrinsic spectrum. There is a degeneracy between NH and Γ in that a steeper power law can be flattened by absorption. Hence, placing a prior on Γ plays an important role in constraining NH. Nevertheless, we found that our results (e.g. the NH distribution of the analysed sample) are insensitive to other choices (for instance Γ = 1.68 ± 0.3 from de Rosa et al. 2012), as the data drive the result in the majority of cases. While redshift and Γ use informed priors, all remaining parameters, unless otherwise stated, are assumed to be location parameters and thus have a uniform prior.

Of course we are not interested in our prior beliefs, but in how the data strengthens or weakens hypotheses and re-weights parameter space. If the data has no discriminatory power, there is no information gain and the prior and posterior probability distributions will look the same. The difference between the prior and posterior can be measured using the Kullback-Leibler divergence (Kullback & Leibler 1951) as , essentially a integral difference across the parameter space. The KL divergence is measured in ban, a unit of information or entropy. A considerable information gain is e.g. KL > 0.13 bans which corresponds to halving the standard error of a Gaussian. We will restrict ourselves to computing the information gain only for the NH parameter.

For model comparison, we furthermore need to specify our a priori preference of models. We consider two approaches: a) pair-wise comparisons from low to high complexity (e.g. torus vs. torus+scattering, see Fig. 3), and between models of the same complexity. We adopt the scale of Jeffreys (1961): a Bayes factor above 100 is “decisive”, 30 “very strong evidence”, 1030 “strong evidence”, 310 “substantial evidence”. In log  Z, this corresponds to differences of 2, 1.5, 1 and 0.5 respectively. In case of a Bayes factor below 10 we remain with the simpler model. b) Comparing all models simultaneously to find the model with the highest evidence. In both cases we consider the models a priori equally probable.

5.5.1. Implementation

For the spectral analysis in the framework of Bayesian analysis, the MultiNest library was used and two software packages were created: PyMultiNest is a generic package for connecting Python probability functions with the MultiNest library, as well as model comparison and parameter estimation analysis of the MultiNest output. BXA is a package that connects the Sherpa X-ray analysis framework (Freeman et al. 2001) with our Bayesian methodology. (Py)MultiNest repeatedly suggests parameters on a unit hypercube which are transformed by BXA into model parameters using the prior definitions. BXA then computes a probability using Sherpas C-stat implementation, which is passed back to (Py)MultiNest. (Py)MultiNest can then be used to compute Bayes factors, create one or two-dimensional marginalised posterior probability distributions (PDFs) and output summarising Gaussian approximations.

BXA and PyMultiNest are publicly available1. In this work, MultiNest v2.17 is being used by BXA on Sherpa version 4.4v2 with 400 live points and a log-evidence accuracy of 0.1. For further analyses, we made extensive use of the NumPy/SciPy, Matplotlib and Cosmolopy packages (Jones et al. 2001; Hunter 20072).

6. Results

thumbnail Fig. 5

Observed (convolved) spectrum of object 179, binned for plotting to 10 counts per bin. Shown are analyses using various models and their individual components: powerlaw (upper left), wabs (upper right), torus+scattering (lower left) and wabs+pexmon+scattering (lower right). The posterior of the parameters are used to compute the median and 10%-quantiles of each model component.

Open with DEXTER

Table 1

Model selection results for object 179.

The Bayesian methodology of parameter estimation and model comparison presented in Sect. 5 is applied using the models introduced in Sect. 3 to all sample spectra (see Sect. 4). To demonstrate the methodology, we first discuss a single source. Then we apply model comparison across the full sample.

6.1. Source 179

Source 179 (spectroscopic z = 0.605, GOODS-MUSIC 15626) was detected with 2485 counts in the 0.5−10 keV band at RA/Dec = (3:32:13.23, –27:42:41.02). This source was chosen because it illustrates the model selection well, showing several features, namely the Fe-Kα line, scattering and absorption. In the next few paragraphs we present the source’s spectrum and how well different models reproduce them. Figure 5 overlays different models to the observed data. For brevity, only a subset of models are included in this presentation, namely powerlaw, wabs, torus+scattering and wabs+pexmon+scattering. We then show the results of model-selection for this object in Table 1, where the log Z-column (normalised to highest) shows the computed evidence. Finally, the derived posterior parameters are shown.

A power law model (powerlaw) does not provide a good fit. This can be seen in the deviations between model and data points in the upper left panel, and also in the fact that this model has the lowest evidence of all models Table 1 (Col. 5). Furthermore, the derived photon index, Γ = 0.8 ± 0.05 is unlikely and would constitute a 7σ outlier.

Obscuration is expected in some sources, and the wabs model indeed improves the fit. The model follows the spectrum much closer (upper right panel in Fig. 5) with a line-of-sight absorption of NH = 22.5 ± 0.1. The evidence for this model is significantly higher, and rules out the powerlaw model. Here we consider a difference of log  Z1 − log  Z2 > log  10 = 1 as a significant preference (see Sect. 5.5). The same is true for sphere and torus models not shown here.

Comparing the data with the model prediction in the wabs spectrum, a line is visible at ~4 keV as well as an excess of soft energy counts. The former coincides with the Fe-Kα line, while we expect to model the latter with the +scattering component. Considering torus+scattering and wabs+pexmon+scattering (lower panels in Fig. 5), they both model the observed counts well. However, the Fe-Kα line is clearly visible in the data. Compared to the simple absorption models, the addition of +scattering and +pexmon increases the evidence (see Table 1), hence these models are preferred, while the other models, e.g. the widely used wabs+scattering, are ruled out for this source.

When comparing models, an important aspect is the model-dependence of derived physical parameters. In Table 1, we show the intrinsic luminosity, photon index and line-of-sight column density. These are computed by the posterior values and summarised using 1σ-equivalent quantiles. The marginal posterior distributions are shown in Fig. 6. Simple absorption model like wabs try to compensate the soft X-rays by using a flatter spectrum with less absorption. Both the +scattering and +pexmon component increase the photon index. As these additional components only take up a fraction of the intrinsic power law component, the changes in intrinsic luminosity are small. Despite these changes, in between obscurer geometries (i.e. wabs, torus, sphere), the values are consistent.

Furthermore, it is important to check whether the results on the derived physical parameters are strongly influenced by the prior. For example, for a weak source with data of no discriminatory power, the posterior of NH would look like the prior (log-uniform). We use the KL-divergence (see Sect. 5.5), also known as the information gain or knowledge update, to measures the “distance” between prior and posterior of NH. For this particular source, the values are shown in Col. 4 of Table 1. For reference, a considerable information gain is e.g. KL > 0.13 bans which corresponds to halving the standard error of a Gaussian. Because NH is well-constrained in this source compared to the prior, the KL | NH values are high, and the posterior is not dominated by the prior.

The BIC and AIC values (Cols. 7 and 8 of Table 1, lower values are preferred) show the same preferences as the Bayesian evidence computations. They are an approximate method of model selection based on the likelihood ratio and parameter penalisation. There are however important differences to the Bayesian model selection (see Sect. 5.2 for details). The evidence and its approximation, the BIC, can be used to express how much more probable a model is compared to the others based on the data (Col. 6). The AIC, in contrast, measures the information loss by using a specific model. In this single source, the model selection prefers models with absorption, scattering and an additional reflection component (pexmon). However, no preference is found between the geometries of wabs, torus and sphere. To improve the discriminatory power, we combine the evidence of the full sample.

6.2. Model selection on the full sample

thumbnail Fig. 7

Model comparison for the three obscuration models. The arrow size and numbers indicate the number of sources, for which one model is strongly preferred over the other.

Open with DEXTER

We present the results for the full sample in two forms. Firstly, in each source, we perform model comparison between each pair of models (arrows in Fig. 3). We count for how many sources a preference was found. This is shown in Fig. 7, where the two numbers indicate the two directions. For instance, in powerlaw sources, the (simpler) powerlaw model is preferred over the torus, while in 170 sources, the torus is preferred. The size of the arrows visualises the same information. There is clear preference for absorption, scattering and reflection in ~170, ~20 and >5 cases respectively, showing that in a substantial number of sources these components are required. Between wabs+pexmon+scattering, torus+pexmon+scattering and sphere+pexmon+scattering, there is no clear trend, however the torus geometry is preferred most often. This indicates some variety in the obscurer geometry between sources.

Table 2

Sample model comparison.

Secondly, we show the model comparison across all sources in Table 2. For each model, the evidence is stacked in Col. 3 by summing the log  Z values. This form of model comparison assumes that one common model describes all sources. For comparison, Col. 2 shows the number of sources for which the model was ruled out by the other models. As shown with Source 179 above, individual sources can already have strong preferences in the model selection. To have a result descriptive of the sample that is not dominated by outliers, we apply bootstrapping. Sources are drawn with repetition, and the same quantities (number of rejections, summed log  Z values) are computed, and taking all draws together the mean and root mean square is estimated. Additionally, we compute for each draw whether this model is ruled out (∑ log  Z1 > ∑ log  Z2 + 1). If the model is ruled out in 100% of draws (Col. 6), the result is robust against bootstrapping and thus there is confidence that this result will also hold for the parent sample and is not dominated by outliers.

The power law model (powerlaw) has the lowest evidence and is ruled out by simple absorption models. These in turn are ruled out by absorption with additional scattering (in Fig. 7, strongly preferred in 1523 objects). Then, for 623 objects additional pexmon reflection is strongly preferred.

The remaining models are thus such that absorption, scattering and reflection are required. The number of objects for which a model is rejected remains comparable between wabs+pexmon+scattering, sphere+pexmon+scattering and torus+pexmon+scattering, with the latter having the highest evidence. The results for the full sample have significant large differences in ∑ log  Z. But when bootstrapping the values show large variation and overlap broadly, showing that the difference is not robust against bootstrapping, and sample-dependent. This indicates considerable variation in evidence, i.e. variation in the models preferred, between sources. We investigate this by splitting the samples by NH (lower segments of Table 2). For this, we assign a source to a NH bin if the 10% quantiles of the posterior values determined from torus+scattering fall inside. As indicated before, NH is comparable between absorption models. Figure 9 shows the differences in evidence between the two best models, wabs+pexmon+scattering and torus+pexmon+scattering in blue circles.

In the Compton-thick regime (NH ≳ 1024 cm-2), torus+pexmon+scattering is consistently strongly preferred. However, both the table and the figure show only mild indication that the torus+pexmon+scattering model is the best across the full sample. Thus some sources must favour wabs+pexmon+scattering, while others favour torus+pexmon+scattering (see Fig. 9).

A further concern might be that low-redshift sources with many counts dominate the result, ignoring the target population of our inference. The last two segments of Table 2 shows the result of selecting only sources with z> 1 and z > 2 respectively. The inference results in this regime are entirely consistent with the results for the full sample.

thumbnail Fig. 8

Luminosity-redshift plot of the sample. The median of the intrinsic luminosity (logarithmic, in erg / s) and redshift posterior probabilities have been used from the torus+pexmon+scattering model. Sources are classified as Compton-thick (NH> 1024 cm-2), obscured (1022 cm-2<NH< 1024 cm-2) or unobscured (NH< 1022 cm-2) when the majority of the probability posterior of NH lies in the respective range. Because of their heavy absorption, the detection of Compton-Thick AGN is biased towards higher luminosities, compared to Compton-thin AGN.

Open with DEXTER

Overall however, torus+pexmon+scattering can be considered the best model. We release a catalogue of the derived quantities for each source in the CDFS (Table 3 shows an excerpt, Table 4 lists all Compton-thick sources; the complete catalogue is available electronically). In particular, we list column densities, intrinsic power law index, intrinsic luminosity as well as the relative normalisations of the additional scattering and reflection components. Full probability distributions are available on request. The most important parameters for e.g. luminosity function studies are L2− 10keV, z and NH, which are visualised in Fig. 8.

Figure 11 shows a comparison to previously published works in the CDFS. Without going into detail here (see figure caption), the found Compton-thick AGN are in agreement with the sample found by Brightman & Ueda (2012), except that our selection criterion removes a number of sources whose soft photons are dominated by stellar processes. One source (ID 186 in their paper), is not found to be a Compton-thick AGN, as a different redshift from the improved catalogue (Hsu et al., in prep) was used.

7. Discussion

Before putting the results into context, we review our methodology.

7.1. X-ray spectral analysis methodology

We have presented a new framework and method for analysing X-ray spectra, relying on Bayesian inference using nested sampling. In particular, parameter estimation and model comparison are easily possible and overcome considerable limitations of current methods (see Sect. 5 for a detail discussion of various methods):

  • No binning of data. Low-count and high-count sources aretreated the same way using Poisson statistics, as with C-stat in thewell-established maximum likelihood estimation methods. Noinformation loss by binning needs to be introduced.

  • Background modelling. The background is modelled with a continuous non-physical model (Gaussian mixture). Unlike other options (background subtractions, bin-wise background estimation), this method remains consistent with the used Poisson statistics.

  • Bayesian parameter estimation. The presented Bayesian framework allows the estimation of parameters where full probability distributions for each parameter are a natural outcome. Constraints such as unphysical regions in parameter space, knowledge from local samples, and information from other studies can be incorporated. For instance, with the Γ-prior we include the assumption that high-redshift AGN behave like local AGN in some regards, and we propagate the uncertainty of redshifts estimates for each source.

  • Model comparison. The comparison of models used here overcomes the limitations of current methods. Likelihood-ratio based methods are approximate results in the limit, which can not compare non-nested models. Unlike approximations like information criteria, the approach is general so that it is unproblematic for model comparison at boundaries.

The implementation overcomes the weaknesses of standard MCMC, namely unknown convergence and multi-modal parameter spaces (see the discussion of methodology in Sect. 5, and also the Appendix B for a specific case). The computational cost is not higher than classical fitting with error estimation or MCMC.

Taking the small step from the MLE-based approach (“C-stat”) to a Bayesian methodology, one might be concerned that the priors influence the result too much. Similarly, one may ask why parametric models are used when no physical model is available? Non-parametric methods would remove the a-priori assumption of a specific model. Often however, physically motivated models are available, and the same is true for priors. Similarly to comparing multiple competing models, multiple priors can be tried to test the robustness of the results. To address the concern that prior distributions may dominate the posterior distribution, the Kullback-Leibler divergence (“information gain”) is a useful characterisation of how strongly the posterior was influenced by the prior (see Sect. 5).

The presented method performs better in model selection than likelihood-ratios, which are also problematic for the models considered here (see Appendix C, and the extensive discussion of methods in Sect. 5). For goodness-of-fit, we use the Kolmogorov-Smirnov measure to detect potential problems in background model fits, and visually inspect the deviations using Q–Q plots (see Appendix A).

7.2. Implications for the geometry of the obscurer of AGN

AGN are confirmed to have line-of-sight obscuration using our methodology by ruling out a simple power law in favour of any of the obscuration models (see Table 1 for decisive evidence on a single source, and Table 2 for the full sample). The improvement of the fit by adding photo-electric absorption (wabs) is shown in Fig. 5 for a single source (ID 179), where the simple power law clearly does not fit the data. If a significant portion of AGN consisted of powerlaw spectra with a low photon index, this simpler model would have been preferred in the bootstrapped results of Table 2. This hypothesis by Hopkins et al. (2009) can clearly be rejected.

Table 3

Catalogue (excerpt).

Table 4

Catalogue (Compton Thicks only).

Additionally, a fraction of energy which has not seen any obscuration is apparent in the soft energies (shown in Fig. 5, upper right panel). This component can be attributed to Thomson-scattering by ionised material within the opening angle of the torus, scattering the intrinsic spectrum into the line-of-sight. Models without this component are ruled out, as can be seen from the large differences in evidence values in Table 1 for Source 179, and in Table 2 for the full sample. This soft component may be confused with other processes such as thermal disk emission or stellar processes. To remedy this, we removed host-dominated sources, and further only considered the subsample of z > 1, where only the >1 keV photons enter the observed band. As the lower segments of Table 2 show, this component is still strongly preferred. The detection of the soft scattering component is in agreement with Brightman & Ueda (2012), who used an ad-hoc method for model selection.

We considered three different absorption models, which differ mainly in the amount of Compton scattering produced outside the line-of-sight due to volume filling. While wabs represents a bullet-like blob in the line-of-sight with no Compton scattering, sphere and torus model a fully and partially open toroidal absorber respectively. The latter two models, computed using Monte Carlo simulations on a constant-density geometry, differ from wabs as they consider Compton scattering and K-shell fluorescence. For the full sample, wabs+scattering is ruled out by torus+scattering (and also sphere+scattering), indicating that these differences are important, i.e. that forward-scattered, low-energy radiation and the additional reflection are observed. This is a relevant finding because it means that high-redshift data is significantly better modelled by a more complex model than commonly used.

Next to absorption and the scattering component, we find that additional Compton reflection is needed. In Fig. 5, where the spectrum of torus+scattering is shown in the lower left panel, this component is clearly visible in the data through its most prominent feature, the Fe−Kα line. As torus+scattering already models the Compton scattering and line emission within the well-constrained line-of-sight obscurer, this component must be of different origin. Radiation may be scattered into the line of sight from denser regions of the torus, if a density gradient is assumed. It is worth re-stating that we photo-electrically absorbed the +pexmon component, requiring the reflection to occur behind the LOS column density. Alternatively, the accretion disk may contribute a reflection spectrum as is known from unobscured objects, which is transmitted through the obscurer. In principle, a higher iron abundance is another hypothesis to increase the yield of the line. These options are discussed further in Buchner et al. (in prep).

Overall, comparing the absorption models for obscured sources, torus+pexmon+scattering is the best model for obscured AGN, especially when considering Compton-thick AGN. However, the combination of photo-electric absorption with a Compton scattering and reflection component (wabs+pexmon+scattering) can emulate the observed spectra almost equally well, especially in the Compton-thin regime where the diversity does not seem to suggest one common geometry (see Fig. 9). In this phenomenological model, the three interaction processes – photo-electric absorption, Compton scattering and Thomson scattering – are independent and not physically connected. Figures 2f,g illustrates possible geometries. But the wabs+pexmon+scattering model is ruled out for Compton-thick AGN (see lower segment of Table 2), in favour of torus+scattering.

The considered obscurer geometries differ in their covering factor, and thus in the strength of the Compton reflection hump. The sphere geometry has the largest reflective area, while wabs does not have any Compton scattering; torus constitutes an intermediate case. The sphere+pexmon+scattering model is ruled out in favour of torus+pexmon+scattering. As the geometry is the only difference between the models, we conclude that the opening angle of AGN must not be vanishing. The spherical geometry can thus be excluded not only by the amount of scattering needed, but independent of that due to the shape of the reflection hump (obscured sources in Table 2 rule out the sphere+scattering and wabs+scattering model). This result also holds when only the Compton-thick AGN are considered.

Unobscured objects, in contrast, are often well-described by a simple power law. However, they may show Fe lines originating from reflection off Compton-thick material outside the line of sight, either from the accretion disk or the torus. For this reason, e.g. wabs+pexmon+scattering provides a good fit here. The torus simulation used may be a good fit as well if torus had not been constrained to an edge-on view. In the face-on view, the Compton scattering off the torus is part of the model spectrum (see Brightman & Nandra 2011).

A number of more complicated variations of the best model, torus+pexmon+scattering, have been tried, namely (1) linking the opening angle to log  NH by decreasing it linearly from 60° to 40° from unobscured to Compton-thick sources; (2) making the opening angle a free parameter for each source; and (3) freeing both the opening and viewing angle. These models yield comparable evidence to torus+pexmon+scattering and thus are not justified by the data.

thumbnail Fig. 12

Cartoon illustrations of a-posteriori possible geometries (see text).

Open with DEXTER

8. Conclusions

We develop a Bayesian framework for analysing X-ray spectra. We apply this methodology to ~350 faint, low-count spectra of AGN in the CDFS to infer model parameters. The approach propagates all uncertainties e.g. the Poisson process of collecting counts, or errors in photometric redshifts determination. The novelty of this work however is to apply Bayesian model comparison.

We consider physically motivated models where various geometries – no obscurer, bullet-like blob in the LOS, toroidal and spherical obscurer – are considered. The best model has (1) an intrinsic power law obscured by (2) a constant-density toroid where photo-electric absorption, Compton scattering and Fe-K fluorescence are considered. We detect the presence of (3) an unabsorbed power law associated with Thomson scattering off ionised clouds. Additional (4) Compton reflection, most noticeable through a stronger Fe-Kα line, is also found. We find strong evidence against a completely closed, or entirely open, toroidal obscurer geometry.

The geometry of the obscurer in the deepest field to date is thus, from the point of view of X-ray spectra, compatible with two simple scenarios illustrated in Fig. 12: (a) Per-source density variations of a constant-density torus, with an accretion disk contributing extra reflection in some sources or (b); following the unification scheme, a torus with a column density gradient where the LOS obscuration depends on the viewing angle and the observed additional reflection originates in denser regions of the torus. In both scenarios, ionised clouds can scatter intrinsic radiation past the torus.

Appendix A: Goodness-of-fit of the background model

thumbnail Fig. A.1

Comparison of the background data from Source 318 with background models. The best model is shown in blue, while the red model has several features removed (see text in Appendix A). In the top two panels, the usual count spectrum is shown with residuals (left unbinned, right binned to at least 20 counts per bin). In the large, bottom panel we present the corresponding Q–Q (quantile-quantile) plot. For each energy E, the model counts predicted and the counts observed below E are recorded on the plot. The grey dashed line is where data and model would perfectly agree. Model 1 (blue solid top line) follows this line very closely, and thus can be considered a good model. Model 2 (red solid bottom line) deviates from the grey dashed line at 2 keV, indicating that a feature in the data may be missing in the model. The shape and size of the deviation also indicates the shape of the needed feature. The significance of the feature can be tested using model selection. Here, the AIC shows that the feature at 2−2.5 keV is necessary ΔAIC > 0, but adding the more complicated feature at 8−9 keV is not (see text in Appendix A for details).

Open with DEXTER

We intend to demonstrate that the used background description is a good model. To this end, we present a goodness of fit (GoF) methodology for X-ray spectra. We use Q–Q plots for model discovery and the AIC model comparison method to test for the significance of model improvements, although any of the model comparison methods introduced in Sect. 5.2 could be chosen. We demonstrate the method using a background source spectrum and our best fit, comparing it to a simplified model. In our analysis, every spectrum is fitted individually to accommodate the diversity of background spectra.

Appendix A.1: Background model definition

We present “Model 1”, which is our final Gaussian mixture model with a constant base continuum: To model the rise in the soft energies two Gaussians (called “softend” and “softsoftend”) are used of widths ~0.5 keV and ~2 keV at centred at ~0 keV. Another five Gaussian lines describe the spectral bumps due to e.g. transition lines of the detector material. These are centred around 1.486 (Al Kα), 1.739 (Si Kα), 2.142 (Au Mα,β), 7.478 (Ni Kα), 9.713 (Au Lα; all in keV). A feature at ~8.3 keV possibly composed of Ni/Au lines is modelled by three further Gaussians at 8.012, 8.265, 8.494 keV. We allow all centres to vary within 0.1 keV. The parameters start from reasonable guesses and are optimised as long as the fit statistic (C-Stat) improves.

Figure A.1 shows the comparison between data and model for a source with the sequential number 318 in the catalogue. This source has 3380 counts and was chosen because it constitutes an intermediate case between the most high-count spectra with many peculiarities and low-count spectra with almost no visible features. The final parameter values after fitting are shown in Table A.1 (middle column).

Table A.1

Background model parameters for two sources.

Appendix A.2: Goodness of fit and model discovery

For comparison, we present Model 2 which has several Gaussians disabled, namely the one centred at 2.1 keV (line 3) and the three between 8−9 keV (line 57). The intent is to compare methods evaluating whether Model 2 is a good model, where it deviates from the data, and whether the deviation is significant.

The classic method is to plot the data, model and the residuals. This is shown in the upper left panel of Fig. A.1 for the un-binned data and model, with the residuals below. The upper right panel shows the same, but with adaptive binning requiring 20 counts in each bin. The feature at 2.1 keV is visible immediately, while the feature at 8−9 keV is less striking.

We present an alternative method of analysing the quality of a model: the Q–Q plot, shown in the large lower panel. For each energy E, the model counts predicted and the counts observed below E are recorded on the plot. Here the counts are shown, while statisticians typically use quantiles, i.e. the fraction of observations that lie below a quantity. This does not influence the main point, namely the shape of the curve. The grey dashed line is where data and model would perfectly agree. A steeper curve means the model predicts more counts than observed, while a shallower curve indicates an excess of observed counts.

Model 1 (blue solid top line) follows this line very closely, and thus can be considered a good model. Model 2 (red solid bottom line) deviates from the grey dashed line at 2 keV, indicating that a feature in the data is not modelled. Above 2.5 keV, the line is parallel to Model 1, indicating no further difference. This means the feature is confined to this energy range. With a bit of practise one can also see that the difference required to bring the lines into agreement looks like the cumulative distribution of a Gaussian (a S-shape rather than e.g. a straight line for a flat distribution). Another, but more subtle, deviation is visible between 8−9 keV. This is highlighted using the green solid middle line which does model the 2.1 keV feature.

Having found a good model, and slightly worse, simpler models, we can now test whether the improvement is significant. For instance, it seems doubtful that the minute feature at 8−9 keV justifies modelling with 3 Gaussian components (9 parameters). For this, we employ the AIC, which punishes the difference in the likelihood (C-stat) by adding twice the number of parameters. As ΔAIC = 20.6 > 0 (Model 2 vs. Model 1, red text), the worsening is significant. But if we only remove the feature at 8−9 keV, the AIC decreases due to the simplification of the model (green text). Thus, in this source, the 8−9 keV feature can be ignored. However, in sources with more counts, it is required (see Fig. A.2 for one example).

The deviation between model and data can be summarised using GoF measures. The Kolmogorov-Smirnov measure, KS = supE | observed (<E) − predicted( <E) |, records the largest deviation between the empirical cumulative distribution of observed counts (0 at the lowest energy, 1 at the highest), and the model cumulative distribution. Other measures, such as the ones

used for the Cramér-von Mises test or the Anderson-Darling test take all deviations into account, not merely the largest. When many spectra are analysed in an automated fashion, the highest values can hint to problematic cases which require further, visual analysis.

thumbnail Fig. A.2

Same as Fig. A.1, but for source 179 (11 802 counts). Contrary to Fig. A.1, the feature between 8−9 keV is required (ΔAIC > 0). In the lower panel, there is a mild, continuous deviation from the grey dashed line indicating that a mild increase in the higher energy counts. This hints that the model could be improved further. Although perfection in background modelling is a noble quest, the source spectra have 1 order of magnitude fewer counts, allowing us to be satisfied with this model.

Open with DEXTER

Appendix B: Propagation of redshift uncertainty

It is common practise to use a redshift point estimator (best fit redshift) from photometric redshifts and analyse spectra with this value. In this work, in contrast, the redshift uncertainty from photometric redshift is propagated into the analysis of X-ray spectra in the form of a probability distribution on the redshift parameter. In this section, we discuss the differences between the approaches.

We consider the source 551 in our catalogue and analyse its X-ray spectrum using the methodology laid out in this paper with the torus model. We perform the analysis twice, a) with the probability distribution from photometric redshifts and b) with the best fit photo-metric redshift. Figure B.2 demonstrates how the different input redshifts (upper right panel) influence the results in the derived column density and intrinsic luminosity parameters (large panel). The Bayesian analysis shows that the parameter space is broad, and split to two distinct solutions (see lower left panel in Fig. B.2): a highly obscured solution and a less obscured solution. The maximum likelihood is in the less obscured solution for the fixed redshift value, but in the highly-obscured solution if the redshift distribution is used, because the likelihood improves when using a slightly lower value than the best fit redshift. The two solutions are strictly separated, i.e. an intermediate solution is ruled out. Common Maximum Likelihood methods, like fitting and error estimation by Fisher matrix or contour search, will fail to estimate the uncertainty correctly and hide the respectively other solution. Methods building on these results can therefore make false conclusions about e.g. the number of Compton-thick sources. The Bayesian inference method presented in this work can handle the separated solutions well and re-weighs them based on the redshift information given. The various methods are discussed and compared in Sect. 5.

thumbnail Fig. B.1

Origin of the two distinct solutions in Fig. B.2 is highlighted in these two cartoons. Given the data shown in the red thick line, one can either A) consider a bright, highly obscured source (top panel), or B) a low-luminosity, low-obscuration solution where the hard counts are due to the background. An intermediate solution is ruled out however. Depending on the background level and redshift, the two solutions will have different likelihoods.

Open with DEXTER

thumbnail Fig. B.2

Demonstration of parameter estimation results under different redshift inputs. Source 551 in the catalogue is analysed twice with the methodology laid out in this paper using the torus model. We consider once the case of using only the best fit photometric redshift and once the case of using the photometric redshift probability distribution (PDZ). Both inputs are shown in the upper right panel as a vertical red line at z = 3.5444 and with a dotted green line respectively. For brevity, only two resulting parameters are presented, namely the column density NH (logarithmic, in cm-2) and the derived intrinsic luminosity L in the 0.5−10 keV band (logarithmic, in erg / s). The large, lower left panel shows the derived intrinsic luminosity and column density parameters by equally probable points, similar to a Markov chain. As each point on the plane is equally likely to be the true value, denser regions represent more probable parameter values. Here, the red squares represent the fixed redshift analysis while the black circles show the analysis results using the PDZ. The marginal distributions are shown in the upper left and lower right panels as probability histograms (red crossed hatching and black striped hatching). Two separated solutions are clearly visible and highlighted using the labels “A” for the highly obscured solution and “B” for the less obscured solution. The associated spectra are illustrated in Fig. B.1. The fixed redshift analysis has the highest likelihood at the less obscured solution. When given the freedom to vary the redshift parameter, the X-ray data have the highest likelihood at the highly obscured solution. Maximum Likelihood analysis methods thus may fail to account for the uncertainty (see text). The difference in results come from the freedom to move to a lower redshift, as can be seen in the upper right panel, where the black line shows the redshift posterior probability distribution. These results thus also show that redshift information can be improved using X-ray data (see Buchner et al., in prep.).

Open with DEXTER

Appendix C: False discovery rate of model comparison methods

In this section, we compare the efficiency and error rate of model comparison methods.

On the one hand we consider methods based on the ratio of the highest likelihood, where, if the likelihood improves beyond a certain log-likelihood ratio threshold, the more complex model is accepted. The threshold may be dependent on the difference in the number of model parameters. In this general form, Wilks’ theorem, χ2-test, F-test, AIC and BIC methods are covered.

On the other hand, the Bayesian inference based on computing the evidence Z is considered, where, assuming a-priori equality, the Bayes factor B12 = Z1/Z2 is used to decide which model to use. In particular, when B12 is greater than the threshold, Model 1 is accepted, while if is greater than the threshold, Model 2 is accepted. An important difference is that Bayesian inference can conclude there is not enough information to make a decision.

thumbnail Fig. C.1

Model selection results between powerlaw and wabs. Likelihood-ratio based methods (left panels) and Bayesian model selection (right panels) show different fractions of false choices decisions (thick red line). The top left plot for example shows the fraction of generated spectra where the likelihood ratio method selected powerlaw instead of the model used for generating the spectrum, wabs. The plot below should be considered simultaneously, as it shows the fraction where powerlaw was selected in spectra generated with wabs. Results are shown in dependence of the threshold applied. The threshold may be optimised, but typical choices are ΔL = 1 from AIC and log  10 = 1 for the Bayes factor if the model priors are equal (gray vertical lines). Considering the sum of thick red lines in panel pairs, the Bayesian model selection has a false selection rate below 1% at the marked threshold, which can not be achieved using likelihood-ratios regardless of the threshold chosen. The Bayesian model selection may decline to decide due to insufficient discriminatory power of the data (thin black line). This is the case for the faint normalisations (bottom four panels), where the likelihood-ratio methods remain with the simpler model (yielding an error of ~50%). In the brighter normalisations (upper right panels), the Bayesian method beyond a threshold of 5 declines to distinguish when the powerlaw model was used to generate the data. This is a consequence of the degeneracy between the nested powerlaw and wabs models. Thus for nested models, a lower threshold, such as Δlog  Z = log  3 = 0.5 can be appropriate. When considering the efficiency only, the likelihood ratio based method yields more correct decisions, as the Bayesian method declines to choose very often. However, one may also remain with the simpler model, in which case the results would be comparable.

Open with DEXTER

Appendix C.1: Nested problems

thumbnail Fig. C.2

Same as Fig. C.1 for the non-nested model selection between wabs and torus (instead of powerlaw and wabs). As the differences between the models are subtle, the likelihood-ratio method often remains with the null model (wabs). This yields an overall error rate (red line) of ~3050%. The Bayesian model selection declares the data insufficient for a distinction in such cases. In the upper right panel, the Bayesian method is able to effectively distinguish between some of the simulated instances, and makes few mistakes (<1% false selection rate, considering the total of panel pairs).

Open with DEXTER

We generate 2000 spectra each based on the powerlaw and wabs input model. We assume a photon index of Γ = 1.9, redshift z = 1.5 and column density NH = 1023/ cm2 (for wabs). The normalisation of the source power-law at 1 keV in photons / keV / cm2/ s is set to either 10-6 or 10-7 (1000 simulated spectra each). A fixed, flat background with 10-8 photons / keV / cm2/ s is used. We re-use the exposure time, ARF and RMF of Source 179 to produce realistic, low-count spectra.

We analyse all simulated X-ray spectra in the 0.5−7 keV band using the methodology laid out in this paper, once with the powerlaw and once with the wabs model. We also store the maximum likelihood (best fit). We apply model selection (Bayesian and likelihood-ratio-based) between powerlaw and the more complicated model wabs, and record the number of false choices (e.g. wabs was preferred, although powerlaw was used for generating the spectrum). These results shown in Fig. C.1 using thick red lines. For the Bayesian model selection, we also record the number of cases where no significant preference was found (black lines).

The Bayesian model selection has a false selection rate below 1% at the marked threshold, which can not be achieved by likelihood-ratio based methods regardless of the threshold chosen. The Bayesian model selection may decline to decide due to insufficient discriminatory power of the data (thin black line). When considering the efficiency only, the likelihood ratio based method yields more correct decisions, as the Bayesian method declines to choose very often. However, one may also remain with the simpler model, in which case the results would be comparable.

Appendix C.2: Non-nested problems

We now consider a non-nested model selection problem, applying the same methodology. We select between the models wabs and torus – neither is a special case of the other. We apply likelihood-ratio based methods using wabs as the null hypothesis, and only select torus when the likelihood-ratio threshold is exceeded. These results are presented in a similar fashion as above in Fig. C.2. The panels show higher normalisations, as the models are more subtle in their differences.

Appendix C.3: Conclusion

To summarise the Figs. C.1 and C.2, error rates below 1% can be achieved using the in the Bayesian model selection, with a threshold of e.g. log  Z > 1 = log  10. It should be clear that a Bayes factor of 10 is actually a quite conservative choice, and does not necessarily refer to a false positive rate of 1 in 10, but a much lower value. This point has been appreciated in the literature before: Efron et al. 2001 show that the Bayes factor scale by Jeffreys (1961) is probably overly conservative.

In contrast, methods based on the likelihood-ratio always have higher error rates than 1% regardless of the chosen threshold. This is due to the procedure remaining with the simpler model if the data has insufficient discriminatory power. The Bayesian method has the benefit of recognising these cases and can decline to decide. It should be stressed again that several likelihood ratio based methods are not valid for non-nested problems (Wilks’ theorem, F-test, χ2-test, likelihood-ratio test), although e.g. the AIC is. Furthermore, all likelihood ratio based methods are not valid at testing against the border of the parameter space (e.g. powerlaw is a special case of wabs with minimal NH). This is discussed further in Sect. 5.2. These results demonstrate that even if a valid method based on the likelihood ratio was introduced, it would perform poorly in the shown cases.


Acknowledgments

JB thanks Farhan Feroz for proofreading the methodology section. Remaining errors are JB’s. JB acknowledges financial support through a Max-Planck-Gesellschaft stipend. We also thank the builders and operators of Chandra. This research has made use of software provided by the Chandra X-ray Center (CXC) in the application package CIAO. AGE acknowledges the thales project 383549, which is jointly funded by the European Union and the Greek Government in the framework of the programme “Education and Lifelong Learning”.

References

Online material

thumbnail Fig. 6

Marginalised parameters of the wabs (top), wabs+scattering, sphere+pexmon+scattering and wabs+pexmon+scattering model (bottom) for source 179. The posterior probability density distribution, normalised to the maximum, is shown by grey bars. The blue line indicates the cumulative posterior distribution. For summary of the error, the median and 10/90% quantiles can be used, or as the blue error bar indicates, the 1 standard-deviation equivalent probabilities.

Open with DEXTER

thumbnail Fig. 9

Evidence contribution from each source with secure spectroscopic redshift. The vertical axis shows the Bayes factor between torus+pexmon+scattering and wabs+pexmon+scattering (red circles), where strong preference for the torus is above log  10 = 1. The same is shown for sphere+pexmon+scattering and wabs+pexmon+scattering (black squares). In both model comparisons, there are obscured objects showing significant preference for either model.

Open with DEXTER

thumbnail Fig. 10

Histograms of the best parameter values derived using the torus+pexmon+scattering model. The median of the marginal posterior distribution for each object is histogrammed in black. The thick red line shows the same as a cumulative distribution. To illustrate the uncertainty in the parameters, the dotted red lines show the cumulative distribution of the 10% and 90% quantiles instead of the median. The dashed gray line shows the used prior.

Open with DEXTER

thumbnail Fig. 11

Comparison of the derived column density (left panels, NH, here in logarithmic) and intrinsic luminosity (right panel, logarithmic, in erg / s for the 2−10 keV rest frame band) with the analysis of Tozzi et al. (2006). We selected only objects from our sample which have the same redshift in Tozzi et al. (2006) and this work. We plot the median and 1-sigma equivalent quantiles of the posterior in our analysis against the best fit found in Tozzi et al. (2006). There are important differences between the works. The Tozzi et al. (2006) analysis is based on only the first 1Ms data, and thus has much fewer counts. Furthermore, only simple absorption models have been considered in their maximum likelihood fitting.

Open with DEXTER

All Tables

Table 1

Model selection results for object 179.

Table 2

Sample model comparison.

Table 3

Catalogue (excerpt).

Table 4

Catalogue (Compton Thicks only).

Table A.1

Background model parameters for two sources.

All Figures

thumbnail Fig. 1

Illustration of the typical shapes of the discussed spectral features. Emission lines and absorption edges have been omitted for simplicity.

Open with DEXTER
In the text
thumbnail Fig. 2

Cartoon illustrations of the geometries associated with each model. The wabs model b) represents an absorbing slab in the line of sight, and can also be interpreted as the case of a torus with extreme opening angle. While torus d) uses a intermediate opening angle, the sphere c) represents the other extreme, a vanishing opening angle. The +scattering extension e) of the named models correspond to Thomson scattering from outside the line of sight, which does not experience any energy-dependent effects. Finally, the reflection component (+pexmon) corresponds to either disk reflection f) or additional reflection if the torus is not viewed through the same column density as the reflection g), h), i). For the sphere it should be noted that scattering is not physically possible, as no unabsorbed radiation can escape.

Open with DEXTER
In the text
thumbnail Fig. 3

Models considered. For model comparison, we compute the evidence (see Sect. 5.2) for each source and model. We then start by assuming a power law (powerlaw) and move along the arrows to more complex models if model comparison justifies the preference. The three obscurer models, wabs, sphere and torus, are compared against each other, as well as the introduction of additional features (+scattering, +pexmon). See text for details.

Open with DEXTER
In the text
thumbnail Fig. 4

The redshift and number of counts in the 0.5−8 keV band of the AGN sample. For the redshift, the spectral redshift is shown where available and otherwise the median on the photometric redshift distribution is used.

Open with DEXTER
In the text
thumbnail Fig. 5

Observed (convolved) spectrum of object 179, binned for plotting to 10 counts per bin. Shown are analyses using various models and their individual components: powerlaw (upper left), wabs (upper right), torus+scattering (lower left) and wabs+pexmon+scattering (lower right). The posterior of the parameters are used to compute the median and 10%-quantiles of each model component.

Open with DEXTER
In the text
thumbnail Fig. 7

Model comparison for the three obscuration models. The arrow size and numbers indicate the number of sources, for which one model is strongly preferred over the other.

Open with DEXTER
In the text
thumbnail Fig. 8

Luminosity-redshift plot of the sample. The median of the intrinsic luminosity (logarithmic, in erg / s) and redshift posterior probabilities have been used from the torus+pexmon+scattering model. Sources are classified as Compton-thick (NH> 1024 cm-2), obscured (1022 cm-2<NH< 1024 cm-2) or unobscured (NH< 1022 cm-2) when the majority of the probability posterior of NH lies in the respective range. Because of their heavy absorption, the detection of Compton-Thick AGN is biased towards higher luminosities, compared to Compton-thin AGN.

Open with DEXTER
In the text
thumbnail Fig. 12

Cartoon illustrations of a-posteriori possible geometries (see text).

Open with DEXTER
In the text
thumbnail Fig. 6

Marginalised parameters of the wabs (top), wabs+scattering, sphere+pexmon+scattering and wabs+pexmon+scattering model (bottom) for source 179. The posterior probability density distribution, normalised to the maximum, is shown by grey bars. The blue line indicates the cumulative posterior distribution. For summary of the error, the median and 10/90% quantiles can be used, or as the blue error bar indicates, the 1 standard-deviation equivalent probabilities.

Open with DEXTER
In the text
thumbnail Fig. 9

Evidence contribution from each source with secure spectroscopic redshift. The vertical axis shows the Bayes factor between torus+pexmon+scattering and wabs+pexmon+scattering (red circles), where strong preference for the torus is above log  10 = 1. The same is shown for sphere+pexmon+scattering and wabs+pexmon+scattering (black squares). In both model comparisons, there are obscured objects showing significant preference for either model.

Open with DEXTER
In the text
thumbnail Fig. 10

Histograms of the best parameter values derived using the torus+pexmon+scattering model. The median of the marginal posterior distribution for each object is histogrammed in black. The thick red line shows the same as a cumulative distribution. To illustrate the uncertainty in the parameters, the dotted red lines show the cumulative distribution of the 10% and 90% quantiles instead of the median. The dashed gray line shows the used prior.

Open with DEXTER
In the text
thumbnail Fig. 11

Comparison of the derived column density (left panels, NH, here in logarithmic) and intrinsic luminosity (right panel, logarithmic, in erg / s for the 2−10 keV rest frame band) with the analysis of Tozzi et al. (2006). We selected only objects from our sample which have the same redshift in Tozzi et al. (2006) and this work. We plot the median and 1-sigma equivalent quantiles of the posterior in our analysis against the best fit found in Tozzi et al. (2006). There are important differences between the works. The Tozzi et al. (2006) analysis is based on only the first 1Ms data, and thus has much fewer counts. Furthermore, only simple absorption models have been considered in their maximum likelihood fitting.

Open with DEXTER
In the text
thumbnail Fig. A.1

Comparison of the background data from Source 318 with background models. The best model is shown in blue, while the red model has several features removed (see text in Appendix A). In the top two panels, the usual count spectrum is shown with residuals (left unbinned, right binned to at least 20 counts per bin). In the large, bottom panel we present the corresponding Q–Q (quantile-quantile) plot. For each energy E, the model counts predicted and the counts observed below E are recorded on the plot. The grey dashed line is where data and model would perfectly agree. Model 1 (blue solid top line) follows this line very closely, and thus can be considered a good model. Model 2 (red solid bottom line) deviates from the grey dashed line at 2 keV, indicating that a feature in the data may be missing in the model. The shape and size of the deviation also indicates the shape of the needed feature. The significance of the feature can be tested using model selection. Here, the AIC shows that the feature at 2−2.5 keV is necessary ΔAIC > 0, but adding the more complicated feature at 8−9 keV is not (see text in Appendix A for details).

Open with DEXTER
In the text
thumbnail Fig. A.2

Same as Fig. A.1, but for source 179 (11 802 counts). Contrary to Fig. A.1, the feature between 8−9 keV is required (ΔAIC > 0). In the lower panel, there is a mild, continuous deviation from the grey dashed line indicating that a mild increase in the higher energy counts. This hints that the model could be improved further. Although perfection in background modelling is a noble quest, the source spectra have 1 order of magnitude fewer counts, allowing us to be satisfied with this model.

Open with DEXTER
In the text
thumbnail Fig. B.1

Origin of the two distinct solutions in Fig. B.2 is highlighted in these two cartoons. Given the data shown in the red thick line, one can either A) consider a bright, highly obscured source (top panel), or B) a low-luminosity, low-obscuration solution where the hard counts are due to the background. An intermediate solution is ruled out however. Depending on the background level and redshift, the two solutions will have different likelihoods.

Open with DEXTER
In the text
thumbnail Fig. B.2

Demonstration of parameter estimation results under different redshift inputs. Source 551 in the catalogue is analysed twice with the methodology laid out in this paper using the torus model. We consider once the case of using only the best fit photometric redshift and once the case of using the photometric redshift probability distribution (PDZ). Both inputs are shown in the upper right panel as a vertical red line at z = 3.5444 and with a dotted green line respectively. For brevity, only two resulting parameters are presented, namely the column density NH (logarithmic, in cm-2) and the derived intrinsic luminosity L in the 0.5−10 keV band (logarithmic, in erg / s). The large, lower left panel shows the derived intrinsic luminosity and column density parameters by equally probable points, similar to a Markov chain. As each point on the plane is equally likely to be the true value, denser regions represent more probable parameter values. Here, the red squares represent the fixed redshift analysis while the black circles show the analysis results using the PDZ. The marginal distributions are shown in the upper left and lower right panels as probability histograms (red crossed hatching and black striped hatching). Two separated solutions are clearly visible and highlighted using the labels “A” for the highly obscured solution and “B” for the less obscured solution. The associated spectra are illustrated in Fig. B.1. The fixed redshift analysis has the highest likelihood at the less obscured solution. When given the freedom to vary the redshift parameter, the X-ray data have the highest likelihood at the highly obscured solution. Maximum Likelihood analysis methods thus may fail to account for the uncertainty (see text). The difference in results come from the freedom to move to a lower redshift, as can be seen in the upper right panel, where the black line shows the redshift posterior probability distribution. These results thus also show that redshift information can be improved using X-ray data (see Buchner et al., in prep.).

Open with DEXTER
In the text
thumbnail Fig. C.1

Model selection results between powerlaw and wabs. Likelihood-ratio based methods (left panels) and Bayesian model selection (right panels) show different fractions of false choices decisions (thick red line). The top left plot for example shows the fraction of generated spectra where the likelihood ratio method selected powerlaw instead of the model used for generating the spectrum, wabs. The plot below should be considered simultaneously, as it shows the fraction where powerlaw was selected in spectra generated with wabs. Results are shown in dependence of the threshold applied. The threshold may be optimised, but typical choices are ΔL = 1 from AIC and log  10 = 1 for the Bayes factor if the model priors are equal (gray vertical lines). Considering the sum of thick red lines in panel pairs, the Bayesian model selection has a false selection rate below 1% at the marked threshold, which can not be achieved using likelihood-ratios regardless of the threshold chosen. The Bayesian model selection may decline to decide due to insufficient discriminatory power of the data (thin black line). This is the case for the faint normalisations (bottom four panels), where the likelihood-ratio methods remain with the simpler model (yielding an error of ~50%). In the brighter normalisations (upper right panels), the Bayesian method beyond a threshold of 5 declines to distinguish when the powerlaw model was used to generate the data. This is a consequence of the degeneracy between the nested powerlaw and wabs models. Thus for nested models, a lower threshold, such as Δlog  Z = log  3 = 0.5 can be appropriate. When considering the efficiency only, the likelihood ratio based method yields more correct decisions, as the Bayesian method declines to choose very often. However, one may also remain with the simpler model, in which case the results would be comparable.

Open with DEXTER
In the text
thumbnail Fig. C.2

Same as Fig. C.1 for the non-nested model selection between wabs and torus (instead of powerlaw and wabs). As the differences between the models are subtle, the likelihood-ratio method often remains with the null model (wabs). This yields an overall error rate (red line) of ~3050%. The Bayesian model selection declares the data insufficient for a distinction in such cases. In the upper right panel, the Bayesian method is able to effectively distinguish between some of the simulated instances, and makes few mistakes (<1% false selection rate, considering the total of panel pairs).

Open with DEXTER
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.