A&A 370, 649-671 (2001)
DOI: 10.1051/0004-6361:20010092

Maximum likelihood estimation of single X-ray point-source parameters in ROSAT data

F. G. Boese - S. Doebereiner

Max-Planck-Institut für extraterrestrische Physik, 85741 Garching, Germany

Received 31 October 2000 / Accepted 9 January 2001

Abstract
The point source parameter estimation via Maximum Likelihood Estimation in ROSAT data is documented. The infall of X-ray photons is treated as a Poisson process. Based on the Poisson model, the log-likelihood function for unbinned data is derived. The adopted source and background models are explained. In the chosen parameterization, the Maximum Likelihood equations and their numerical solution are expounded. The performance of the method is illustrated by an example and tests on simulated observations.

Key words: X-ray: general, space vehicles: instrumentation: detectors, techniques: image processing


  
1 Introduction

In a class of contemporary X-ray observatories, the observing instruments consist of one or more sets of nested telescopes and one or more detectors in the focal plane. The detectors are often designed to register single X-ray photons. Several geometrical and physical quantities of individual X-ray photons are recorded, see Trümper (1983). In the case of the ROSAT observatory, these are (a) the time of photon arrival, (b) the impact location in the detector, (c) the amplitude of the detector signal, and (d) the infalling direction[*]. Depending on the ROSAT observation mode, the instrument's optical axis is, relative to the observed target, nominally fixed or moving in time. During an observation time interval, a number of photons is collected. After data preparation, the observed sample forms the data set to be analyzed.

The majority of all observed celestial sources are point sources. The capacity of analyzing point sources with respect to (a) X-ray flux, (b) celestial direction and (c) extent (if larger than but akin to point sources) thus makes it possible to treat a large percentage of all observed sources routinely.

In this work, the technical details of the statistical inference from data to point-like and circular disk shaped X-ray sources are investigated. To X-ray flux, celestial direction, and source extent correspond point source parameters. Estimation of these parameters using Maximum Likelihood Estimation (MLE) will be described in general and also some of the peculiarities of the implementation within the ROSAT data analysis system EXSAS. The latter software system assists in analyzing ROSAT observational data, see Zimmermann et al. (1998). It is distributed to more than 300 observers in 75 institutes worldwide.

Up to December 2000, about $4\,500$ scientific papers based on ROSAT data appeared in journals. Among the many ROSAT catalogues are the ROSAT All-Sky Survey Bright Source Catalogue with $18\,811$ sources and the Source Catalogue from Pointed PSPC Observations with $74\,301$ sources, see Voges et al. (1999). A considerable percentage of all related data analysis was carried out with the SASS or EXSAS analysis systems - an imperative reason to document the underlying method. Apart from a preliminary conference report by Cruddace et al. (1987), no technical description exists, internally or publically.

In Sect. 2, the observed ROSAT sample is presented. In Sect. 3, the source model is introduced and the logarithmic likelihood functional for unbinned data is derived from basic principles in considerable generality. Background model and point source model are set up in Sect. 4. The maximum likelihood equations are compiled in Sect. 5. Statistical errors for the estimated source parameters will be discussed in Sect. 6. The iterative solution of the maximum likelihood equations is treated in Sect. 7. A point source estimation example is contributed and performance tests on simulated observations are presented in Sect. 8. The finishing Sect. 9 contains the discussion followed by the conclusions in Sect. 10. An Appendix, Sect. 11, explains EXSAS commands related to the MLE of point sources.

  
2 The observed sample

The ROSAT satellite was launched on July 1, 1990 into a circular orbit with an inclination of $53^{\circ}$ and an altitude of 580 km. The last observation was made on December 18, 1998. The ROSAT detectors were (a) the Positional Sensitive Proportional Counter (PSPC), (b) the High Resolution Imager (HRI), and (c) the Wide Field Camera (WFC). Two telescopes were in operation: the larger one for the PSPC and HRI, the smaller for the WFC.

  
2.1 The photon representation in ROSAT data

During an observation interval[*] $\vec{T}:=[\underline T,\bar T]$ of length $T:=\bar T-\underline T$ starting at time $t=\underline T$ and ending at time $t=\bar T:=\underline T+T$, the instrument consisting of a telescope and a detector registers a number $M:=M(\underline T,\bar T)$ of photons[*]. The mth photon, $\wp_m,\,m=1,2,\,...,M$, comes from the celestial direction (Xm,Ym), arrives at time t=tm, and hits the instrument's focal plane at the detector location (xm,ym). The photon causes a detector signal amplitude Am.

Upon completion of the observation and data pre-processing, we are given a 6-variate time-ordered photon series[*]

 
    $\displaystyle {\cal P}:=\left\{\wp_k\right\}_{k=1}^M,$  
    $\displaystyle \wp_k:=(t_k,x_k,y_k,X_k,Y_k,A_k),\quad t_1<t_2<...<t_M.$ (1)

Let S be an estimate for the number of sources in the sample ${\cal P}$. Then, the full sample will be partitioned into S+1 subsamples in Sect. 4.1,

 \begin{displaymath}{\cal P}=:{\cal P}_0\cup{\cal P}_1\cup ... \cup{\cal P}_S.
\end{displaymath} (2)

To the first source belongs subsample ${\cal P}_1$, to the second source belongs subsample ${\cal P}_2$, and so forth. The remainder subsample, ${\cal P}_0$, comprises all photons which do not belong to any subsample ${\cal P}_1$ to ${\cal P}_S$. The latter background corrected subsamples together with a background distribution derived from ${\cal P}_0$ are the input data for MLE. While the size M of ${\cal P}$ may be in the order of millions, the sizes of ${\cal P}_1$ to ${\cal P}_S$ may go down to dozens. The processing time for the whole sample ${\cal P}$ is a function of the size M.

  
2.2 The amplitude space

Source spectra can be formed when the detector discriminates between photon energies. Often, detector amplitudes are the primary measured quantities and not the photon energies. Then, spectra are estimated by statistical inference in the detector's amplitude space $\vec{A}$ to be presented here.

Let the instrument be designed for the photon energy space $\vec{E}:=[\underline E,\bar E]$of width $\Delta E:=\bar E-\underline E$. A single infalling photon of energy $E\in\vec{E}$ causes an electrical impulse of amplitude A containing a superimposed (small) random amplitude in the instrument's amplitude space $\vec{A}:=[\underline A,\bar A]$ of width $\Delta A:=\bar A-\underline A$. The response distribution to a Dirac delta input single line photon spectrum $\delta(E-E'),\,E,E'\in\vec{E}$, of energy E has the probability density

 \begin{displaymath}p(A \vert E),\qquad \int_{\vec{A}}p(A \vert E){\rm d}A=1.
\end{displaymath} (3)

Due to absorption effects at both ends of $\vec{E}$, the normalization in (3) is there only approximative. Neglecting this, $p(A \vert E){\rm d}A$ is the fraction of all photons of given energy E causing detector amplitudes lying in an interval of infinitesimal length dA around the amplitude A. We partition the amplitude space $\vec{A}$ into C amplitude channels,
 
    $\displaystyle \vec{A}=:\vec{A}_1\cup ... \cup\vec{A}_C,\quad \vec{A}_c:=[A_c,A_{c+1}],$  
    $\displaystyle \Delta A_c:=A_{c+1}-A_c,\quad c=1,\,...,C.$ (4)

The index c is called channel number, a reminiscence of the involved electronics. The discrete version of $p(A\mid E)$ for the channel numbers c from (4),

 \begin{displaymath}p_{c\mid k}:={1\over{\Delta E_k}}\int_{\vec{A}_c\times\vec{E}_k}p(A\mid E){\rm d}A{\rm d}E,\quad 1\le k\le K,
\end{displaymath} (5)


  \begin{figure}
\par\includegraphics*[bb=200 215 400 740,angle=0,scale=0.40]{ml_fig_1.ps}\end{figure} Figure 1: The $256\times 729$ PSPC detector response matrix $p_{c\mid k}$ from (5). For c=1 to 7, the entries are set to zero $p(c\mid k)=0$ (to avoid noisy data). Dark = high values $p_{c\mid k}\in [0,0.0826]$
Open with DEXTER

is called (energetical) detector response matrix. Figure 1 shows the last update of this measured matrix for the ROSAT PSPC detector. It gives a good overview on the development of the mode (= peak) position and spread of $p(A\mid E)$ with increasing energy E. Here C=256 amplitude channels of equal width partition $\vec{A}:=[1,256]$. The energy band $\vec{E}:=[0,3]$keV is partitioned by K=729 energy bins of unequal widths $\Delta E_k$. To hold the impulse height distribution density $p(A\mid E)$ internally in table form $p_{c\mid k}$ allows us a faster access than function evaluation.

  
2.3 The photon space

According to (1), a photon $\wp:=(t,x,y,X,Y,A)$ in ROSAT data has six coordinates t to A. Four of them, t,X,Y,A, are for MLE of prime significance and span the photon space $\vec{P}$. This is the space in which all calculations are done, where the sought distributions are accommodated. To understand this space thoroughly, we follow the photon's optical path through the telescope to the detector plane. Its relation to the (X,Y) celestial directions space will be explained.

  
2.3.1 The astronomical directions space

The equatorial system with mean epoch 2000 is taken in ROSAT as the principal astronomical coordinate system.

Disregarding all distances from Earth's centre to celestial objects leads to the unit sphere ${\vec{S}'}^2$ in 3-space referred to equatorial coordinates $\{{\cal O}';\alpha,\delta\}$. The choice of the distance unit is unimportant. The null direction ${\cal O}'$ is the vernal equinox, the angle $\alpha\in[0,2\pi)$ is the right ascension and the angle $\delta\in[-\pi/2,\pi/2]$ the declination. Seen by a ROSAT detector, there is a photon count rate distribution $\l '(t,\alpha,\delta,A)$ representing the celestial directional distribution in X-rays at time t, in photon detector amplitude A, and coming from direction $(\alpha,\delta)$. ROSAT took samples from this celestial distribution $\l '(t,\alpha,\delta,A)$. Count rates will be formally introduced in (18).

ROSAT's pointing direction $p'(t):=(\alpha_p(t),\delta_p(t))\in{\vec{S}'}^2$ moves in time t due to (a) a possible survey motion, (b) a normally applied wobbling motion, and (c) an unavoidable stochastical motion caused by the attitude control loop dynamics and, at that, excited by external stochastical forces. The wobbling motion will be explained in Sect. 2.3.3.

  
2.3.2 The tangential space

The direct approach to calculate with data on the sphere ${\vec{S}'}^2$ was not taken. Instead, spherical data are mapped to a tangential plane referred to Cartesian coordinates. In this plane, all directional calculations are actually effectuated. At the end of the analysis, the calculated results are mapped back to the sphere, i.e. expressed in equatorial coordinates[*]. We shall encounter the mapping details by explaining the geometry and kinematics of the data acquisition.

Consider the pointing direction $p':=(\alpha_p,\delta_p)\in{\vec{S}'}^2$ as it evolves in time t. For any fixed time, a tangential Cartesian coordinate system ${\cal T}:=\{p';X,Y,Z\}$ with origin at[*] p' is erected. The Z axis goes through the origin of the unit sphere and the point p'. The X and Y axes lie in the tangential plane $\vec{T}(p')=:\vec{S}$ with tangency point p'. The plane $\vec{S}$ is termed sky space. It is thus adapted to the individual pointed observation. To explain the axes' directions, consider the two oriented, mutually orthogonal circles of constant right ascension ${\cal C}(\alpha_p)$ and constant declination ${\cal C}^\perp(\delta_p)$ through p',

 
$\displaystyle {\cal C}(\alpha_p):=\left \{(\alpha,\delta)\in{\vec{S}'}^2:\ \alp...
...elta_p):=\left \{(\alpha,\delta)\in{\vec{S}'}^2:\ \delta=\delta_p\right \}\cdot$     (6)

${\cal C}(\delta_p)$ is oriented with increasing right ascension $\alpha$ and ${\cal C}^\perp(\alpha_p)$ with increasing declination $\delta$. Denote by $t_{\alpha}(p')$ the tangential vector along ${\cal C}^\perp(\delta_p)$ at p' and by $t_\delta(p')$ the tangential vector along ${\cal C}(\alpha_p)$. The requirements
 
    $\displaystyle t_{\alpha}(p') \hbox{ is the direction of the $X$\space axis},$  
    $\displaystyle t_{\delta}(p') \hbox{ is the direction of the $Y$\space axis}$ (7)

define the Cartesian coordinate system ${\cal T}$ uniquely. Now, we carry out a rotation S(t) of the celestial sphere ${\vec{S}'}^2$ to obtain ${\vec{S}''}^2$ which will be referred to the relative spherical coordinate system $\{{\cal O}'';\alpha',\delta'\}$. The new null direction ${\cal O}''$ becomes the pointing direction p' and the new equatorial plane is the (X,Z) plane. The Y axis points to the new north pole $NP':=(\alpha_p(t),\delta_p(t)+\pi/2)$. In the reverse direction lies the new south pole $SP':=(\alpha_p(t),\delta_p(t)-\pi/2)$. In the new spherical coordinate system, the general celestial direction $p:=(\alpha,\delta)\in{\vec{S}'}^2$ has the relative longitude $\alpha'$ and relative latitude $\delta'$. These can be expressed by the tangential coordinates $X_{\rm T},Y_{\rm T},Z_{\rm T}$ in ${\cal T}$ by
 
    $\displaystyle \alpha'=\alpha'[\alpha,\delta,\alpha_p(t),\delta_p(t)]:=\arctan\left [{X_{\rm T}\over{1+Z_{\rm T}}}\right ],$  
    $\displaystyle \delta'=\delta'[\alpha,\delta,\alpha_p(t),\delta_p(t)]:=\arcsin\left [ Y_{\rm T}\right ],$  
    $\displaystyle X_{\rm T}:=\sin[\alpha-\alpha_p(t)]\cos(\delta),$  
    $\displaystyle Y_{\rm T}:=\sin(\delta)\cos(\delta_p)-\cos[\alpha-\alpha_p(t)]\cos(\delta)\sin(\delta_p),$  
    $\displaystyle 1+Z_{\rm T}:=\cos[\alpha-\alpha_p(t)]\cos(\delta)\cos(\delta_p)+\sin(\delta)\sin(\delta_p).$ (8)

For $\alpha_p=\delta_p=0$, no rotation is performed, consequently, (8) becomes the identity, $\alpha'=\alpha,\,\delta'=\delta$. The projection $\pi_{\vec{S}}$ from the rotated sphere ${\vec{S}''}^2$ to the tangential plane $\vec{S}$ is of cylindrical type with respect to both spherical coordinates,

 \begin{displaymath}X=\alpha',\quad Y=\delta',\quad -\pi\le X<\pi,\ -\pi/2\le Y\le\pi/2.
\end{displaymath} (9)

The points on $X=-\pi$ and $X=\pi$ are to be pointwise identified making the domain (9) a cylinder. The new poles NP',SP' are mapped to the line segments $Y=\pm\pi/2$. This means that large length distortions must emerge near these lines. Rolling the cylinder (9) first along the new equator of ${\vec{S}''}^2$ and then along the new meridional direction to reach the final point $(\alpha',\delta')$ realizes the mapping (9). We call the composed mapping $A(t):=S(t)\circ\pi_{\vec{S}} :\ {\vec{S}'}^2\to\vec{S}$ attitude-sky mapping. Since the sphere rotation S(t) is invertible and obviously (9) too, the attitude mapping A(t) is invertible $A(t)^{-1}:=\pi_{\vec{S}}^{-1}S(t)^{-1}$.

The two coordinates in the sky space $\vec{S}$ are termed sky longitude X and sky latitude Y. Both coordinates are measured in angular units, in units of $sky\_pix:=0.5$ arcsec. This unit is appropriate for the spatial resolutions of the three ROSAT detectors.

A fixed celestial direction $p:=(\alpha,\delta)$ is mapped by A(t) at time t to the sky point (X(t),Y(t)). All ROSAT detectors have fields of view in the order $\vert\alpha'\vert,\vert\delta'\vert\approx1^\circ$. Evaluating A(t) for small fields of view, i.e. for $\max\{\vert\alpha-\alpha_p\vert,\vert\delta-\delta_p\vert\}\to0$ yields[*]

 
    $\displaystyle X=\alpha-\alpha_p+{\cal O}\left [\max\{\vert\alpha-\alpha_p\vert^2,\vert\delta-\delta_p\vert^2\}\right ],$  
    $\displaystyle Y=\delta-\delta_p+{\cal O}[\max\{\vert\alpha-\alpha_p\vert^2,\vert\delta-\delta_p\vert^2\}].$ (10)

Figure 2 shows the image curves of the 12 circles[*] ${\cal C}(\alpha),\;\alpha=0^\circ(30^\circ)330^\circ$, and of the 11 circles ${\cal C}^\perp(\delta),\;\delta=-75^\circ(15^\circ)75^\circ$, for the pointing direction $p':=(30^{\circ },45^{\circ })$. As mentioned, ROSAT observed the source count rate distribution $\l '(\alpha,\delta):=\l '(t,\alpha,\delta,A)\ge0$ on the celestial sphere ${\vec{S}'}^2$. The attitude mapping A(t) is not area-preserving. We discuss the consequence for our photon counting experiment.
  \begin{figure}
\par\includegraphics*[angle=-90,scale=0.90]{ml_fig_2.eps} \end{figure} Figure 2: The images of the circles ${\cal C}(\alpha )$ and ${\cal C}(\delta )$ with $\alpha =0^\circ $ $(30^\circ )330^\circ $ and $\delta =-75^\circ (15^\circ )75^\circ $ under the attitude map A(t) (8),(9) for the pointing direction $p':=(30^{\circ },45^{\circ })$
Open with DEXTER

The one-to-one image $\lambda(X,Y)$ of $\l '(\alpha,\delta)$ under the attitude mapping A(t) is distorted in shape by the aereal distortion factor $d[\alpha,\delta,\alpha_p(t),\delta_p(t)]$ appearing in

 
    $\displaystyle \lambda(X,Y){\rm d}X{\rm d}Y:=\l '(\alpha,\delta)d[\alpha,\delta,\alpha_p(t),\delta_p(t)]{\rm d}\alpha {\rm d}\delta,$  
    $\displaystyle d\left [\alpha,\delta,\alpha_p(t),\delta_p(t)\right ]:={D[\alpha,...
...t),\delta_p(t)]\over{\sqrt{1-Y_{\rm T}^2}\cdot [(1+Z_{\rm T})^2+X_{\rm T}^2}]},$  
    $\displaystyle D[\alpha,\delta,\alpha_p(t),\delta_p(t)]:=(1+Z_{\rm T})\cdot (X_{{\rm T},\alpha}Y_{{\rm T},\delta}-X_{{\rm T},\delta}Y_{{\rm T},\alpha})$  
    $\displaystyle ~~~~~~~~~~~~~~~~~~~~~~~~~~~~-X_{\rm T}\cdot (Z_{{\rm T},\alpha}Y_{{\rm T},\delta}-Z_{{\rm T},\delta}Y_{{\rm T},\alpha}).$ (11)

In the count rate transformation rule (11), (X,Y) and $(\alpha,\delta)$ are mutual image points. The suffixes $\alpha,\delta$ on $X_{\rm T},Y_{\rm T},Z_{\rm T}$ denote partial derivatives with respect to suffix variables. The factor $\vert D\vert\le2$ is free of singularities and coprime with the denominator factors. Hence, the points on the line $Z_{\rm T}=-1,\,X_{\rm T}=0$ intersecting ${\vec{S}''}^2$ are pole singularities of d. The latter are located at the new pole pair NP',SP' having as image under A(t) the pair of horizontal line segments $Y=\pm\pi/2$ in Fig. 2. The weaker singularity of d in the plane pair $Y_{\rm T}=\pm1$, the tangential plain pair at the poles NP',SP' specifies thus again the new pole pair.

From (8), it follows near the origin (X,Y)=(0,0) in the sky space $\vec{S}$, i.e. for $\vert\alpha-\alpha_p(t)\vert,\vert\delta-\delta_p(t)\vert\to0$,

 
$\displaystyle d[\alpha,\delta;\alpha_p(t),\delta_p(t)] = 1+{\cal O} \left ( \max\{\vert\alpha-\alpha_p(t)\vert,\vert\delta-\delta_p(t)\vert \} \right ).$     (12)

The attitude mapping A(t) is thus almost area-preserving near the pointing direction. A distorted source count density distribution $\lambda(X,Y)$ may give the non-expert inspector a misleading visual impression and may cause the software to generate artificial sources. The use of A(t) only near the origin (X,Y)=(0,0) together with the local almost area-preserving property (12) excludes this risk greatly. No correction factor d for transforming count rates from the (X,Y) directional space to equatorial space ${\vec{S}'}^2$ is calculated in EXSAS. For a thorough survey article on celestial coordinates, see Calabretta & Greisen (2000).

  
2.3.3 The detector space

Photon impact locations in the detector plane $\vec{R}^2$ are measured in the plane Cartesian detector coordinate system $\{{\cal O};x,y\}$. The origin, ${\cal O}$, is the piercing point of the instrument's optical axis in the detector plane. The detector field $\vec{D}$contains the field of view $\vec{F}$ and is parallel to the axes in $\vec{R}^2$, i.e. $\vec{F}\subset\vec{D}\subset\vec{R}^2$. Figure 3 shows the field of view (black) of the ROSAT PSPC detector. The detector field $\vec{D}$ is a bit larger than the smallest square parallel to the axes containing $\vec{F}$.

  \begin{figure}
\par\includegraphics*[bb=105 315 500 725,angle=0,scale=0.40]{ml_fig_3.ps}\end{figure} Figure 3: The PSPC field of view $\vec{F}$ and detector field $\vec{D}$. Photons are censored by the support ribs, the hub (white), and outside the black disk
Open with DEXTER

The pointing direction p'(t) and the optical axis direction coincide[*]. Consequently, the detector plane $\vec{R}^2$ must be a rotated sky space plane $\vec{R}^2$. The time-dependent rotation angle $\rho(t)$ is called roll angle. It is defined to be the angle between the detector x-axis and sky X-axis measured counterclockwise in the sky coordinate system. The normal direction to the surface of the ROSAT solar paddles has to point to the Sun. This requirement defines the roll angle $\rho(t)$ at time t.

Taking into account the different units in both spaces leads to the invertible linear mapping between sky space and detector space

 
    $\displaystyle \left(
\begin{array}{l}
x(t)\\
y(t)\end{array}\right) =R(t)
\left(
\begin{array}{l}
X\\
Y
\end{array}\right),$  
    $\displaystyle R(t):=c\cdot \left(\begin{array}{rr}
\phantom{-}\cos[\rho(t)],&\sin[\rho(t)]\ \\
-\sin[\rho(t)],&\cos[\rho(t)],\end{array}\right),$  
    $\displaystyle c:={\rm det\_pix\over sky\_pix}\cdot$ (13)

The pointing direction p'(t) supplemented by the roll angle $\rho(t)$ is called satellite attitude $a(t):=(p'(t),\rho(t))$. We call the composed invertible mapping from astronomical directions to detector points $D(t):=R(t)\circ A(t) :\ {\vec{S}'}^2\to\vec{R}^2$ attitude-detector mapping. The knowledge of the instrument's attitude a(t) at any time $t\in\vec{T}$ together with the relation between detector positions and photon infall directions (ray-trace relation) permits us to map detector positions to celestial directions by the inverse D(t)-1.

Assume a time-constant nominal pointing direction p'0 and imagine a source which falls on a rib in Fig. 3. Such a source is totally censored. In order to lessen the detrimental censoring by the ribs, a periodic wobbling motion w(t) in saw-tooth form and with a time-constant tangential vector W is superimposed to the effect that the said photons are now distributed along a line segment crossing the rib (when the rib and W are not parallel),

 \begin{displaymath}p'(t):=p'_0+W\cdot w(t),\quad w(t+T_w)=w(t),\quad W\in\vec{S}.
\end{displaymath} (14)

The nominal wobbling period was $T_w:=403\,$s and the nominal wobbling amplitude $\vert W\vert=3\,$arcmin. As Fig. 8 witnesses, the ribs from Fig. 3 are almost made to disappear due to wobbling motion.

  
2.3.4 The ROSAT photon space

The sample ${\cal P}$ from (1) lies in the instrument's photon space $\vec{P}$. This is the Cartesian product of the observation interval $\vec{T}$, the detector field $\vec{D}$, the sky space $\vec{S}$, and the detector's amplitude space $\vec{A}$,

 \begin{displaymath}\vec{P}:=\vec{T}\times\vec{D}\times\vec{S}\times\vec{A}.
\end{displaymath} (15)

In our setting, the task is to infer the hypothesized parent source distribution $\lambda(t,X,Y,A)$ in $\vec{P}$ from which the sample ${\cal P}\subset\vec{P}$ stems.

  
3 The source model and the log-likelihood function

As mentioned in Sect. 1, some state-of-the art detectors register the X-ray photons one after the other in time, resolve them with respect to energy and infall direction.

A given instrument has its individual photon space. Its dimension may be lower, equal, or larger than that of (15). The constitutive factor spaces may be different from the factors $\vec{T},\vec{D},\vec{S},\vec{A}$ in (15). In this section, $\vec{P}$ is the photon space of any astronomical photon counting instrument. However, to be not overly abstract in this section, a paradigmatic choice is made: photons possess an arrival time t on a certain (X,Y) target space and transport an energy E. Let ${\cal P}\subset\vec{P}$ be a sample of size M in this section.

It is appropriate to view the infall of photons as a stochastic counting process in the instrument's photon space $\vec{P}$.

  
3.1 The Poisson source model

As X-ray photons are observed one by one (by introductory assumption), a stochastical point of view is adopted in the pertinent parts of this article. A Poisson emitter in photon space $\vec{P}$ is the stochastic X-ray source model adopted for ROSAT data and defined by (P1), (P2) in (16) below.

Let $\vec{Q}\subset\vec{P}$ be an arbitrary set in photon space and $N(\vec{Q})$ be the random number of photons falling into $\vec{Q}$. According to the definition, of a Poisson process, see Parzen (1962, p. 118), two properties of the process $\{N(\vec{Q}),\vec{Q}\subset\vec{P}\}$ are implied:

 
    $\displaystyle \hbox{(P1)}\ \hbox{ $N(\vec{Q})$\space has independent increments: whenever}$  
    $\displaystyle \hbox{ $\vec{Q}_1,\,...,\vec{Q}_n$\space are disjoint sets in $\vec{P}$ , the photon}$  
    $\displaystyle \hbox{ counts $N(\vec{Q}_1),\,...,N(\vec{Q}_n)$\space are independent}$  
    $\displaystyle \hbox{ random variables.}$  
    $\displaystyle \hbox{(P2)}\ \hbox{ For any $\vec{Q}\subset\vec{P}$ , $N(\vec{Q})$\space is Poisson distributed,}$  
    $\displaystyle {\rm Prob}[N(\vec{Q})=n]:={{\rm e}^{-\Lambda(\vec{Q})}\Lambda(\vec{Q})^n\over n!},
\quad n=0,1,\,...$ (16)

The relation $\vec{E}[N(\vec{Q})]=\Lambda(\vec{Q})$ provides the statistical interpretation of the intensity parameter $\Lambda(\vec{Q})$ as the expected counts from $\vec{Q}$. The functional $\vec{E}$takes the expectation of a random variate. In terms of the local mean count rate $\lambda$, we have the representation

 \begin{displaymath}\Lambda(\vec{Q}):=\int_{\vec{Q}}\lambda{\rm d}\vec{Q}.
\end{displaymath} (17)

Basically, in an actual application, it remains to be statistically tested whether (16) is compatible with the data ${\cal P}$ or not. However, particle counting experiments belong since the days of Lord Rutherford to the stock of canonical applications of Poisson processes.

The local mean count rate, $\lambda$, introduced in (17) depends on all four photon parameters spanning $\vec{P}$ and is assumed to be bounded,

 \begin{displaymath}\lambda:=\lambda(t,X,Y,E).
\end{displaymath} (18)

Consequently, the infinitesimal increment $\lambda {\rm d}\vec{Q}$ becomes

 \begin{displaymath}{\rm d}\Lambda(t,X,Y,E):=\lambda(t,X,Y,E)\,{\rm d}t{\rm d}X{\rm d}Y{\rm d}E,
\end{displaymath} (19)

the local mean number of photon counts expected in the infinitesimal box d$\vec{Q}$ (parallel to the axes in $\vec{P}$) around the central point $(t,X,Y,E)\in\vec{P}$ with the lengths elements dt, dX, dY, dE and the 4-volume element d $V:={\rm d}t$dXdYdE. Notice that to the local mean count rate $\lambda$ from (18) belongs the unit

 \begin{displaymath}{({\rm photons})\over{({\rm unit\ time})\cdot ({\rm unit\ area})\cdot ({\rm unit\ energy})} }\cdot
\end{displaymath} (20)

The implied unit of the 4-volume is given by the denominator in (20).

In possession of the sample ${\cal P}\subset\vec{P}$ and having specified the parent source model, we have to decide by which statistical estimation procedure we wish to infer $\lambda(t,X,Y,E)$ from ${\cal P}$.

Among several standard estimation procedures at disposal, estimation via MLE has proven to be appropriate in astronomical application, see Cash (1979), Strong (1985) and also in X-ray astronomy with ROSAT, see Cruddace et al. (1987).

In X-ray observations, one has often to cope with weak sources. Due to observation time restrictions, not more than a handful of photons are collected from weak sources. Any loss of data information due to binning is unacceptable. Technically spoken, many bins without counts are the rule. As one might expect from (P2) in (16) and as we shall see in (22), the statistics associated with MLE for photon counting is based on the Poisson distribution - a distribution for rare events. It is applicable for small occupation numbers. So, the choice of MLE is almost mandatory for ROSAT data under small source sample sizes and/or high precision requirements.

  
3.2 Derivation of the likelihood function

Within the MLE we have to establish the log-likelihood function, L, for our photon counting experiment. While the likelihood ${\rm e}^L$ is in our context the prime probability quantity, it is technically preferable to assign this role to the exponent $L\le0$. The L, seen as a functional of the hypothetical count rate, is proportional to the logarithm of the probability that the observation emerged from the assumed count rate, given the observation data.

To start with the derivation of L, imagine a partition $\Delta$ of the instrument's photon space $\vec{P}$ into finitely many bins $\vec{P}_{i,j,k,l}$. The independent increment property (P1) in (16) allows us to partition the photon space $\vec{P}$ arbitrarily into finitely many parts of positive 4-volume and to have identical independent distribution forms in each bin $\vec{P}_{i,j,k,l}$. We are free to design bins in form of Cartesian products so that the partition $\Delta$ has the bins

 
    $\displaystyle \vec{P}_{i,j,k,l}:=\vec{X}_i\times\vec{Y}_j\times\vec{E}_k\times\vec{T}_l,$  
    $\displaystyle \vec{X}_i:=[X_i,X_{i+1}),\qquad \ \ X_{i+1}-X_i=:\Delta X_i>0,$  
    $\displaystyle \vec{Y}_j:=[Y_j,Y_{j+1}),\qquad \ \ Y_{j+1}-Y_j=:\Delta Y_j>0,$  
    $\displaystyle \vec{E}_k:=[E_k,E_{k+1}),\qquad E_{k+1}-E_k=:\Delta E_k>0,$  
    $\displaystyle \vec{T}_l:=[T_l,T_{l+1}),\qquad \ \ T_{l+1}-T_l=:\Delta T_l>0,$  
    $\displaystyle 1\le i\le I,\ 1\le j\le J,\ 1\le k\le K,\ 1\le l\le L'.$ (21)

Let ni,j,k,l be the number of counts[*] observed in bin $\vec{P}_{i,j,k,l}$. By (P1), these counts are independent random variates. Hence, their joint probability is the product of the involved individual probabilities. Assuming a continuous count rate $\lambda$, the Poisson distribution property (P2) from (16) gives the probability $P_{\Delta}(n_{1,1,1,1},\,...,n_{I,J,K,L'})$ to observe the count numbers n1,1,1,1 to nI,J,K,L' as
 
    $\displaystyle P_{\Delta}(n_{1,1,1,1},\,...,n_{I,J,K,L'}):=$  
    $\displaystyle ~~~~~~~~~~~~~~~~~\prod_{i,j,k,l}{{\rm e}^{-\lambda_{i,j,k,l}\Delt...
... (\lambda_{i,j,k,l}\Delta V_{i,j,k,l}\right )^{n_{i,j,k,l}}\over n_{i,j,k,l}!},$  
    $\displaystyle \lambda_{i,j,k,l}:={1\over{\Delta V_{i,j,k,l}}}\int_{\vec{P}_{i,j,k,l}}\lambda(t,X,Y,E)\,{\rm d}t{\rm d}X{\rm d}Y{\rm d}E,$  
    $\displaystyle \Delta V_{i,j,k,l}:=\Delta X_i\Delta Y_j\Delta E_k\Delta T_l.$ (22)

The product in (22) extends over all bin multi-indices (i,j,k,l) from (21). The quantities $\lambda_{i,j,k,l}$ are called bin-averaged count rates.

We wish to get rid of the loss of information incurred by the finite binning (21). To this end, we consider an infinite sequence $\{\Delta^{(1)},\Delta^{(2)},\,...\}$ of binnings of the photon space $\vec{P}$ with the infinite resolution property

 \begin{displaymath}\Delta X_i\to0,\quad \Delta Y_j\to0,\quad \Delta E_k\to0,\quad \Delta T_l\to0
\end{displaymath} (23)

for all the $I\cdot J\cdot K\cdot L'$ bins $\vec{P}_{i,j,k,l}$ of $\Delta:=\Delta^{(n)}$ for $n\to\infty$. We will write simpler $\Delta\to0$ for the passages to the limits in (23) for $n\to\infty$.

The same observation appears then in infinitely many binnings. We wish to investigate the likelihood $L_\Delta:=\ln[P_\Delta]$ in the passage $\Delta\to0$.

We split the probability $P_{\Delta}=:P_{\Delta}[\lambda]$ in (22) into three factors. The first one, $P_{\Delta,{\rm e}}$, comprises all normalization factors and the remaining two are the contributions from the void and occupied bins,

 
    $\displaystyle P_{\Delta}:=P_{\Delta,{\rm e}}\cdot P_{\Delta,0}\cdot P_{\Delta,>0},$  
    $\displaystyle P_{\Delta,{\rm e}}:=\prod_{i,j,k,l}{\rm e}^{-\lambda_{i,j,k,l}\Delta V_{i,j,k,l} },$  
    $\displaystyle P_{\Delta,0}:=\prod_{n_{i,j,k,l}=0}{\left (\lambda_{i,j,k,l}\Delta V_{i,j,k,l}\right )^{n_{i,j,k,l}}\over n_{i,j,k,l}!},$  
    $\displaystyle P_{\Delta,>0}:=\prod_{n_{i,j,k,l}>0}{\left (\lambda_{i,j,k,l}\Del...
...} \right )^{n_{i,j,k,l}}\over n_{i,j,k,l}!}
=:P_{\Delta,\l }\cdot P_{\Delta,V},$  
    $\displaystyle P_{\Delta,\l }:=\prod_{n_{i,j,k,l}>0}\left (\lambda_{i,j,k,l}\right )^{n_{i,j,k,l}},$  
    $\displaystyle P_{\Delta,V} :=\prod_{n_{i,j,k,l}>0}{\left (\Delta V_{i,j,k,l}\right )^{n_{i,j,k,l}}\over{n_{i,j,k,l}!}}=:V_\Delta.$ (24)

We evaluate the above four products $P_{\Delta,{\rm e}}$ to $P_{\Delta,V}$.

1. The normalization factor. Taking the logarithm and summing over all bins leads with the total number, $\Lambda$, of expected counts from (17) immediately to

 \begin{displaymath}P_{\Delta,{\rm e}}={\rm e}^{-\Lambda},\qquad \Lambda:=\int_{\vec{P}}\lambda(t,X,Y,E){\rm d}t{\rm d}X{\rm d}Y{\rm d}E.
\end{displaymath} (25)

We notice that $\Lambda$ is independent of the binning $\Delta$;

2. The contribution of the void bins. By definition,

 \begin{displaymath}P_{\Delta,0}=1,
\end{displaymath} (26)

so that these bins do not contribute to $P_\Delta$, irrespective of the binning;

3. The contribution due to the count rate. By hypothesis, $\lambda$ is continuous. Applying the mean value theorem to the integral expressions $\lambda_{i,j,k,l}$ in (22) entails in terms of the general point (t,X,Y,E) in the photon space $\vec{P}$ and the observed photon positions

 
    $\displaystyle \lim_{(t,X,Y,E)\in\vec{P}_{i,j,k,l},\;\Delta\to0}\lambda_{i,j,k,l}=\lambda(t,X,Y,E),$  
    $\displaystyle \lim_{\Delta\to0}P_{\Delta,\lambda}=\prod_{m=1}^M\lambda(t_m,X_m,Y_m,E_m).$ (27)

Remind; the point $(t,X,Y,E)\in\vec{P}$ determines uniquely the bin $\vec{P}_{i,j,k,l}$ containing it;

4. The contribution due to the bin volumes. The product $V_\Delta$depends on the occupation numbers, on the binning $\Delta$, but not on the count rate $\lambda$. Obviously, for any positive number M of total observed counts holds

 \begin{displaymath}V_\Delta\to0\quad \hbox{for}\quad \Delta\to0.
\end{displaymath} (28)

In order to avoid the divergence to zero, we take $V_\Delta$, as renormalization quantity, called in this use occupation volume product, and redefine the log-likelihood by

 \begin{displaymath}L_{\Delta}:=\ln[P_\Delta/V_\Delta].
\end{displaymath} (29)

A probabilistic interpretation of $V_\Delta$ is provided by the relation $V_\Delta={\rm e}^{V(\vec{P})}P_\Delta[1]$ where $V(\vec{P})$ is the volume of $\vec{P}$. Thus, $P_\Delta/V_\Delta$ can be seen as proportional to a likelihood ratio (The hypothesis of a unit count rate $\l =1$ may seem far-fetched in the light of the observed occupation numbers).

The renormalization quantity $V_\Delta$ was purposely defined so that we have with the number of expected counts $\Lambda$ from (25)

 
    $\displaystyle P[\lambda]:= \lim_{\Delta\to0}{P_{\Delta}\over V_{\Delta}}={\rm e}^{-\Lambda}\cdot \prod_{m=1}^M\lambda(t_m,X_m,Y_m,E_m),$  
    $\displaystyle ={\rm e}^{-\int_{\vec{P}}\lambda(t,X,Y,E)\,{\rm d}t{\rm d}X{\rm d}Y{\rm d}E +\sum_{m=1}^M\ln\left [\lambda(t_m,X_m,Y_m,E_m)\right ]}.$ (30)

This is the probability for the data sample ${\cal P}\subset\vec{P}$ to occur assigned by the hypothetical count rate $\lambda$ under the applied renormalization. The log-likelihood functional, $L[\lambda]$, defined by $P[\lambda]=:{\rm e}^{L[\lambda]}$ is read off directly from (30),
 
$\displaystyle L[\lambda]$ := $\displaystyle -\int_{\vec{P}}\lambda(t,X,Y,E)\,{\rm d}t{\rm d}X{\rm d}Y{\rm d}E$$\displaystyle +\sum_{m=1}^M\ln\left [\l\left (t_m,X_m,Y_m,E_m\right )\right ].$ (31)

Interpreted as a likelihood, ${\rm e}^{L[\lambda]}$ is proportional[*] to the likelihood of the count rate $\lambda$ to have caused the sample ${\cal P}$.

The Maximum likelihood principle directs us to choose the $L[\lambda]$ maximizing count rate, $\bar\lambda$, as estimate for $\lambda$. Generally, the maximizer $\bar\lambda$need neither to be unique nor existent - an unhappy situation with unpleasant numerical consequences, when really given.

The $L[\lambda]$ from (31) is the general log-likelihood functional for a Poisson counting process in the paradigmatic instrument photon space $\vec{P}$ for the hypothetical count rate $\lambda(t,X,Y,E)$ from (18), (20), given the photon data sample ${\cal P}$ from (1).

As explained, the integral in (31) is the total (not necessarily integer) number, $\lambda$, of the expected photons falling into the entire photon space $\vec{P}$ and thus the model counter part of M, the number of observed photon counts,

 \begin{displaymath}\Lambda:=\Lambda(\vec{P}):=\int_{\vec{P}}\lambda(t,X,Y,E){\rm d}t{\rm d}X{\rm d}Y{\rm d}E.
\end{displaymath} (32)

Moreover, $\lambda(t_m,X_m,Y_m,E_m)$ is the hypothetical count rate evaluated at the location of the mth observed photon in the photon space $\vec{P}$.

In some cases, the model count rate depends in a special way on all parameters. Let the full model parameter vector $p=:p_1\oplus p_2$ be split into two groups p1,p2 spanning $\vec{P}_1,\vec{P}_2$ such that the count rate $\lambda$ factorizes,

 \begin{displaymath}\lambda(p):=\lambda_1(p_1)\cdot \lambda_2(p_2).
\end{displaymath} (33)

For example, an X-ray source may vary rapidly in time but may possess a time-constant spectrum at the scale of the observation period.

The likelihood function $L[\lambda]$ for $\lambda$ from (31) becomes in terms of the likelihood functions of the two count rate factors, $L[\lambda_1],\,L[\lambda_2]$ with the associated expected counts $\Lambda_j:=\Lambda(\vec{P}_j),\,j=1,2$ from (32)

 \begin{displaymath}L[\lambda_1\cdot \lambda_2]=\Lambda_1+\Lambda_2-\Lambda_1\cdot \Lambda_2+L[\lambda_1]+L[\lambda_2].
\end{displaymath} (34)

The likelihood from (34) is cheaper to evaluate and alleviates the solution of the maximum likelihood equations presented in Sect. 5.

The parameterization of $\lambda$ is still open. It is desirable to have the number of expected photons, $\Lambda$, among the parameters as the counter part for the number of observed photons M. This can be achieved upon the factorization $\lambda=:\Lambda\cdot \lambda'$. For notational convenience the prime on $\lambda'$ is dropped again. In this parameterization and notation, we arrive at the representation

 
    $\displaystyle L[\lambda]:=M\ln(\Lambda)-\Lambda+\sum_{m=1}^M\ln\left [\lambda\left (t_m,X_m,Y_m,E_m\right )\right ],$  
    $\displaystyle 1=:\int_{\vec{P}}\lambda(t,X,Y,E)\,{\rm d}t{\rm d}X{\rm d}Y{\rm d}E.$ (35)

Notice, $\lambda$ in (35) is a normalized count rate distribution: one count is distributed over the entire photon space $\vec{P}$. The maximum of $L[\lambda]$with respect to $\Lambda$ is reached at the ML estimate $\hat\Lambda=M$ for the number of expected photons $\Lambda$. In words: given that M photons are observed, the expected number of photons is equated to the number of observed photons.

Often, if not always, the count rate is conceived as an additive mixture of several sources. In the simplest case of a binary mixture, one discerns the source count rate of proper interest, $\lambda_{\rm S}$, and the remaining count rate, called then background count rate $\lambda_{\rm B}$. We split the count rate $\lambda$ accordingly with (unknown) mixture proportions $p_{\rm B},p_{\rm S}$

 \begin{displaymath}\lambda:=p_{\rm B}\lambda_{\rm B}+p_{\rm S}\lambda_{\rm S},\qquad p_{\rm B}+p_{\rm S}=1,\qquad p_{\rm B},p_{\rm S}\ge0.
\end{displaymath} (36)

With the splitting (36), the likelihood function for the source count rate $\lambda_{\rm S}$, contaminated by a background source count rate $\lambda_{\rm B}$ and the expected number of photons $\Lambda$ from both sources assumes the shape
 
    $\displaystyle L[\lambda]:=M\ln(\Lambda)-\Lambda+\sum_{m=1}^M\ln[ p_{\rm B}\lambda_{\rm B}\left (t_m,X_m,Y_m,E_m\right )$  
    $\displaystyle ~~~~~~~~~ + p_{\rm S}\lambda_{\rm S}\left (t_m,X_m,Y_m,E_m\right )],$  
    $\displaystyle 1=:\int_{\vec{P}}\lambda_{\rm B}(t,X,Y,E)\,{\rm d}t{\rm d}X{\rm d}Y{\rm d}E,$  
    $\displaystyle 1=:\int_{\vec{P}}\lambda_{\rm S}(t,X,Y,E)\,{\rm d}t{\rm d}X{\rm d}Y{\rm d}E.$ (37)

Since background is ubiquitous, the signal-plus-noise situation (37) is fairly general and placed in this section. In the next step of model refinement, the instrument's point spread function, will be taken into account in Sect. 4.

The derivation of L for unbinned photon data is basically known, see Cash (1979) for a less elaborate but excellent treatment. Some details are worked out here.

  
3.3 Finite binning of the photon space and simplified likelihood evaluation

In practice, the resolution limit is given by the detector binning $\Delta_{\rm D}$. The limiting likelihood $L[\lambda]$ from (31),(35) for unbinned data can only be reached up to a certain binning error $\delta L[\lambda]$ which we now wish to assess.

We do this by observing first the photons with the fictitious detector of unrestricted resolution. Then the ideal observation is compared with the real observation in the finite binning $\Delta$ used in the data analysis. In the latter, only the bin centre $(t_{\rm c},X_{\rm c},Y_{\rm c},E_{\rm c})$ of the bin into which the individual photon falls is retained leading to uncertain photon positions. Hence, a finite photon binning entails unavoidably the first error of the list

 
    $\displaystyle \hbox{(1) Photon localization error,\hfill}$  
    $\displaystyle \hbox{(2) Numerical integration error.}$ (38)

The second error source in (38) is tolerated in order to save program execution time. The most primitive (but cheapest) numerical integration rule is applied in order to evaluate the bin-averaged count rates $\lambda_{i,j,k,l}$ from (22). We investigate the likelihood error

 \begin{displaymath}\delta L[\lambda]:=L_\Delta[\lambda]-L[\lambda]
\end{displaymath} (39)

with $L_\Delta[\lambda]$ from (29) in presence of the two error sources from (38).

Let $\Delta$ be a binning of $\vec{P}$ not finer than $\Delta_{\rm D}$ with bin numbers I,J,K,L'. We approximate the bin-averaged count rates,

 \begin{displaymath}\lambda_{i,j,k,l}\approx\lambda(t_{\rm c},X_{\rm c},Y_{\rm c},E_{\rm c})\cdot \Delta V_{i,j,k,l},
\end{displaymath} (40)

by bin-central count rates $\lambda(t_{\rm c},X_{\rm c},Y_{\rm c},E_{\rm c})$ of the same bin. With an unknown mean position $\tilde p:=(\tilde t,\tilde X,\tilde Y,\tilde E)\in\vec{P}_{i,j,k,l}$ for p:=(t,X,Y,E) and an unknown fraction $\theta\in(0,1)$ we arrive at the exact error representation in terms of the gradient and the Hessian of the count rate $\lambda$:
 
    $\displaystyle \delta L[\lambda]:=\sum_{i,j,k,l}\epsilon_{i,j,k,l}\Delta V_{i,j,k,l}+\sum_{m=1}^M\epsilon_m,$  
    $\displaystyle \epsilon_{i,j,k,l}:={1\over{\Delta V_{i,j,k,l}}}\int_{\vec{P}_{i,j,k,l}}\lambda(t,X,Y,E)\,{\rm d}t{\rm d}X{\rm d}Y{\rm d}E$  
    $\displaystyle ~~~~~~~~~~~ -\lambda(t_{\rm c},X_{\rm c},Y_{\rm c},E_{\rm c}),$  
    $\displaystyle ~~~~~~~~~ ={1\over2}\cdot \sum_{\alpha,\b=1}^4
{{\partial ^2 \lam...
...}\over{\partial p_\alpha \partial p_\beta}}\cdot \tilde p_\alpha\tilde p_\beta,$  
    $\displaystyle \epsilon_m:=\ln\left [\lambda(t_{\rm c},X_{\rm c},Y_{\rm c},E_{\rm c})/\lambda(t_m,X_m,Y_m,E_m)\right ],$  
    $\displaystyle ~~~~ =\sum_{\alpha=1}^4{1\over\lambda(\theta\tilde p)} {{\partial \lambda(\theta\tilde p)}\over{\partial p_\alpha}}\cdot \tilde p_\alpha.$ (41)

In (41), the mean value positions $\tilde p$ and the fractions $\theta$ are supplied by the multivariate Taylor expansion with remainder and differ in both occurrences. Associated to the mth photon is the bin index (i,j,k,l) of the bin into which it falls.

For a given binning $\Delta$ and a chosen tolerance $\epsilon$, the third and fifth line in (41) to establish observation feasibility margins on the variability of the count rate $\lambda$ in time, space, and energy in terms of first and second partial derivatives of the count rate $\lambda$.

  
3.3.1 The photon space for directional estimation

The replacement of bin-averaged count rates $\lambda_{i,j,k,l}$ by the right-hand side of (40), as done in EXSAS, yields an enormous gain in calculation speed. A further gain in speed is brought about by the adaption of the photon space to the task. Joint light curve, spectral, and directional source estimations were possible with (31) but were not included in EXSAS. Spectral and light curve estimation is delegated to other EXSAS tasks. Therefore, the photon space is in time and detector signal amplitudes A (replacing the energy E from Sect. 3.1) trivially binned, L'=1 and K=1. In other words, photons of all arrival times and of all detector signal amplitudes falling into selectable intervals can be included,

 \begin{displaymath}\vec{T}_1:=[\underline T,\bar T],\qquad \vec{A}_1:=[\underline A,\bar A].
\end{displaymath} (42)

For instance, the user has the choice to make soft or hard band images of an observation by choosing the margins of $\vec{A}_1$ appropriately. This means (P1), (P2) in (16) is only required with respect to the spatial binning in sky space $\vec{S}$.

  
4 Background model and point source model

The point source detection sensitivity depends on the imaging properties of the instrument and the background against which a point source is to be detected. For the ROSAT detectors, four background count sources are relevant: (a) instrumental background, (b) charged particles induced background, (c) scattered solar radiation, and (d) diffuse celestial X-ray background. The contributions from these background sources for the three ROSAT detectors are compiled in the instrument description, see Trümper (1991a) and the detector specific papers. For instance, the ROSAT PSPC detector has no instrumental background. The particle induced background contributes $2.5 ~ 10^{-5}\,{\rm counts/(s ~ arcmin}^2)$ over $95\%$ of the observation time and is uniformly distributed over the detector field $\vec{D}$.

As in the general case (36), two separate types/classes of photons are distinguished, background photons and source photons. The first ones are those which are defined not to come from the observation target. The definition itself will be made in Sect. 4.1.3.

  
4.1 The background count rate distribution

Apart from the parameterization in (36) we assume in (18) an additive two-component-mixture of a local mean background count rate $\lambda_{\rm B}\ge0$ and a local mean source count rate component $\lambda_{\rm S}\ge0$ in the count rate vector $\lambda:=(\lambda_{\rm B},\lambda_{\rm S})$,

 \begin{displaymath}\lambda(t,X,Y,A)=:\lambda_{\rm B}(t,X,Y,A)+\lambda_{\rm S}(t,X,Y,A).
\end{displaymath} (43)

Within this model, the estimation of $\lambda$ becomes the joint estimation of both components $\lambda_{\rm B}$ and $\lambda_{\rm S}$. This joint estimation can be carried out algorithmically by a series of repeated iterated estimation steps for one background component until the change between the last $\lambda$ and the penultimate estimation $\lambda'$ is small enough,
 
    $\displaystyle (1) \hbox{ choose an increment threshold $\Delta\lambda>0$ },$  
    $\displaystyle (2) \hbox{ estimate $\lambda_{\rm B}$ },$  
    $\displaystyle (3) \hbox{ estimate $\lambda_{\rm S}$ , given $\lambda_{\rm B}$ },$  
    $\displaystyle (4) \hbox{ repeat steps (2), (3) until $\vert\lambda-\lambda'\vert\le\Delta\lambda$ }.$ (44)

The distance $\vert\cdot \vert$ is to be defined in the $(\lambda_{\rm B},\lambda_{\rm S})$ quadrant. The convergence to a limit vector $\lambda:=(\lambda_{\rm B},\lambda_{\rm S})$ in (44) for an increasing number of repetitions is presumed. In EXSAS, a multi-step background estimation procedure embodies (2) in (44). This is the subject of the next three subsections. Step (3) is the principal and last step of the MLE. No further corrections for the initial $\lambda_{\rm B}$ as indicated in (44) are calculated, i.e. the repetition (4) is not performed.

  
4.1.1 The background count distribution in EXSAS

According to the ROSAT photon space (15), we parameterize the three $\lambda$ in (43) by amplitude A instead of energy E. The transition from count rates to counts follows. With the trivial time and amplitude binning from (42), and the fine binning (21) of the sky space we have with $C\in\{B,S\}$ the hypothetical bin counts

 \begin{displaymath}\Lambda_{C,i,j}:=\int_{\vec{T}_1\times\vec{X}_i\times\vec{Y}_...
...s\vec{A}_1}\lambda_C(t,X,Y,A){\rm d}t{\rm d}X{\rm d}Y{\rm d}A.
\end{displaymath} (45)

The aim now is to obtain estimates $\hat\Lambda_{{\rm B},i,j}$ for $\Lambda_{{\rm B},i,j}$. The sample ${\cal P}$is subjected to the same binning (45) yielding the observed photon histogram ${\cal N}$ with spatial bin occupation numbers ni,j,

 \begin{displaymath}{\cal N}:=\left \{\,n_{i,j}:\ (i,j)\in \vec{Z}^2_{I,J}\right \},\quad \vec{Z}_{I,J}^2:=[1,I]\times[1,J].
\end{displaymath} (46)

Departing from ${\cal N}$ with equal sizes I=J=512, a smooth estimate $\hat\Lambda_{\rm B}(X,Y)$ over $\vec{D}$ with the approximate count preservation requirement

 \begin{displaymath}\Lambda_{B,i,j}\approx{1\over{\Delta X_i\Delta Y_j}}\int_{\vec{X}_i\times\vec{Y}_j}\hat\Lambda_{\rm B}(X,Y){\rm d}X{\rm d}Y
\end{displaymath} (47)

is calculated in three steps.

  
4.1.2 Crude source number and location estimation

To find an initial estimate on source number and location is the first challenging task within MLE.

Point sources appear in spatial count space as local count enhancements in a form given by the PSF in effect. In a brute-force approach, source regions of a form and size compatible with the PSF are hypothesized and all their translations and dilatations in a specified search region ${\cal S}$ are tested at a selectable significance level to contain count variations. Testing all the hypothetical source regions guarantees that no one can be overlooked. The form of the region, however, is kept fixed.

First, specify the region of interest ${\cal S}\subset\vec{Z}^2_{I,J}$. Small hypothetical source squares $\vec{S}_{i,j}(s)$ parallel to the axes, called windows in some publications, of linear half-width $s\in{\bf N}$ around a central pixel position (i,j) together with a background pixel neighborhood $\vec{B}_{i,j}(s)$ around the source square are chosen. Denote by $T_{\vec{S},i,j}(s),T_{\vec{B},i,j}(s)$ the total exposure times for the respective suffix regions. The number of counts[*], $N_{\vec{S},i,j}(s)$, falling into $\vec{S}_{i,j}(s)$ and the number of counts, $N_{\vec{B},i,j}(s)$, falling into $\vec{B}_{i,j}(s)$ are determined,

 
    $\displaystyle N_{\vec{S},i,j}(s):=\sum_{(k,l)\in\vec{S}_{i,j}(s)}n_{k,l},\quad (i,j)\in{\cal S},$  
    $\displaystyle N_{\vec{B},i,j}(s):=\sum_{(k,l)\in\vec{B}_{i,j}(s)}n_{k,l},$  
    $\displaystyle T_{\vec{S},i,j}(s):=\sum_{(k,l)\in\vec{S}_{i,j}(s)}T_{k,l},$  
    $\displaystyle T_{\vec{B},i,j}(s):=\sum_{(k,l)\in\vec{B}_{i,j}(s)}T_{k,l},$  
    $\displaystyle \vec{S}_{i,j}(s):=\left \{(k,l)\vec{Z}^2_{I,J}:\ \vert k-i\vert\le s,\,\vert l-j\vert\le s\right \},$  
    $\displaystyle \vec{B}_{i,j}(s):=\vec{S}_{i,j}\left ([(1+5s)/3]\right )\setminus \vec{S}_{i,j}(s).$ (48)

In (48), $[\cdot ]$ stands for rounding to the nearest integer. For $s\bmod3=1$no rounding takes place. Figure 4 shows a hypothetical source region $\vec{S}_{i,j}(1)$ surrounded by the related background region $\vec{B}_{i,j}(1)$ from (48).
  \begin{figure}
\includegraphics[width=8cm]{2525f4.eps}\par\end{figure} Figure 4: The $3\times 3$ pixel source region $\vec{S}_{i,j}(1)$ surrounded by the 16 pixel background region $\vec{B}_{i,j}(1)$. The central pixel is located at (i,j)
Open with DEXTER

The exposure time histogram ${\cal T}$ has the binning from (46). Figure 5 shows an example of this histogram for the PSPC detector.

  \begin{figure}
\par\includegraphics*[bb=105 315 500 725,angle=0,scale=0.40]{ml_fig_5.ps}\end{figure} Figure 5: The $512\times 512$ PSPC exposure time histogram ${\cal T}$
Open with DEXTER

To account for extended sources, the dilatations $s:=2^d,\,d=0(1)D$, in (48) are investigated in a multi-grid implementation from fine to coarse. The highest dilatation exponent D can be chosen appropriately. This means: only translations (ks,ls) with natural k,l are possible[*].

Does the central pixel (i,j) belong to a source region? In order to lend this question a statistically testable form, we formulate a null hypothesis in terms of the spatial count rates $\lambda_{\vec{S},i,j},\lambda_{\vec{B},i,j}$ from (45) but without the spatial integration in the suffix regions from (48),

 \begin{displaymath}H_0 :\ \hbox{The spatial count rates are equal, $\lambda_{\vec{S},i,j}=\lambda_{\vec{B},i,j}$ .}
\end{displaymath} (49)

Consider any region $\vec{Q}$ in sky space. Partition it into a hypothetical source region $\vec{S}$ and a hypothetical background region $\vec{B}$.

Let $\Lambda(\vec{S}),\Lambda(\vec{B})$ be the related mean counts from (17),(45). According to (P1),(P2) from (16), the probability $p(n_{\vec{S}}\mid n,H_0)$ to observe $n_{\vec{S}}$ counts in the region $\vec{S}$ conditional upon $n:=n_{\vec{S}}+n_{\vec{B}}$ being fixed is binomial. This probability and the cumulative probability, $P(n_{\vec{S}}\mid n,H_0)$, to observe $n_{\vec{S}}$ or more source counts are given by

 
    $\displaystyle p(n_{\vec{S}}\mid n,H_0):={n\choose n_{\vec{S}}}\,{p_{\vec{S}}}^{n_{\vec{S}}}\cdot {p_{\vec{B}}}^{n-n_{\vec{S}}},$  
    $\displaystyle P(n_{\vec{S}}\mid n,H_0):=\sum_{k=n_{\vec{S}}}^{n}p(k\mid n,H_0),$  
    $\displaystyle p_{\vec{S}}:={\Lambda(\vec{S})\over{\Lambda(\vec{S})+\Lambda(\vec{B})}},$  
    $\displaystyle p_{\vec{B}}:={\Lambda(\vec{B})\over{\Lambda(\vec{S})+\Lambda(\vec{B})}}\cdot$ (50)

With (17), the probabilities from (50) applied to the pair of regions $\vec{S}:=\vec{S}_{i,j}(s),\,\vec{B}:=\vec{B}_{i,j}(s)$ become when dropping three arguments for notational convenience,
 
    $\displaystyle P_{\vec{S},i,j}(s):=P(N_{\vec{S},i,j}(s)\mid N_{i,j}(s),H_0)_{{\Large \vert}S=S_{i,j},B=B_{i,j}},$  
    $\displaystyle p_{\vec{S},i,j}(s):={T_{\vec{S},i,j}(s)\over{T_{i,j}(s)}},\ N_{i,j}(s):=N_{\vec{S},i,j}(s)+N_{\vec{B},i,j}(s),$  
    $\displaystyle p_{\vec{B},i,j}(s):={T_{\vec{B},i,j}(s)\over{T_{i,j}(s)}},\ T_{i,j}(s):=T_{\vec{S},i,j}(s)+T_{\vec{B},i,j}(s).$ (51)

We select an appropriate error probability $P_0:=\exp[-L_0]$ parameterized by[*] L0>0. If the counts $N_{\vec{S},i,j}$ are large enough so that the probability, $P_{\vec{S},i,j}(s)$, to observe $N_{\vec{S},i,j}$ or more counts in the source square $\vec{S}_{i,j}(s)$ satisfies

 \begin{displaymath}P_{\vec{S},i,j}(s)\le P_0
\end{displaymath} (52)

then the central pixel (i,j) is loosely referred to as source pixel at detection likelihood L0. This is the technical term for the quantitative degree of source existence in the final source list used in many ROSAT publications, see Voges et al. (1999) for instance. The likelihood sequence increases strictly with the number $n_{\vec{S}}$ of observed source counts and decreases with increasing source probability  $p_{\vec{S}}$,
 
    $\displaystyle 0:=L_0<L_1<...<L_{n-1}<L_n:=-n\ln(p_{\vec{S}}),$  
    $\displaystyle L_k:=-\ln[P(k\mid n,H_0)],\qquad 0\le k\le n.$ (53)

Figure 6 shows for the source probabilities $p_{\vec{S}}=0.1(0.1)0.9$ the likelihood sequences for n=1000 total counts. For less source counts than expected, i.e. for $n_{\vec{S}}<np_{\vec{S}}$, the likelihoods Lk stay about in the interval $[0,\ln(2)]$. Of practical interest is the likelihood interval of say [5,20].
  \begin{figure}
\par\includegraphics*[angle=-90,width=8.2cm,clip]{ml_fig_6.ps}\end{figure} Figure 6: The likelihood sequences Lk for $p_{\vec{S}}=0.1(0.1)0.9$ (from left to right) for n=1000 total counts in the likelihood interval [0,1000]
Open with DEXTER

The evaluation of the binomial cumulative probability $P(k,\mid n,H_0)$ from (53) becomes impractical for large count numbers n and k. The univariate version of the Central Limit Theorem supplies us with the classical asymptotical approximation for $n\to\infty$ (with the notorious slow convergence speed $1/\sqrt{n}$) in terms of the univariate normal density. Under the change $t^2\to t$ of the independent variable t, the approximation can be rewritten in terms of the density $\chi_1(t)$, of the $\chi^2$ distribution with one degree of freedom. This approximation is used in the central region $0\le n_{\vec{S}}-np_{\vec{S}}\le 3\sqrt{np_{\vec{S}} p_{\vec{B}}}$. Additionally, a lower bound sequence $\underline L_k$ of excellent approximation quality is used. Figure 7 gives the error sequence $\Delta L_k:=L_k-l_k$ with lk being the said approximation.

For high counts in ${\cal C}_{\rm H}$ (see (57)), the likelihoods Li,j are approximated by the tail probability of the $\chi^2_1$ distribution with one degree of freedom,

 
Li,j := $\displaystyle -\ln\left [\int_{X_{i,j}^2}^\infty\chi_1(t){\rm d}t\right ] ,\quad \chi_1(t):={{\rm e}^{-t/2}\over{\sqrt{2\pi t}}},$  
Xi,j := $\displaystyle {{N_{\vec{S},i,j}T_{i,j}-N_{i,j}T_{\vec{S},i,j}}\over{\sqrt{N_{i,j}T_{\vec{S},i,j}T_{\vec{B},i,j}}}}\ge0.$ (54)

The function $L(X^2):=-\ln\left [\int_{X_{i,j}^2}^\infty\chi_1(t){\rm d}t\right ]$ is tabulated in EXSAS for X2=0(0.1)X20. Beyond the upper bound X20:=39.9, the lower bound L1(X2) in
 
    $\displaystyle L_1\left (X^2\right )\le L\left (X^2\right )\le L_2\left (X^2\right ),\quad X^2>0,$  
    $\displaystyle L_1(X^2):= {1\over2}\left (X^2+\ln\left ({\pi\over2}X^2\right )\right ),$  
    $\displaystyle L_2(X^2):= {1\over2}\left (X^2+\ln\left ({\pi\over2}\left (X+{1\over X}\right )^2\right )\right ).$ (55)

is used. The error is for $X^2\ge40$ less than 0.0007.
  \begin{figure}
\par\includegraphics*[angle=-90,width=8.2cm,clip]{ml_fig_7.ps}\end{figure} Figure 7: The likelihood error sequence $\Delta L_k$ for $p_{\vec{S}}=0.36$ and n=1000 counts
Open with DEXTER

Each pixel position (i,j) satisfying (52) becomes a point of the support of the (first) likelihood histogram

 
    $\displaystyle {\cal L}_1(L_0):=\left \{\lambda_{i,j} :\ (i,j)\in{\cal S}\right \},$  
    $\displaystyle \lambda_{i,j}:=\left\{\begin{array}{*2{l}} L_{i,j}, & L_{i,j}\ge L_0,\\
0, & L_{i,j}<L_0, \end{array}\right.$  
    $\displaystyle L_{i,j}:=-\ln\left [P_{\vec{B},i,j}(s)\right ].$ (56)

In order to adjust the approximation error properly later, select $N_0\in\vec{N}$ large enough. The observed histogram count space ${\cal C}$ is partitioned into a low count region ${\cal C}_{\rm L}$ and the complementary high count region ${\cal C}_{\rm H}$ by
 
    $\displaystyle {\cal C}_{\rm L}:=\{(n_{\vec{S}},n_{\vec{B}})\in{\cal C} :\ 0\le n_{\vec{S}}+n_{\vec{B}}<N_0\},$  
    $\displaystyle {\cal C}_{\rm H}:=\{(n_{\vec{S}},n_{\vec{B}})\in{\cal C} :\ n_{\vec{S}}+n_{\vec{B}}\ge N_0\},$  
    $\displaystyle {\cal C}:=\{(n_{\vec{S}},n_{\vec{B}})\vec{Z}^2 :\ n_{\vec{S}},\,n_{\vec{B}}\ge0\}.$ (57)

In EXSAS, N0=100 was chosen. In ${\cal C}_{\rm L}$, the likelihoods Li,j from (56) are evaluated. The histogram ${\cal L}_1(L_0)$ is algorithmically decomposed into a union of connected components ${\cal L}_{\rm s}(L_0)$. For each component, the maximizer pixel position $(i_{\rm s},j_{\rm s})$ is found. The center of gravity in a small neighborhood, ${\cal L}_{\rm s}'(L_0)$, of the maximizer is the estimate of the source position,
 
    $\displaystyle {\cal L}_1(L_0):=\cup_{s=1}^{S_1}{\cal L}_{1,{\rm s}}(L_0),$  
    $\displaystyle (i_{\rm s},j_{\rm s}):=\arg\left [\max_{(i,j)\in{\cal L}_{1,{\rm s}}(L_0)}\lambda_{i,j}\right ] ,$  
    $\displaystyle X_{\rm s}:={1\over\lambda_{\rm s}'} \sum_{(i,j)\in{\cal L}_{1,{\r...
...s}:={1\over\lambda_{\rm s}'}\sum_{(i,j)\in{\cal L}_{1,{\rm s}}'}j\lambda_{i,j},$  
    $\displaystyle \lambda_{\rm s}':=\sum_{(i,j)\in{\cal L}_{1,{\rm s}}'}\lambda_{i,j}.$ (58)

This way, a first source list, ${\cal S}_1(L_0)$, of S1 source candidate positions in sky space at detection likelihood L0 is found,

 \begin{displaymath}{\cal S}_1(L_0):=\left \{(X_{\rm s},Y_{\rm s})\right \}_{s=1}^{S_1}.
\end{displaymath} (59)

Besides positions and detection likelihoods Li,j, 13 additional source characteristics are incorporated in this table - the intermediate result after the first step. See Voges et al. (1999) for a specimen of this list.

  
4.1.3 Background domain definition

In the second step, the background estimate $\hat\Lambda_{\rm B}(X,Y)$ is found. S:=S1 circular disks around the source positions with radii $R_{\rm s}$ and a union $\vec{U}$ thereof serve to define the background domain $\vec{B}$,

 
    $\displaystyle \vec{D}(X_{\rm s},Y_{\rm s},R_{\rm s}):=\left \{(X-X_{\rm s})^2+(Y-Y_{\rm s})^2\le R_{\rm s}^2\right \},$  
    $\displaystyle R_s:=\rho \hbox{\it FWHM}(\epsilon,A),$  
    $\displaystyle \vec{U}:=\cup_{s=1}^S\vec{D}(X_{\rm s},Y_{\rm s},R_{\rm s}),$  
    $\displaystyle \vec{B}:=\vec{X}\times\vec{Y}\setminus \vec{U}.$ (60)

For the radii $\rho:=2.5$ is standardly chosen. The FWHM function, explained in Sect. 4.2, is evaluated at the disk centre off-axis angle $\epsilon$ and photon amplitude A. The photons falling within the background domain $\vec{B}$ constitute the subsample ${\cal P}_0$ and are defined to be background photons. Clearly, the error probabilities that a source photon stems from the background and vice versa are expected highest near the disk boundaries. A user-defined trade-off is possible by adjusting the radii $R_{\rm s}$.

Figure 8 shows an example for the background photons (black dots) corresponding to ${\cal P}_0$, introduced in (2) in binned form ${\cal N}_0$.

  \begin{figure}
\par\includegraphics*[bb=105 315 500 725,angle=0,scale=0.40]{ml_fig_8.ps} \end{figure} Figure 8: Censored $512\times 512$ background. Disks around source positions are cut out
Open with DEXTER

  
4.1.4 Background count distribution

A bivariate least square cubic spline surface $\hat\Lambda_{\rm B}(X,Y)$ over $\vec{D}$ solving the optimization problem,

 \begin{displaymath}\sum_{(i,j)\in\vec{B}}w_{i,j}\left [\Lambda_B(X_i,Y_j)-n_{i,j}\right ]^2\to\min,\quad w_{i,j}:=T_{i,j}
\end{displaymath} (61)

is fitted to the observed count histogram ${\cal N}$ restricted to the background domain $\vec{B}$. This is the proper estimation step. The pixel centres (Xi,Yj) used in (61) are rarefied, a coarser binning is applied. This is adequate for the background component postulated to exhibit no abrupt variations.

The spline surface $\hat\Lambda_{\rm B}(X,Y)$ is the adopted count distribution estimate over the sky field. So, we assume the $\Lambda_{{\rm B},i,j}$ in (45) known.

  
4.1.5 Refined source number and location estimation

Knowing the background estimate $\hat\Lambda_{\rm B}(X,Y)$, the source determination can be redone. This is the third step. From $\hat\Lambda_{\rm B}(X,Y)$ the background histogram ${\cal N}_{\rm B}$ of the same format as ${\cal N}$ from (46) is formed. The background regions and background counts are redefined,

 
    $\displaystyle {\cal N}_{\rm B}:=\left \{\,n_{\vec{B},i,j} :\ (i,j)\vec{Z}^2_{I,J}\right \},$  
    $\displaystyle N_{\vec{B},i,j}(s):=\sum_{(k,l)\in\vec{B}_{i,j}(s)}n_{\vec{B},k,l},$  
    $\displaystyle \vec{B}_{i,j}(s):=\vec{S}_{i,j}(s).$ (62)

With the remaining quantities from (48), the source position estimation as described in Sect. 4.1.2 is repeated. The outcome is a second list, ${\cal L}_2(L_0)$, of additional S2 source candidate positions in sky space. The second list is merged with the first list ${\cal L}_1(L_0)$ from (58) into a list ${\cal L}(L_0)$ of S:=S1+S2 sources in total
 
    $\displaystyle {\cal L}_2(L_0):=\left \{(X_{\rm s},Y_{\rm s})\right \}_{s=1}^{S_2},$  
    $\displaystyle {\cal L}(L_0):={\cal L}_1(L_0)\cup{\cal L}_2(L_0).$ (63)

With finally S source positions from (63), the corresponding disks $\vec{D}$ from (59) are determined. The photons in these disks constitute the subsamples ${\cal P}_{\rm s},\;s=1,\,...,S$, mentioned in (15); they are taken as source photons - the input data for MLE in the narrow sense.

  
4.2 The point source model

The image of a celestial point source in a fixed celestial direction at infinite distance, is, by definition, the point spread function (PSF). In the case of the ROSAT detectors, the PSFs are parameterized by the (a) off-axis angle $\epsilon$ and (b) the photon detector amplitude A. The off-axis angle $\epsilon$ is the angle which subtends the PSF centre with the instrument's optical axis.

As a compromise between practicability (in terms of computing time) of the resulting MLE procedure and model precision, a bivariate normal circular symmetric distribution density, p, defined over the whole sky space $\vec{S}$ is used to represent the measured point spread function in EXSAS for the MLE of single X-ray point source parameters. In statistical interpretation, the probability dP to find a photon in sky space in a box around the central point (X,Y) with infinitesimal side lengths dX and dY when a photon arrived at the point (x,y) in the detector plane causing a detector amplitude A is d $P:=p(X,Y\mid X_{\rm S},Y_{\rm S},\nu_{\rm S},A)\,{\rm d}X{\rm d}Y$. The spread parameter $\nu_{\rm S}$ appearing in (64) will be explained in the sequel.

More precise point spread function models of the ROSAT instruments are used in EXSAS for other analysis purposes, see Boese (2000). For instance, the distribution tail in the measured PSF is higher than that of (64). The celestial local mean source count rate $\lambda_{\rm S}^{\rm c}$ in (64) entails the count rate $\lambda_{\rm S}$ in sky space

 
    $\displaystyle \lambda_{\rm S}(t,X,Y,A):=\lambda_{\rm S}^{\rm c}(t,X_{\rm S},Y_{\rm S},A)
\cdot V[A,\epsilon(x,y)]$  
    $\displaystyle ~~~~~~~~~~~~~~~~~~~~~ \times p(X,Y\mid X_{\rm S},Y_{\rm S},
\nu_{\rm S},A),$  
    $\displaystyle p(X,Y\mid X_{\rm S},Y_{\rm S},\nu_{\rm S},A):={1\over2\pi\nu(x,y,A)}$  
    $\displaystyle ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \times {\rm e}^{-[(X-X_{\rm S})^2
+(Y-Y_{\rm S})^2]/\nu(x,y,A)},$  
    $\displaystyle \nu(x,y,A):=2\sigma^2[A,\epsilon(x,y)]+\nu_{\rm S},\quad \nu_{\rm S}\ge0,$  
    $\displaystyle \epsilon(x,y):=\sqrt{x^2+y^2}.$ (64)

Notice the joint presence of sky coordinates (X,Y) and detector coordinates (x,y) in (64). The dependence of p on the detector coordinates is not denoted. The vignetting factor V will be explained in the next subsection.

To account for extended sources, the spread parameter $\nu(x,y,A)$, in fact a double variance parameter, containing additively the nonnegative source extent parameter $\nu_{\rm S}\ge0$, has been introduced. Though the source parameter $\nu_{\rm S}$ does not estimate the full morphology[*] of extended sources, it offers the azimuthally averaged part of the morphology as a parameter of source extendedness. In operational terms, a radial two-dimensional Gaussian photon distribution is fitted to an extended source photon distribution.

  
4.3 The vignetting

Apart from the form of the distribution density $p(X,Y\mid X_{\rm S},Y_{\rm S},\nu_{\rm S},A)$, two further functions characterize the instrument. One of them is the instrument's vignetting function, $V[A,\epsilon]$, fulfilling

 \begin{displaymath}0\le V[A,\epsilon]\le V[A,0]:=1.
\end{displaymath} (65)

The vignetting function describes the instrument's relative collection efficiency as a function of photon detector amplitude A and off-axis angle $\epsilon$. The reference value is the value of highest efficiency, found in on-axis position, see (65). In a stochastical interpretation, it describes the loss of photon counts with increasing angular off-axis distance $\epsilon(x,y)$ from (64) for a given photon detector amplitude A.

A prominent property of the Poisson process is that it is preserved under random sampling. We view the (non-)vignetting as a realization of random sampling, and the infalling local mean source count rate $\lambda(t,X,Y,E)$ diminishes under vignetting from the on-axis maximal count rate $\lambda^0(t,X,Y,A)$ to the lower vignetted count rate $\lambda^{\rm V}(t,X,Y,A)$,

 \begin{displaymath}\lambda_{\rm S}^{\rm V}(t,X,Y,A):=\lambda_{\rm S}^0(t,X,Y,A)\cdot V\left [A,\epsilon(x,y)\right ].
\end{displaymath} (66)

The second function, $\sigma^2(A,\epsilon)>0$, appearing in (64) is the variance of distribution density p. It characterizes the spread of the point spread function in dependence of the source photon amplitude A and the off-axis angle $\epsilon$. More precisely, $r_{1/2}:=\sigma(A,\epsilon)\cdot \sqrt{\ln(4)}=\sigma(A,\epsilon)\cdot 1.177...$ is the median radius, i.e. the radius of the circular disk around the maximizer $(X_{\rm S},Y_{\rm S})$ of p in which $50\%$ of all photons fall. Into the larger disk with radius $3\sigma$ and the same centre fall $98.9\%$ of all photons.

According to the source model (64), we have four single-source parameters $X_{\rm S},Y_{\rm S},\lambda_{\rm S}^V,\nu_{\rm S}$, representing in turn: the source position $(X_{\rm S},Y_{\rm S})$, vignetted source count rate, and source extent parameter. Depending on the context, we regard either $\lambda^{\rm V}_{\rm S}$ or $\lambda^0_{\rm S}$ from (66) as the basic count rate parameter. Only the first one is observable, and the one of eventuel physical interest is $\lambda^{\rm c}_{\rm S}$.

  
4.4 The time-dependent point spread function

The optical axis moves across the Sky in time. Correspondingly, a celestial X-ray source in a fixed celestial direction traces in the detector plane a source trajectory ${\cal D}$ during the observation interval $\vec{T}$. The motion ${\cal D}$ in time-parameterized form and parameterized by the off-axis-angle $\epsilon$, viewed as a random variate with distribution $T(\epsilon),\,0\le\epsilon\le\bar\epsilon$, for the individual trajectory ${\cal D}$ are

 
    $\displaystyle {\cal D}:\ \left (x_{\rm S}(t),y_{\rm S}(t)\right )\in\vec{F},\qquad t\in\vec{T},$  
    $\displaystyle {\cal D}:\ T(\epsilon):=\vert\left \{t\in\vec{T} :\ x_{\rm S}^2(t)+y_{\rm S}^2(t)\le\epsilon^2\right \}\vert/T.$ (67)

By definition (67), $T(\epsilon)$ is the fraction of the observation time T the trajectory ${\cal D}$ stays in or on the circle of radius $\epsilon$. The upper limit $\bar\epsilon$ of the range of $\epsilon$ is the radius of the field of view $\vec{F}$. Let $t(\epsilon):=T'(\epsilon)$ be the density of the distribution $T(\epsilon)$.

As mentioned in Sect. 2.3.3, in pointing mode, the motion (67) is nominally constant apart from a standardly applied deliberate small wobbling motion of the optical axis and superimposed by a still smaller stochastic motion due to stochastic forces. So, $t(\epsilon)$ is expected to be unimodal with a mode position near $(x_{\rm S},y_{\rm S})$, the source position $(X_{\rm S},Y_{\rm S})$ in detector coordinates. In survey mode, the source under consideration executes (under the ROSAT survey scanning scheme) a series of neighbouring parallel tracks in the field of view $\vec{F}$. Hence, a density $t(\epsilon)$ close to the uniform density $t_{\rm u}(\epsilon):=1/\vert\vec{F}\vert$ is expected. To save execution time within MLE in EXSAS, no off-axis histogram approximating $t(\epsilon)$ has been constructed and used. Equally, the individual density $t(\epsilon)$ was not explicitly replaced by more general and thus less close approximations to $t(\epsilon)$. The underlying rationale will be discussed.

Substituting the decomposition (43) into (31) gives the likelihood functional L from (31) in the parameterized form

 
    $\displaystyle L(X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}):=-\Lambda_{\rm...
...rge [\lambda_{\rm B}(t_m,X_m,Y_m,A_m)+\lambda_{\rm S}(t_m,X_m,Y_m,A_m)\Large ],$  
    $\displaystyle \Lambda_{\rm B}:=\int_{\vec{P}}\lambda_{\rm B}(t,X,Y,A)\,{\rm d}t{\rm d}X{\rm d}Y{\rm d}A,$  
    $\displaystyle \Lambda_{\rm S}:=\int_{\vec{P}}\lambda_{\rm S}(t,X,Y,A)\,{\rm d}t{\rm d}X{\rm d}Y{\rm d}A.$ (68)

In (68), the detector coordinates $(x,y):=(x_{\rm S}(t),y_{\rm S}(t))$ are taken along the source trajectory ${\cal D}$ from (67). This leads to a time-dependent off-axis angle $\epsilon(t):=\sqrt{x_{\rm S}^2(t)+y_{\rm S}^2(t)}$ and thus to a time-dependent PSF density p in (64) explaining the first argument t in $\lambda_{\rm S}(t,X,Y,A)$. The integration with respect to time t in (68) can be replaced by the integration with respect to the off-axis angle, ${\rm d}t=t(\epsilon){\rm d}\epsilon$ over its range $[0,\bar\epsilon]$.

Conceptually, the number $\Lambda_{\rm B}$ of expected background photons in (68) is constant with respect to all four source parameters; to source position $(X_{\rm S},Y_{\rm S})$, mean local source count rate $\lambda_{\rm S}$, and to the source extent parameter $\nu_{\rm S}\ge0$. This is also in practice so, save for cases in which the background region $\vec{B}$ from (60) is small compared to the field of view $\vec{F}$. The (real) number $\Lambda _{\rm S}$ of expected source photons in (68) depends in source positions near the boundary $\partial \vec{F}$ of the field of view $\vec{F}$ remarkably on the source parameters - as the following arguments clarify.

Assuming a continuous local mean count rate $\lambda_{\rm S}$ with respect to all four arguments, the mean value theorem applied to the integral $\Lambda _{\rm S}$ gives with an average mean local source count rate $\tilde\lambda_{\rm S}$ (to be determined) and a mean off-axis angle $\tilde\epsilon$ (to be determined) the exact representation

 \begin{displaymath}\Lambda_{\rm S}=:\tilde\lambda_{\rm S}\cdot \vert\vec{F}\vert\cdot \Delta A\cdot T,\qquad \epsilon=\tilde\epsilon.
\end{displaymath} (69)

In (69), $\vert\vec{F}\vert$ is the area of the field of view and $\Delta A:=\bar A-\underline A$ the width of the amplitude space used. In the case of strong sources, both T and $\Delta A$, can be minimized by selecting photons with arrival times and amplitudes close together with the consequence $\tilde\lambda_{\rm S}\approx\lambda_{\rm S}$. For weak sources, an averaging in (68) is in effect.

Under the trival binning (42) with respect to time and amplitude, no resolution with respect to these variables is achieved and only the time- and amplitude-averaged parts $\lambda_{\rm S}(X,Y)$ of a source count distribution $\lambda_{\rm S}(t,X,Y,A)$ can be estimated - a limitation without detrimental consequences when one aims at celestial directions and fluxes only.

We reverse the argument when we state: for

 
  $\textstyle \hbox{(S1)}\ \,$ $\displaystyle \hbox{stationary sources, $\lambda_{\rm S}(t,X,Y,A):=\lambda_{\rm S}(X,Y,A)$ ,}$  
  $\textstyle \hbox{(S2)}\ \,$ $\displaystyle \hbox{sources with uniformly distributed amplitudes,}$  
    $\displaystyle \lambda_{\rm S}(t,X,Y,A):=\lambda_{\rm S}(t,X,Y),$  
  $\textstyle \hbox{(S3)}\ \,$ % latex2html id marker 10961
$\displaystyle \hbox{celestial point sources of~(\ref{e21}) with $\lambda_{\rm S}$\space from~(\ref{e15}),}$ (70)

Eq. (69) holds with the spatially averaged count rate

 \begin{displaymath}\tilde\lambda_{\rm S}:={1\over{\vert\vec{F}\vert}}\int_{\vec{F}}\lambda_{\rm S}(X,Y){\rm d}X{\rm d}Y.
\end{displaymath} (71)

The constancy of $\lambda_{\rm S}$ in (S1),(S2) with respect to time t and amplitude A leads immediately to $\tilde\lambda_{\rm S}$ in (71). The normalized form (35) is less favourable in cases when several components of the total count rate $\lambda$ in (43) are discerned.

For a source possessing all three properties (S1) to (S3), i.e. for stationary amplitude-uniform point sources, one arrives with an unknown mean off-axis angle $\tilde\epsilon$ and a constant on-axis source count rate $\lambda_{\rm S}^0$ at

 
$\displaystyle \Lambda_{\rm S}$ := $\displaystyle \lambda_{\rm S}^0\cdot T\int_{\vec{F}}V[A,\tilde\epsilon]\cdot p(X,Y\,\vert\,X_{\rm S},Y_{\rm S},\nu_{\rm S},A)\,{\rm d}X{\rm d}Y,$  
  = $\displaystyle \lambda_{\rm S}^0\cdot T\cdot V\left [A,\tilde\epsilon\right ]\cdot \left [1-\o_{\vec{F}}\left (X_{\rm S},Y_{\rm S},\nu_{\rm S},A\right )\right ].$ (72)

The approximation $\tilde\epsilon:=\epsilon(x_{\rm S},y_{\rm S})$ in (72) is deemed to be reasonably accurate in both observation modes, survey and pointing.

It is justified to ignore censoring by the field of view $\vec{F}$ when the source position $(X_{\rm S},Y_{\rm S})$ lies in the central part of the field of view $\vec{F}$. The fraction of photons not falling into the field of view,

 
    $\displaystyle \o_{\vec{F}}\left (X_{\rm S},Y_{\rm S},\nu_{\rm S},A\right ):=$  
    $\displaystyle ~~~~~~~~~~~~~~~~\int_{\vec{R}^2\setminus \vec{F}}p(X,Y\mid X_{\rm S},Y_{\rm S},\nu_{\rm S},A)\,{\rm d}X{\rm d}Y\approx0$ (73)

is disregarded. Or, seen as a requirement for the point spread function and source positions, source positions must be bounded away from the boundary $\partial \vec{F}$, say farther than $3\sigma$ from the boundary $\partial \vec{F}$ of the field of view.

Under the approximations (69),(73) for stationary amplitude-uniform point sources, it follows the log-likelihood representation

 
    $\displaystyle L(X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}):=-\Lambda_{\rm B}-\lambda_{\rm S}\cdot T\cdot V[A,\epsilon(x_{\rm S},y_{\rm S})]$  
    $\displaystyle ~ +\sum_{m=1}^M \ln\Large [\lambda_B(t_m,X_m,Y_m,A_m)$  
    $\displaystyle ~ +\lambda_{\rm S}\cdot V[A_m,\epsilon(x_m,y_m)]\cdot p\left (X_m,Y_m\mid X_{\rm S},Y_{\rm S},\nu_{\rm S},A\right )\Large ].$ (74)

In (74), the on-axis source count rate $\lambda_{\rm S}^0$ from (66) is again denoted by $\lambda_{\rm S}$ for notational convenience[*]. In a more parsimonious notation and evaluating the vignetting factor at a mean amplitude $\tilde A$, (74) reads in terms of a vignetted observation time TV
 
    $\displaystyle L(X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}):= -\Lambda_{\r...
...-\lambda_{\rm S}
T^V+\sum_{m=1}^M\ln\Large [\lambda_{{\rm B},m}+\lambda_{\rm S}$  
    $\displaystyle ~~~~~~~~~~~~~~~~~~~~~~~~~~\times V[A_m,\epsilon(x_m,y_m)]\cdot p_m\left(X_{\rm S},Y_{\rm S},\nu_{\rm S}\right)\Large ] ,$  
    $\displaystyle T^{\rm V}:= T\cdot V[\tilde A,\tilde\epsilon],$  
    $\displaystyle \lambda_{{\rm B},m}:= \lambda_{\rm B}(t_m,X_m,Y_m,A_m),$  
    $\displaystyle p_m(X_{\rm S},Y_{\rm S},\nu_{\rm S}):= p^*(X_m,Y_m\,\vert\,X_{\rm S},Y_{\rm S},\nu_{\rm S},A_m).$ (75)

In (75), p* means that p is to be evaluated at the off-axis angle $\epsilon(x_m,y_m)$ and not at $\epsilon(x_{\rm S},y_{\rm S})$ as demanded by (64). Under this convention, $\nu$ is a function of $\nu_{\rm S}$ but not of $X_{\rm S},Y_{\rm S}$.

The mean off-axis angle $\tilde\epsilon$ is taken constant with respect to all four source parameters. All that follows will be based on this form of the log-likelihood function L. The form of L in (75) or (74) is the one used in EXSAS.

  
5 The maximum likelihood equations

R. A. Fisher coined in 1921 the term "likelihood'' and in 1922 the "maximum likelihood estimate''. Readers interested in the historical development are referred to Aldrich (1977), Edwards (1997), Hald (1999), and the literature cited therein.

Having specified the distributional form of the underlying source count distribution density in (75) up to the source parameter vector

 \begin{displaymath}\vec{s}:=\left (X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}\right )=:\left (s_1,s_2,s_3,s_4\right ),
\end{displaymath} (76)

it remains to achieve an estimate $\hat{\vec{s}}:=\left (\hat X_{\rm S},\hat Y_{\rm S},\hat\lambda_{\rm S},\hat\nu_{\rm S}\right )$ for the unknown parent counterpart $\vec{s}$ from (76) given the photon time series ${\cal P}$ from (1). As already said, for the purposes of this section, we assume a stationary monoenergetic point source.

Among several point estimation procedures, we chose estimation via the Maximum Likelihood Principle. The latter directs us to choose $\hat{\vec{s}}$ so that with L from (75) holds

 \begin{displaymath}L\left (\hat X_{\rm S},\hat Y_{\rm S},\hat\lambda_{\rm S},\ha...
...left (X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}\right ).
\end{displaymath} (77)

The source parameter space $\vec{S}$ in (77) is defined by
 
$\displaystyle \vec{S}$ := $\displaystyle \left\{\left(X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}\right)\vec{R}^4:\ (X_{\rm S},Y_{\rm S})\in\vec{F},\ \lambda_{\rm S}\ge0,\right.$  
    $\displaystyle \left.\nu_{\rm S}\ge0\right\}.$ (78)

Two cases are possible: either, the maximizer of L lies in the interior of $\vec{S}$ or on its boundary $\partial \vec{S}$. The first case is viewed as the generic one and assumed further on.

In view of the differentiability of L with respect to $\vec{s}$, the equations for $\hat{\vec{s}}$

 
$\displaystyle {\partial L(\hat{\vec{s}})\over{\partial X_{\rm S}}}$ = $\displaystyle 0,\quad{\partial L(\hat{\vec{s}})\over{\partial Y_{\rm S}}}=0,\qu...
...mbda_{\rm S}}}=0,\quad{\partial L(\hat{\vec{s}})\over{\partial \nu_{\rm S}}}=0,$  
$\displaystyle H(\hat{\vec{s}})$ < 0. (79)

are necessary. When the estimate $\hat{\vec{s}}$ satisfies (79), then $\vec{s}$ in (77) runs only through all zeros of the gradient vector ${\partial L/\partial \vec{s}}$ with components as given in the first line of (79) and obeying the negative definiteness in (79). The Hessian $4\times4$ matrix $H(\vec{s})$ of the log-likelihood function L in (79) has the elements

 \begin{displaymath}H_{j,k}(\vec{s}):={\partial ^2 L(\vec{s})\over{\partial s_j\partial s_k}},\qquad j,k=1,2,3,4.
\end{displaymath} (80)

As usual, $H(\hat{\vec{s}})<0$ means that $H(\vec{s})$ is negative definite at the point $\vec{s}:=\hat{\vec{s}}$ in source parameter space $\vec{S}$. The four equations along with the negative definiteness in (79) will be called ML equations.

By Sylvester's famous algebraic criterion, we know that a real $4\times4$ matrix is negative definite if and only if all its four principal minors d1 to d4 are negative,

 
    $\displaystyle d_j(\hat s)<0,$  
    $\displaystyle d_j(\hat s):={\rm det}\left [H^{(j)}(\hat s)\right ],\qquad j=1,2,3,4.$ (81)

In (81), $H^{(j)}(\hat{\vec{s}})$ is the principal matrix of order j of $H(\vec{s})$ evaluated at the maximizer $\vec{s}=\hat{\vec{s}}$,
 
    d1:=H1,1,  
    $\displaystyle d_2:={\rm det}\left [\begin{array}{cc} H_{1,1},& H_{1,2}\\  H_{2,1 },& H_{2,2}\end{array}\right ],$  
    $\displaystyle d_3:={\rm det}\left [\begin{array}{ccc} H_{1,1},& H_{1,2},& H_{1,...
...& H_{2,2},& H_{2,3}\\
H_{3,1},& H_{3,2},& H_{3,3}\end{array}\right ],\nonumber$  
    $\displaystyle d_4:={\rm det}[H].$ (82)

  
5.1 Maximizer at the boundary

We come briefly to the non-generic case. There may be special cases where the observation dictates a dedicated analysis effort. When $\hat{\vec{s}}$ is a boundary point of $\vec{S}$ at least one component of $\hat{\vec{s}}$ assumes a boundary value. The ML equations are to be altered accordingly,

 
$\displaystyle {\partial L\over{\partial \nu_{\rm S}}}$ < $\displaystyle 0,\qquad \hat\nu_{\rm S}=0,$  
$\displaystyle {\partial L\over{\partial \lambda_{\rm S}}}$ < $\displaystyle 0,\qquad \hat\lambda_{\rm S}=0,$  
$\displaystyle {\partial L\over{\partial r_{\rm S}}}$ < $\displaystyle 0,\qquad {\partial L\over{\partial t_{\rm S}}}=0,\qquad (\hat X_{\rm S},\hat Y_{\rm S})\in\partial \vec{F}.$ (83)

In (83), $\partial L/\partial r_{\rm S}$ means the inwardly directed radial direction derivative and $\partial L/\partial t_{\rm S}$ is the derivative in tangential direction. The latter direction is that of the boundary curve $\partial \vec{F}$ at the boundary point $(\hat X_{\rm S},\hat Y_{\rm S})$[*].

Sources of interest lie seldom near the boundary of the field of view $\vec{F}$. However, a vanishing source extent parameter $\hat\nu_{\rm S}$ is not uncommon. The same applies to a vanishing estimate for the source count estimate $\hat\lambda_{\rm S}$ in weak source count enhancements, cf. Sect. 6.1. The second condition in (83) is equivalent to $\partial L(0)/\partial \lambda_{\rm S}<0$ and is fulfilled for a high background or small number of source counts within the observation time.

In both cases, the corresponding inequalities in (83) are to be satisfied. Estimations for sources near the boundary $\partial \vec{F}$ of the field of view become difficult because of data censoring and tend to be unreliable and are not pursued any further here.

  
5.2 Gradient and Hessian of the log-likelihood

To solve the ML Eqs. (79) and to assign error bars, all partial derivatives of first and second order are needed. We consider $\tilde T:=T\cdot V[A,\tilde\epsilon]$ as a constant with respect to the point source parameters $X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}$, and arrive with the denotations of (75) and $\nu_m:=\nu(x_m,y_m,A_m)$ as well as $V_m:=V[A_m,\epsilon(x_m,y_m)]$ at the representation for the four first partial derivatives of L,

 
$\displaystyle {\partial L\over{\partial X_{\rm S}}}$ := $\displaystyle 2\lambda_{\rm S}\sum_{m=1}^M {V_m p_m(X_{\rm S},Y_{\rm S},\nu_{\r...
..._{\rm S} V_m p_m(X_{\rm S},Y_{\rm S},\nu_{\rm S})}}{X_m-X_{\rm S}\over{\nu_m}},$  
$\displaystyle {\partial L\over{\partial Y_{\rm S}}}$ := $\displaystyle 2\lambda_{\rm S}\sum_{m=1}^M {V_m p_m(X_{\rm S},Y_{\rm S},\nu_{\r...
..._{\rm S} V_m p_m(X_{\rm S},Y_{\rm S},\nu_{\rm S})}}{Y_m-Y_{\rm S}\over{\nu_m}},$  
$\displaystyle {\partial L\over{\partial \lambda_{\rm S}}}$ := $\displaystyle -\tilde T+\sum_{m=1}^M {V_m p_m(X_{\rm S},Y_{\rm S},\nu_{\rm S})\...
...\lambda_{{\rm B},m}+\lambda_{\rm S} V_m p_m(X_{\rm S},Y_{\rm S},\nu_{\rm S})}},$  
$\displaystyle {\partial L\over{\partial \nu_{\rm S}}}$ := $\displaystyle \lambda_{\rm S}\cdot \sum_{m=1}^M {V_m p_m(X_{\rm S},Y_{\rm S},\n...
...{\lambda_{{\rm B},m}+\lambda_{\rm S} V_m p_m(X_{\rm S},Y_{\rm S},\nu_{\rm S})}}$  
  $\textstyle \phantom{:=}$ $\displaystyle \times{(X_m-X_{\rm S})^2+(Y_m-Y_{\rm S})^2-\nu_m\over{\nu_m^2}}\cdot$ (84)

In order to represent the second order derivatives in the separated formula block (93) concisely, we use a yet shorter notation with $V_m,\nu_m$ as before and
 
    $\displaystyle p_m:=p_m(X_{\rm S},Y_{\rm S},\nu_{\rm S}),$  
    $\displaystyle \lambda_m:=\lambda_{{\rm B},m}+\lambda_{\rm S} V_m p_m(X_{\rm S},Y_{\rm S},\nu_{\rm S}),$  
    $\displaystyle r_m^2:=(X_m-X_{\rm S})^2+(Y_m-Y_{\rm S})^2.$ (85)

In this notation we have for the 10 different elements $H_{j,k},\ j=1,2,3,4,\ j\le k$, of the symmetric Hessian of L the expressions in (93). Later in Sect. 6, formula (108) we shall need also the higher partial derivatives $\partial ^k L/\partial \lambda_{\rm S}^k,\ k>2$. From the entry for $\partial ^2 L/\partial \lambda_{\rm S}^2$ in the eighth line of (93), it follows in the notation of (85),

 \begin{displaymath}{\partial ^k L\over{\partial \lambda_{\rm S}^k}}:=(-1)^{k-1}(...
...{m=1}^M \left ({V_m p_m\over{\lambda_m}}\right )^k,\qquad k>1.
\end{displaymath} (86)

  
6 Statistical errors for the source parameter estimates

Two questions are fundamental for the physical interpretation of the source parameter estimates:

 
  $\textstyle \hbox{(a)}\quad$ $\displaystyle \hbox{Does the count enhancement really stem from}$  
    $\displaystyle \hbox{a celestial X-ray source or could it stem from}$  
    $\displaystyle \hbox{a haphazard count fluctuation?}$  
  $\textstyle \hbox{(b)}\quad$ $\displaystyle \hbox{When (a) is answered in favour of a real source}$  
    $\displaystyle \hbox{and its source parameters are estimated: how}$  
    $\displaystyle \hbox{large are the error bars?}$ (87)

In a statistical setting, (a) from (87) can be formulated as a pair of exclusive hypotheses in our adopted source model:
 
  $\textstyle \hbox{$H_0$ :}\quad$ $\displaystyle \hbox{A celestial X-ray source caused the count}$  
    $\displaystyle \hbox{enhancement, $\lambda_{\rm S}>0$ ,}$  
  $\textstyle \hbox{$H_1$ :}\quad$ $\displaystyle \hbox{No celestial X-ray source caused the count}$  
    $\displaystyle \hbox{enhancement, $\lambda_{\rm S}=0$ .}$ (88)

In the form of (88), problem (a) of (87) turned into a problem of statistical hypothesis testing. Using the associated Likelihood Ratio (appearing in the form of the difference of two log-likelihoods), we wish to decide the alternative (88). Within this method we have to evaluate two log-likelihoods associated to $H_0,\,H_1$,
 
    $\displaystyle \hat L_0:=L\left (\hat X_{\rm S},\hat Y_{\rm S},\hat\lambda_{\rm S},\hat\nu_{\rm S}\right ),$  
    $\displaystyle \hat{\hat L}_1:=L\left (\hat{\hat X}_{\rm S},\hat{\hat Y}_{\rm S},0,\hat{\hat\nu}_{\rm S}\right ).$ (89)

In (89), $\hat{\vec{s}}:=\left (\hat X_{\rm S},\hat Y_{\rm S},\hat\lambda_{\rm S},\hat\nu_{\rm S}\right )$ is the unconditional maximizer as in (77) and $\hat{\hat{\vec{s}}}:=\left (\hat{\hat X}_{\rm S},\hat{\hat Y}_{\rm S},0,\hat{\hat\nu}_{\rm S}\right )$ is the conditional maximizer over the restricted source space $\vec{S}_1:=\left \{\vec{s}\in\vec{S}:\ \lambda_{\rm S}=0\right \}$of H1. Clearly, $\hat L_0\ge\hat{\hat L}_1$. Hence, twice the log-likelihood difference, $\hat l$, is nonnegative,

 \begin{displaymath}\hat l:=2\cdot \left (\hat L_0-\hat{\hat L}_1\right )\ge0.
\end{displaymath} (90)

With the factor 2 in (90), the likelihood statistics l follows asymptotically a standard distribution, see (94). In terms of the parent statistics l, the number $\hat l_{\rm S}:=\hat l/2$ is the likelihood of source existence of the sample returned by MLE and mentioned in many publications on ROSAT data.

Let $\alpha\in(0,1)$ be the requested significance level of the test and let gM(l) be the sample frequency function of the test statistics l from (90). The $\alpha$-quantile, $l_M(\alpha)$, is the solution of the equation for $l_M(\alpha)$

 \begin{displaymath}\int_0^{l_M(\alpha)} g_M\left (l\right ){\rm d}l=\alpha.
\end{displaymath} (91)

With $l_\alpha:=l_M(\alpha)$ the decision rule for the alternative (88) is in terms of l from (90) and (89)

 \begin{displaymath}\hat l\left\{\begin{array}{*2{l}} > l_\alpha, &\hbox{$H_0$\sp...
...l_\alpha, &\hbox{$H_0$\space is rejected}.
\end{array}\right.
\end{displaymath} (92)


 
    $\displaystyle {\partial ^2 L\over{\partial X_{\rm S}^2}} :=2\lambda_{\rm S} \su...
...ft [{2\lambda_{{\rm B},m}\over\nu_m\lambda_m}\cdot (X_m-X_{\rm S})^2-1\right ],$  
    $\displaystyle {\partial ^2 L\over{\partial X_{\rm S}\partial Y_{\rm S} }}:=4\la...
...rm B},m} V_m p_m\over{\nu_m^2\lambda_m^2}}\cdot (X_m-X_{\rm S})(Y_m-Y_{\rm S}),$  
    $\displaystyle {\partial ^2 L\over{\partial X_{\rm S}\partial \lambda_{\rm S} }}...
...1}^M {\lambda_{{\rm B},m} V_m p_m\over{\nu_m\lambda_m^2}}\cdot (X_m-X_{\rm S}),$  
    $\displaystyle {\partial ^2 L\over{\partial X_{\rm S}\partial \nu_{\rm S} }}:=2\lambda_{\rm S}\sum_{m=1}^M {V_m p_m\over{\nu_m^2\lambda_m}}\cdot (X_m-X_{\rm S})$  
    $\displaystyle ~~~~~~~~~~~~~~~\times \left [{\lambda_{{\rm B},m}(r_m^2-\nu_m)\over{\nu_m\lambda_m}}-1\right ],$  
    $\displaystyle {\partial ^2 L\over{\partial Y_{\rm S}^2}} :=2\lambda_{\rm S} \su...
...ft [{2\lambda_{{\rm B},m}\over\nu_m\lambda_m}\cdot (Y_m-Y_{\rm S})^2-1\right ],$  
    $\displaystyle {\partial ^2 L\over{\partial Y_{\rm S}\partial \lambda_{\rm S} }}...
...1}^M {\lambda_{{\rm B},m} V_m p_m\over{\nu_m\lambda_m^2}}\cdot (Y_m-Y_{\rm S}),$  
    $\displaystyle {\partial ^2 L\over{\partial Y_{\rm S}\partial \nu_{\rm S} }}:=2\lambda_{\rm S} \sum_{m=1}^M {V_m p_m\over{\nu_m^2\lambda_m}}\cdot (Y_m-Y_{\rm S})$  
    $\displaystyle ~~~~~~~~~~~~~~\times \left [{\lambda_{{\rm B},m}(r_m^2-\nu_m)\over{\nu_m\lambda_m}}-1\right ],$  
    $\displaystyle {\partial ^2 L\over{\partial \lambda_{\rm S}^2}} :=-\sum_{m=1}^M \left ({V_m p_m\over{\lambda_m}}\right )^2,$  
    $\displaystyle {\partial ^2 L\over{\partial \lambda_{\rm S}\partial \nu_{\rm S} ...
..._{{\rm B},m} V_m p_m\over{\nu_m^2\lambda_m^2}}\cdot \left (r_m^2-\nu_m\right ),$  
    $\displaystyle {\partial ^2 L\over{\partial \nu_{\rm S}^2}}:=\lambda_{\rm S}\sum_{m=1}^M {V_m p_m\over{\nu_m^4\lambda_m}}$  
    $\displaystyle ~~~~~~~~~~ \times \left [{\lambda_{{\rm B},m}\over\lambda_m}\left (r_m-\nu_m\right )^2-\left (2r_m^2-\nu_m\right )\nu_m\right ].$ (93)

After (92), the tasks are (a) the determination of the sample frequency distribution, gM, of l and (b) the solution of (91) for $l_\alpha$ when the sample size M is given. The latter poses no numerical difficulties since the integrand gM is positive.

To avoid the appearance of the sample frequeny function from (91), we resort again to large sample sizes.

According to Doob (1934), Wilks (1938), for sample size $M\to\infty$ the distribution gM(l) tends to the $\chi^2_\nu$ distribution with $\nu>0$ degrees of freedom with $t\in[0,\infty)$

 \begin{displaymath}g_M(t){\rm d}t\to{1\over \Gamma(\nu/2)}{\rm e}^{-t/2}\left ({...
...nu/2-1}{\rm d}\left ({t\over2}\right )=:\chi^2_\nu(t){\rm d}t.
\end{displaymath} (94)

The number of restrictions in H1 equals $\nu=1$ in (94). Given l by (90), for the large sample sizes, the significance level

 \begin{displaymath}\alpha(l):=\int_0^{l}\chi^2_1(t){\rm d}t
\end{displaymath} (95)

serves equally well to judge the likelihood of source existence. It should always be borne in mind that a large sample size is presupposed when replacing gM(t) by $\chi^2_1(t)$.

More delicate are the error bars from (b) in (87). The defining maximum relation (77) for the ML source estimate vector $\hat{\vec{s}}$ defines a random vector when the photon sample ${\cal P}$ is viewed as a set of random vectors. As such, $\vec{s}$ possesses a sampling distribution, say $\vec{P}_{\vec{S}}^{(M)}$ given by (91) with l as upper integration limit. Then, one can calculate the mean vector $\bar{\vec{s}}$, its sample counterpart $\hat{\bar{\vec{s}}}$, and the elements Cj,k of the covariance matrix ${\bf Cov}[\vec{s}]$,

 
    $\displaystyle C_{j,k}:=\vec{E}_{\vec{P}_{\vec{S}}^M}[(s_j-\bar s_j)(s_k-\bar s_k)],\quad j,k=1,2,3,4,$  
    $\displaystyle \bar s_j:=\vec{E}_{\vec{P}_{\vec{S}}^M}[s_j].$ (96)

The subscript $\vec{P}_{\vec{S}}^M$ in (96) means: expectation $\vec{E}$ is to be taken with respect to the distribution $\vec{P}_{\vec{S}}^{(M)}$. The $\bar s_j,C_{j,k}$ are assumed to exist for all sample sizes M>1. If the distribution $\vec{P}_{\vec{S}}^{(M)}$ is of the type of a normal distribution, i.e. at least unimodal, and the statistical interdependence among the components of $\vec{s}$ is low, it is justified to define the quantities

 \begin{displaymath}\sigma_k:=\sqrt{{\rm\bf Cov}[\vec{s}]_{k,k}},\qquad k=1,2,3,4,
\end{displaymath} (97)

as standard errors. Unfortunately, the sample distribution $\vec{P}_{\vec{S}}^{(M)}$ is analytically nearly inaccessible. Also, our log-likelihood function $L\left (X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}\right )$ does not allow for a set of four jointly sufficient statistics for the four source parameters. The distributions in the "sufficiency'' class are explicitly known, see Stuart & Ord (1991), and our L lies outside. In this situation, other possibilities to define statistical errors are to be drawn into consideration.

For large sample sizes, see Wald (1943), the distribution $\vec{P}_{\vec{S}}^{(M)}$ tends to a multivariate normal distribution with zero mean and covariance matrix with elements according to (96) at $\vec{s}=\hat{\vec{s}}$,

 
    $\displaystyle {\rm {\bf C}{\bf o}{\bf v}}_{j,k}[\hat{\vec{s}}]:=-\vec{E}\left [...
...ver{\partial s_j}}\cdot {\partial L(\hat{\vec{s}})\over{\partial s_k}} \right ]$  
    $\displaystyle ~~~~~~~ =\int_{\vec{R}^2}{\rm e}^{L\left (X,Y\,\vert\,\hat X_{\rm...
...\partial s_j}}{\partial L(\hat{\vec{s}})\over{\partial s_k}}\,{\rm d}X{\rm d}Y.$ (98)

To circumvent the integration in (98), we replace the parent product moments Cj,k by the sample counterparts and obtain
 
    $\displaystyle \hat C_{j,k}:={\rm\bf Cov}_{j,k}[\hat{\vec{s}}]:={1\over M}\sum_{m=1}^M {\rm e}^{L\left (X_k,Y_k\,\vert\,\hat{\vec{s}}\right )}$  
    $\displaystyle ~~~~~~~~~ \times{\partial L(X_k,Y_k\,\vert\,\hat{\vec{s}})\over{\partial s_j}}\cdot {\partial L(X_k,Y_k\,\vert\,\hat{\vec{s}})\over{\partial s_k}},$  
    $\displaystyle \hat\sigma_k:=\sqrt{\hat{\rm\bf Cov}_{k,k}[\hat{\vec{s}}]}.$ (99)

In EXSAS, a less costly approximation is accepted. For a large number of source photons, the frequency distribution ${\rm e}^L$ in (98) tends to a Dirac distribution supported at the ML estimate $\hat{\vec{s}}$. Thus, the approximation $\vec{E}[H]\approx H(\hat s)$ of the expected matrix by the information matrix $H(\hat s)$ lies near and is used.

Let $c\ne0$ be a nonvanishing (column) p-vector and $H=H^{\rm T}>0$ a real, symmetric, positive definite, simple $p\times p$ matrix. Then for any positive number h>0 the following equality holds:

 \begin{displaymath}m(c):=\max_{h=x^{\rm T}Hx}c^{\rm T}x=\sqrt{h\cdot c^{\rm T}H^{-1}c}.
\end{displaymath} (100)

To calculate the positive number $c^{\rm T}H^{-1}c$, somewhat less than the inverse H-1 needs to be known. Let H be given through its diagonalized form $\Lambda:={\rm diag}(\lambda_1,\,...,\lambda_p)$ by

 \begin{displaymath}H={E}^{\rm T}\Lambda E,\quad {E}^{\rm T} E=I,\quad E:=(e_1,\,...,e_p).
\end{displaymath} (101)

In terms of the eigenvectors e1 to ep and positive eigenvalues[*] $\lambda_1$ to $\lambda_p$ of H, the maximum m(c) receives the representation

 \begin{displaymath}m(c)=\sqrt{{h\cdot (Ec)^{\rm T}\Lambda^{-1}(Ec)}}.
\end{displaymath} (102)

We wish to find the smallest box with bounding hyperplanes perpendicular to the axes in x-space which contains the ellipsoid $h=x^{\rm T} Hx$. In other words, the distances of the said hyperplanes from the origin are the quantities sought for. To find them, we choose for $c=\pm\delta_k:=\pm(0,\,...,0,1,0,\,...,0)$the kth standard base vector, $k=1,\,...,p$, and obtain (102) upon the replacement of h by the $\alpha$-quantile, $\chi^2_{1,\alpha}$, of the $\chi^2$ distribution with one degree of freedom as

 \begin{displaymath}\hat\sigma_k(\alpha):=m(\pm\delta_k)=\sqrt{{\chi^2_{1,\alpha}...
...over\lambda_1}+...+{{\rm e}^2_{k,p}\over\lambda_p} \right )}},
\end{displaymath} (103)

where $\alpha\in(0,1)$ is the used significance level[*].

In EXSAS, the $1\sigma$-error bars are calculated. As usual, $1\sigma$ codifies the significance level $\alpha=0.682...$ This level corresponds exactly to the quantile value $\chi^2_{1,\alpha}=1$. The statistical interpretation of the EXSAS error bars, $\hat\sigma_k,\,k=1,2,3,4$, for the unknown true point source parameter sk is then: in $68.3\%$ of all MLEs, the containment relation $s_k\in[\hat s_k-\hat\sigma_k,\hat s_k+\hat\sigma_k]$ holds good - under the caveat of a large number of source photons.

  
6.1 Upper confidence limits for the source count rate

All X-ray sources are variable on source-specific time scales. A certain source found in one observation may not show up in another observation. The task is then to prove quantitatively the non-detectability of a source in a given direction. We wish to assert: based on the observation, with a selectable probability $\beta$, a hypothetical source cannot have a count rate $\lambda_{\rm S}$ exceeding an upper limit count rate $\hat\lambda_{{\rm S},\beta}^+$.

Assume a (hypothetical) source position $(X_{\rm S}^0,Y_{\rm S}^0)$ with source extent parameter $\nu_{\rm S}^0$ is given. For such a source with these parameters and an unknown source count rate $\lambda_{\rm S}$, EXSAS offers the (approximate) calculation of the upper limit of a confidence interval, $[\hat\lambda_{{\rm S},\beta}^-,\hat\lambda_{{\rm S},\beta}^+]$, containing the unknown true source count rate parameter $\lambda_{\rm S}$ with a selectable confidence level, $\beta\in(0,1)$, see the EXSAS command (125). The statistical interpretation is: the unknown true source count rate $\lambda_{\rm S}$ obeys in the fraction $\beta$ of all MLEs the inequality

 
    $\displaystyle \hat\lambda_{{\rm S},\beta}^-\le\lambda_{\rm S}\le\hat\lambda_{{\rm S},\beta}^+,$  
    $\displaystyle \hat\lambda_{{\rm S},\beta}^+:=\hat{\cal L}^{-1}_+\left [\hat{\cal L}\left (\hat\lambda_{\rm S}\right )-\chi_{1,\beta}^2/2\right ],$  
    $\displaystyle \hat\lambda_{{\rm S},\beta}^-:=\hat{\cal L}^{-1}_-\left [\max\lef...
...{\cal L}\left (\hat\lambda_{\rm S}\right )-\chi_{1,\beta}^2/2\right \}\right ],$  
    $\displaystyle \hat{\cal L}(\lambda_{\rm S}):=L(X_{\rm S}^0,Y_{\rm S}^0,\lambda_{\rm S},\nu_{\rm S}^0).$ (104)

Inspection of $\partial L/\partial \lambda_{\rm S}$ in (84) and $\partial ^2 L/\partial \lambda_{\rm S}^2$ in (108) shows that $\hat{\cal L}(\lambda_{\rm S})$ is a concave function with a unique maximizer $\bar\lambda_{\rm S}(=\hat\lambda_{\rm S})$, see Lemma 1 in Sect. 7 for details. Thus the inverse function $\hat{\cal L}^{-1}$ has for $\hat{\cal L}'(0)>0$ a lower (= weaker) branch $\hat{\cal L}^{-1}_-$ with domain of definition $[\hat{\cal L}(0),\hat{\cal L}(\bar\lambda_{\rm S})]$ and an upper (= stronger) count rate branch $\hat{\cal L}^{-1}_+$ with domain $(-\infty,\hat{\cal L}(\bar\lambda_{\rm S})]$. The upper confidence limit $\hat\lambda_{{\rm S},\beta}^+$ exists for all significance levels $\beta$ in contrast to the lower limit. There is a critical confidence level $\hat\beta_{\rm c}$ defined by the equation $\chi_{1,\beta}^2=\hat{\cal L}(\hat\lambda_{\rm S})-\hat{\cal L}(0)$ for $\beta$ such that the lower limit becomes trivial, $\hat\lambda_{{\rm S},\beta}^-=0$, for $\beta\ge\hat\beta_{\rm c}$. The achievable confidence level at the trivial lower bound, $\hat \beta_0$, stays then below the requested one, $\hat\beta_0<\beta$.

The subject of estimator variances in astronomical context is a recurrent theme. The works of Avni (1976), Cash (1978), Strong (1985) belong to a larger body of literature.

  
7 Numerical solution of the ML equations

Given an initial estimate, $\hat{\vec{s}}^{(0)}$, for the source parameter vector $\vec{s}$, we wish to set up a numerical iterative algorithm that generates a sequence of refined estimates

 \begin{displaymath}(\hat{\vec{s}}^{(k)})_{k\in\vec{N}_0}:=\left (\hat X_{\rm S}^...
...da_{\rm S}^{(k)},\hat\nu_{\rm S}^{(k)}\right )_{k\in\vec{N}_0}
\end{displaymath} (105)

such that the sequence (105) has a limit, $\hat{\vec{s}}^{(*)}$, for $k\to\infty$ and the necessary maximum conditions (79) are satisfied in the limit for $\vec{s}=\hat{\vec{s}}^{(*)}$. If the initial estimate $\hat{\vec{s}}^{(0)}$ is near the true maximizer of L, the convergence of (105) to the latter is likely.

The first step towards this goal consists in writing three of the four equations in (79) in fixed-point form. The third equation thereof, $\partial L/\partial \lambda_{\rm S}=0$, the local mean count rate equation, will be treated differently. As mentioned in Sect. 5.1, $\partial L(0)/\partial \lambda_{\rm S}\le0$ entails the vanishing maximizer $\hat\Lambda_{\rm S}=0$. In the remaining case $\partial L(0)/\partial \lambda_{\rm S}\>0$ a positive local mean count rate $\lambda_{\rm S}>0$ maximizes. In this case, we obtain from (79) the system of four equations, three of them in fixed-point form,

 
    $\displaystyle X_{\rm S}=X\left (X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}\right ),$  
    $\displaystyle Y_{\rm S}=Y\left (X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}\right ),$  
    $\displaystyle \tilde T=T\left (X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}\right ),$  
    $\displaystyle \nu_{\rm S}=N\left (X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}\right ),$  
    $\displaystyle X\left (X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}\right ):=...
... (X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}\right )\cdot {1\over\nu_m}}},$  
    $\displaystyle Y\left (X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}\right ):=...
... (X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}\right )\cdot {1\over\nu_m}}},$  
    $\displaystyle T\left (X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}\right ):=\sum_{m=1}^M W_m\left (X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}\right ),$  
    $\displaystyle N\left (X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}\right ):=...
...X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}\right )\cdot {1\over\nu_m^2}}},$  
    $\displaystyle q_m:=q_m(X_{\rm S},Y_{\rm S}):={{(X_{\rm S}-X_m)^2+(Y_{\rm S}-Y_m)^2-\nu_m^0}\over{\nu_m^2}},$  
    $\displaystyle W_m\left (X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}\right )...
...ver{\lambda_{{\rm B},m}+\lambda_{\rm S} p_m(X_{\rm S},Y_{\rm S},\nu_{\rm S})}},$  
    $\displaystyle \nu_m^0:=2\sigma^2\left [A_m,\epsilon(x_m,y_m)\right ].$ (106)

The third equation in (106), the local mean count rate equation, is not in fixed-point form. Fortunately, the function $T(X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S})$ is the simplest of all right-hand sides in (106). We shall benefit from the fact that T is a convex decreasing function in $\lambda_{\rm S}$ as inspection of  (106) and (86) for k=3 reveals.

The local mean count rate equation will be treated separately. Given $X_{\rm S},Y_{\rm S},\nu_{\rm S}$, we consider $\tilde T=T(X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S})$ as an equation for $\lambda_{\rm S}$ alone and denote by

 \begin{displaymath}\lambda_{\rm S}:=\lambda_{\rm S}\left (X_{\rm S},Y_{\rm S},\nu_{\rm S},\tilde T\right )
\end{displaymath} (107)

its solution. A result on solution existence and uniqueness of the count rate $\lambda_{\rm S}$ will be stated below in Lemma 1. We begin with some preparations.

The monotonicity property

 \begin{displaymath}(-1)^k\cdot {\partial ^k T\left (X_{\rm S},Y_{\rm S},\lambda_...
...\partial \l ^k}>0,\quad k\in{\bf N},\quad \lambda_{\rm S}\ge0.
\end{displaymath} (108)

derives from (86). The monotony relations (108) permit us to construct a lower bound , $\underline T$, and an upper bound, $\bar T$, to T. Let $\lambda_{\rm S}^0\ge0$ be any fixed local mean source count rate. Under this choice of $\lambda_{\rm S}^0$ holds
 
$\displaystyle \underline T(X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S})$ < $\displaystyle T(X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}),$  
$\displaystyle T(X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S})$ < $\displaystyle \bar T(X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}),$  
$\displaystyle \bar T(X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S})$ := $\displaystyle \min\big\{\bar T_1\left (X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}\right ),$  
  $\textstyle \phantom{:=}$ $\displaystyle \bar T_2\left (X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}\right )\big\},$  
$\displaystyle \bar T_1(X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S})$ := $\displaystyle \sum_{m=1}^M W_m(X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S})\Big\vert _{\lambda_{{\rm B},m}=0}$  
  = $\displaystyle :{W(X_{\rm S},Y_{\rm S},\nu_{\rm S})\over{\lambda_{\rm S}}},$  
$\displaystyle \bar T_2(X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S})$ := $\displaystyle T\left (X_{\rm S},Y_{\rm S},\lambda_{\rm S}^0,\nu_{\rm S}\right )$  
  $\textstyle \phantom{:=}$ $\displaystyle +{\partial T\left (X_{\rm S},Y_{\rm S},\lambda_{\rm S}^0,\nu_{\rm...
...over{\partial \lambda_{\rm S}}}\left (\lambda_{\rm S}-\lambda_{\rm S}^0\right )$  
  $\textstyle \phantom{:=}$ $\displaystyle +{1\over2}{\partial ^2 T(X_{\rm S},Y_{\rm S},\lambda_{\rm S}^0,\nu_{\rm S})\over{\partial \lambda_{\rm S}^2}}(\lambda_{\rm S}-\lambda_{\rm S}^0)^2,$  
$\displaystyle \underline T\left (X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S}\right )$ := $\displaystyle T\left (X_{\rm S},Y_{\rm S},\lambda_{\rm S}^0,\nu_{\rm S}\right )$  
  $\textstyle \phantom{:=}$ $\displaystyle +{\partial T\left (X_{\rm S},Y_{\rm S},\lambda_{\rm S}^0,
\nu_{\r...
...er{\partial \lambda_{\rm S}}} \left (\lambda_{\rm S}-\lambda_{\rm S}^0\right ).$ (109)

The functions $\underline T$ and $\bar T_2$ can be formed for all $\lambda_{\rm S}^0\ge0$. We choose $\lambda_{\rm S}^0:=0$. Inverting the inequalities $\underline T\le\tilde T\le\bar T$ for $\lambda_{\rm S}$ yields the following result concerning the local mean source count rate (MCR) equation.

Lemma 1 (MCR existence, uniqueness, enclosure)

The local mean count rate equation

 \begin{displaymath}\tilde T=T(X_{\rm S},Y_{\rm S},\lambda_{\rm S},\nu_{\rm S})
\end{displaymath} (110)

has a unique solution $\lambda_{\rm S}:=\lambda_{\rm S}(X_{\rm S},Y_{\rm S},\nu_{\rm S},\tilde T)$ iff

 \begin{displaymath}0<\tilde T\le T\left (X_{\rm S},Y_{\rm S},0,\nu_{\rm S}\right )=:\bar T\left (X_{\rm S},Y_{\rm S},\nu_{\rm S}\right ).
\end{displaymath} (111)

The solution $\lambda_{\rm S}$ is enclosed by

 \begin{displaymath}\underline \lambda_{\rm S}(X_{\rm S},Y_{\rm S},\nu_{\rm S})\l...
...{\rm S}\le\bar\lambda_{\rm S}(X_{\rm S},Y_{\rm S},\nu_{\rm S})
\end{displaymath} (112)

where with T from (106) and W from (109)
 
    $\displaystyle \underline \lambda_{\rm S}(X_{\rm S},Y_{\rm S},\nu_{\rm S},\tilde T):=(\tilde T-T_0)/T_1 ,$  
    $\displaystyle \bar\lambda_{\rm S}(X_{\rm S},Y_{\rm S},\nu_{\rm S},\tilde T):=\min\Big\{\bar\lambda_1(X_{\rm S},Y_{\rm S},\nu_{\rm S},\tilde T),$  
    $\displaystyle ~~~~~~~~~~~~~~~~~~~~~~~~~ \bar\lambda_2(X_{\rm S},Y_{\rm S},\nu_{\rm S},\tilde T)\Big\},$  
    $\displaystyle \bar\lambda_1(X_{\rm S},Y_{\rm S},\nu_{\rm S},\tilde T):={W(X_{\rm S},Y_{\rm S},\nu_{\rm S})\over{\tilde T}},$  
    $\displaystyle \bar\lambda_2(X_{\rm S},Y_{\rm S},\nu_{\rm S},\tilde T):=\left\{\...
...e T\le\tilde T\le\bar T,\\
+\infty, &\tilde T<\underline T,
\end{array}\right.$  
    $\displaystyle \tau:=-T_1-\sqrt{T_1^2-2T_2(\tilde T-T_0)},$  
    $\displaystyle T_0:=T(X_{\rm S},Y_{\rm S},0,\nu_{\rm S}),$  
    $\displaystyle T_1:={\partial T(X_{\rm S},Y_{\rm S},0,\nu_{\rm S})\over{\partial \lambda_{\rm S}}},$  
    $\displaystyle T_2:={\partial ^2 T(X_{\rm S},Y_{\rm S},0,\nu_{\rm S})\over{\partial \lambda_{\rm S}^2}},$  
    $\displaystyle \underline T:=T_0-{T_1^2\over{2T_2}}.$ (113)

The upper bound $\bar\lambda_{\rm S}$ stays always finite (and may have a jump).

Once an enclosing interval $[\underline \lambda_{\rm S},\bar\lambda_{\rm S}]$ for the source count rate $\lambda_{\rm S}$ is available, the length of the enclosing interval can be shrunk to any given length $\epsilon_{\lambda_{\rm S}}>0$ by bisection or otherwise. Using $\hat X_{\rm S}^{(0)},\hat Y_{\rm S}^{(0)},\hat\nu_{\rm S}^{(0)}$ in the third equation, $\tilde T=T$, of (106), its approximative solution to precision $\epsilon_{\lambda_0}$ is denoted by $\hat\lambda_{\rm S}^{(0)}$. Now, given an initial estimate $\hat{\vec{s}}^{(0)}$ formed with that $\hat\lambda_{\rm S}^{(0)}$, its successor estimate $\hat{\vec{s}}^{(k+1)}$ is calculated by performing the iteration step from the kth iterate to the (k+1)th,

 
    $\displaystyle X_{\rm S}^{(k+1)}:=X\left (X_{\rm S}^{(k)},Y_{\rm S}^{(k)},\lambda_{\rm S}^{(k)},\nu_{\rm S}^{(k)}\right ),$  
    $\displaystyle Y_{\rm S}^{(k+1)}:=Y\left (X_{\rm S}^{(k+1)},Y_{\rm S}^{(k)},\lambda_{\rm S}^{(k)},\nu_{\rm S}^{(k)}\right ),$  
    $\displaystyle \nu_{\rm S}^{(k+1)}:=N\left (X_{\rm S}^{(k+1)},Y_{\rm S}^{(k+1)},\lambda_{\rm S}^{(k)},\nu_{\rm S}^{(k)}\right ),$  
    $\displaystyle \tilde T :=T\left (X_{\rm S}^{(k+1)},Y_{\rm S}^{(k+1)},\lambda_{\rm S}^{(k+1)},\nu_{\rm S}^{(k+1)}\right ).$ (114)

The last equation in (114) defines $\lambda_{\rm S}^{(k+1)}$ implicitly and it is solved iteratively as described above. With appropriately chosen tolerances for position and extend components,

 \begin{displaymath}\epsilon_{X_{\rm S}}>0,\quad\epsilon_{Y_{\rm S}}>0,\quad\epsilon_{\nu_{\rm S}}>0,
\end{displaymath} (115)

the iteration step (114) is carried out as long as the termination criterion,
 
    $\displaystyle \left \vert X_{\rm S}^{(k+1)}-X_{\rm S}^{(k)}\right \vert<\epsilon_{X_{\rm S}},$  
    $\displaystyle \left \vert Y_{\rm S}^{(k+1)}-Y_{\rm S}^{(k)}\right \vert<\epsilon_{Y_{\rm S}},$  
    $\displaystyle \left \vert\nu_{\rm S}^{(k+1)}-\nu_{\rm S}^{(k)}\right \vert<\epsilon_{\nu_{\rm S}},$ (116)

is not met. During the iteration the likelihood, L(k), belonging to the kth iteration step and the relative change in likelihood, $\delta L^{(k)}$,
 
    $\displaystyle L^{(k)}:=L\left (X_{\rm S}^{(k)},Y_{\rm S}^{(k)},\lambda_{\rm S}^{(k)},\nu_{\rm S}^{(k)}\right ),$  
    $\displaystyle \delta L^{(k)}:={{L^{(k)}-L^{(k-1)}}\over L^{(k)}}$ (117)

are superintended. With a threshold $\epsilon_L$ in the order of 10-5 the violation of

\begin{displaymath}\delta L^{(k)}>\epsilon_L
\end{displaymath} (118)

serves, in addition to (116), as a termination criterion for the iteration (114) in EXSAS.

Again, the success of the iteration hinges on the closeness of the initial estimate $\hat{\vec{s}}^{(0)}$ to the unknown true maximizer. Therefore, EXSAS uses an elaborate process to find positional components in the initial estimate.

  \begin{figure}
\par\includegraphics*[bb=105 315 500 725,angle=0,scale=0.60]{ml_fig_9.ps}\end{figure} Figure 9: $6.1^\circ \times 6.1^\circ $ region around the ecliptic north pole observed in the ROSAT all sky survey in the energy band [0.1,2.35] keV. The central exposure time is 15.5 ks
Open with DEXTER

  
8 Example of and tests on the MLE of point sources

In practice, MLE was tested innumerable times. The ROSAT All-Sky Survey (RASS) described by Voges (1992,1999) is a monumental instance. Also, several thousands of pointed ROSAT observations were processed by MLE as described, see for example the group of contributions prefaced by Trümper (1991b).

We shall illustrate the performance of the MLE twofold. Firstly, we apply MLE to a field with many point sources. Secondly, MLE was tested on simulated data. Simulated observations give the possibility to compare estimated parameter values with true values under the conditions of the implemented MLE (and test) software.

  
8.1 An example of the ML point source parameter estimation

Figure 9 shows a $6.1^\circ \times 6.1^\circ $ field around the ecliptic north pole observed during the ROSAT All Sky Survey in the energy band [0.1,2.35] keV. The exposure time of the image centre was 15.5 ks. Only the photons which fell into the inner 19 arcmin of the ROSAT PSPC field of view were selected resulting in the image Fig. 9. The darker strips emanating from the central yet darker region are caused by a varying exposure time due to the ROSAT survey attitude. Figure 10 shows the same region augmented by point source positions. The little dark circles indicate the 545 point source positions found by MLE.

  \begin{figure}
\par\includegraphics*[bb=105 315 500 725,angle=0,scale=0.60]{ml_fig_10.ps}\end{figure} Figure 10: The ecliptic north polar region as in Fig. 9. Circles indicate point source positions as found by MLE
Open with DEXTER

  
8.2 Testing the MLE of point source parameters on simulated data

Many tests with simulated observations were performed in order to study the performance behaviour of the implemented MLE software in the high-dimensional parameter space spanned by the source photon parameters and additional hidden internal parameters. Among the varied input quantities in the test series were: source strength, source offset, background strength, and cut-off radius. Several dozens of output quantities were collected and plotted as function of the input quantity. As an typical example, we report on the source position error as a function of the source strength.

Figure 11 shows the stochastic process of the horizontal positional errors $e_X:=\hat X-X_{\rm S}$ between the ML estimate $\hat X$ and the known true source position component $X_{\rm S}$ as a function of the source photon count number $\Lambda_{\rm S}\in[0,5000]$ as parameter. The plot is scaled such that all errors eX fall within the vertical plot range. Evidently, the sample standard deviation $\sigma_{e_X}(\Lambda_{\rm S})$ becomes smaller from left to right in Fig. 11, i.e. for increasing source photon numbers $\Lambda _{\rm S}$. This checks with the expected ${\cal O}(1/\sqrt{\Lambda_{\rm S}})$ behaviour for large number of photons $\Lambda_{\rm S}\to\infty$. Each of the 500 black dots means (a) the creation of a photon series ${\cal P}$ from (1) simulating a source with a random source position $(X_{\rm S},Y_{\rm S})$ which is (b) afterwards subjected to a MLE analysis carried out with the help of the EXSAS commands described in the Appendix. Judged by eye, no bias around the horizontal axis eX=0 seems to be present for source photons above $\Lambda_{\rm S}>100$. An analogous picture is expected for the vertical positional error $e_Y:=\hat Y-Y_{\rm S}$ in Fig. 12. Again, all errors eY fall within the vertical plot range which is dilated by a factor 2 compared to Fig. 11. This stretching is due to the randomly large errors eY for small number of source photons, cf. eY=0.45 image pixels for the upper left dot. A closer look reveals $\sigma_{e_X}(\Lambda_{\rm S})\approx\sigma_{e_Y}(\Lambda_{\rm S})$ as expected.

  \begin{figure}
\par\includegraphics*[angle=-90,width=7.8cm,clip]{ml_fig_11.ps}\end{figure} Figure 11: Horizontal positional errors $e_X:=\hat X-X_{\rm S}$ (in units of image pixels) as a function of the source photons $\Lambda _{\rm S}$
Open with DEXTER


  \begin{figure}
\par\includegraphics*[angle=-90,width=7.8cm,clip]{ml_fig_12.ps}\end{figure} Figure 12: Vertical positional errors $e_Y:=\hat Y-Y_{\rm S}$ (in units of image pixels) in dependence of the source photons $\Lambda _{\rm S}$
Open with DEXTER

  
9 Discussion

A similar sliding window technique as the one described in Sect. 4.1.2 has been used in the analysis of EINSTEIN data, see Harris & Irwin (1984).

Already in the title, the application of the MLE is restricted to single sources. Thus, pairs of poorly resolved sources, less frequent triples of neighbouring sources or, generally, clusters of several sources are not admissible. In this context, distances are measured in units of the PSF width. The source parameters of source clusters must be estimated jointly. In a different EXSAS application, MLE for source groups is possible under the command name FIT/MULTI_SOURCE on image data.

At the end of Sect. 3.1, the motivations for selecting MLE were named. Besides ML estimation, source parameter estimation based on maximum entropy or wavelets are known and in use - to name only some recent methods. For the application of wavelet decomposition to ROSAT data, see Lazzati et al. (1999). A comparison of different estimation procedures is desirable, a task on its own, and far outside the scope of this article. An estimation procedure that is superior in all important respects seems not to exist.

The comparison of the hypothetical count rate distribution is made in the space of the observational data. This is the conventional strategy to circumvent the (ill-posed) inverse problem commonly referred to as deconvolution. After having estimated source parameters $\hat X_{\rm S},\hat Y_{\rm S},\hat\lambda_{\rm S}^{\rm V},\hat\nu_{\rm S}$, the conversion to the source count rate $\lambda_{\rm S}$ and source flux is to be done. As this task is outside of MLE, details are not described in this work.

In Sect. 1, the Standard Analysis Software System (SASS) , cf. Voges et al. (1999, p. 223), was mentioned in passing. All ROSAT catalogues were produced with SASS. Meanwhile also a ROSAT all-sky survey of faint sources has been made public. For a list of all ROSAT catalogues see Voges et al. (2001). In relation to SASS, the EXSAS system is an extension of the former.

One implementation feature in EXSAS and SASS is not satisfactory. As mentioned, MLE on unbinned photon data is appropriate for weak sources and works down to, say, thrice as many photons per source as there are source parameters (i.e. 12 photons). On the other hand, the source parameter error bars need a large number of photons per source.

In practice no unrealistic error bar sizes were observed. Nevertheless, a better approximation to the small sample frequency distribution is a desideratum - a convergence proof for the estimate series (114) is another.

Error assignement in small sample situations is not a privilege of observational astronomy. Attempts to attack the problem are known. This is, however, not the place to review the current situation.

As remarked in its second paragraph, Sect. 8 provides only an illustration of performance and precision issues. A full account, however desirable, does not fit into the frame of this work.

Finally, we state two perturbation problems. The instrument's PSF in effect in observation is known to be different from the one used in data analysis, here via MLE. Also, the background component obtained from the estimation cannot be regarded the best possible. So, we have to raise the problem of stability of MLE against PSF variation and against background variation.

Let $\hat{\vec{s}}_j,\,j=1,2$, be the ML estimates of the source parameter vector with PSF1,PSF2 in effect, respectively. Does the convergence in probability $\hat{\vec{s}}_1\to\hat{\vec{s}}_2$ take place for $PSF_1\to PSF_2$? If the answer is affirmative, how fast is the convergence? Practicable measures of sensitivity are to be constructed. Replacing in the problem statement "PSF'' by "background'' yields the problem statement on background. Both affirmative answers lie at the heart of observational astronomy, known to be utterly succesful. But a formal problem solution tells you how precise PSF and background have to be modelled.

  
10 Conclusions

The MLE in SASS and EXSAS grew over the years towards a stable software system. Large quantities of ROSAT data were once or several times processed in the last decade with that software. The results can be found in many catalogues and in the literature. A detailed technical description has been given here. The likelihood functional for unbinned data was derived (almost) rigorously. The underlying assumptions were laid open (see below). Equally, the assumptions and approximations made for the EXSAS implementation were represented. An example acquaints the reader with a typical MLE application. The performance limits of the implemented MLE were shown by simulations.

For clarity, we repeat the conditions for the application of the likelihood function L from (31) or (35) for a sample of single photons as follows.

The sample must be drawn from a population which follows a Poisson distribution with respect to the parameters to be estimated. No assumption on the geometrical form of the sample in the associated photon space is made. Thus no restriction of the instrument's PSF is imposed. Equally, no (non-trivial) lower or upper bounds on the sample size and photons per source exist for MLE as method. However, the statistical quality of the parameter estimates increase with increasing source photon numbers. The observed photon positions are imprecise due to binning. In the strict sense, the resulting MLE estimates are thus at least imprecise by half a bin size (plus an implementation-dependent additive).

The MLE is qualified by method properties for high energy astronomy observations with their notorious small numbers of photons per source.

Acknowledgements
The authors would like to thank H.-U. Zimmermann for many constructive comments and for painstakingly reading an earlier version of the manuscript. The authors are also indebted to J. Englhauser and R. Supper for support. Special thanks go to J. Trümper. The constructive criticism of the Reviewer and the Deputy Editor led to many ameliorations.

  
11 Appendix: EXSAS aids for the ML source parameter estimation in ROSAT data

Some commands related to MLE of the source parameters shall be explained here. The command


 
DETECT/LOCAL (119)

comprises the operations described in Sect. 4.1.2. The input data consist of up to six photon count histograms ${\cal N}$ from (46) and corresponding exposure histograms ${\cal T}$. The output is the source lists ${\cal L}_1(l_0)$ from (59). File names and remaining parameters, among them the significance level, L0, and the dilatation exponent D are collected in a steering parameter file. Parameter files also control the other commands. The command


 \begin{displaymath}CREATE/BG\_IMAGE
\end{displaymath} (120)

produces the background count estimate $\hat\Lambda(X,Y)$ as described in Sect. 4.1.4 in histogram form ${\cal N}_{\rm B}$. The input consists of up to six histograms ${\cal N},{\cal T}$ and the allied source lists ${\cal L}_1(l_0)$. One of the parameters is the factor $\rho$ from (60). Having the background available, a modification of the command (119) called


 
DETECT/MAP (121)

can be executed. The input files are those of the command (119) supplemented by the background histograms ${\cal N}_{\rm B}$. The result files are the source lists ${\cal L}_2(L_0)$ from (63). The command


 \begin{displaymath}MERGE/SOURCE\_TABLES
\end{displaymath} (122)

carries out the union in (63). After the execution of the foregoing sequence of commands, all provisions are made to perform the MLE in narrow sense embodied in the command


 
DETECT/MAXLIK. (123)

The command


 
DETECT/SOURCES (124)

performs the sequence of the commands (119) to (123) in one command. This command is simpler in application but allows the user less control over parameter settings.

For sources with existence likelihoods $\hat l_{\rm S}$ below the user-selected likelihood threshold $l_{\rm S}^0$, upper confidence limits, $\hat\lambda_{{\rm S},\beta}^+$, on the source count rate $\lambda_{\rm S}$ for a selectable confidence level $\beta\in(0,1)$ are found with the command


 \begin{displaymath}COMPUTE/UPPER\_LIMITS.
\end{displaymath} (125)

The input consists of the photon sample ${\cal P}$ from (1), the background histogram ${\cal N}_{\rm B}$ from (62) and a list of positions $(X_{\rm S}^0,Y_{\rm S}^0)$, specified in equatorial coordinates, with or without the associated source extent parameters $\nu_{\rm S}^0$ according to the chosen options. The user may keep fixed separately: source position $(X_{\rm S}^0,Y_{\rm S}^0)$, source extent parameter $\nu_{\rm S}^0$ (and source count rate $\lambda_{\rm S}^0$ too). Two tables with source parameters form the output. One of them contains all sources, i.e. those with source existence likelihoods $\hat l_{\rm S}>0$. The strong sources defined by $\hat l_{\rm S}\ge l_{\rm S}^0$ are collected in a second table.

The command


 
SIMULATE/SOURCE (126)

simulates point sources. The underlying PSF density is a bivariate normal density with circular symmetry, see (64). With the output sample ${\cal P}$, the histogram ${\cal N}$ from (46) and an attitude table, the input to run the above commands is provided.

References

 


Copyright ESO 2001