A&A 420, 789-797 (2004)
DOI: 10.1051/0004-6361:20034220

Searching for periods in X-ray observations using Kuiper's test[*]

Application to the ROSAT PSPC archive

S. Paltani

Laboratoire d'Astrophysique de Marseille, BP 8, 13376 Marseille cedex 12, France

Received 21 August 2003 / Accepted 28 February 2004

Abstract
We use Kuiper's test to detect periodicities in X-ray and gamma-ray observations. Like Rayleigh's test, it uses the individual photon arrival times, and is therefore well suited to the analysis of faint sources. Our method makes it possible to take into account the discontinuities in the observation, and to completely get rid of the contamination that results from them. This makes it particularly adapted to the search of periods long compared to the total observation duration. We propose a semi-analytical approach to determine the effective number of trial frequencies when searching for unknown periods over a frequency range. This approach can be easily adapted to other tests. We show that, using Kuiper's test, we can recover periods in frequency domains where other tests are completely confused by contamination. We finally search the entire ROSAT Position-Sensitive Proportional Counter (PSPC) archive for long periods, and find 28 new periodic-source candidates.

Key words: methods: data analysis - methods: statistical - stars: pulsars: general - X-ray: stars

   
1 Introduction

Important efforts have been devoted to the search of periodic signals throughout the electromagnetic spectrum. Because of the idiosyncrasies of astrophysical observations, different methods must be used depending on the type of object and the wavelength range. Four test families seem to dominate the period-detection "market''. The calculation of the Fourier power spectrum density (e.g., Press et al. 1993) using a fast Fourier transform (FFT) is adapted to evenly spaced (or evenly binned) observations. The Lomb-Scargle periodogram (Lomb 1976; Scargle 1982; Horne & Baliunas 1986), a discrete Fourier transform method, can be used for unevenly-spaced flux measurements. Epoch folding (EF) (e.g., Leahy et al. 1983a) can be used in the same conditions or for individual photons, but requires a binning according to the phase. Rayleigh's test (e.g., Fisher 1993; Gibson et al. 1982) is particularly adapted for the analysis of individual photons.

Observations in the X- and gamma-rays usually have two important characteristics. First, independent, time-tagged photons are collected. A method requiring binning is therefore far from ideal, as it results in a loss of information. Furthermore, binning is prohibited for sources detected with very few photons; for EF for instance, the required assumption of Gaussian distribution in each bin is not satisfied in this case. Moreover, the necessary assumptions on the number and sizes of the bins lower the performance of the test (Schwarzenberg-Czerny 1999). Secondly, space observations are often interrupted by "bad time'' periods, where no data are received. Fourier-based methods and Rayleigh tests are seriously affected by this problem. In practice, it means that only periods short compared to the durations of uninterrupted observation can be investigated.

In this paper we present in detail Kuiper's test (Kuiper 1960). This test has been applied to the distribution of solar flares (Jetsu et al. 1997), and to the search for periodicities in Earth impacts (Jetsu 1997; Jetsu & Pelt 2000), but its unique suitability to X-ray and gamma-ray observations has been overlooked. Similarly to Rayleigh's test, it uses discrete events, and can be applied to very faint sources without any a priori assumption. Similarly to EF, it takes into account non-uniform coverage of the phase domain, and can therefore be used when searching for periods long compared to the total observation duration[*]. We study in detail the properties of Kuiper's test for period detection, and particularly its significance level. We concentrate on two important issues: the treatment of discontinuous observations, and the determination of the effective number of trial frequencies when searching for unknown periods. We finally apply the algorithm to the entire archive of the ROSAT Position-Sensitive Proportional Counter (PSPC) archive.

2 Kuiper's test

Kuiper's test (Kuiper 1960) is a variant of Kolmogorov-Smirnov's (KS) test (see Press et al. 1993, and Jetsu & Pelt 1996 for short introductions). Given a sample $\{x_i\}$, i=1,...,N, and a probability distribution $\varphi(x)$, $a\le x\le b$, the Kuiper statistic is defined by:

 \begin{displaymath}V^\Phi(\{x_i\})=\max_{a\le x\le b} \left( S^{\{x_i\}}(x)-\Phi...
...ht) +\max_{a\le x\le b} \left( \Phi(x)-S^{\{x_i\}}(x) \right),
\end{displaymath} (1)

where $\Phi(x)=\int_a^x~\varphi(y)~{{\rm d}}y$, and $S^{\{x_i\}}(x)=\char93 (x_i\le x)/N$ is the empirical cumulative distribution of the $\{x_i\}$, i=1,...,N sample ($\char93 (...)$meaning "number of ...''). Similarly to KS, the Kuiper statistic does not depend on the underlying distribution. The null hypothesis is that the $\{x_i\}$, i=1,...,N sample is an outcome of N draws from the $\varphi(x)$ distribution.

Kuiper's test can be readily transformed into a test of periodicity in a series of photons by phase-folding their arrival times $\{t_i\}$, i=1,...,N for a given test period P0=1/f0:

 \begin{displaymath}\psi_i(f_0)={\rm Frac}\left(\frac{t_i-t_0}{P_0}\right),~ i=1,...,N
\end{displaymath} (2)

where Frac(y) is the fractional part of y, and t0 an arbitrary time. In the absence of periodicity at frequency f0, the $\psi_i(f_0)$ phases are expected to be distributed uniformly. This can be tested using the Kuiper statistic $V^U(\{\psi_i(f_0)\})$, where $U(x)=x,~0\le x\le 1$ is the cumulative of a uniform distribution between 0 and 1. A very low probability is evidence that the phases are not uniformly distributed for this frequency, and indicates a periodicity (but see Sect. 5.2).

Contrarily to KS or EF, the Kuiper statistic is invariant under a shift of the origin for periodic distributions. As a result, $V^\Phi(\{\psi_i(f_0)\})$ is invariant under a shift in phase $\{\psi_i(f_0)\}\rightarrow \{\psi_i(f_0)+\psi_0~{\rm mod}~1\}$that would result from a different choice of t0.

   
2.1 Significance of the Kuiper statistic

Kuiper (1960) gave the following asymptotic expression for large N to calculate the probability of the Kuiper statistic V to be larger than a given value z under the null hypothesis:
 
$\displaystyle {\bf Prob}(V\ge z/\sqrt{N})=\sum_{m=1}^{\infty} 2(4m^2z^2-1){{\rm...
...{m=1}^{\infty}m^2(4m^2z^2-3){{\rm e}}^{-2m^2z^2}+O\left(\frac{1}{N}\right)\cdot$     (3)

This is the false positive probability (${\it FPP}$) of falsely rejecting the null hypothesis. This formula is systematically used, even though its validity for small N has not been tested (see, e.g., Jetsu & Pelt 1996). In Appendix A we show that the FPP is overestimated by a factor 3 for N=20 at the 10-7 level. For N<15, the probability is underestimated, which wrongly increases the rate of false positives by a factor 30 at the 10-7 level. Eq. (3) is therefore seriously wrong for small N.

Stephens (1965) gives two analytical formulae valid for the lower tail ( ${=}1-{\it FPP}$) of the Kuiper statistic distribution:

 \begin{displaymath}
{\bf Prob}(V\le z)= N! \left(z-\frac{1}{N}\right)^{N-1}, {\rm ~if~}\frac{1}{N}\le z\le \frac{2}{N},
\end{displaymath} (4)

( $\frac{1}{N}$ is the minimum of the Kuiper statistic), and, if $\frac{2}{N}\le z\le \frac{3}{N}$:

 \begin{displaymath}{\bf Prob}(V\le z)=\frac{(N-1)!\left(\beta^{N-1}(1-\alpha)-\alpha^{N-1}(1-\beta)\right)}
{N^{N-2} (\beta-\alpha)}
\end{displaymath} (5)

with $\alpha$ and $\beta$ being the two solutions of the quadratic equation: $t^2-(Nz-1)t+\frac{1}{2}(Nz-2)^2=0$.

Stephens (1965) also gives an analytic formula for the ${\it FPP}$:

 \begin{displaymath}
{\bf Prob}(V\ge z)= \sum_{t=0}^M~\left(\begin{array}{c}N\\ t\end{array}\right)\left(1-z-\frac{t}{N}\right)^{N-t-1} T_t
\end{displaymath} (6)

with:

\begin{displaymath}T_t=y^{t-3}\left(y^3N-y^2t\frac{3-2/N}{N}-\frac{t(t-1)(t-2)}{N^2}\right),
\end{displaymath} (7)

where $y=z+\frac{t}{N}$, which is valid if $z\ge1/2$, if N is even, and if $z\ge (N-1)/(2N)$, if N is odd.

The domains of validity of the three exact equations are shown in Fig. 1, the asymptotic formula being used outside them. The validity condition of Eq. (6) is difficult to satisfy for large N. For N=100, the probability that $z\ge1/2$ is of the order of 10-21. For N=50, this probability is of the order of 10-10, making Eq. (6) useful even for intermediate-size samples. Equations (4) and (5) represent 40% of the cases for N=10, and only 1% for N=20.

  \begin{figure}
\par\resizebox{8.8cm}{!}{\includegraphics{0220fg01.eps}} %
\end{figure} Figure 1: Domains in ${\it FPP}$ vs. Sample Size covered by the four formulae. The labels indicate the number of the equation used to calculate the probability. The structures in the dark grey area are due to the different validity criteria for even and odd numbers in Eq. (6)
Open with DEXTER

Using the four equations, the ${\it FPP}$ is never underestimated. The only remaining discrepancy with the true distribution is in the region $N\sim 40{-}50$, where the probability is overestimated by a factor 1.5 at the 10-7 level.

2.2 Performance of Kuiper's test

Using extensive sets of simulations, we compare the performances of Kuiper's test with those of the more common Rayleigh test. We create simulated "observations'' of periodic sources for different count rates, different signal-to-noise ratios (S/N), and different signal shapes. The phase-folded light curve (hereafter simply "light curve'') is defined as the superimposition of a constant function (the "continuum'') and of the first half-period of a sine function (the "pulse''), covering a fraction w of the period. The S/N is defined as the ratio between the surfaces of the pulse and of the continuum. We draw events at random from the "pulse+continuum'' light curves until a given number of events has been collected in the pulse. For each set of parameters, 10 000 light curves are simulated. We then compare the average null-hypothesis probabilities of the two tests.

Figure 2 shows the results for three signal intensity cases: 20, 100, and 500 events in the pulse. In the three cases, Rayleigh's test is more efficient for $w\ge 3/4$, while Kuiper's performs better for $w\le 1/2$. In the situation most favorable to Rayleigh's test (i.e. w=1, 100 events in the pulse), the significance threshold (set arbitrarily to 10-4) is crossed with a S/N 2 times smaller with Rayleigh's test; this advantage decreases to 15% with w=3/4, and Rayleigh's test is about 30% less sensitive than Kuiper's with w=1/4. Kuiper's test has more difficulty with periodic signals presenting only weak modulations, but the decrease in performance is moderate. It is actually well known that Rayleigh's test is particularly sensitive in the case of broad peaks (Leahy et al. 1983b). On the other hand, some pulsars, in particular in the gamma-rays, have peaks much narrower than those simulated here (Kanbach 1998), in which case Kuiper's test can significantly outperform Rayleigh's.

   
3 Searching for periodicities with Kuiper's test

3.1 Frequency step

To search for periodicities, we can calculate the Kuiper statistic over a set of test frequencies. The Kuiper periodogram (or, more appropriately, "frequencygram'') is defined as:

\begin{displaymath}S(f)=\log_{10} {\bf Prob}(V>V^U(\{\psi_i(f)\})),~ f_1\le f\le f_2
\end{displaymath} (8)

where $V^U(\{\psi_i(f)\})$ is the Kuiper statistic calculated for a frequency f. The logarithm is applied to highlight the candidate periods. Given a periodic signal with a frequency f0, Kuiper's test may present harmonic and subharmonic peaks at frequencies $\ell\cdot
f_0$ and $f_0/\ell$ (plus their harmonics), $\ell$ being any small integer.


  \begin{figure}
\par\resizebox{8.8cm}{!}{\includegraphics{0220fg02.eps}} %
\end{figure} Figure 2: Sensitivities of Kuiper's and Rayleigh's tests as a function of the S/N between the pulse and the continuum. a) 20 counts in the pulse; b) 100 counts; c) 500 counts. In each graph, the curves are (from top to bottom) w=1, w=3/4, w=1/2, w=1/4. The w=3/4 and w=1/4 curves are highlighted in light grey for visual identification. The solid line is Kuiper's test; the dashed line Rayleigh's.
Open with DEXTER

To avoid missing significant peaks, S(f) must be calculated for frequencies sufficiently close to each other. Assuming a source emitting a photon every P0=1/f0 seconds, the phases of the first and last photons evaluated at a frequency $f_1=f_0+\Delta f$ close to f0 differ by $\Delta \varphi\simeq T\cdot\Delta f$, where T is the total duration of the observation. The coherence is preserved if:

 \begin{displaymath}
\Delta \varphi\ll 1 \Rightarrow \Delta f\ll 1/T.
\end{displaymath} (9)

Therefore S(f) must be calculated at equidistant frequencies, depending only on T. We define the oversampling parameter k:

 \begin{displaymath}
\Delta f = \frac{1}{k\cdot T}\cdot
\end{displaymath} (10)

Equation (9) becomes therefore: $k \gg 1$. If this inequality is not satisfied, significant peaks can be missed, or underestimated by sampling them too far from their central frequencies. On the other hand, the CPU time is proportional to k. Reasonable values of kare in the range 20-50 (but see Sect. 3.2).

   
3.2 Number of trials

${\bf Prob}(V>V^U(\{\psi_i(f)\}))$ is the probability that P=1/f is not a period of the source for a single draw of a Kuiper statistic. If S(f) is calculated for a set of frequencies fj, j=1,...,n, and assuming all the frequencies are independent, we have:

 \begin{displaymath}{\bf Prob}(\exists ~j ~\vert V_j>z, j=1,...,n)= 1-{\bf Prob}(V\le z)^n,
\end{displaymath} (11)

The above equation can be approximated by:

 \begin{displaymath}{\bf Prob}(\exists ~j ~\vert V_j>z, j=1,...,n)
\simeq n\cdot {\bf Prob}(V\ge z),
\end{displaymath} (12)

under the condition $n\cdot {\bf Prob}(V\ge z)\ll 1$. We can therefore correct our S(f) estimator for the number of trials:

 \begin{displaymath}\hat{S}(f) = S(f)+\log_{10} n.
\end{displaymath} (13)

As n is proportional to k, Eq. (13) may destroy the significance of some peaks if the large k's required to find the peaks are used. However, S(f) is strongly correlated on scales $\sim\Delta f$ and below, and we have ${\bf Prob}(V\ge z) \le {\bf
Prob}(\exists ~j ~\vert V_j>z, j=1,...,n)\le n\cdot {\bf
Prob}(V\ge z)$, the exact value being very difficult to calculate. This problem affects all period search algorithms, and has been addressed using extensive simulations for very specific cases (e.g., de Jager et al. 1988; Horne & Baliunas 1986; de Jager et al. 1989). We propose here a simple and workable semi-analytical method to completely correct for the choice of k.

We choose an arbitrary threshold V*, small enough so that $n\cdot
{\bf Prob}(V\ge V_*)\ll 1$. We then simulate m sets of random photons, and calculate $\max_{j=1,...,n} V^U({\psi_i(f_j)})$over all fj for all m simulations. The probability that, for a given simulation, $\max_{j=1,...,n} V^U({\psi_i(f_j)})>V_*$can now be estimated as $\char93 (\max_{j=1,...,n}
V^U({\psi_i(f_j)})>V_*)/m$. This is the left-hand side of Eq. (12), with z=V*. We can therefore estimate the effective number of frequencies, $n_{{\rm eff}}$:

 \begin{displaymath}n_{{\rm eff}}=\frac{\char93 (\max_{j=1,...,n} V^U({\psi_i(f_j)})>V_*)}{m\cdot {\bf Prob}(V\ge V_*)}
\end{displaymath} (14)

$n_{{\rm eff}}$ can be understood as the number of independent frequencies among the fj's. Approximating $\char93 (\max_{j=1,...,n}
V^U({\psi_i(f_j)})>V_*)$ with a Poisson distribution, the uncertainty on $n_{{\rm eff}}$ is:

 \begin{displaymath}\Delta n_{{\rm eff}}=\frac{\sqrt{\char93 (\max_{j=1,...,n} V^U({\psi_i(f_j)})>V_*)}}{m\cdot {\bf Prob}(V\ge V_*)}\cdot
\end{displaymath} (15)

The corrected periodogram is then deduced from Eq. (13):

 \begin{displaymath}\hat{S}(f)=S(f)+\log_{10}(n_{{\rm eff}})
\end{displaymath} (16)

Provided $\hat{S}(f)\ll 0$, $10^{\hat{S}(f)}$ is the probability that the source has no 1/f period, if n tests are performed. This method is quite general, and can be easily adapted to other statistical tests.

In principle, the correction factor $R=n_{{\rm eff}}/n$ can depend on k, the number of photons, the frequency range, the observation duration, and so on, which means that R should be estimated separately for all observations. As this is computationally expensive for large numbers of observations (see Sect. 5), we approximate R as a function of k only. Details are presented in Appendix B. In the limit $k \rightarrow 0$, Kuiper's tests are independent from each other, while in the limit $k \rightarrow \infty$, $n_{{\rm eff}}$ reaches a plateau. We can therefore write the approximation:

 \begin{displaymath}R(k)=\frac{1}{1+r_0\cdot k}\cdot
\end{displaymath} (17)

Figure 3 shows the correction factor R(k) for five sets of simulated observations with different number of photons and different GTIs (see Sect. 3.3). In all cases, 10 000 simulations have been made for each k, and we set V* so that $n\cdot {\bf Prob}(V\ge V_*)=0.1$, which seems sufficiently small. The behavior of R(k) follows quite well Eq. (17), but the curves do significantly differ from each other, albeit moderately. We adopt in the following a unique value r0=0.0815, which gives $n_{{\rm eff}}/n=0.38$ for k=20, or $n_{{\rm eff}}/n=0.197$ for k=50. This value corresponds to the upper envelope of the curves of Fig. 3.
  \begin{figure}
\par\resizebox{8.8cm}{!}{\includegraphics{0220fg03.eps}} %
\end{figure} Figure 3: Ratio $R(k)=n_{{\rm eff}}/n$ as a function of k for five different "observations'': two continuous (20 and 200 photons) ones, and three corresponding to obs. RP300093N00, RP300262N00, and RP700232N00 with 100, 1596, and 377 photons. The grey area shows the uncertainties for one of the five curves. The dashed line has a slope -1.
Open with DEXTER

   
3.3 Discontinuous observations

In high-energy observations, the photons are collected during limited periods of time called "good time intervals'' (GTIs). Their main effect is to make the cumulative distribution of the phases of the photons coming from a constant source depart from U(x)=x, because the phase intervals are not uniformly covered. This creates strong aliases in FFTs and Rayleigh's test; EF can take into account the actual exposure time of each phase bin, but with some limitations due to the binning.

Kuiper's test is similar to EF in spirit, and even allows a perfect correction for expected non-uniformity. Like KS's test, Kuiper's test is independent of the shape of the putative parent distribution. Thus we calculate exactly, for each frequency, the expected distribution $\xi(x)$ of the phases for a constant source. This can be done by folding the GTIs according to the period boundaries. $\xi(x)$ being piecewise constant, its cumulative $\Xi(x)=\int_0^x \xi(y)~{{\rm d}}y$ can be calculated exactly. $\Phi(x)$ in Eq. (1) is then replaced by $\Xi(x)$ to calculate the Kuiper statistic.

Figure 4 compares S(f) to $Z(f)=\log_{10} {\bf
Prob}(R>R_0(f))$, R0(f) being the Rayleigh statistic, in three different cases.

  \begin{figure}
\par\resizebox{8.8cm}{!}{\includegraphics{0220fg04.eps}} %
\end{figure} Figure 4: Effect of the GTIs on Rayleigh's (top) and Kuiper's (bottom) tests. a-b) Simulated 1000 s 1000-photon observation. c-d) 433-photon source in ROSAT obs.  RF500043A01 (8 GTIs). e-f) Simulated 300-photon periodic source with the GTIs of obs.  RP600121N00. The dashed line indicates the location of the true period P=104 s.
Open with DEXTER

The first case is a simulated 1000 s 1000-photon observation of a constant source. No significant peak is observed in Kuiper's test down to the absolute minimum frequency, $0.001~{\rm Hz}=1/1000$ s, while Rayleigh's produces several very significant spurious peaks. The second case is a real anonymous 433-photon source in ROSAT obs. RF500043A01, an observation consisting of 8 GTIs. Again, absolutely no significant peak is observed in Kuiper's test down to the absolute minimum frequency, while Rayleigh's produces many, very deep spurious peaks. The third case is a simulated 300-photon source with a period P=104 s with the GTIs of ROSAT obs. RP600121N00. The photons have been drawn from a "w=1/4'' light curve with 150 photons in the pulse. This observation totaled 44 733 s spread over 1 month in 40 separate GTIs. The longest GTI lasted 3118 s, 38 of the 40 GTIs lasting half an hour or less. The peak at f=10-4 Hz has comparable depth in both tests. However, because many contaminating peaks have an amplitude comparable to that of the true period, some even overwhelming it, it is impossible to retrieve the 104 s period using Rayleigh's test. In the Kuiper periodogram, the peak at f=10-4 Hz dominates all other peaks with a probability ratio larger than 20 000. Furthermore, the second and third peaks are located respectively at f/4 and 2 f, and are very probably aliases of the true frequency.

   
4 Application to known periodic sources

In a search for new periodic-source candidates in the ROSAT PSPC archive (see Sect. 5), we found two known periodic sources, which particularly illustrate the power of Kuiper's test.

4.1 EX Hya

EX Hya is a cataclysmic variable of type DQ Her in which a 4020 s (67 min) period has been claimed by Kruszewski et al. (1981) using an Einstein observation. This period was later confirmed by Córdova et al. (1985) using a very long EXOSAT observation. Another period of 5880 s (98 min) is claimed to be present in both optical (Mumford 1967) and X-ray (Córdova et al. 1985) light curves. Figure 5a shows the periodograms for EX Hya in ROSAT obs. RP300093N00, a 28 340 s observation (i.e. only about seven 4020 s periods), with a 15 542 s effective exposure time split in 12 GTIs. Rayleigh's test produces a forest of spurious peaks. On the other hand, a very significant peak ( ${\rm Prob}< 10^{-20}$) is easily recovered with Kuiper's test at P=3953 s, very close to the "official'' period. The 98-min period is not found here, but there is a second peak at about three times the 67 min period, extremely close (within 1%) to $2\cdot 98$ min. This peak could be an alias of both periods. The existence of the optical 98 min period in the X-ray domain is therefore unclear, and deserves further study.
  \begin{figure}
\par\resizebox{8.8cm}{!}{\includegraphics{0220fg05.eps}} %
\end{figure} Figure 5: Kuiper periodograms of EX Hya  (a) and UW Pic  (b). The grey lines show the 3953 s and 8047 s periods respectively. The dotted lines indicate the 10-4 significance threshold. In  a) the short-dash and long-dash lines show respectively the $3 \cdot 3953$ s and the possible 98-min period. In both panels the insets show the results of Rayleigh's test.
Open with DEXTER

4.2 UW Pic

UW Pic (RX J0531.5-4624) is a cataclysmic variable of type AM Her with an optical period of 8010 s (Reinsch et al. 1994). A phase folding of the ROSAT All-Sky Survey light curve at the known period suggests the existence of the period in the X-rays. Figure 5b shows the Kuiper periodogram for UW Pic in ROSAT obs. RP300334N00, which exhibits a very significant peak ( ${\rm Prob}<10^{-6}$) at P=8047 s, even though the observation consists of 29 GTIs over 2.3 days, totalling 34 501 s. Again, Rayleigh's test is completely unable to recover the period.

   
5 Period search in the ROSAT PSPC archive

We apply Kuiper's test to the entire set of 4638 ROSAT PSPC observations, treating them completely separately. For simplicity, we did not attempt to combine distinct observations of a single object. We search for periods in a range from 100 s up to a third of the total duration of the observation, using k=20.

5.1 Source extraction

Source detection has been performed following the standard EXSAS spatial analysis procedure (Zimmermann et al. 1998) on a per-observation basis using standard parameters. Overlapping sources were extracted twice: once ignoring the second source, and once excluding it. We ended up with a total of 186 572 sources, distinct or not. To obtain optimum sensitivity, we extracted the photons up to a larger radius in high signal-to-noise ratios (S/N) sources than in low S/N sources (1.5 times the source full width at half maximum compared to 0.65 times). We extract at most 2000 photons per source to limit computation time. We did not apply barycentric correction here, because the effect is negligible for the low frequencies and relatively short observations considered here.

   
5.2 Contaminations in the PSPC observations

The ideal situation of a perfectly constant source is often not realized in the X-ray domain. Two types of contamination affect S(f): extrinsic periodicities, and aperiodic variability.

Spurious peaks can be produced in the Kuiper periodogram by extrinsic phenomena. Four different kinds of contaminations affect ROSAT PSPC data. One is the wobble of the ROSAT satellite: Its attitude oscillates around the target, masking and unmasking some of the sources behind the PSPC window support structure with a period ${\sim} 400~{\rm s}\equiv f_{\rm wob}=0.0025$ Hz. The spacecraft's orbit also produces contamination. While the gaps due to observing constraints are completely taken care of with our method (see Sect. 3.3), part of the background depends on the position along the orbit (e.g., the scattered Solar X-ray background, Snowden et al. 1994), and induces a periodic variability at the period of the spacecraft's revolution, i.e. $5760~{\rm s}\equiv f_{\rm orb} 1.7361\times 10^{-4}$ Hz. We also found in about 50 cases a period of 86 400 s, obviously of extrinsic origin. In a handful of observations, many objects presented very significant peaks at 0.003 Hz. The fact that distinct objects present the same period clearly indicates a non-astrophysical origin, which we could not identify. The contaminations combine with each other, and peaks at $f_{\rm wob}\pm i~f_{\rm orb}$, i being any small integer, are frequent.

Knowing the contaminating frequencies, we could check whether harmonic and subharmonic (see Sect. 3) peaks can dominate the peak at the fundamental frequency. No subharmonic peak has been found to dominate the fundamental, but harmonic peaks occasionally do. Thus there is a risk of misidentifying a harmonic peak for the fundamental.

Aperiodic variability is also a serious difficulty when dealing with long periods. When trial periods are comparable to the source's shortest variability time scale, or longer, the effect of aperiodic variability cannot cancel itself out over the successive phases, and strongly affects S(f), preventing period detection over large ranges of frequencies. This is analogous to the red-noise contamination in Fourier power spectra.

5.3 Candidate selection

Several thousand sources exhibit significant frequencies at the 10-4 level (corrected for the number of trials), the vast majority of them being due to contamination. We applied several filters to reduce the number of candidates We rejected first all frequencies in broad ranges around the contaminating frequencies discussed above, and their harmonics. The ranges have been determined using a histogram of all significant frequencies. Aperiodic variability has been dealt with in two steps: First, we discarded objects for which S(1/T0)<-10, T0 being the total observation duration. We also rejected all objects for which more than 10 significant frequencies were found. Finally, we eliminated many of them after visual inspections, ending up with 30 objects, because several close peaks in S(f) had similar, but just below threshold, depths. This last step is however somewhat subjective.

5.4 Candidate periodic sources

Table 1 lists the properties of the 30 remaining sources. Figure 6 show S(f) for the 28 new candidates. A search over 180 000 objects produces about 18 spurious sources at the 10-4 level, assuming that all contaminations are perfectly identified. The periodicities must therefore be confirmed using distinct data sets. Six candidates have been observed several times using ROSAT PSPC with adequate observation durations, and are discussed below. The 22 other sources require additional observations before their status can be settled, and remain candidates.


  \begin{figure}
\par\includegraphics[width=14.3cm,clip]{0220fg06.eps} \end{figure} Figure 6: Kuiper periodograms of the 28 candidates listed in Table 1. The candidate frequency is highlighted with a grey line. The horizontal dotted line is the 10-4 significance limit corrected for the number of trials. Contamination related to the wobble frequency is indicated with a black circle. Contamination related to the revolution frequency is indicated with a black triangle. The numbers are the "ID'' column in Table 1.
Open with DEXTER

V603 Aql (Source #17) is a classical nova for which a period of 63 min was found using Einstein IPC data (Udalski & Schwarzenberg-Czerny 1989). Using the same data, Eracleous et al. (1991) possibly find only its first harmonics, remaining cautious about its reality. We do not find the candidate period in any of the two long ROSAT PSPC observations. Similarly, Borczyk et al. (2003), combining 27 short observations, did not find any evidence of X-ray periodicity. We found however a very significant peak at $f \sim 0.00199~{\rm Hz}\equiv 503.2$ s, a region not explored by Borczyk et al. (2003), in obs. RP300262N00. This observation lasted 1736 s, i.e. a little more than 3 cycles. Such a small number of cycles could result from a chance occurrence of three similar successive flares. However, a peak near this frequency is found in at least two other observations, but with a lower significance. The repeated occurence of the peak makes nevertheless the 503 s period intriguing. Its absence in most observations could mean that it is only a characteristic variability time scale, whitout long-term coherence, or that the periodic modulation is not persistent.

MRK 841 (Source #20) is a Seyfert 1 galaxy, with a candidate period of 240.68 s. A similar peak is found in two out of nine other observations, which were rejected because of red noise. If Source #11 ( P = 1741.89 s) is really 1 RXS J172136.9+431045, it is also an active galactic nucleus (AGN). AGN do not present periodic variability in general, but, because of their similarity to X-ray binaries, (quasi-) periodicities are not excluded. There have been several claims of existence of periodicity in AGN (e.g., Iwasawa et al. (1998) in the Seyfert 1 Galaxy IRAS 18325-5926, but see Benlloch et al. (2001)).

Source #22 is the symbiotic star AG Dra, and shows a periodicity at P=234 s. The peak is quite narrow, and there is no evidence of contamination in the region surrounding the frequency. AG Dra is a known X-ray source (Anderson et al. 1981) with two probable periods of about 350 and 550 days in the optical (Friedjung et al. 2003). No periodicity has ever been reported in the X-rays. The period was completely absent in the few other ROSAT PSPC observations. If real, the periodic component must be non-persistent.

Source #23, with a candidate period of 161.47 s, has been observed three times in total. A peak at the same frequency, albeit below our significance threshold, is found in the two other observations, making it a very good candidate. This object is, or is close to, the white dwarf WD 1620-391, which appears slightly extended in the ROSAT image. No periodicity has ever been reported for this object.

Source #25 (P=142.47 s) has been observed several times, without any confirmation of the candidate period.

Source #28 presents low-significance peaks around the candidate 116.9 s period in other observations, but the weakness of the source makes impossible to settle the case.

Table 1: Properties of the confirmed (first two rows) and candidate periodic sources. ROSAT coordinates are J2000. The frequency is expressed in mHz. N is the number of photons. $\hat{S}(f)$ is the decimal logarithm of the ${\it FPP}$ corrected for the number of trials. $N_{\rm F}$ is the number of frequencies searched. "ID'' refers to the numbers in Fig. 6. Identifications in italics are tentative.

   
6 Conclusions

Kuiper's test shows very interesting properties for the search of long-period periodic objects. Its ability to cope very naturally, without any hidden assumption, with complex GTIs is unique. Compared to Rayleigh's, Kuiper's test performs better for narrow-peaked light curves. Kuiper's test is quite sensitive to both subharmonics and harmonics of the fundamental frequency, but usually identifies the fundamental correctly. Kuiper's test is particularly adapted to X-ray missions, like XMM-Newton and Chandra, high-energy gamma-ray satellites like GLAST, and for Cherenkov telescopes.

The semi-analytical method we propose here to correct the false-positive probability in case of a search over a range of frequencies should be quite useful in practice, not only for Kuiper's test, but also for other tests, as its principle can be easily adapted. It has the advantage of simplicity, and of being based on sound probability principles.

On the 28 candidate periodic sources, 6 could be cross-checked using other ROSAT PSPC observations. Good or partial confirmation of the existence of periodicities is found in 3 of these objects, and there is total absence of confirmation in 3 objects. This does not necessarily imply a "confirmation of absence''. It must be reminded that X-ray sources are quite often strongly variable, and that a periodic signal may remain undetected in some observations, even though the observing conditions seem adequate. For instance, Israel et al. (2000) report the detection of a periodic signal in the X-ray pulsar 2E 0053.2-7242 in only one out of nine ROSAT PSPC observations, the source having dimmed by a factor >6between the different observations.

The possibility that extrinsic contamination, or statistical flukes explain some, or even most, of the candidate periods must be considered seriously. Firm identification of the candidates as periodic sources will be contingent upon the detection of the periods in independent data sets. The building up of important X-ray archives from XMM-Newton and Chandra makes it quite probable that new observations will be available for a fair number of these sources in the near future.

A C library implementing the algorithms discussed in this paper is available from the author.

Acknowledgements
This research has made use of data obtained from the High Energy Astrophysics Science Archive Research Center (HEASARC), provided by NASA's Goddard Space Flight Center. The author acknowledges a grant from the Swiss National Science Foundation

References

  
Online Material

   
Appendix A: Distribution of the Kuiper statistic

We test the formulae presented in Sect. 2.1 using Monte Carlo simulations of the null hypothesis. Figure A.1 shows the fraction of test results with a probability smaller than 10-1,..., 10-7 respectively as a function of the sample size. These fractions would reach asymptotically 10-1,..., 10-7 respectively if we had exact equations. 109 simulations have been performed for each sample size. The only discrepancies are found for sample sizes in the range 30-100. The asymptotic formula overestimates the probability of the null hypothesis by about 40% for a 40-member sample at the 10-7 level. The overestimation becomes unimportant for sample sizes larger than 100. An overestimation of the FPP is however not serious, since we are chiefly concerned with avoiding false positives. There is no evidence of underestimation of the FPP, which would be a more serious issue, as it would lead to false negatives.
  \begin{figure}
\resizebox{8.8cm}{!}{\includegraphics{0220fga1.eps}} %
\end{figure} Figure A.1: Fractions of false-positives compared to the expectation for 7 different probability thresholds, from 10-1 to 10-7, and different sample sizes. The empty circles have been calculated using Eq. (3). The black symbols use different RNGs (see text).

Empty circles in Fig. A.1 have been calculated using Eq. (3) only. The overestimation reaches a factor 3 for N=20. More importantly, the FPPs are underestimated for N < 15. The factor reaches 30 for N=10 at the 10-7 level, and is close to 300 at the 10-8 level.

In principle, one should be cautious about simulations exploring tails of probability distributions, since the random number generators (RNGs) may present defects in these regimes. This does not seem to be a problem here. Indeed, the simulated FPPs match perfectly the expected ones when either an exact formula is used, or when N is large enough if Eq. (3) is used. Moreover, the simulations that end up in the very tail of the Kuiper-statistic distribution are not at all in the tails of the uniform distributions used to generate the list of photons.

We further checked the validity of the simulations by comparing different RNGs. The curves used the MT19937 generator (Matsumoto & Nishimura 1998). The black triangles at N=40used the RANLUX generator at luxury level 2 (Lüscher 1994), and the black squares at N=50 used the (very poor) standard UNIX RNG (C function rand())[*]. The different RNGs produce perfectly compatible results within the statistical fluctuations due to the limited number of simulations, which reach $10\%$ at the 10-7 level, and $1\%$ at the 10-5 level.

   
Appendix B: Details on the R correction factor

We explore in more detail the properties of the correction factor Rwith respect to other parameters. Figure B.1 shows the effect of the frequency range. We cut the set of trial frequencies into chunks of 1000 frequencies for two 100-photon simulated observations covering the GTIs of obs. RP201045N00 and RP800035A01. We used k=20, and performed 10 000 simulations in each case. While not constant, R changes moderately, without any visible trend.
  \begin{figure}
\resizebox{8.8cm}{!}{\includegraphics{0220fgb1.eps}} %
\end{figure} Figure B.1: Correction factor R for successive chunks of 1000 frequencies for two simulated 100-photon observations using the GTIs of obs.  RP201045N00  a) and RP800035A01  b).

We test the dependence of R on the number of photons N, the number of trial frequencies $N_{\rm F}$ (which is roughly proportional to the observation duration), and a measure of the importance of gaps in the observation, given by the ratio between the "on-time'' (i.e. the sum of the individual GTI durations) and the total duration. Figure B.2

  \begin{figure}
\resizebox{8.8cm}{!}{\includegraphics{0220fgb2.eps}} %
\end{figure} Figure B.2: Correction factor R for the 30 sources from Table 1 as a function of N  (a), $N_{\rm F}$  (b), and the on-time vs. total duration ratio  (c). The dashed line is the best linear fit.

shows R for the 30 sources from Table 1 as a function of N, $N_{\rm F}$, and on-time vs total duration ratio for k=20, with 1000 simulations per observation. A significant correlation is found only between R and N, making R increase with N, with a 1% probability chance occurrence of Spearman's correlation coefficient.

The average R value for k=20 is 0.335, with a rms corrected for the contribution of the number of simulations of 0.066. After removal of the best linear fit to Fig. B.2a, the rms becomes 0.053. Hence most of the scatter remains unexplained, and probably results from the distribution of the GTIs.

All parameters except k can be neglected as a first approximation, and using R(k) is justified. A more detailed approximation would make use of both k and N. Unless one is searching for periodic sources in a large number of observations, which is the case in this work, the correct approach is nevertheless to calculate Rspecifically for the observation at hand.



Copyright ESO 2004