A&A 420, 789-797 (2004)
DOI: 10.1051/0004-6361:20034220
Searching for periods in X-ray observations using Kuiper's test![[*]](/icons/foot_motif.gif)
Application to the ROSAT PSPC archive
S. Paltani
Laboratoire d'Astrophysique de Marseille, BP 8, 13376 Marseille cedex 12, France
Received 21 August 2003 / Accepted 28 February 2004
Abstract
We use Kuiper's test to detect periodicities in X-ray and
gamma-ray observations. Like Rayleigh's test, it uses the individual
photon arrival times, and is therefore well suited to the analysis
of faint sources. Our method makes it possible to take into account
the discontinuities in the observation, and to completely get rid of
the contamination that results from them. This makes it particularly
adapted to the search of periods long compared to the total
observation duration. We propose a semi-analytical approach to
determine the effective number of trial frequencies when searching
for unknown periods over a frequency range. This approach can be
easily adapted to other tests. We show that, using Kuiper's test, we
can recover periods in frequency domains where other tests are
completely confused by contamination. We finally search the entire
ROSAT Position-Sensitive Proportional Counter (PSPC) archive for
long periods, and find 28 new periodic-source candidates.
Key words: methods: data analysis - methods: statistical - stars: pulsars:
general - X-ray: stars
1 Introduction
Important efforts have been devoted to the search of periodic signals
throughout the electromagnetic spectrum. Because of the
idiosyncrasies of astrophysical observations, different methods must
be used depending on the type of object and the wavelength range.
Four test families seem to dominate the period-detection "market''.
The calculation of the Fourier power spectrum density
(e.g., Press et al. 1993) using a fast Fourier transform
(FFT) is adapted to evenly spaced (or evenly binned) observations.
The Lomb-Scargle periodogram
(Lomb 1976; Scargle 1982; Horne & Baliunas 1986), a
discrete Fourier transform method, can be used for unevenly-spaced
flux measurements. Epoch folding (EF) (e.g.,
Leahy et al. 1983a) can be used in the same conditions or for
individual photons, but requires a binning according to the phase.
Rayleigh's test (e.g., Fisher 1993; Gibson et al. 1982)
is particularly adapted for the analysis of individual photons.
Observations in the X- and gamma-rays usually have two important
characteristics. First, independent, time-tagged photons are
collected. A method requiring binning is therefore far from ideal,
as it results in a loss of information. Furthermore, binning is
prohibited for sources detected with very few photons; for EF for
instance, the required assumption of Gaussian distribution in each bin
is not satisfied in this case. Moreover, the necessary assumptions on
the number and sizes of the bins lower the performance of the test
(Schwarzenberg-Czerny 1999). Secondly, space observations are often
interrupted by "bad time'' periods, where no data are received.
Fourier-based methods and Rayleigh tests are seriously affected by
this problem. In practice, it means that only periods short compared
to the durations of uninterrupted observation can be investigated.
In this paper we present in detail Kuiper's test
(Kuiper 1960). This test has been applied to the
distribution of solar flares (Jetsu et al. 1997), and to the
search for periodicities in Earth impacts
(Jetsu 1997; Jetsu & Pelt 2000), but its unique
suitability to X-ray and gamma-ray observations has been overlooked.
Similarly to Rayleigh's test, it uses discrete events, and can be
applied to very faint sources without any a priori assumption.
Similarly to EF, it takes into account non-uniform coverage of the
phase domain, and can therefore be used when searching for periods
long compared to the total observation duration
. We
study in detail the properties of Kuiper's test for period detection,
and particularly its significance level. We concentrate on two
important issues: the treatment of discontinuous observations, and the
determination of the effective number of trial frequencies when
searching for unknown periods. We finally apply the algorithm to the
entire archive of the ROSAT Position-Sensitive Proportional Counter
(PSPC) archive.
Kuiper's test (Kuiper 1960) is a variant of
Kolmogorov-Smirnov's (KS) test (see Press et al. 1993, and Jetsu & Pelt 1996 for short introductions). Given a sample
,
i=1,...,N, and a probability distribution
,
,
the Kuiper statistic is defined by:
 |
(1) |
where
,
and
is the empirical cumulative
distribution of the
,
i=1,...,N sample (
meaning "number of ...''). Similarly to KS, the Kuiper statistic does
not depend on the underlying distribution. The null hypothesis is that
the
,
i=1,...,N sample is an outcome of N draws
from the
distribution.
Kuiper's test can be readily transformed into a test of periodicity in
a series of photons by phase-folding their arrival times
,
i=1,...,N for a given test period P0=1/f0:
 |
(2) |
where Frac(y) is the fractional part of y, and t0 an arbitrary
time. In the absence of periodicity at frequency f0, the
phases are expected to be distributed uniformly. This
can be tested using the Kuiper statistic
,
where
is the cumulative of a uniform distribution
between 0 and 1. A very low probability is evidence that the phases
are not uniformly distributed for this frequency, and indicates a
periodicity (but see Sect. 5.2).
Contrarily to KS or EF, the Kuiper statistic is invariant under a
shift of the origin for periodic distributions. As a result,
is invariant under a shift in phase
that would result from a different choice of t0.
2.1 Significance of the Kuiper statistic
Kuiper (1960) gave the following asymptotic expression for
large N to calculate the probability of the Kuiper statistic V to
be larger than a given value z under the null hypothesis:
 |
|
|
(3) |
This is the false positive probability (
)
of falsely rejecting
the null hypothesis. This formula is systematically used, even though
its validity for small N has not been tested (see,
e.g., Jetsu & Pelt 1996). In Appendix A we
show that the FPP is overestimated by a factor 3 for N=20 at
the 10-7 level. For N<15, the probability is underestimated,
which wrongly increases the rate of false positives by a factor 30 at
the 10-7 level. Eq. (3) is therefore seriously
wrong for small N.
Stephens (1965) gives two analytical formulae valid for the
lower tail (
)
of the Kuiper statistic distribution:
 |
(4) |
(
is the minimum of the Kuiper statistic), and, if
:
 |
(5) |
with
and
being the two solutions of the quadratic
equation:
.
Stephens (1965) also gives an analytic formula for the
:
 |
(6) |
with:
 |
(7) |
where
,
which is valid if
,
if N is even,
and if
,
if N is odd.
The domains of validity of the three exact equations are shown in
Fig. 1, the asymptotic formula being used outside
them. The validity condition of Eq. (6) is
difficult to satisfy for large N. For N=100, the
probability that
is of the order of 10-21. For
N=50, this probability is of the order of 10-10, making
Eq. (6) useful even for intermediate-size samples.
Equations (4) and (5) represent 40%
of the cases for N=10, and only 1% for N=20.
 |
Figure 1:
Domains in
vs. Sample Size covered by the four formulae. The
labels indicate the number of the equation used to calculate the
probability. The structures in the dark grey area are due to the
different validity criteria for even and odd numbers in
Eq. (6) |
| Open with DEXTER |
Using the four equations, the
is never underestimated. The only
remaining discrepancy with the true distribution is in the region
,
where the probability is overestimated by a factor 1.5
at the 10-7 level.
Using extensive sets of simulations, we compare the performances of
Kuiper's test with those of the more common Rayleigh test. We create
simulated "observations'' of periodic sources for different count
rates, different signal-to-noise ratios (S/N), and different signal
shapes. The phase-folded light curve (hereafter simply "light
curve'') is defined as the superimposition of a constant function (the
"continuum'') and of the first half-period of a sine function (the
"pulse''), covering a fraction w of the period. The S/N is defined
as the ratio between the surfaces of the pulse and of the continuum.
We draw events at random from the "pulse+continuum'' light curves
until a given number of events has been collected in the pulse. For
each set of parameters, 10 000 light curves are simulated. We then
compare the average null-hypothesis probabilities of the two tests.
Figure 2 shows the results for three signal
intensity cases: 20, 100, and 500 events in the pulse. In the three
cases, Rayleigh's test is more efficient for
,
while
Kuiper's performs better for
.
In the situation most
favorable to Rayleigh's test (i.e. w=1, 100 events in the
pulse), the significance threshold (set arbitrarily to 10-4) is
crossed with a S/N 2 times smaller with Rayleigh's test; this
advantage decreases to 15% with w=3/4, and Rayleigh's test
is about 30% less sensitive than Kuiper's with w=1/4.
Kuiper's test has more difficulty with periodic signals presenting
only weak modulations, but the decrease in performance is moderate. It
is actually well known that Rayleigh's test is particularly sensitive
in the case of broad peaks (Leahy et al. 1983b). On the other
hand, some pulsars, in particular in the gamma-rays, have peaks much
narrower than those simulated here (Kanbach 1998), in which
case Kuiper's test can significantly outperform Rayleigh's.
3 Searching for periodicities with Kuiper's test
To search for periodicities, we can calculate the Kuiper statistic over a
set of test frequencies. The Kuiper periodogram (or, more
appropriately, "frequencygram'') is defined as:
 |
(8) |
where
is the Kuiper statistic calculated for a
frequency f. The logarithm is applied to highlight the candidate
periods. Given a periodic signal with a frequency f0, Kuiper's test
may present harmonic and subharmonic peaks at frequencies
and
(plus their harmonics),
being any small
integer.
 |
Figure 2:
Sensitivities of Kuiper's and Rayleigh's tests as a function of
the S/N between the pulse and the continuum. a) 20 counts
in the pulse; b) 100 counts; c) 500 counts. In
each graph, the curves are (from top to bottom) w=1,
w=3/4, w=1/2, w=1/4. The
w=3/4 and w=1/4 curves are highlighted in
light grey for visual identification. The solid line is Kuiper's
test; the dashed line Rayleigh's. |
| Open with DEXTER |
To avoid missing significant peaks, S(f) must be calculated for
frequencies sufficiently close to each other. Assuming a source
emitting a photon every P0=1/f0 seconds, the phases of the first
and last photons evaluated at a frequency
close to
f0 differ by
,
where T is
the total duration of the observation. The coherence is preserved if:
 |
(9) |
Therefore S(f) must be calculated at equidistant frequencies,
depending only on T. We define the oversampling parameter k:
 |
(10) |
Equation (9) becomes therefore:
.
If this inequality
is not satisfied, significant peaks can be missed, or underestimated
by sampling them too far from their central frequencies. On the other
hand, the CPU time is proportional to k. Reasonable values of kare in the range 20-50 (but see Sect. 3.2).
3.2 Number of trials
is the probability that P=1/f is
not a period of the source for a single draw of a Kuiper
statistic. If S(f) is calculated for a set of frequencies fj,
j=1,...,n, and assuming all the frequencies are independent,
we have:
 |
(11) |
The above equation can be approximated by:
 |
(12) |
under the condition
.
We can
therefore correct our S(f) estimator for the number of trials:
 |
(13) |
As n is proportional to k, Eq. (13) may destroy the
significance of some peaks if the large k's required to find the
peaks are used. However, S(f) is strongly correlated on scales
and below, and we have
,
the exact value being very difficult to calculate.
This problem affects all period search algorithms, and has been
addressed using extensive simulations for very specific cases
(e.g., de Jager et al. 1988; Horne & Baliunas 1986; de Jager et al. 1989).
We propose here a simple and workable semi-analytical method to
completely correct for the choice of k.
We choose an arbitrary threshold V*, small enough so that
.
We then simulate m sets of random
photons, and calculate
over all fj for all m simulations. The probability that, for a
given simulation,
can now be estimated as
.
This is the left-hand side of
Eq. (12), with z=V*. We can therefore
estimate the effective number of frequencies,
:
 |
(14) |
can be understood as the number of independent
frequencies among the fj's. Approximating
with a Poisson distribution, the uncertainty
on
is:
 |
(15) |
The corrected periodogram is then deduced from Eq. (13):
 |
(16) |
Provided
,
is the probability that
the source has no 1/f period, if n tests are performed. This
method is quite general, and can be easily adapted to other
statistical tests.
In principle, the correction factor
can
depend on k, the number of photons, the frequency range, the
observation duration, and so on, which means that R should be
estimated separately for all observations. As this is computationally
expensive for large numbers of observations (see
Sect. 5), we approximate R as a function of k only.
Details are presented in Appendix B. In the limit
,
Kuiper's tests are independent from each
other, while in the limit
,
reaches a plateau. We can therefore write the
approximation:
 |
(17) |
Figure 3 shows the correction factor R(k) for five
sets of simulated observations with different number of photons and
different GTIs (see Sect. 3.3). In all cases, 10 000
simulations have been made for each k, and we set V* so that
,
which seems sufficiently small.
The behavior of R(k) follows quite well Eq. (17), but
the curves do significantly differ from each other, albeit moderately.
We adopt in the following a unique value
r0=0.0815, which
gives
for k=20, or
for k=50. This value corresponds
to the upper envelope of the curves of Fig. 3.
 |
Figure 3:
Ratio
as a function of k for
five different "observations'': two continuous (20 and 200
photons) ones, and three corresponding to obs. RP300093N00,
RP300262N00, and RP700232N00 with 100, 1596, and 377
photons. The grey area shows the uncertainties for one of the five
curves. The dashed line has a slope -1. |
| Open with DEXTER |
3.3 Discontinuous observations
In high-energy observations, the photons are collected during limited
periods of time called "good time intervals'' (GTIs). Their main
effect is to make the cumulative distribution of the phases of the
photons coming from a constant source depart from U(x)=x, because
the phase intervals are not uniformly covered. This creates strong
aliases in FFTs and Rayleigh's test; EF can take into account the
actual exposure time of each phase bin, but with some limitations due
to the binning.
Kuiper's test is similar to EF in spirit, and even allows a perfect
correction for expected non-uniformity. Like KS's test, Kuiper's test
is independent of the shape of the putative parent distribution. Thus
we calculate exactly, for each frequency, the expected distribution
of the phases for a constant source. This can be done by
folding the GTIs according to the period boundaries.
being
piecewise constant, its cumulative
can
be calculated exactly.
in Eq. (1) is then
replaced by
to calculate the Kuiper statistic.
Figure 4 compares S(f) to
,
R0(f) being the Rayleigh statistic, in three
different cases.
 |
Figure 4:
Effect of the GTIs on Rayleigh's (top) and Kuiper's (bottom)
tests. a-b) Simulated 1000 s 1000-photon observation.
c-d) 433-photon source in ROSAT obs. RF500043A01 (8 GTIs).
e-f) Simulated 300-photon periodic source with the GTIs of
obs. RP600121N00. The dashed line indicates the location of the
true period P=104 s. |
| Open with DEXTER |
The first case is a simulated 1000 s 1000-photon observation of a
constant source. No significant peak is observed in Kuiper's test down
to the absolute minimum frequency,
s, while
Rayleigh's produces several very significant spurious peaks. The
second case is a real anonymous 433-photon source in ROSAT
obs. RF500043A01, an observation consisting of 8 GTIs. Again,
absolutely no significant peak is observed in Kuiper's test down to
the absolute minimum frequency, while Rayleigh's produces many, very
deep spurious peaks. The third case is a simulated 300-photon source
with a period P=104 s with the GTIs of ROSAT
obs. RP600121N00. The photons have been drawn from a
"w=1/4'' light curve with 150 photons in the pulse. This
observation totaled 44 733 s spread over 1 month in 40 separate
GTIs. The longest GTI lasted 3118 s, 38 of the 40 GTIs lasting half
an hour or less. The peak at f=10-4 Hz has comparable
depth in both tests. However, because many contaminating peaks have an
amplitude comparable to that of the true period, some even
overwhelming it, it is impossible to retrieve the 104 s period
using Rayleigh's test. In the Kuiper periodogram, the peak at
f=10-4 Hz dominates all other peaks with a probability
ratio larger than 20 000. Furthermore, the second and third peaks
are located respectively at f/4 and 2 f, and are very probably
aliases of the true frequency.
4 Application to known periodic sources
In a search for new periodic-source candidates in the ROSAT PSPC
archive (see Sect. 5), we found two known periodic
sources, which particularly illustrate the power of Kuiper's test.
EX Hya is a cataclysmic variable of type DQ Her in which a
4020 s (67 min) period has been claimed by
Kruszewski et al. (1981) using an Einstein observation. This
period was later confirmed by Córdova et al. (1985) using a
very long EXOSAT observation. Another period of 5880 s (98 min) is
claimed to be present in both optical (Mumford 1967) and
X-ray (Córdova et al. 1985) light curves.
Figure 5a shows the periodograms for EX Hya in ROSAT
obs. RP300093N00, a 28 340 s observation (i.e. only about seven
4020 s periods), with a 15 542 s effective exposure time split in
12 GTIs. Rayleigh's test produces a forest of spurious peaks. On the
other hand, a very significant peak (
)
is easily
recovered with Kuiper's test at P=3953 s, very close to the
"official'' period. The 98-min period is not found here, but there is
a second peak at about three times the 67 min period, extremely close
(within 1%) to
min. This peak could be an alias of both
periods. The existence of the optical 98 min period in the X-ray
domain is therefore unclear, and deserves further study.
 |
Figure 5:
Kuiper periodograms of EX Hya (a) and UW Pic (b).
The grey lines show the 3953 s and 8047 s periods
respectively. The dotted lines indicate the 10-4 significance
threshold. In a) the short-dash and long-dash lines show
respectively the
s and the possible 98-min period.
In both panels the insets show the results of Rayleigh's test. |
| Open with DEXTER |
UW Pic (RX J0531.5-4624) is a cataclysmic variable of type AM
Her with an optical period of 8010 s (Reinsch et al. 1994).
A phase folding of the ROSAT All-Sky Survey light curve at the known
period suggests the existence of the period in the X-rays.
Figure 5b shows the Kuiper periodogram for UW Pic in ROSAT
obs. RP300334N00, which exhibits a very significant peak
(
)
at P=8047 s, even though the observation
consists of 29 GTIs over 2.3 days, totalling 34 501 s. Again,
Rayleigh's test is completely unable to recover the period.
5 Period search in the ROSAT PSPC archive
We apply Kuiper's test to the entire set of 4638 ROSAT PSPC
observations, treating them completely separately. For simplicity, we
did not attempt to combine distinct observations of a single object.
We search for periods in a range from 100 s up to a third of the
total duration of the observation, using k=20.
Source detection has been performed following the standard EXSAS
spatial analysis procedure (Zimmermann et al. 1998) on a
per-observation basis using standard parameters. Overlapping sources
were extracted twice: once ignoring the second source, and once
excluding it. We ended up with a total of 186 572 sources, distinct
or not. To obtain optimum sensitivity, we extracted the photons up to a
larger radius in high signal-to-noise ratios (S/N) sources than in low
S/N sources (1.5 times the source full width at half maximum compared
to 0.65 times). We extract at most 2000 photons per source to limit
computation time. We did not apply barycentric correction here,
because the effect is negligible for the low frequencies and
relatively short observations considered here.
5.2 Contaminations in the PSPC observations
The ideal situation of a perfectly constant source is often not
realized in the X-ray domain. Two types of contamination affect
S(f): extrinsic periodicities, and aperiodic variability.
Spurious peaks can be produced in the Kuiper periodogram by extrinsic
phenomena. Four different kinds of contaminations affect ROSAT PSPC
data. One is the wobble of the ROSAT satellite: Its attitude
oscillates around the target, masking and unmasking some of the
sources behind the PSPC window support structure with a period
Hz.
The spacecraft's orbit also produces contamination. While the gaps
due to observing constraints are completely taken care of with our
method (see Sect. 3.3), part of the background depends on
the position along the orbit (e.g., the scattered Solar X-ray
background, Snowden et al. 1994), and induces a periodic
variability at the period of the spacecraft's revolution, i.e.
Hz. We
also found in about 50 cases a period of 86 400 s, obviously of
extrinsic origin. In a handful of observations, many objects presented
very significant peaks at 0.003 Hz. The fact that distinct objects
present the same period clearly indicates a non-astrophysical origin,
which we could not identify. The contaminations combine with each
other, and peaks at
,
i being
any small integer, are frequent.
Knowing the contaminating frequencies, we could check whether harmonic
and subharmonic (see Sect. 3) peaks can dominate the
peak at the fundamental frequency. No subharmonic peak has been found
to dominate the fundamental, but harmonic peaks occasionally do. Thus
there is a risk of misidentifying a harmonic peak for the fundamental.
Aperiodic variability is also a serious difficulty when dealing with
long periods. When trial periods are comparable to the source's
shortest variability time scale, or longer, the effect of aperiodic
variability cannot cancel itself out over the successive phases, and
strongly affects S(f), preventing period detection over large ranges
of frequencies. This is analogous to the red-noise contamination in
Fourier power spectra.
Several thousand sources exhibit significant frequencies at the
10-4 level (corrected for the number of trials), the vast
majority of them being due to contamination. We applied several
filters to reduce the number of candidates We rejected first all
frequencies in broad ranges around the contaminating frequencies
discussed above, and their harmonics. The ranges have been determined
using a histogram of all significant frequencies. Aperiodic
variability has been dealt with in two steps: First, we discarded
objects for which
S(1/T0)<-10, T0 being the total observation
duration. We also rejected all objects for which more than 10
significant frequencies were found. Finally, we eliminated many of
them after visual inspections, ending up with 30 objects, because
several close peaks in S(f) had similar, but just below threshold,
depths. This last step is however somewhat subjective.
Table 1 lists the properties of the 30 remaining
sources. Figure 6 show S(f) for the 28 new candidates.
A search over 180 000 objects produces about 18 spurious sources at
the 10-4 level, assuming that all contaminations are perfectly
identified. The periodicities must therefore be confirmed using
distinct data sets. Six candidates have been observed several
times using ROSAT PSPC with adequate observation durations, and are
discussed below. The 22 other sources require additional
observations before their status can be settled, and remain
candidates.
![\begin{figure}
\par\includegraphics[width=14.3cm,clip]{0220fg06.eps} \end{figure}](/articles/aa/full/2004/23/aa0220/Timg89.gif) |
Figure 6:
Kuiper periodograms of the 28 candidates listed in
Table 1. The candidate frequency is highlighted
with a grey line. The horizontal dotted line is the 10-4 significance limit corrected for the number of trials.
Contamination related to the wobble frequency is indicated with a
black circle. Contamination related to the revolution frequency is
indicated with a black triangle. The numbers are the "ID'' column
in Table 1. |
| Open with DEXTER |
V603 Aql (Source #17) is a classical nova for which a period
of 63 min was found using Einstein IPC data
(Udalski & Schwarzenberg-Czerny 1989). Using the same data,
Eracleous et al. (1991) possibly find only its first harmonics,
remaining cautious about its reality. We do not find the candidate
period in any of the two long ROSAT PSPC observations. Similarly,
Borczyk et al. (2003), combining 27 short observations, did not
find any evidence of X-ray periodicity. We found however a very
significant peak at
s,
a region not explored by Borczyk et al. (2003), in
obs. RP300262N00. This observation lasted 1736 s, i.e. a little
more than 3 cycles. Such a small number of cycles could result from
a chance occurrence of three similar successive flares. However, a peak
near this frequency is found in at least two other observations, but
with a lower significance. The repeated occurence of the peak makes
nevertheless the 503 s period intriguing. Its absence in most
observations could mean that it is only a characteristic variability
time scale, whitout long-term coherence, or that the periodic
modulation is not persistent.
MRK 841 (Source #20) is a Seyfert 1 galaxy, with a candidate
period of 240.68 s. A similar peak is found in two out of nine
other observations, which were rejected because of red noise. If
Source #11 (
P = 1741.89 s) is really 1 RXS J172136.9+431045,
it is also an active galactic nucleus (AGN). AGN
do not present periodic variability in general, but, because of their
similarity to X-ray binaries, (quasi-) periodicities are not excluded.
There have been several claims of existence of periodicity in AGN
(e.g., Iwasawa et al. (1998) in the Seyfert 1 Galaxy
IRAS 18325-5926, but see Benlloch et al. (2001)).
Source #22 is the symbiotic star AG Dra, and shows a
periodicity at P=234 s. The peak is quite narrow, and there
is no evidence of contamination in the region surrounding the
frequency. AG Dra is a known X-ray source (Anderson et al. 1981)
with two probable periods of about 350 and 550 days in the optical
(Friedjung et al. 2003). No periodicity has ever been reported in
the X-rays. The period was completely absent in the few other
ROSAT PSPC observations. If real, the periodic component must be
non-persistent.
Source #23, with a candidate period of 161.47 s, has been observed
three times in total. A peak at the same frequency, albeit below our
significance threshold, is found in the two other observations, making
it a very good candidate. This object is, or is close to, the white
dwarf WD 1620-391, which appears slightly extended in the
ROSAT image. No periodicity has ever been reported for this object.
Source #25 (P=142.47 s) has been observed several times,
without any confirmation of the candidate period.
Source #28 presents low-significance peaks around the candidate
116.9 s period in other observations, but the weakness of the
source makes impossible to settle the case.
Table 1:
Properties of the confirmed (first two rows) and candidate periodic
sources. ROSAT coordinates are J2000. The frequency is expressed in mHz.
N is the number of photons.
is the decimal logarithm of
the
corrected for the number of trials.
is
the number of frequencies searched. "ID'' refers to the numbers in
Fig. 6. Identifications in italics are tentative.
6 Conclusions
Kuiper's test shows very interesting properties for the search of
long-period periodic objects. Its ability to cope very naturally,
without any hidden assumption, with complex GTIs is unique. Compared
to Rayleigh's, Kuiper's test performs better for narrow-peaked light
curves. Kuiper's test is quite sensitive to both subharmonics and
harmonics of the fundamental frequency, but usually identifies the
fundamental correctly. Kuiper's test is particularly adapted to X-ray
missions, like XMM-Newton and Chandra, high-energy gamma-ray
satellites like GLAST, and for Cherenkov telescopes.
The semi-analytical method we propose here to correct the
false-positive probability in case of a search over a range of
frequencies should be quite useful in practice, not only for Kuiper's
test, but also for other tests, as its principle can be easily
adapted. It has the advantage of simplicity, and of being based on
sound probability principles.
On the 28 candidate periodic sources, 6 could be cross-checked using
other ROSAT PSPC observations. Good or partial confirmation of the
existence of periodicities is found in 3 of these objects, and there
is total absence of confirmation in 3 objects. This does not
necessarily imply a "confirmation of absence''. It must be reminded
that X-ray sources are quite often strongly variable, and that a
periodic signal may remain undetected in some observations, even
though the observing conditions seem adequate. For instance,
Israel et al. (2000) report the detection of a periodic signal
in the X-ray pulsar 2E 0053.2-7242 in only one out of nine
ROSAT PSPC observations, the source having dimmed by a factor >6between the different observations.
The possibility that extrinsic contamination, or statistical flukes
explain some, or even most, of the candidate periods must be
considered seriously. Firm identification of the candidates as
periodic sources will be contingent upon the detection of the periods
in independent data sets. The building up of important X-ray archives
from XMM-Newton and Chandra makes it quite probable that new observations
will be available for a fair number of these sources in the near
future.
A C library implementing the algorithms discussed in this paper is
available from the author.
Acknowledgements
This research has made use of data obtained from the
High Energy Astrophysics Science Archive Research Center (HEASARC),
provided by NASA's Goddard Space Flight Center. The author
acknowledges a grant from the Swiss National Science
Foundation
- Anderson, C. M., Cassinelli,
J. P., & Sanders, W. T. 1981, ApJ, 247, L127 [NASA ADS] [CrossRef] (In the text)
- Benlloch, S., Wilms, J., Edelson,
R., Yaqoob, T., & Staubert, R. 2001, ApJ, 562, L121 [NASA ADS] [CrossRef] (In the text)
- Borczyk, W.,
Schwarzenberg-Czerny, A., & Szkody, P. 2003, A&A, 405,
663 [EDP Sciences] [NASA ADS] [CrossRef] (In the text)
- Córdova, F. A., Mason,
K. O., & Kahn, S. M. 1985, MNRAS, 212, 447 [NASA ADS] (In the text)
- de Jager, O. C.,
Raubenheimer, B. C., North, A. R., Nel, H. I., &
van Urk, G. 1988, ApJ, 329, 831 [NASA ADS] [CrossRef]
- de Jager, O. C.,
Raubenheimer, B. C., & Swanepoel, J. W. H. 1989,
A&A, 221, 180 [NASA ADS]
- Eracleous, M., Patterson, J.,
& Halpern, J. 1991, ApJ, 370, 330 [NASA ADS] [CrossRef] (In the text)
-
Fisher, N. I. 1993, Statistical analysis of circular data
(Cambridge: Cambridge University Press)
- Friedjung, M., Gális, R.,
Hric, L., & Petrík, K. 2003, A&A, 400, 595 [EDP Sciences] [NASA ADS] [CrossRef] (In the text)
- Gibson, A. I., Harrison,
A. B., Kirkman, W., et al. 1982, Nature,
296, 833 [NASA ADS]
- Horne, J. H., &
Baliunas, S. L. 1986, ApJ, 302, 757 [NASA ADS] [CrossRef]
- Israel, G. L., Campana, S.,
Covino, S., et al. 2000, ApJ, 531, L131 [NASA ADS] [CrossRef] (In the text)
- Iwasawa, K., Fabian, A. C.,
Brandt, W. N., et al. 1998, MNRAS, 295, L20 [NASA ADS] (In the text)
-
Jetsu, L. 1997, A&A, 321, L33 [NASA ADS]
- Jetsu, L., & Pelt, J. 1996,
A&AS, 118, 587 [NASA ADS] (In the text)
- Jetsu, L., & Pelt, J. 2000,
A&A, 353, 409 [NASA ADS]
- Jetsu, L., Pohjolainen, S., Pelt,
J., & Tuominen, I. 1997, A&A, 318, 293 [NASA ADS] (In the text)
-
Kanbach, G. 1998, Adv. Space Res., 21, 227 [NASA ADS] [CrossRef] (In the text)
- Kruszewski, A., Mewe, R., Heise,
J., et al. 1981, Space Sci. Rev., 30, 221 [NASA ADS] (In the text)
-
Kuiper, N. H. 1960, Proc. of the Koninkl. Nederl. Akad.
van Wetenschappen, Ser. A., 63, 38
(In the text)
- Leahy, D. A., Darbro, W.,
Elsner, R. F., et al. 1983a, ApJ, 266, 160 [NASA ADS] [CrossRef] (In the text)
- Leahy, D. A., Elsner,
R. F., & Weisskopf, M. C. 1983b, ApJ, 272, 256 [NASA ADS] [CrossRef] (In the text)
-
Lomb, N. R. 1976, Ap&SS, 39, 447 [NASA ADS]
-
Lüscher, M. 1994, Comput. Phys. Comm., 79, 100 [NASA ADS] [CrossRef] [MathSciNet] (In the text)
- Matsumoto, M., & Nishimura,
T. 1998, ACM Trans. Model. Comput. Simul., 8, 3 [NASA ADS] [CrossRef] (In the text)
-
Mumford, G. S. 1967, ApJS, 15, 1 [NASA ADS] [CrossRef] (In the text)
- Press, W. H., Teukolsky,
S. A., Vetterling, W. T., & Flannery, B. P.
1993, Numerical Recipes in C: The Art of Scientific Computing, 2nd
edn. (Cambridge: Cambridge University Press)
(In the text)
- Reinsch, K., Burwitz, V.,
Beuermann, K., Schwope, A. D., & Thomas, H.-C. 1994,
A&A, 291, L27 [NASA ADS] (In the text)
-
Scargle, J. D. 1982, ApJ, 263, 835 [NASA ADS] [CrossRef]
-
Schwarzenberg-Czerny, A. 1999, ApJ, 516, 315 [NASA ADS] [CrossRef] (In the text)
- Snowden, S. L., McCammon,
D., Burrows, D. N., & Mendenhall, J. A. 1994, ApJ,
424, 714 [NASA ADS] [CrossRef] (In the text)
-
Stephens, M. A. 1965, Biometrika, 52, 309 [NASA ADS] [MathSciNet] (In the text)
- Udalski, A., &
Schwarzenberg-Czerny, A. 1989, Acta Astron., 39, 125 [NASA ADS] (In the text)
- Zimmermann, U., Boese, G.,
Becker, W., et al. 1998, EXSAS User's Guide, Tech. rep., MPE
Report, 85740 Garching bei München, Germany
(In the text)
Online Material
Appendix A: Distribution of the Kuiper statistic
We test the formulae presented in Sect. 2.1 using Monte
Carlo simulations of the null hypothesis. Figure A.1
shows the fraction of test results with a probability smaller than
10-1,..., 10-7 respectively as a function of the sample
size. These fractions would reach asymptotically 10-1,...,
10-7 respectively if we had exact equations. 109 simulations
have been performed for each sample size. The only discrepancies are
found for sample sizes in the range 30-100. The asymptotic formula
overestimates the probability of the null hypothesis by about 40% for
a 40-member sample at the 10-7 level. The overestimation becomes
unimportant for sample sizes larger than 100. An overestimation of
the FPP is however not serious, since we are chiefly concerned with
avoiding false positives. There is no evidence of underestimation of
the FPP, which would be a more serious issue, as it would lead to
false negatives.
 |
Figure A.1:
Fractions of false-positives compared to the expectation for 7 different
probability thresholds, from 10-1 to 10-7,
and different sample sizes. The empty circles have been
calculated using Eq. (3). The black symbols use
different RNGs (see text). |
Empty circles in Fig. A.1 have been calculated using
Eq. (3) only. The overestimation reaches a factor 3 for
N=20. More importantly, the FPPs are underestimated for
N < 15. The factor reaches 30 for N=10 at the
10-7 level, and is close to 300 at the 10-8 level.
In principle, one should be cautious about simulations exploring tails
of probability distributions, since the random number generators
(RNGs) may present defects in these regimes. This does not seem to
be a problem here. Indeed, the simulated FPPs match perfectly the
expected ones when either an exact formula is used, or when N is
large enough if Eq. (3) is used. Moreover, the simulations
that end up in the very tail of the Kuiper-statistic distribution are
not at all in the tails of the uniform distributions used to generate
the list of photons.
We further checked the validity of the simulations by comparing
different RNGs. The curves used the MT19937 generator
(Matsumoto & Nishimura 1998). The black triangles at N=40used the RANLUX generator at luxury level 2 (Lüscher 1994),
and the black squares at N=50 used the (very poor) standard
UNIX RNG (C function rand())
. The different RNGs produce
perfectly compatible results within the statistical fluctuations due
to the limited number of simulations, which reach
at the
10-7 level, and
at the 10-5 level.
Appendix B: Details on the R correction factor
We explore in more detail the properties of the correction factor Rwith respect to other parameters. Figure B.1 shows the
effect of the frequency range. We cut the set of trial frequencies
into chunks of 1000 frequencies for two 100-photon simulated
observations covering the GTIs of obs. RP201045N00 and RP800035A01. We used k=20, and performed 10 000 simulations in
each case. While not constant, R changes moderately, without any
visible trend.
 |
Figure B.1:
Correction factor R for successive chunks of 1000 frequencies
for two simulated 100-photon observations using the GTIs of
obs. RP201045N00 a) and RP800035A01 b). |
We test the dependence of R on the number of photons N, the number
of trial frequencies
(which is roughly proportional to
the observation duration), and a measure of the importance of gaps in
the observation, given by the ratio between the "on-time'' (i.e. the
sum of the individual GTI durations) and the total duration.
Figure B.2
 |
Figure B.2:
Correction factor R for the 30 sources from
Table 1 as a function of N (a),
(b), and the on-time vs. total duration ratio (c).
The dashed line is the best linear fit. |
shows R for the 30 sources from Table 1 as a
function of N,
,
and on-time vs total duration ratio
for k=20, with 1000 simulations per observation. A significant
correlation is found only between R and N, making R increase
with N, with a 1% probability chance occurrence of Spearman's
correlation coefficient.
The average R value for k=20 is 0.335, with a rms corrected for
the contribution of the number of simulations of 0.066. After
removal of the best linear fit to Fig. B.2a, the rms
becomes 0.053. Hence most of the scatter remains unexplained, and
probably results from the distribution of the GTIs.
All parameters except k can be neglected as a first approximation,
and using R(k) is justified. A more detailed approximation would
make use of both k and N. Unless one is searching for periodic
sources in a large number of observations, which is the case in this
work, the correct approach is nevertheless to calculate Rspecifically for the observation at hand.
Copyright ESO 2004