Issue 
A&A
Volume 519, September 2010



Article Number  A85  
Number of page(s)  12  
Section  Astronomical instrumentation  
DOI  https://doi.org/10.1051/00046361/201014079  
Published online  16 September 2010 
Unevenlysampled signals: a general formalism for the LombScargle periodogram
R. Vio^{1}  P. Andreani^{2,3}  A. Biggs^{4}
1  Chip Computers Consulting srl, Viale Don L. Sturzo 82,
S. Liberale di Marcon, 30020 Venice, Italy
,
2 
ESO, Karl Schwarzschild strasse 2, 85748 Garching, Germany
3 
INAF  Osservatorio Astronomico di Trieste, via Tiepolo 11, 34143 Trieste, Italy
4 
ESO, Karl Schwarzschild strasse 2, 85748 Garching, Germany
Received 15 January 2010 / Accepted 12 May 2010
Abstract
The periodogram is a popular tool that tests whether a signal
consists only of noise or if it also includes other components. The
main issue of this method is to define a critical detection threshold
that allows identification of a component other than noise, when a peak
in the periodogram exceeds it. In the case of signals sampled
on a regular time grid, determination of such a threshold is relatively
simple. When the sampling is uneven, however, things are more
complicated. The most popular solution in this case is to use the LombScargle
periodogram, but this method can be used only when the noise is the
realization of a zeromean, white (i.e. flatspectrum) random
process. In this paper, we present a general formalism based on
matrix algebra, which permits analysis of the statistical properties of
a periodogram independently of the characteristics of noise
(e.g. colored and/or nonstationary), as well as the
characteristics of sampling.
Key words: methods: data analysis  methods: statistical
1 Introduction
Spectral analysis is a popular tool for testing whether a given experimental time series
contains only noise, i.e.
x(t_{j}) = n(t_{j}), or whether some other
component s(t) is present, i.e.
x(t) = s(t) + n(t). The classic approach is to fit the time series with the model function
. If, for example, a periodic component s(t) is present with a frequency f_{l} close to one in the set , then the periodogram ,
will show a prominent peak close to k=l. If s(t) is semiperiodic or even nonperiodic, the situation is more complicated since more peaks are expected. The main problem with the use of this technique is the definition of a detection threshold that fixes the contribution of noise in such a way that, when a peak exceeds it, the presence of a component s(t) can be claimed. In the case of signals sampled on a regular time grid, the determination of such a threshold is a relatively simple procedure, but this is not the case when the condition of regularity does not apply. In this respect, several solutions have been proposed (see Zechmeister & Kürster 2009; Reegen 2007; FerrazMello 1981; Lomb 1976; Scargle 1982; Gilliland & Baliunas 1987, and references therein) that, however, work only under rather restrictive conditions (e.g. white and/or stationary noise) and are difficult to extend to more general situations.
In this paper, a general formalism is presented that allows analysis of the statistical properties of periodograms independently of the specific characteristics of the noise and the sampling of the signal. In Sect. 2 the formalism is presented for the case of even sampling and its extension to arbitrary sampling in Sect. 3. The usefulness of the proposed formalism is illustrated in Sect. 4, where the case of white noise with a mean different from zero and that of colored noise is calculated. In Sect. 5 the relationship between the periodogram and the leastsquares method is considered. Finally, on the basis of simulated signals and a real time series, we discuss in Sect. 6 whether the use of algorithms specifically developed for computing the periodogram of unevenlysampled time series is really advantageous.
2 Periodogram analysis in the case of even sampling
If a continuous signal x(t) is sampled on a set of N equispaced time instants
,
a time series x_{j},
is obtained. Its discrete Fourier transform (DFT) is given by
with . The sequence can be recovered from by means of
The set provides the socalled Fourier frequencies. Implicit in the use of DFT is the assumption that is a periodic sequence with period where .
In matrix notation, Eqs. (3) and (4) can be written in the form
and
Here, and are column arrays that contain, respectively, the sequences and , and is the socalled ``Fourier matrix'', which is an N N square, symmetric matrix whose (k,j)entry^{} is given by
The superscript ``^{*}'' denotes the complex conjugate transpose. Matrix is unitary, i.e.,
with the identity matrix. Another useful property is that
with
By means of Eq. (9) it can be shown that
where and and where and are the real and the imaginary parts of a complex quantity. Indeed,
and is a real matrix, hence Eq. (11) has to hold. From Eq. (3) it is also possible to see that is a real quantity. In the case that N is an even number, the same holds for where and represents the smallest integer greater than z. Finally, dealing with , only half of this array can be considered, since , , where if i=j and zero otherwise. With this notation, the periodogram of is defined as
where is the kth entry of the array ^{} and  .  denotes the Euclidean norm. By means of
(14) 
a column array obtained by the column concatenation of and , Eq. (13) can be rewritten in the form
In Sect. 5 it is shown that this periodogram is equivalent to that obtainable by means the leastsquares fit of model (1) with M=N, t_{j} = j and f_{k} = k/N.
An important point to stress is that, if
is the realization of a (not necessarily Gaussian) random process, then each
is given by the sum of N random variables. This is because of the linearity of the Fourier operator .
Thanks to the central limit theorem, therefore, the entries of
can be expected to be Gaussian random quantities. As a consequence, the entries of
can also be expected to be Gaussian random quantities with covariance matrix
given by
Here, denotes the expectation operator, is the covariance matrix of , and the matrix obtained by the first rows of the Fourier matrix . From Eqs. (16) and (11), it is easy to deduce that, if is the realization of a standard whitenoise process, i.e., , then is a diagonal matrix with , and . In other words, the entries of are mutually uncorrelated. In turn, this means that is uncorrelated with . If is a colored (not necessarily stationary) noise process, i.e. is not a diagonal matrix, then this holds also for . However, from it is possible to obtain an array containing mutually uncorrelated entries by means of the transformation
The matrix can be computed via
with the orthogonal matrix whose columns contain the eigenvectors of and a diagonal matrix containing the corresponding eigenvalues . This decomposition is particularly simple and computationally efficient if is a circulant matrix since it can be diagonalized according to
where `` '' denotes a diagonal matrix whose diagonal entries are given by the array and is the first column of . Because of this, the decomposition (18) can be directly computed with and ; hence, . This means that the whitening operation can be performed in the harmonic domain through the following procedure: a) computation of , i.e. the DFT of ; b) computation of ; and c) the inverse DFT of . A potential difficulty in using transformation (17) is that is required to be of full rank. Such a condition can be expected to be satisfied in most practical applications. Some problems can arise if the time step of the sampling is much shorter than the decorrelation time of ^{}. Indeed, some columns (and therefore some rows) of could almost be identical i.e., this matrix could become numerically illconditioned. In this case, the most natural solution consists of averaging the data that are close in time.
When the periodogram is used to test whether
vs.
,
a threshold
has to be defined such that, with a prefixed probability, a peak in
that exceeds
can be expected to not arise because of the noise. This requires knowledge of the statistical properties of
under the hypothesis that
.
The simplest situation is when
is the realization of a standard whitenoise process. In fact, since the entries of
are uncorrelated random Gaussian quantities, from Eq. (15) it can be derived that the entries of
are (asymptotically) independent quantities distributed according to a distribution^{}. As a consequence, independent of the frequency k, a threshold
can be determined that corresponds to the level that a peak due to the noise would exceed with a prefixed probability
when a number N_{f} of (statistically independent) frequencies are inspected. More specifically,
is the highest value for which
(Scargle 1982), in formula
If the entire periodogram is inspected, then . Commonly, is called the level of false alarm.
Threshold (20) is not
applicable when the noise is colored. However, the requirement to fix a
different level for each frequency can be avoided if the original
signal
is transformed into
Indeed, under the hypothesis that , the entries of are uncorrelated and unitvariance random quantities. As seen above, this operation should not be difficult. Some problems emerge if the decomposition (18) is carried out by means of the efficient DFT approach described above (a necessary approach in the case of very long sequences of data). This is because, in forming , it is necessary to take the periodicity of the sequence forced by the DFT into account. In other words, it is necessary to impose a spurious correlation among the first and the last entries in . For instance, for a stationary noise process, is a Toeplitz matrix, but it has to be approximated with a circulant one. When dealing with sampled signals, this is an unavoidable problem that no technique can completely solve. A classical solution to relieving this situation is the windowing method, i.e., the substitution of x_{j} in Eq. (3) with , where is some prefixed discrete function (window) that makes the signal gently reduce to zero at the extreme of the sampling interval (e.g. see Oppenheim & Shafer 1989). This method can be expected to work satisfactorily only when the decorrelation time of noise is shorter than the length of the signal.
3 Periodogram analysis in the case of uneven sampling
If a signal x(t) is sampled on an uneven set of time instants, some problems emerge: it is no longer possible to define a set of ``natural'' frequencies such as those obtained by the Fourier
transform. In turn, this implies some ambiguities in the definition of
the Nyquist frequency that, loosely speaking, corresponds to the
highest frequencies that contain information on the signal of interest
(e.g. see Vio et al. 2000; Koen 2006). As a consequence, Eq. (3) has to be modified. In the following, with no loss of generality, it is assumed that
is sampled at M arbitrary time instants
with t_{0} = 0,
t_{M1} = M1 and the remaining t_{j} arbitrarily distributed within this interval. Moreover, a set of N frequencies
is considered with
.
Such a set corresponds to the frequencies that are typically inspected
when looking for a periodicity. However, others can be chosen. With
these conditions, a transformation corresponding to the one given
by Eq. (5) is
where
, and is a real positive number. Because of the normalization by , time is expressed in units of the shortest sampling time interval. Apart from the substitution of the Fourier matrix with , the uneven sampling of signals does not modify the formalism introduced in the previous section. In particular, the covariance matrix defined in Eq. (16) becomes
The Fourier matrix does not have the properties (8)(12). As consequence, and also in the case that is the realization of a standard whitenoise process i.e. , matrix ,
is not diagonal. In general, is not even diagonalizable. For example, this happens when the periodogram of a time series containing M data is computed on N frequencies with N > M (a typical situation in practical applications). This implies that the entries of cannot be made mutually uncorrelated. Obviously, the same holds for the entries of as given by Eq. (15). As a consequence, although a number N of frequencies are considered in , at most only M/2 of them are statistically independent^{}. Particularly troublesome is that, for a given frequency k, , and (i.e. and ) are also correlated. This makes it difficult to fix the statistical characteristics of . In this respect, two choices are possible. The first consists in the determination, for each frequency, of the PDF of . Actually, this is a rather involved approach, because and have variance and , respectively, and covariance . Therefore, once and are normalized to unit variance, each is given by the sum of two correlated random quantities. Although available in analytical form (Simon 2006), the resulting PDF is rather complex and hence difficult to handle (for an alternative approach, see Reegen 2007). Moreover, there is the additional problem that changes with k. A simpler alternative is the use of two uncorrelated and unitvariance random quantities, and , obtained through the transformation
Here, , , is a diagonal matrix whose entries are given by the reciprocal of the square root of the nonzero eigenvalues of the covariance matrix
zero otherwise, and is an orthogonal matrix that contains the corresponding eigenvectors^{}. Indeed, if the periodogram is defined as
then each is given by the sum of two independent, unitvariance, Gaussian random quantities. As a consequence, the corresponding PDF is, independently of k, a whose cumulative distribution function (CDF) is the exponential function. This permits determining the statistical significance of for a specified frequency k. Things become more complex if N_{f} frequencies are inspected when looking for a peak. Indeed, also after the operation (26), it happens that for , i.e., the frequencies of the periodogram remain mutually correlated. This is an unavoidable problem. Because of it, N_{f} does not correspond to the number of independent frequencies, so the level of false alarm (20) cannot be computed. However, since is the same for all the frequencies, an upper limit can be fixed for by setting . The periodogram obtained by means of Eq. (28) corresponds to the original LombScargle periodogram.
Since the transformation (17) does not depend on the characteristics of the signal sampling, the strategy of following in the case that is the realization of (not necessarily stationary) colored noise is simply the one in Sect. 2 i.e. transformation of to an array with uncorrelated entries. After that, the LombScargle periodogram can be computed. It is worth noticing that this simple result has been possible thanks to a formulation of the problem in the time domain and the use of the matrix notation. The same results could have been obtained by following the popular approach of working in the harmonic domain but at the price of a much more difficult derivation.
4 Two examples
To illustrate the usefulness and the simplicity of the proposed formalism in handling different situations from the classical ones, we show two examples in this section.
The first consists of a periodogram of a meansubtracted time
series. The evaluation of the reliability of a peak in the periodogram
of a signal
requires that (under the null hypothesis
)
be the realization of a zeromean noise process. In most experimental
situations, this condition is not fulfilled and one works with a
centered (i.e. meansubtracted) version
of .
This, however, introduces some (often neglected) problems. The case where
is the realization of a discrete white noise process with variance
has been considered several times in the literature. An example is the paper by Zechmeister & Kürster (2009)
where a rather elaborate solution is presented. With the approach
proposed here, a simpler solution can be obtained if one takes
into consideration that the subtraction of the mean from
forces a spurious correlation among the entries of
in such a way that the covariance matrix
is given by
where M is number of entries of and an M M matrix with every entry equal to unity. Since this matrix is singular, it cannot be diagonalized and therefore cannot be whitened. In any case, if in Eq. (24) matrix is substituted for and one sets
(30) 
then it is a trivial matter to decorrelate and by means of Eqs. (26), (27) and to compute the periodogram through Eq. (28). This result can be easily extend to the case where, because of measurement errors, each entry of has its own variance and a weighted mean is subtracted from the data sequence i.e. , with . Indeed, it is sufficient to substitute as given by Eq. (29) with
(31) 
where . The rest of the procedure remains unmodified.
The second example consists of zeromean colored noise. The improvement in the quality of the results obtainable with the approach presented in the previous section is visible in Fig. 1. The top left panel shows a discrete signal , f=0.127, simulated on a regular grid of 120 time instants but with missing data in the ranges [31 70] and [76 115]. Here, is the realization of a discrete, zeromean, colored noise process whose autocovariance function is given in the top right panel. From the bottom left panel, it is evident that LombScargle periodogram of the original sequence provides rather ambiguous results concerning the presence of a sinusoidal component. On the other hand, such component is well visible in the bottom right panel that shows the LombScargle periodogram of the sequence .
Figure 1: Results concerning the numerical experiment, presented in Sect. 4, on the detection of a sinusoidal component in colored noise. The top left panel shows an observed time series (blue crosses) obtained through the simulation of signal with , f=0.127, on a regular grid of 120 time instants but with the data in the ranges [31 70] and [76 115]. Here, (red line) is the realization of a discrete, zeromean, colored noise process whose autocovariance function is given in the top right panel. For comparison, the sinusoidal component is also plotted (green line). The bottom left panel shows the LombScargle periodogram of the original sequence computed on 120 frequencies , whereas the bottom right panel shows the LombScargle periodogram corresponding to its whitened version. In both cases, only the first 60 frequencies are shown and the vertical red line corresponds to the frequency of the sinusoidal component. The horizontal green line in the bottom panels provides the threshold corresponding to a 0.01 level of false alarm (number of independent frequencies N_{f}=60), i.e. the probability that the periodogram of a pure noise signal exceeds such a threshold by chance is . 

Open with DEXTER 
Figure 2: Covariance matrix of the real and the imaginary parts, say and of the Fourier transform of an unevenly sampled whitenoise signal (i.e. matrix as given by Eq. (25)). In particular, is assumed to be the realization of a zeromean, unitvariance, whitenoise process sampled at 120 time instants regularly spaced but with 80 missing data. Three different cases have been considered with missing data in the following ranges: a) [31 110] ( top left panel), b) [31 70], and [76 115] ( top right panel), c) [6 25], [36 75] and [96 115] ( bottom left panel). In a fourth case, the missing data are randomly distributed ( bottom right panel). Because of the small size of the figure, it is necessary to stress that for each panel, none of the prominent diagonal structures visible in the top right, as well in the bottom left quadrant, correspond to the main diagonal of the quadrant itself (i.e. none of them provide the covariance between and that is seen in Fig. 3). 

Open with DEXTER 
Figure 3: Variance of and covariance of with as a function of the frequency k for the experiments in Fig. 2. The blue line corresponds to the main diagonal of the top left quadrant in each panel of Fig. 2, whereas the red one corresponds to the main diagonal in the top right, as well in the bottom left quadrant. 

Open with DEXTER 
Figure 4: Real (blue line) and imaginary (red line) parts of the spectral windows that have been used to simulate the irregular sampling of the signal in the experiments corresponding to Fig. 2 (i.e., is the Fourier transform of the sampling pattern of , see Eqs. (37), (38)). 

Open with DEXTER 
5 Periodogram and leastsquares fit of sinusoids
The formalism proposed here is also useful in the context of more
theoretical questions (but with important practical implications).
For example, a point often overlooked in the astronomical
literature is the relationship between the periodogram and the
leastsquares fit of sine functions. Often these two methods are
believed to be equivalent. Actually, this is true only when the
sampling is regular and the frequencies of the sinusoids are given by
the Fourier ones. Indeed, if
and
f_{k} = k / N,
,
then Eq. (1) can be written in the form
with
(33) 
and . The leastsquares solution of system (32) is given by
where ``^{+}'' denotes MoorePenrose pseudoinverse (Björck 1996). In the case of even sampling, i.e. when and , it happens that
(35) 
In other words, coefficients and , as given by the leastsquares approach, can be obtained through the DFT of , because, as shown by means of Eqs. (11), . In the case of uneven sampling, this identity is not fulfilled. Any kind of periodogram computed through Eq. (22) and the leastsquares fit of sine functions has to be expected to give different results. Moreover, as only under the two abovementioned conditions do the sine functions constitute an orthonormal basis, the leastsquares fit of a single sine function per time does not in general provide the same result as the simultaneous fit of all the sinusoids as in Eq. (34) (e.g. see Hamming 1973, p. 450). In particular, if an unevenlysampled signal is given by the contribution of two or more sinusoids, the oneatatime fit of a single sine function provides biased results. This also holds for the LombScargle periodogram, which is equivalent to the leastsquares fit of a single sinusoid with a specified frequency, with the constraint that the corresponding coefficients ``a'' and ``b'' are uncorrelated (Zechmeister & Kürster 2009; Scargle 1982).
6 Discussion
As demonstrated in Sect. 3, when noise has arbitrary statistical characteristics, the computation of the periodogram of an unevenlysampled signal requires two steps:
 whitening and standardization of the noise component (in this way a signal is obtained);
 computation of the LombScargle periodogram of .
Figure 5: Crosscorrelation between the real and the imaginary parts of the spectral windows that are shown in Fig. 4. The red cross in each panel corresponds to the point (0,0). 

Open with DEXTER 
Figure 6: Covariance matrix of the real and the imaginary parts, say and of the Fourier transform of an unevenly sampled whitenoise signal with periodic gaps (i.e. matrix as given by Eq. (25)). In particular, is the realization of a discrete zeromean, unitvariance, whitenoise process sampled on a grid of 600 time instants randomly distributed (i.e. not rebinned) according to the cyclic sampling pattern shown in the bottom panel of Fig. 7. In total, 1200 frequencies have been considered. 

Open with DEXTER 
Figure 7: Top panel: variance of and covariance of with (only the first 100 frequencies are shown) for the experiment concerning the realization of a discrete zeromean, unitvariance, whitenoise process on 600 time instants randomly distributed (i.e. not rebinned) according to a cyclic sampling. In total, 1200 frequencies have been considered (see text in Sect. 6). Bottom panel: cyclic sampling used in the experiment. 

Open with DEXTER 
The last issue that has to be considered is to which extent the use of the LombScargle
periodogram is really advantageous. Indeed, the action of the
algorithms dealing with uneven sampling is essentially directed, for
each frequency k, to force
to be the sum of two independent Gaussian random quantities. However,
although not clearly emphasized, it has already been pointed out
that this operation is not critical (e.g. seeScargle 1982). It can be expected that a periodogram computed simply through
with , is often very close to the one given by Eq. (28). A rigorous demonstration of this fact is difficult because of its strict dependence on the specific sampling. However, with the help of some numerical experiments and of the formalism introduced here, some insights are possible. In particular, we consider the covariance matrix of a white noise signal when sampled on different uneven time grids.
6.1 Numerical simulations
In our simulations, we take the realization of a zeromean, unitvariance, Gaussian whitenoise process sampled on time regularlyspaced instants, but with 80 missing data (i.e. M=40). The available signal can be written in the form
where and is an array whose entries are equal to one in correspondence to a value of that is available and zero otherwise. In this case, Eq. (3) can be written as
with . Three different cases have been considered where the missing data have time indices in the ranges a) [31 110], b) [31 70] and [76 115], c) [6 25], [36 75] and [96 115], whereas they are randomly distributed in a fourth case. The related covariance matrices , computed through Eq. (25) with , are shown in Fig. 2, whereas Fig. 3 displays the corresponding main diagonal of the blocks and . Especially from Fig. 3 it is evident that, for an arbitrary frequency k, the covariance between and is quite close to zero. This means that each entry of , as given by Eq. (36), can be assumed to be distributed according to a . From Fig. 2 it is also evident that, in the case of gaps present in the sampling pattern, there are nearby frequencies k and l for which not only and are mutually correlated, but also with and with . These correlations, especially those between the real and the imaginary components, disappear in the case of random sampling.
6.2 Interpretation of the results of the simulations
To understand these results, it is necessary to take into account that Eq. (39) can be rewritten in the form
=  (41)  
=  (42) 
As is a (singular) diagonal matrix, is a (singular) circulant matrix. This implies that is given by the circular convolution of (the DFT of the original signal, inclusive of the missing data) with the spectral window (the DFT of the sampling pattern). Since for two arbitrary frequencies, say k and l, , , , and are mutually independent, any correlation existing between , , , and is induced by the correlation between the entries of and . Now, as shown in Fig. 4, in the presence of gaps the real and the imaginary parts of are both only significantly different from zero in a narrow interval of frequencies surrounding the origin, i.e., corresponding to the lowest frequencies ( can be interpreted as lowpass filter). This produces the correlations observed among , , and for nearby frequencies k and l. On the other hand, in the case of random sampling mimics the behaviour of white noise, making the correlations less important. Moreover, Fig. 5 shows that, in the case of sampling with gaps, the crosscorrelation between with is significant but not centered at zero lag. This explains why the quantities and , can be strongly correlated in spite of the small correlation between and . Similar arguments also hold for the case of sampling with periodic gaps (rather common in astronomical experiments). Indeed, in practical applications the period of these gaps is somewhat smaller than the mean sampling time step of an uninterrupted sequence of data. As a consequence, both and present a sharp and narrow peak in correspondence to a frequency close to the origin. Actually, the spectral window of a sampling with periodic gaps is also characterized by the presence of aliases. However, these aliases too are sharp and narrow, and their importance decreases for increasing frequencies. The combination of these facts leads again to the quantities , , , and almost being uncorrelated if the frequencies k and l are not sufficiently close enough. Moreover, since with periodic gaps the crosscorrelation between with can also be significant, but not centered on zero lag, then for each frequency k the correlation between and is negligible. These considerations are confirmed by Figs. 6 and 7. Figure 6 displays the covariance matrix , computed through Eq. (25), of a discrete zeromean, unitvariance, and whitenoise process sampled on 600 time instants randomly distributed (i.e. not rebinned) along six cycles (i.e. 100 points per cycle) of the sampling pattern shown in the bottom panel of Fig. 7; 1200 frequencies have been considered. The top panel of Fig. 7 displays the main diagonal of the blocks and .
The numerical simulations indicate that the consequences of unevenlysampled data seem to concern the number of independent frequencies in rather than the correlation between and its imaginary counterpart . At present, no general method has been developed to deal with this problem. However, it has been pointed out in the literature that the number of independent frequencies is not a critical parameter to test the significance level of a peak in . In particular, empirical arguments indicate that this number can be safely set to M/2 (e.g. see Press et al. 1992). The conclusion is that forcing each entry of to be the sum of two independent Gaussian quantities only has minor effects. This is shown by Fig. 8 where the top panel shows that the LombScargle periodogram of the time series with periodic gaps considered above is quite similar to that provided simply by with . This is evident in the bottom panel of the same figure where the similarity of the two periodiograms is demonstrated by their absolute difference.
A final point to underline, which has important practical implications, is that for long time series the small differences visible in Fig. 8 should decrease. Indeed, as seen above, will be significantly correlated with only if the two frequencies k and l are close enough. The only exception is represented by the frequencies at the extremes of the frequency domain where the assumption of periodic signal intrinsic to DFT forces a spurious correlation. For longer and longer time series, this spurious correlation will affect a smaller and smaller fraction of frequencies and, as consequence, a larger and larger fraction of them will be mutually independent. This is shown in Fig. 9 where, in the context of the previous experiment, the mean absolute difference between the two periodograms is plotted as a function of , the number of cyclic sampling patterns ( in Fig. 8). This argument also explains why in many practical situations the number of independent frequencies can be safely fixed to M/2. In conclusion, only in the case of signals that contain a small number of data, the LombScargle periodogram can be expected to exhibit noticeable differences from the periodogram given by Eq. (36). Often, using it does not change anything. Comparable results can be expected with less sophisticated approaches.
Figure 8: The top panel shows the LombScargle periodogram and the classic periodogram given by for the experiment in Fig. 7. The bottom panel shows the smallness of their absolute difference. 

Open with DEXTER 
6.3 Application to an astronomical time series
As an example of an unsophisticated method able to produce results similar to those obtainable with the LombScargle periodogram, we consider the rebinning of the original time series on an arbitrarily dense regular time grid (in this way a signal with a regular sampling is obtained but some data are missing) followed by applying any of the fast Fourier transform (FFT) algorithms available nowadays. Figure 10 shows an experimental (meansubtracted) time series versus its rebinned version. This time series, which is characterized by rather irregular sampling, was obtained with the VLA array (Biggs et al. 1999) and consists of polarisation position angle measurements at an observing frequency of 15 GHz for one of the images of the double gravitational lens system B0218+357. The original sequence contains only M=45 data and it is rebinned on a regular grid of M_{r} = 92 time instants. In spite of this, as visible in Fig. 11, the corresponding periodograms, computed on N=M_{r} equispaced frequencies by means of the LombScargle and the FFT approach, are remarkably similar. Here, the highest frequency approximately corresponds to the Nyquist frequency that is related to the shortest sampling time step. The main conclusion of this example is to point out that, although in the previous section we stated that use of the LombScargle periodogram can be expected to be effective only for time series that contain a small number of data, this is not a sufficient condition to guarantee that the method is truly useful.
Figure 9: Mean absolute difference (N_{k} = number of frequencies) between the LombScargle periodiogram and the classic periodogram given by as a function of the number of cyclic sampling patterns (Fig. 8 shows the case with ). 

Open with DEXTER 
Figure 10: Experimental (meansubtracted) time series containing 45 unevenlyspaced data versus a rebinned version computed on a regular grid of 92 time instants (data taken from Biggs et al. 1999). 

Open with DEXTER 
Figure 11: Periodograms of the time series shown in Fig. 10. Frequency is in given in units of the Nyquist frequency corresponding to the shortest sampling time step. The periodogram of the original time series has been obtained by means of the LombScargle method and that of the rebinned version by means of a classic FFT algorithm. 

Open with DEXTER 
7 Conclusions
In this paper we worked out a general formalism, based on the matrix algebra, that is tailored to analysis of the statistical properties of the LombScargle periodogram independently ofthe characteristics of the noise and the sampling. With this formalism it has become possible to develop a test for the presence of components of interest in a signal in more general situations than those considered in the current literature (e.g. when noise is colored and/or nonstationary). Moreover, we were able to clarify the relationship between the LombScargle periodogram and other techniques (e.g. the leastsquares fit of sinusoids) and to fix the conditions under which the use of such method can be expected to be effective.
References
 Biggs, A. D., Browne, I. W. A., Helbig, P., et al. 1999, MNRAS, 304, 349 [NASA ADS] [CrossRef] [Google Scholar]
 Björck, A. 1996, Numerical Methods for Least Squares Problems (Philadelphia: SIAM) [Google Scholar]
 FerrazMello, S. 1981, AJ, 86, 619 [NASA ADS] [CrossRef] [Google Scholar]
 Gilliland, R. L., & Baliunas, S. L. 1987, ApJ, 314, 766 [NASA ADS] [CrossRef] [Google Scholar]
 Koen, C. 2006, MNRAS, 37, 1390 [NASA ADS] [CrossRef] [Google Scholar]
 Hamming, R. W. 1973, Numerical Methods For Scientists and Engineering (New York: Dover) [Google Scholar]
 Lomb, N. R. 1976, Ap&SS, 39, 447 [NASA ADS] [CrossRef] [Google Scholar]
 Oppenhaim, A. V., & Shafer, R. W. 1989, Discretetime Signal Processing (London: Prentice Hall) [Google Scholar]
 Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. 1992, Numerical Recipes (Cambridge: Cambridge University Press) [Google Scholar]
 Reegen, P. 2007, A&A, 467, 1353 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Scargle, J. D. 1982, ApJ, 263, 835 [NASA ADS] [CrossRef] [Google Scholar]
 Simon, M. K. 2006, Probability Distributions Involving Gaussian Random Variables (Heidelberg: Springer) [Google Scholar]
 Vio, R., Strohmer, T., & Wamsteker, W. 2000, PASP, 112, 74 [NASA ADS] [CrossRef] [Google Scholar]
 Zechmeister, M., & Kürster, M. 2009, A&A, 496, 577 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Footnotes
 ...)entry^{}
 In the following, the element in the nth row and mth column of an N M matrix will be indicated with A_{mn} or alternatively with , , .
 ...^{}
 From now on, if is an N 1 column array, then is a column array that contains the first entries of , i.e. . Similarly, if is an N M matrix, then is a matrix that contains the first rows of .
 ...^{}
 The decorrelation time of a random signal n(t) is the time interval such that two values n(t_{1}) and n(t_{2}), with , can be considered as uncorrelated.
 ... distribution^{}
 denotes the chisquare distribution with two degrees of freedom.
 ... independent^{}
 This is because, if N > M, the rank of the N N matrix is smaller than or equal to M. This implies that the array has at most M degrees of freedom. Since each entry of is given by the sum of two entries of , then a periodogram has at most M/2 degrees of freedom.
 ... eigenvectors^{}
 The diagonal elements and of can be trivially computed through the solution of the quadratic equation , with and denoting the trace and determinant operators. The arrays and , which constitute the columns of , can be obtained by solving the equations , l=1,2.
All Figures
Figure 1: Results concerning the numerical experiment, presented in Sect. 4, on the detection of a sinusoidal component in colored noise. The top left panel shows an observed time series (blue crosses) obtained through the simulation of signal with , f=0.127, on a regular grid of 120 time instants but with the data in the ranges [31 70] and [76 115]. Here, (red line) is the realization of a discrete, zeromean, colored noise process whose autocovariance function is given in the top right panel. For comparison, the sinusoidal component is also plotted (green line). The bottom left panel shows the LombScargle periodogram of the original sequence computed on 120 frequencies , whereas the bottom right panel shows the LombScargle periodogram corresponding to its whitened version. In both cases, only the first 60 frequencies are shown and the vertical red line corresponds to the frequency of the sinusoidal component. The horizontal green line in the bottom panels provides the threshold corresponding to a 0.01 level of false alarm (number of independent frequencies N_{f}=60), i.e. the probability that the periodogram of a pure noise signal exceeds such a threshold by chance is . 

Open with DEXTER  
In the text 
Figure 2: Covariance matrix of the real and the imaginary parts, say and of the Fourier transform of an unevenly sampled whitenoise signal (i.e. matrix as given by Eq. (25)). In particular, is assumed to be the realization of a zeromean, unitvariance, whitenoise process sampled at 120 time instants regularly spaced but with 80 missing data. Three different cases have been considered with missing data in the following ranges: a) [31 110] ( top left panel), b) [31 70], and [76 115] ( top right panel), c) [6 25], [36 75] and [96 115] ( bottom left panel). In a fourth case, the missing data are randomly distributed ( bottom right panel). Because of the small size of the figure, it is necessary to stress that for each panel, none of the prominent diagonal structures visible in the top right, as well in the bottom left quadrant, correspond to the main diagonal of the quadrant itself (i.e. none of them provide the covariance between and that is seen in Fig. 3). 

Open with DEXTER  
In the text 
Figure 3: Variance of and covariance of with as a function of the frequency k for the experiments in Fig. 2. The blue line corresponds to the main diagonal of the top left quadrant in each panel of Fig. 2, whereas the red one corresponds to the main diagonal in the top right, as well in the bottom left quadrant. 

Open with DEXTER  
In the text 
Figure 4: Real (blue line) and imaginary (red line) parts of the spectral windows that have been used to simulate the irregular sampling of the signal in the experiments corresponding to Fig. 2 (i.e., is the Fourier transform of the sampling pattern of , see Eqs. (37), (38)). 

Open with DEXTER  
In the text 
Figure 5: Crosscorrelation between the real and the imaginary parts of the spectral windows that are shown in Fig. 4. The red cross in each panel corresponds to the point (0,0). 

Open with DEXTER  
In the text 
Figure 6: Covariance matrix of the real and the imaginary parts, say and of the Fourier transform of an unevenly sampled whitenoise signal with periodic gaps (i.e. matrix as given by Eq. (25)). In particular, is the realization of a discrete zeromean, unitvariance, whitenoise process sampled on a grid of 600 time instants randomly distributed (i.e. not rebinned) according to the cyclic sampling pattern shown in the bottom panel of Fig. 7. In total, 1200 frequencies have been considered. 

Open with DEXTER  
In the text 
Figure 7: Top panel: variance of and covariance of with (only the first 100 frequencies are shown) for the experiment concerning the realization of a discrete zeromean, unitvariance, whitenoise process on 600 time instants randomly distributed (i.e. not rebinned) according to a cyclic sampling. In total, 1200 frequencies have been considered (see text in Sect. 6). Bottom panel: cyclic sampling used in the experiment. 

Open with DEXTER  
In the text 
Figure 8: The top panel shows the LombScargle periodogram and the classic periodogram given by for the experiment in Fig. 7. The bottom panel shows the smallness of their absolute difference. 

Open with DEXTER  
In the text 
Figure 9: Mean absolute difference (N_{k} = number of frequencies) between the LombScargle periodiogram and the classic periodogram given by as a function of the number of cyclic sampling patterns (Fig. 8 shows the case with ). 

Open with DEXTER  
In the text 
Figure 10: Experimental (meansubtracted) time series containing 45 unevenlyspaced data versus a rebinned version computed on a regular grid of 92 time instants (data taken from Biggs et al. 1999). 

Open with DEXTER  
In the text 
Figure 11: Periodograms of the time series shown in Fig. 10. Frequency is in given in units of the Nyquist frequency corresponding to the shortest sampling time step. The periodogram of the original time series has been obtained by means of the LombScargle method and that of the rebinned version by means of a classic FFT algorithm. 

Open with DEXTER  
In the text 
Copyright ESO 2010
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.