Issue |
A&A
Volume 635, March 2020
|
|
---|---|---|
Article Number | A83 | |
Number of page(s) | 8 | |
Section | Numerical methods and codes | |
DOI | https://doi.org/10.1051/0004-6361/201936905 | |
Published online | 11 March 2020 |
Efficient modeling of correlated noise
I. Statistical significance of periodogram peaks
Département d’Astronomie, Université de Genève, 51 Chemin des Maillettes, 1290 Versoix, Switzerland
e-mail: jean-baptiste.delisle@unige.ch
Received:
11
October
2019
Accepted:
18
January
2020
Periodograms are common tools used to search for periodic signals in unevenly spaced time series. The significance of periodogram peaks is often assessed using false alarm probability (FAP), which in most studies assumes uncorrelated noise and is computed using numerical methods such as bootstrapping or Monte Carlo. These methods have a high computational cost, especially for low FAP levels, which are of most interest. We present an analytical estimate of the FAP of the periodogram in the presence of correlated noise, which is fundamental to analyze astronomical time series correctly. The analytical estimate that we derive provides a very good approximation of the FAP at a much lower cost than numerical methods. We validate our analytical approach by comparing it with Monte Carlo simulations. Finally, we discuss the sensitivity of the method to different assumptions in the modeling of the noise.
Key words: methods: data analysis / methods: statistical / methods: analytical / planets and satellites: general
© ESO 2020
1. Introduction
Detecting periodic signals in unevenly spaced time series is a common problem in astronomy, which is encountered, for instance, when searching for binaries or exoplanet companions in radial velocity, astrometric, or photometric time series. The Lomb-Scargle (LS) periodogram (Lomb 1976; Scargle 1982) is a classical and efficient tool to search for sinusoidal signals. The principle of the LS periodogram is to scan a wide range of frequencies and to compare a linear sinusoidal model at a given frequency with a constant model, called the base model. A widespread variant of the LS periodogram, called the generalized LS periodogram (GLS), was proposed by Zechmeister & Kürster (2009) where the constant is adjusted for at each frequency (see also Ferraz-Mello 1981). Once the periodogram is computed, the false alarm probability (FAP) criterion is often used to determine whether or not it supports the detection of a periodic signal. The FAP is often estimated using numerical methods such as bootstrapping or Monte Carlo. These methods can be computationally intensive, especially at low FAP levels. Indeed, estimating a FAP level of P numerically requires at least the computation of 10/P periodograms.
Several analytical formula have been proposed for the FAP (Scargle 1982; Horne & Baliunas 1986), but have been subsequently contested (Koen 1990). The periodogram framework was generalized in a series of works to handle more complex models (Baluev 2008, 2009, 2013a,b, 2015), where rigorous and sharp analytical approximations of the FAP were provided based on the so-called Rice formula and previous works by Davies (1977, 1987, 2002). These works allow a fast and rigorous estimation of the periodogram FAP in the presence of white noise. However, the white noise assumption is often incorrect for astronomical time series. Indeed, several sources of correlated noise, such as the astronomical source itself, Earth’s atmosphere, or instrumental systematics, could contaminate the measurements.
For instance, stellar variability has a huge impact on the detection of low-mass exoplanets using high-precision radial velocity time series. Indeed, at a precision better than 1 m s−1, stellar variability affects the measurements on timescales ranging from a few minutes (for stellar oscillation p-modes), to hours and days (for stellar granulation and super-granulation), and even up to the star rotation period (for the effect of spots and plages). In this context, low-mass exoplanet detection becomes a challenge (e.g., Queloz et al. 2001) and proper tools have to be developed to treat correlated noise properly and to compute reliable periodograms and FAPs. Sulis et al. (2016) provide an analytical FAP estimate for periodograms normalized by the power density spectrum of the noise, in the limit of low aliasing and in the case of evenly sampled data. This is, however, not the case of most radial velocity datasets. Baluev (2013c) provides a “suggestive generalization” to the correlated noise case of the FAP formula obtained in Baluev (2008), but advocates against the use of this formula since it has not been proved rigorously.
In this article, we extend the work of Baluev (2008) to account for correlated noise. In Sect. 2 we define a general periodogram for an arbitrary covariance of the noise and provide an analytical approximation of the corresponding FAP, which we validate against Monte Carlo simulations. In Sect. 3 we provide a method to explore the sensitivity of the periodogram to the noise model. We discuss our results in Sect. 4.
2. Significance of periodogram peaks with correlated noise
In this section, we present a method to assess the significance of periodogram peaks (FAP) in the correlated noise case. We first give a general definition of the periodogram in Sect. 2.1. We then provide formulas to compute the corresponding FAP in Sect. 2.2. Finally, we compare this analytical FAP with the results of Monte Carlo simulations in Sect. 2.3.
2.1. General linear periodogram
We extend the general definition of least squares periodograms by Baluev (2008) to the correlated noise case. Following Baluev (2008), we compare the χ2 of the residuals of a linear base model ℋ of p parameters with enlarged linear models 𝒦 of p + d parameters, parameterized by the frequency ν. The base model ℋ is written as
where θℋ is the vector of size p of the model parameters, φℋ is a n × p matrix, and n the number of points in the time series. The columns of φH are thus explanatory time series that are scaled by the linear parameters θℋ. For instance, in the case of a radial velocity time series with two different instruments and a linear drift, the linear base model could be chosen as
where γ1 and γ2 are the velocity offsets of both instruments and α is the slope of the linear drift. The function δ1(t) (respectively, δ2(t)) is equal to one for the points taken by instrument 1 (respectively, instrument 2) and zero otherwise. The matrix φℋ would thus be a n × 3 matrix whose columns would be the three explanatory time series φℋ = (δ1(t),δ2(t),(t − epoch)) and the vector of parameters would be θℋ = (γ1, γ2, α).
The enlarged model 𝒦(ν) is written as
where θ𝒦 = (θℋ, θ) is the vector of size p + d of the parameters and φ𝒦(ν) = (φH, φ(ν)) is a n × (p + d) matrix whose p first columns are those of φℋ, and whose d last columns are functions of the frequency ν. Typically, d = 2, and the two additional columns are cos(νt) and sin(νt), but the theory developed by Davies (1977, 1987, 2002) and Baluev (2008) is more general.
We denote by and
the χ2 of the residuals after a linear least squares fit with a covariance matrix C of the models ℋ and 𝒦(ν), respectively. Baluev (2008) assumed the noise to be independent (diagonal covariance matrix) and Gaussian and that the uncertainties of the measurements are known precisely (at least within a common factor). In this generalization, we do not assume the noise to be independent anymore, but we still assume the noise to be Gaussian with a known covariance matrix C (at least within a common factor). The covariance matrix C accounts for all sources of correlated and uncorrelated noise, such as intrinsic noise from the source or subsequent contamination by the Earth’s atmosphere or by the instrument.
In the general case, the periodogram is a function z(ν) = f(,
). A general linear periodogram z is thus defined by the models ℋ and 𝒦(ν) and the function f. In the following, we consider the four definitions of the periodogram proposed by Baluev (2008):
where nℋ = n − p and n𝒦 = n − (p + d).
The widespread GLS periodogram (see Ferraz-Mello 1981; Zechmeister & Kürster 2009) is very close to the definition z1 of the periodogram. Indeed, we have
and all the results obtained for z1 are also valid for the GLS.
Once the periodogram is computed, it is useful to compute the p-value of the highest peak, or FAP, defined as Pr{maxνz(ν) ≥ Z | ℋ}, where Z is the value of the maximum peak of the periodogram computed on the data.
2.2. False alarm probability for periodograms with correlated noise
In this section, we provide analytical approximations of the FAP for the definitions of the periodogram of Eq. (4). Their precise derivation is provided in Appendix A.
The model ℋ is defined as in Eq. (1), where the n × p matrix φℋ is user defined; for instance, it might include offsets and drifts. The model 𝒦 (Eq. (3)) is the horizontal concatenation of φℋ and the two column vectors cos νt and sinνt (φ𝒦(ν) = (φℋ, cos νt, sinνt)).
The periodogram is computed in the range of frequencies ]0, νmax]. The FAP is approximated by (see Baluev 2008)
where analytical expressions for FAPsingle and τ(Z, νmax) are given in Table 1. These expressions depend on the rescaled frequency bandwidth W defined as
where Teff is the effective time series length, which we approximate by (see Appendix A.2)
The n × n matrices Σ, Δ, and Π are defined as
and for two n × n matrices X and Y, X*Y is the Hadamard (or element-wise) product
and ⟨X⟩ is defined as
False alarm probability for different definitions (see Eq. (4)) of the periodogram power by Baluev (2008) in the case d = 2.
The expression of the effective time series length found by Baluev (2008) in the white noise case can be derived from Eq. (8). Indeed in this case (diagonal covariance matrix C), Eq. (8) is simplified as
where , and
are weighted means with weights
.
In Appendix A.2, we additionally provide approximations of Teff in the low and high frequency limit, that is, νmaxΔ ≪ 1 (Eq. (A.24)) and νmaxΔ ≫ 1 (Eq. (A.25)).
2.3. Comparison of analytical FAP with Monte Carlo simulations
We validated our analytical estimate of the FAP (Eqs. (6)–(8), Table 1) by comparing it with Monte Carlo simulations. The Monte Carlo simulations are performed by generating a large set of random time series following the same distribution (same covariance matrix). We used the times of observation and error bars of the HARPS radial velocities of HD 136352 (Udry et al. 2019) to obtain a realistic temporal sampling and realistic covariance matrices. The HARPS radial velocities of HD 136352 consist of 648 points taken over almost 11 years (2004–2015) and spread over 238 distinct nights (about 2.7 points per night).
Our method is general and does not require a particular shape for the covariance matrix. However, for illustration purposes, we assume the covariance matrix to follow
and vary the values of the parameters (σjit., σexp., τexp.) to define four different noise models:
-
obs. (white noise): a diagonal matrix with observational error bars;
-
jit. (white noise): a diagonal matrix with observational error bars plus a jitter of 1 m s−1;
-
daily exp. (correlated noise): observational error bars on the diagonal, plus an exponential decay of 1 m s−1 with a timescale of 1 d;
-
monthly exp. (correlated noise): the same as daily exp. but with a timescale of 30 d.
The values of the parameters (σjit., σexp., τexp.) used for each noise model are summarized in Table 2. In the context of radial velocity time series, jitter terms might model both intrinsic noise from the star and instrumental noise, while exponential decay terms are often used to account for stellar noise (e.g., granulation and oscillation).
For a given covariance matrix C, we generated a synthetic radial velocity time series by randomly sampling from a normal distribution with covariance matrix C. We generate 106 such random time series and compute a periodogram (with the correct covariance matrix) for each time series. The periodograms are computed in the range where Pmin = 0.9 d, and with an instrumental offset γ adjusted for each frequency (p = 1, nℋ = n − 1, n𝒦 = n − 3). Then, the distribution of the maximum of these periodograms allows us to estimate numerically the FAP.
The comparison between the analytical and numerical FAP is shown in Fig. 1. As explained by Baluev (2008), the analytical formula of the FAP is an upper bound that asymptotically (for low FAP levels) converges to the exact FAP. This is indeed what we observe in Fig. 1. For all the covariance matrices, the analytical and numerical estimates agree very well for FAP ≲ 0.1, and the analytical formula overestimates the FAP for FAP ≳ 0.1
![]() |
Fig. 1. Comparison between analytical and numerical estimations of the FAP for four types of covariance matrix (see Sect. 2.3) and using the HARPS time series of HD 136352. The periodogram power is computed following the definition z1 of Eq. (4). The expectation of z1 is one (see Eq. (B.10)). |
Since we used 106 samples for the numerical estimation of the FAP, we could not reliably explore FAP levels below 10−4 owing to small numbers statistics. However, the analytical and numerical estimates agree very well down to 10−4, and the analytical approximation is expected to be even more accurate for lower FAP levels.
To sum up, the analytical estimate provides a very good approximation of the FAP in the range of most interest (FAP ≲ 0.1), and is conservative for higher FAP levels. Moreover, this analytical estimate is much faster to compute than Monte Carlo simulations (or other numerical methods), especially for low FAP levels. These properties make it very convenient to use in practical applications.
3. Sensitivity of periodogram to noise model
The FAP formula obtained in Sect. 2.2 provides an efficient and robust way to assess the significance of periodogram peaks when the covariance matrix is known. However, this is not the case in general, and we can often only make educated guesses about the shape of the covariance matrix. It is therefore necessary to explore the sensitivity of the periodogram to the noise model. To do so, we computed the expectation of the periodogram computed with an incorrect noise model. The analytical derivation of the periodogram expectation is described in Appendix B. In this section, we illustrate its use by exploring the effect of an incorrect noise model on the periodogram and its associated FAP.
We assume that the actual covariance matrix of the noise is C, while the periodogram is computed with an incorrect covariance matrix V. We used the same dataset (HARPS radial velocities of HD 136352) and the same noise models as in Sect. 2.3. We first chose a noise model among the four models of Sect. 2.3 (obs., jit., daily exp., and monthly exp.), which we considered the correct noise model (covariance matrix C). We then chose the incorrect noise model (covariance matrix V) among the three other models. The study is done with the definition z1 of the periodogram (see Eq. (4)). We first computed the periodogram expectation following the analytical expression of Eq. (B.9) and we then estimated the FAP corresponding to the highest peak of this mean periodogram using the analytical formula of Eq. (6) with the incorrect covariance matrix (V).
The results are shown in Fig. 2 for all pairs of noise models. We observe in Fig. 2 that adding a jitter term (jit.) does not affect the results much compared to the model with the observational error bars alone (obs.), and vice versa. This is not surprising since all the data points were taken with the same instruments (HARPS), and thus have very similar error bars. Therefore, adding a common jitter term to all error bars does not much affect the relative weight of each measurement. Moreover, the definition z1 of the periodogram (Eq. (4)) is not sensitive to the multiplication of all error bars by a common factor.
![]() |
Fig. 2. Periodogram expectation for HD 136352 in the case of a wrong noise model (see Eq. (B.9)). Rows correspond to the true noise model, while columns correspond to the assumed wrong model. The definition z1 of the periodogram power is used (see Eq. (4)). If the noise model was correct, the expectation of the periodogram power would be uniformly 1 (see Eq. (B.10)). The red vertical lines highlight 0.5 d, 1 d, and 1 yr. For each periodogram, we provide the analytical estimate of the FAP (using the wrong noise model) corresponding to the highest peak of the periodogram expectation. |
Using a correlated noise model (daily or monthly exp.) while the true noise is uncorrelated (obs. or jit.) remains conservative on the whole range of frequencies (𝔼(z1(ν)) ≲ 1 for all ν). However, the periodogram level is very low at long periods, which means that the detection capability at long periods is strongly reduced. On the other hand, using an uncorrelated noise model (obs. or jit.) while the true noise model is correlated (daily or monthly exp.) can lead to spurious detections with very low FAP levels (down to 3.22 × 10−6, see Fig. 2). Underestimating the correlation timescale (using daily instead of monthly exp.) has a similar (but weaker) effect as using an uncorrelated model instead of a correlated model. Finally overestimating the correlation timescale (using monthly instead of daily exp.) reduces the capability to detect long periods, and might lead to spurious detections of short periods. In the two latter cases (underestimation and overestimation of the correlation timescale), the FAP remains very high (close to 1, i.e., non-significant detection).
Overall these results are not surprising but illustrate the possibility to investigate the sensitivity of the periodogram and its FAP with respect to the noise model using the formula for the periodogram expectation (Eq. (B.9)).
4. Conclusions
We present a generalization of the analytical estimate of Baluev (2008) (which was restricted to the white noise case) to the correlated noise case (see Sect. 2.2). We show that the “suggestive generalization” of Baluev (2013c) is valid in the low frequency limit (see Eq. (A.24)), but we find a more general expression (Eq. (8)) that is valid for all frequencies. We validate our analytical estimate against Monte Carlo simulations (see Sect. 2.3) and show that this analytical criterion is very efficient and accurate, provided that the covariance matrix of the noise is known (at least within a common factor).
In most cases, however, astronomical time series are contaminated by sources of correlated noise that are difficult to characterize fully, which results in an approximate modeling of the covariance matrix. We illustrate the sensitivity of the periodogram to the noise model, by deriving the expectation of a periodogram computed with an incorrect covariance matrix (see Sect. 3). This method allows us to visualize which parts of the periodogram are the most affected by a change in the noise model. For instance, we observe, as expected, that overestimating the correlation timescale of the noise tends to reduce the detection capability at long periods strongly, while underestimating this timescale can lead to spurious detections. Another way to visualize the sensitivity of the periodogram to the noise model would be to compute several periodograms on the same data with various noise models. Both approaches are complementary to better understand the features observed in the periodograms.
Several methods can be used to obtain a more realistic covariance matrix. First, a likelihood maximization can be performed to adjust some noise model parameters. This maximization can be performed once, before computing the least squares periodogram (with a fixed noise model). It can also be performed for each frequency by computing a likelihood periodogram instead of a least squares periodogram. In the white noise case, Baluev (2009) proposed a FAP estimate for a likelihood periodogram with a free error term (jitter) added in quadrature to the nominal error bars and adjusted for at each frequency. A generalization to the correlated noise case could probably also be achieved for this likelihood periodogram, but is beyond the scope of this article. Finally, a Bayesian approach could be used to compare different models (signal + noise), but with a much higher computational cost.
Acknowledgments
We thank the anonymous referee for his/her useful comments. We acknowledge financial support from the Swiss National Science Foundation (SNSF). This work has, in part, been carried out within the framework of the National Centre for Competence in Research PlanetS supported by SNSF.
References
- Baluev, R. V. 2008, MNRAS, 385, 1279 [NASA ADS] [CrossRef] [Google Scholar]
- Baluev, R. V. 2009, MNRAS, 393, 969 [NASA ADS] [CrossRef] [Google Scholar]
- Baluev, R. V. 2013a, MNRAS, 436, 807 [NASA ADS] [CrossRef] [Google Scholar]
- Baluev, R. V. 2013b, MNRAS, 431, 1167 [NASA ADS] [CrossRef] [Google Scholar]
- Baluev, R. V. 2013c, Astron. Comput., 2, 18 [NASA ADS] [CrossRef] [Google Scholar]
- Baluev, R. V. 2015, MNRAS, 446, 1478 [NASA ADS] [CrossRef] [Google Scholar]
- Davies, R. B. 1977, Biometrika, 64, 247 [CrossRef] [MathSciNet] [Google Scholar]
- Davies, R. B. 1987, Biometrika, 74, 33 [Google Scholar]
- Davies, R. B. 2002, Biometrika, 89, 484 [CrossRef] [Google Scholar]
- Ferraz-Mello, S. 1981, AJ, 86, 619 [NASA ADS] [CrossRef] [Google Scholar]
- Horne, J. H., & Baliunas, S. L. 1986, ApJ, 302, 757 [NASA ADS] [CrossRef] [Google Scholar]
- Koen, C. 1990, ApJ, 348, 700 [NASA ADS] [CrossRef] [Google Scholar]
- Lomb, N. R. 1976, Ap&SS, 39, 447 [NASA ADS] [CrossRef] [Google Scholar]
- Queloz, D., Henry, G. W., Sivan, J. P., et al. 2001, A&A, 379, 279 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Scargle, J. D. 1982, ApJ, 263, 835 [NASA ADS] [CrossRef] [Google Scholar]
- Sulis, S., Mary, D., & Bigot, L. 2016, ArXiv e-prints [arXiv:1601.07375] [Google Scholar]
- Udry, S., Dumusque, X., Lovis, C., et al. 2019, A&A, 622, A37 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
- Zechmeister, M., & Kürster, M. 2009, A&A, 496, 577 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Appendix A: Computation of the FAP in the correlated noise case
In this appendix, we extend the method of Baluev (2008) to obtain analytical FAP estimates in the correlated noise case. The main idea allowing the analytical approximation of the FAP with correlated noise is to perform a change of coordinates that yields independent Gaussian random variables. Then, the method described by Baluev (2008) can be applied on these new variables. The change of variables is described in Appendix A.1 and the derivation of the FAP estimate in Appendix A.2.
A.1. Change of random variables
Let us assume that the covariance matrix C of the noise is known (at least within a common factor). Then, under the null hypothesis (i.e., assuming the base model is correct), the time series is written as
where θℋ, 0 is the true value of the parameters and the noise ϵ is Gaussian with zero mean and covariance C.
For a linear model φm and parameters θm (m = ℋ or 𝒦), the χ2 is defined as
The least squares estimate of the parameters is written as
and the minimum χ2 is thus
Let us now perform the following change of coordinates:
where C = LLT is the Cholesky decomposition of the covariance matrix. Since we assumed ϵ to follow a Gaussian law with zero mean and covariance C, η follows a Gaussian law with zero mean and covariance 𝟙. The random variables η are thus independent Gaussian variables. In these new variables, the χ2 is simply rewritten as
the least squares estimate of the parameters is rewritten as
and the minimum χ2 as
which follows a χ2 law with nm degrees of freedom (nℋ = n − p, n𝒦 = n − (p + d)). Therefore, the initial problem of analyzing a time series y with covariance matrix C, base model φℋ, and enlarged models φ𝒦 = (φH, φ(ν)) is equivalent to analyzing the time series z, with covariance matrix 𝟙, base model ψℋ, and enlarged model ψ𝒦 = (ψH, ψ(ν)). However, if φ(ν) was the sine and cosine at frequency ν, this is no longer the case for ψ(ν). Nevertheless, the theory of Baluev (2008) is very general, and does not assume a particular shape for this matrix, except for the final application to the least squares periodogram. We thus follow the method proposed by Baluev (2008), and only change the hypothesis on the shape of the enlarged model matrix.
A.2. Analytical FAP estimate
The FAP can be bounded by (see Baluev 2008, Eq. (5))
and approximated by (see Baluev 2008, Eq. (6))
where Z is the maximum periodogram power, FAPsingle(Z) is the FAP in the case in which the frequency ν of the putative additional signal is fixed, τ(Z, νmax) is the expectation of the number of up-crossings of the level Z by the periodogram (see Baluev 2008).
Computing FAPsingle(Z) and τ(Z, νmax) requires us to specify the definition of the periodogram z(ν). Baluev (2008) proposed several definitions and derived the corresponding formulas for FAPsingle(Z) and τ(Z, νmax). These results are summarized in Table 1 for d = 2. The general case (d not necessarily equal to 2 and other definitions of z(ν)) is provided in Baluev (2008), Appendix B.
For the definitions of the periodogram of Eq. (4) and assuming d = 2, the only quantity left to compute is the factor W, which is the rescaled frequency bandwidth, defined as (see Baluev 2008)
where
The 2 × 2 matrix M(ν) is defined as follows (with x′ = ∂x/∂ν):
Baluev (2008) also defined the effective time series length as
such that
From Eqs. (A.12) and (A.14), we obtain
where is the mean of x(ν) over the frequency range ]0, νmax]. As noted by Baluev (2008), the inequality in Eqs. (A.12) and (A.16) is very sharp in practical applications. In particular, it saturates (i.e., becomes an equality) when the eigenvalues of M(ν) are equal. This expression can be evaluated numerically by sampling the frequency over the interval ]0, νmax], and computing M(ν) according to Eq. (A.13) for each frequency ν. The cost of evaluating Teff is of the same order of magnitude as computing the periodogram itself. It is therefore much more efficient than performing Monte Carlo simulations. However, this cost is not negligible compared to the periodogram cost and analytical approximations might be useful.
We now specify the expression of Teff for φ = (cos(νt), sin(νt)). Replacing φ in the definitions of Q, S, and R, we find
where * denotes the Hadamard (or elementwise) product,
We follow Baluev (2008) and neglect aliasing effects. In this approximation all the terms containing sine or cosine of νΣ average out. The terms containing sinνΔ can also be neglected. Indeed, in the low frequency limit (νΔ ≪ 1), the terms in sinνΔ vanish, while in the high frequency limit (νΔ ≫ 1), the terms in sinνΔ average out. We thus obtain
where J is the antisymmetric matrix
As in Baluev (2008), we additionally assume that ψ(ν) is orthogonal to ψℋ for all ν. As a consequence, Q̃ = Q, S̃ = S, R̃ = R, and
Replacing Eq. (A.19) in Eq. (A.21), we find
The two eigenvalues of M are thus equal in this approximation, and we can approximate Teff with (see Eq. (A.16))
In the low frequency limit νmaxΔ ≪ 1, the cosines can all be replaced by 1, and we obtain
where u is the vector of size n filled with ones. This expression was proposed by Baluev (2013c) as a “suggestive generalization” of the results of Baluev (2008) to correlated noise. However, Baluev (2013c) advocates against its use since it was not proved rigorously. Moreover, this expression is not valid in the general case but only in the low frequency limit.
In the high frequency limit νmaxΔ ≫ 1, the cosines average out in Eq. (A.23), except on the diagonal (where they are equal to 1), and we find
where w = diag(C−1).
Finally, for any frequency νmax, we can approximate (at first order) the integral of Eq. (A.23) by replacing each sum ⟨X(ν)⟩ by its average over the range ]0, νmax]. We have
and thus
Equations (A.24) and (A.25) can also be derived directly from Eq. (A.27) in the low frequency and high frequency approximations. Moreover, as explained in Sect. 2.2, the expression found by Baluev (2008) in the white noise case can also be derived from Eq. (A.27) by using the fact that the covariance matrix is diagonal.
In practical applications, all estimations of Teff (numerical evaluation of Eqs. (A.16) or (A.23), or directly using Eqs. (A.24), (A.25), and (A.27)) yield similar results. Moreover, as noted by Baluev (2008) in the case of independent Gaussian noise, Teff is often of the same order of magnitude as the total time span of the time series (Tspan = max(t)−min(t)).
Appendix B: Periodogram expectation
In this appendix, we show how to obtain an analytical estimate of the expectation of a periodogram computed with an incorrect noise model. We assume that the actual covariance matrix C of the noise is not known and that the χ2 and periodograms are computed using an incorrect covariance matrix V. Under the null hypothesis (model ℋ), the time series still follows Eq. (A.1) but the χ2 becomes
The least squares estimate of the parameters becomes
and the minimum χ2 is thus
We introduce a change of coordinates that is slightly different from Appendix A.1 (Eq. (A.5)), i.e.,
where C = LLT and V = MMT are the Cholesky decompositions of the true and assumed covariance matrices, and η is a vector of independent, centered, and reduced Gaussian random variables. In these coordinates, the minimum χ2 (Eq. (B.3)) is rewritten as
where
is the projection matrix on the subspace of ℝn defined by the vectors of ζm. The expectation of the minimum χ2 with the wrong covariance matrix V is thus
where CV = NNT = M−1CM-T. In the case V = C, we have CV = 𝟙, and we deduce
B.1. First order formula
At first order, the expectation of the periodogram can be obtained by replacing (respectively,
) by its expectation μℋ (respectively, μ𝒦) in the definition of the periodogram (Eq. (4)). We find
In the case V = C (the actual covariance matrix is known), we deduce
for i = 0, …, 3, and for all frequencies ν. However, in the case V ≠ C (wrong noise model), the periodogram expectation can significantly depart from d/2 and depends on the frequency ν.
B.2. Higher order formulas
Higher order estimates can also be obtained by developing Eq. (4) in power series of and computing higher order momenta of
,
. We provide more details in the following, however, the first order formula already yields very accurate results, and we thus adopt it in our study.
We introduce the random variables (m = ℋ, 𝒦), which we assume to be small with respect to μm. We then develop the periodogram power of Eq. (4) in power series of Xm. For instance, at second order we obtain
The expectation of the periodogram is thus (at second order)
where μℋ, μ𝒦(ν) are computed according to Eq. (B.7), and
All Tables
False alarm probability for different definitions (see Eq. (4)) of the periodogram power by Baluev (2008) in the case d = 2.
All Figures
![]() |
Fig. 1. Comparison between analytical and numerical estimations of the FAP for four types of covariance matrix (see Sect. 2.3) and using the HARPS time series of HD 136352. The periodogram power is computed following the definition z1 of Eq. (4). The expectation of z1 is one (see Eq. (B.10)). |
In the text |
![]() |
Fig. 2. Periodogram expectation for HD 136352 in the case of a wrong noise model (see Eq. (B.9)). Rows correspond to the true noise model, while columns correspond to the assumed wrong model. The definition z1 of the periodogram power is used (see Eq. (4)). If the noise model was correct, the expectation of the periodogram power would be uniformly 1 (see Eq. (B.10)). The red vertical lines highlight 0.5 d, 1 d, and 1 yr. For each periodogram, we provide the analytical estimate of the FAP (using the wrong noise model) corresponding to the highest peak of the periodogram expectation. |
In the text |
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.