Efficient modeling of correlated noise

J.-B. Delisle; N. Unger; N. C. Hara; D. Ségransan

doi:10.1051/0004-6361/202141949

Home

All issues

Volume 659 (March 2022)

A&A, 659 (2022) A182

Full HTML

Free Access

Issue		A&A Volume 659, March 2022


Article Number		A182
Number of page(s)		15
Section		Numerical methods and codes
DOI		https://doi.org/10.1051/0004-6361/202141949
Published online		25 March 2022

A&A 659, A182 (2022)

III. Scalable methods for jointly modeling several observables’ time series with Gaussian processes

J.-B. Delisle, N. Unger, N. C. Hara^⋆ and D. Ségransan

Département d’Astronomie, Université de Genève, Chemin Pegasi 51, 1290 Versoix, Switzerland
e-mail: jean-baptiste.delisle@unige.ch

Received: 3 August 2021
Accepted: 15 December 2021

Abstract

The radial velocity method is a very productive technique used to detect and confirm extrasolar planets. The most recent spectrographs, such as ESPRESSO or EXPRES, have the potential to detect Earth-like planets around Sun-like stars. However, stellar activity can induce radial velocity variations that dilute or even mimic the signature of a planet. A widely recognized method for disentangling these signals is to model the radial velocity time series, jointly with stellar activity indicators, using Gaussian processes and their derivatives. However, such modeling is prohibitive in terms of computational resources for large data sets, as the cost typically scales as the total number of measurements cubed. Here, we present S+LEAF 2, a Gaussian process framework that can be used to jointly model several time series, with a computational cost that scales linearly with the data set size. This framework thus provides a state-of-the-art Gaussian process model, with tractable computations even for large data sets. We illustrate the power of this framework by reanalyzing the 246 HARPS radial velocity measurements of the nearby K2 dwarf HD 13808, together with two activity indicators. We reproduce the results of a previous analysis of these data, but with a strongly decreased computational cost (more than two order of magnitude). The gain would be even greater for larger data sets.

Key words: methods: data analysis / methods: statistical / methods: analytical / planets and satellites: general

^⋆

NCCR CHEOPS fellow.

© ESO 2022

1. Introduction

It is common in astronomy to indirectly detect a physical event or the presence of a body by searching for its signature in a data set and, more specifically, in a time series. Astronomical time series are typically corrupted by photon noise, which is uncorrelated: the noise values at two distinct times are statistically independent. In that case, as more data are acquired, the searched-for signal should emerge more clearly. However, in many cases, the data are also corrupted by correlated noise emerging from other physical events, contamination from the Earth’s atmosphere, instrumental noise, etc. In some cases, the structure of this correlated noise can mimic the signal of interest, leading to spurious detections or to a poor estimation of the model parameters.

This situation is encountered in particular when searching for exoplanets in radial velocity (RV) data. The RV of a star is the star velocity projected onto the line of sight, measured thanks to the Doppler effect. The presence of a planetary companion induces a reflex motion of the star and thus a periodic pattern in the RV time series. The latest generation of spectrographs, such as ESPRESSO (Pepe et al. 2021) or EXPRES (Blackman et al. 2020), is able to reach a RV precision of the order of 10 cm s⁻¹, and has the potential to discover Earth-like planets around Sun-like stars. However, correlated noise of stellar origin complexifies this task. The p-modes and granulation processes introduce correlated noise at different timescales (Dumusque et al. 2011, 2012). Furthermore, the random appearance of spots and faculae at the surface of the star, combined with the star’s rotation, introduces complex structure in the data, which might be difficult to disentangle from low-mass planets. At longer timescales (from hundreds to thousands of days), the stellar magnetic cycle also induces RV variations, as well as variations in activity indicators such as the flux in the calcium II H & K emission lines ( $log R_{HK}^{'}$ ${\log R_{\rm HK}^{\prime}}$ , Noyes et al. 1984) as in, for instance, (Queloz et al. 2001). The RV signals induced by these multiple physical processes of stellar origin are globally referred to as stellar activity. A common way to account for stellar activity is to model it through a Gaussian process (GP) model (Rasmussen & Williams 2006). The GP regression method allows for the modeling of complex processes by parametrizing the covariance between the measurements instead of defining a deterministic model of the physical processes. For a GP G(t) measured at times t_i and t_j, the values G(t_i) and G(t_j) are assumed to be randomly drawn from a normal distribution, with covariance C_i, j = k(t_i, t_j), where k is the chosen parametrized kernel. The GP is often assumed to be stationary, such that the kernel k only depends on the lag Δt = |t_i − t_j| between two measurements. A commonly used kernel to model stellar activity is (Aigrain et al. 2012; Haywood et al. 2014; Rajpaul et al. 2015):

$\begin{matrix} k (Δ t) = σ^{2} exp (- \frac{Δ t^{2}}{2 ρ^{2}} - \frac{{sin}^{2} (\frac{π Δ t}{P})}{2 η^{2}}), \end{matrix}$ $\begin{aligned} k(\Delta t) = \sigma ^2 \exp \left(-\frac{\Delta t^2}{2 \rho ^2} - \frac{\sin ^2 \left(\frac{\pi \Delta t}{P}\right)}{2 \eta ^2}\right), \end{aligned}$ (1)

where σ, P, ρ, and η are the hyperparameters of the GP which need to be adjusted. In the following, we refer to this kernel as the squared-exponential periodic (SEP) kernel.

Gaussian processes are known to be able to represent a wide range of signals. As such, when their hyperparameters are left free, they are prone to absorb the signal of interest (planetary signal) along with the correlated noise (stellar activity). To avoid this drawback, Rajpaul et al. (2015) proposed a framework in which the RV time series is modeled jointly with activity indicators. Building on Aigrain et al. (2012), the authors assume that the activity-induced variations of the measurements depend linearly on an underlying Gaussian process G(t) and its derivative G′(t). The evolution of the RV and indicators is modeled as:

$\begin{matrix} Δ RV = V_{c} G (t) + V_{r} G^{'} (t), \\ Δ BIS = B_{c} G (t) + B_{r} G^{'} (t), \\ Δ log R_{HK}^{'} = L_{c} G (t), \end{matrix}$ $\begin{aligned}&\Delta \mathrm{RV} = V_c G(t) + V_r G^\prime (t),\nonumber \\ &\Delta \mathrm{BIS} = B_c G(t) + B_r G^\prime (t),\nonumber \\ &\Delta {\log R_{\rm HK}^{\prime }} = L_c G(t), \end{aligned}$ (2)

for some constants V_c, V_r, B_c, B_r, L_c. The GP’s hyperparameters are thus constrained by the three time series, instead of the RV alone. This reduces the risk of the GP overfitting, that is, the absorbtion of planetary signals, since those signals are only present in the RV time series. This framework can be straightforwardly generalized to account for additional indicators, for the combination of several GP with different amplitudes, or even for the second order derivatives of the GP (e.g., Jones et al. 2017).

While this framework is very powerful in modeling stellar activity, it represents a challenge in terms of computational cost. Indeed, computing the likelihood (or χ²) of the model for a given set of hyperparameters requires us to solve a linear system involving the full covariance matrix of the measurements. For a time series of size n, the full covariance matrix – including RV, BIS, and $log R_{HK}^{'}$ ${\log R_{\rm HK}^{\prime}}$ measurements – has a size of 3n × 3n, and the cost of the solving typically scales as 𝒪((3n)³). This becomes prohibitive in terms of computer time for large data sets, especially in the context of Bayesian methods (MCMC, nested sampling, etc.), which might require billions of evaluations of the likelihood.

In the context of a GP applied to a single time series, Ambikasaran (2015) and Foreman-Mackey (2018) (see also Rybicki & Press 1995) have shown that the so-called celerite kernel,

$\begin{matrix} k (Δ t) = \sum_{s < n_{c}} (a_{s} cos (ν_{s} Δ t) + b_{s} sin (ν_{s} Δ t)) e^{- λ_{s} Δ t}, \end{matrix}$ $\begin{aligned} k(\Delta t) = \sum _{s < n_{\rm c}} \left(a_s \cos (\nu _s\Delta t) + b_s \sin (\nu _s\Delta t) \right) \boldsymbol{e}^{-\lambda _s\Delta t}, \end{aligned}$ (3)

where n_c is the number of components, and a_s, b_s, λ_s, and ν_s are the kernel hyperparameters, can be represented as a semiseparable matrix. As a consequence, the computational cost of evaluating the likelihood scales linearly with the number of points (𝒪(n)), allowing to apply these methods to large data sets. Delisle et al. (2020b) defined a more general class of covariance matrices with a similar linear scaling of the cost: the S+LEAF matrix, which is the sum of a semiseparable matrix and a LEAF matrix. The LEAF component, which has non-zero elements close to the diagonal, is particularly adapted to represent calibration noise (see Delisle et al. 2020b). Gordon et al. (2020) extended the celerite model to the case of two-dimensional data sets. This applies, in particular, to the case of several parallel time series (e.g., RV, BIS, and $log R_{HK}^{'}$ ${\log R_{\rm HK}^{\prime}}$ ), with measurements taken at the same times. However, Gordon et al. (2020) do not discuss the treatment of the derivatives of the GP, and thus of models similar to the Rajpaul et al. (2015) model (Eq. (2)).

In this study, we extend the celerite and S+LEAF models to account for the case of several time series, with independent calendars, modeled as a linear combination of several GP and their derivatives. This allows us to apply models similar to the model used by Rajpaul et al. (2015), but with a linear scaling of the evaluation cost of the likelihood. We call this new model S+LEAF 2, as it is a generalization of the S+LEAF model (Delisle et al. 2020b).

In Sect. 2, we recall the main properties of the celerite and S+LEAF models. We then show in Sect. 3 how to model the derivative of a celerite GP. In Sect. 4, we extend the model to the case of multiple time series. We illustrate the power of this framework by reanalyzing the HARPS RV of the nearby K2 dwarf HD 13808 in Sect. 5. Finally, we discuss our methods and results in Sect. 6. An open-source reference implementation of our algorithms as C library with python wrappers is publicly available¹.

2. The celerite and S+LEAF models for homogeneous time series

We consider a time series of measurements (t_i, y_i) (i = 1, …, n), which can be modeled by a deterministic component, a GP component, and measurement noise. In the case of radial velocities, the deterministic component encompasses the reflex motion due to companions, the systematic velocity of the system, instruments offsets, and so on. The GP might be used to model physical mechanisms that are too poorly understood or constrained to be included in the deterministic part. This is typically the case of stellar activity at different timescales (oscillations, granulation, rotation, magnetic cycles, etc.). Finally the noise component encompasses photon noise, calibration noise, and so on. The time series can thus be expressed as:

$\begin{matrix} y_{i} = m (t_{i}) + G (t_{i}) + ϵ_{i}, \end{matrix}$ $\begin{aligned} y_i = m(t_i) + G(t_i) + \epsilon _i, \end{aligned}$ (4)

where m is the deterministic part of the model, G is the GP, and ϵ the noise. These three components of the model might depend on a set of parameters θ. Assuming the noise to also be Gaussian (but not necessarily white), the log-likelihood function of a given set of parameters θ is

$\begin{matrix} ln L (θ) & = ln p (y | θ) \\ = - \frac{1}{2} {(y - m_{θ})}^{T} C_{θ}^{- 1} (y - m_{θ}) \\ - \frac{1}{2} ln det (2 π C_{θ}) . \end{matrix}$ $\begin{aligned} \ln \mathcal{L} (\theta )&= \ln p(y|\theta )\nonumber \\&= -\frac{1}{2} \left(y-m_\theta \right)^\mathrm{T}C_\theta ^{-1} \left(y-m_\theta \right)\nonumber \\&\quad -\frac{1}{2} \ln \det \left(2\pi C_\theta \right)\!. \end{aligned}$ (5)

C is the total covariance matrix of the time series which can be split as

$\begin{matrix} C = K + Σ, \end{matrix}$ $\begin{aligned} C = K + \Sigma , \end{aligned}$ (6)

where K is the covariance of the GP G and Σ is the covariance of the noise

$\begin{matrix} K_{i, j} = cov (G (t_{i}), G (t_{j})) = k (t_{i}, t_{j}), \\ Σ_{i, j} = cov (ϵ_{i}, ϵ_{j}), \end{matrix}$ $\begin{aligned}&K_{i,j} = \text{ cov}\left(G(t_i), G(t_j)\right) = k(t_i, t_j),\nonumber \\ &\Sigma _{i,j} = \text{ cov}\left(\epsilon _i, \epsilon _j\right)\!,\end{aligned}$ (7)

with k the GP kernel function.

2.1. The celerite model

The celerite model proposed by Foreman-Mackey et al. (2017) allows for very efficient computations of this model and, in particular, the likelihood and its gradient with respect to θ, in the case of white noise (diagonal covariance matrix Σ = diag(σ²)) and assuming the kernel function k to follow Eq. (3). In this case, the covariance matrix C is semiseparable (see Foreman-Mackey et al. 2017):

$\begin{matrix} C = diag (A + σ^{2}) + tril (U V^{T}) + triu (V U^{T}), \end{matrix}$ $\begin{aligned} C = \text{ diag}\left(A+\sigma ^2\right) + \text{ tril}\left(U V^\mathrm{T}\right) + \text{ triu}\left(V U^\mathrm{T}\right)\!, \end{aligned}$ (8)

where diag(A) is the diagonal matrix built from the vector A of size n, U and V are n × r matrices (with r = 2n_c the rank of U and V) defined as:

$\begin{matrix} A_{i} = \sum_{s < n_{c}} a_{s}, \\ U_{i, s} = e^{- λ_{s} t_{i}} (a_{s} cos (ν_{s} t_{i}) + b_{s} sin (ν_{s} t_{i})), \\ U_{i, n_{c} + s} = e^{- λ_{s} t_{i}} (a_{s} sin (ν_{s} t_{i}) - b_{s} cos (ν_{s} t_{i})), \\ V_{i, s} = e^{λ_{s} t_{i}} cos (ν_{s} t_{i}), \\ V_{i, n_{c} + s} = e^{λ_{s} t_{i}} sin (ν_{s} t_{i}) . \end{matrix}$ $\begin{aligned}&A_i = \sum _{s < n_{\rm c}}a_s,\nonumber \\ &U_{i,s} = \boldsymbol{e}^{-\lambda _s t_i} \left(a_s \cos (\nu _s t_i) + b_s \sin (\nu _s t_i)\right),\nonumber \\ &U_{i,n_{\rm c}+s} = \boldsymbol{e}^{-\lambda _s t_i} \left(a_s \sin (\nu _s t_i) - b_s\cos (\nu _s t_i)\right)\!,\nonumber \\ &V_{i,s} = \boldsymbol{e}^{\lambda _s t_i} \cos (\nu _s t_i),\nonumber \\ &V_{i,n_{\rm c}+s} = \boldsymbol{e}^{\lambda _s t_i} \sin (\nu _s t_i). \end{aligned}$ (9)

This low-rank representation of the covariance matrix allows us to use very efficient dedicated algorithms (Cholesky decomposition, solving, dot product, determinant, see Foreman-Mackey et al. 2017; Foreman-Mackey 2018), which are particularly useful in computing the likelihood. The memory footprint of the celerite model scales as 𝒪(nr), and the computational cost scales as 𝒪(nr²). For comparison, the memory footprint of a naive representation of the same covariance matrix scales as 𝒪(n²) and the computational costs typically scales as 𝒪(n³).

The celerite algorithm is actually not restricted to kernels of the form of Eq. (3), but it can be applied as long as the kernel admits a semiseparable representation following Eq. (8). In particular, the Matérn 3/2 and Matérn 5/2 kernels can be represented using a semiseparable representation of rank 2 and 3 respectively (see Appendix A.1).

2.2. The S+LEAF model

The S+LEAF model (Delisle et al. 2020b) extends the celerite model in the case of correlated measurement noise. While in the celerite model, the covariance matrix Σ is assumed to be diagonal, the S+LEAF model allows us to account for a more general class of noise models, where Σ can be represented by a LEAF matrix. Such LEAF matrices are sparse matrices where non-zero values are close to the diagonal (see Delisle et al. 2020b). This encompasses banded matrices, block-diagonal matrices, staircase matrices, and so on. In the context of radial velocities, LEAF matrices are especially useful to model calibration noise (see Delisle et al. 2020b). Indeed, to obtain precise RV measurements, the instrument must be calibrated periodically, typically once per night. Thus, several measurements taken with the same instrument during the same night share the same calibration noise, which introduces a block-diagonal contribution to the covariance matrix.

As in the case of the celerite model, the S+LEAF model allows for a sparse representation of the covariance matrix and for efficient dedicated algorithms to compute the likelihood and its gradient with respect to the parameters (Delisle et al. 2020b). The memory footprint of the S+LEAF model scales as $O ((r + \bar{b}) n)$ ${\mathcal{O}}\left(\left(r+\bar{b}\right)n\right)$ and its computational cost as $O ((r^{2} + r \bar{b} + {\bar{b}}^{2}) n)$ ${\mathcal{O}}\left(\left(r^2+r\bar{b}+\bar{b}^2\right)n\right)$ , where $\bar{b}$ $\bar{b}$ is the average band width of the LEAF component.

3. Derivative of a celerite/S+LEAF Gaussian process

We consider a GP G(t) whose kernel function k is stationary and admits a semiseparable decomposition (Eq. (8)). The covariance matrix of G is thus given by:

$\begin{matrix} cov (G (t_{i}), G (t_{j})) = k (t_{i} - t_{j}) = U (t_{i}) V (t_{j}) = U_{i} V_{j}, \end{matrix}$ $\begin{aligned} \text{ cov}\left(G(t_i), G(t_j)\right) = k(t_i - t_j) = U(t_i) V(t_j) = U_i V_j, \end{aligned}$ (10)

for t_i ≥ t_j, namely, for the lower triangular part. By way of symmetry, we have

$\begin{matrix} cov (G (t_{i}), G (t_{j})) = k (t_{j} - t_{i}) = V_{i} U_{j}, \end{matrix}$ $\begin{aligned} \text{ cov}\left(G(t_i), G(t_j)\right) = k(t_j - t_i) = V_i U_j, \end{aligned}$ (11)

for t_i < t_j, namely, for the upper triangular part. Assuming G to be differentiable, the covariance between G and G′ is given by the partial derivatives of the kernel function:

$\begin{matrix} cov (G^{'} (t_{i}), G (t_{j})) = \frac{\partial k (t_{i}, t_{j})}{\partial t_{i}} = k^{'} (t_{i} - t_{j}) = U^{'} (t_{i}) V (t_{j}) = U_{i}^{'} V_{j}, \\ cov (G (t_{i}), G^{'} (t_{j})) = \frac{\partial k (t_{i}, t_{j})}{\partial t_{j}} = - k^{'} (t_{i} - t_{j}) = U_{i} V_{j}^{'}, \end{matrix}$ $\begin{aligned}&\text{ cov}\left(G^\prime (t_i), G(t_j)\right) = \frac{\partial k(t_i, t_j)}{\partial t_i} = k^\prime (t_i-t_j) = U^\prime (t_i) V(t_j) = U^\prime _i V_j,\nonumber \\ \text{ cov}\left(G(t_i), G^\prime (t_j)\right) = \frac{\partial k(t_i, t_j)}{\partial t_j} = -k^\prime (t_i-t_j) = U_i V^\prime _j, \end{aligned}$ (12)

where we assume t_i ≥ t_j, and the primes denote the differentiation with respect to time. For these two equations to be valid, the semiseparable representation (U, V) must verify:

$\begin{matrix} U' V^{T} = - U V^{' T} . \end{matrix}$ $\begin{aligned} U^\prime V^\mathrm{T}= - U V^{\prime \mathrm{T}}. \end{aligned}$ (13)

This relation is actually linked with the stationarity of the kernel. Indeed, since

$\begin{matrix} k (t, t + Δ t) = k (Δ t) = U (t) V (t + Δ t) \end{matrix}$ $\begin{aligned} k(t, t+\Delta t) = k(\Delta t) = U(t) V(t+\Delta t) \end{aligned}$ (14)

does not depend on t, we have

$\begin{matrix} \frac{\partial k (t, t + Δ t)}{\partial t} = 0 = U' (t) V (t + Δ t) + U (t) V' (t + Δ t) . \end{matrix}$ $\begin{aligned} \frac{\partial k(t, t+\Delta t)}{\partial t} = 0 = U^\prime (t) V(t+\Delta t) + U(t) V^\prime (t+\Delta t). \end{aligned}$ (15)

In addition to this stationarity condition, for the GP G to be differentiable the kernel must verify k′(0) = 0 (e.g., Rasmussen & Williams 2006), thus:

$\begin{matrix} U_{i}^{'} V_{i} = U_{i} V_{'}^{i} = 0 . \end{matrix}$ $\begin{aligned} U^\prime _i V_i = U_i V^\prime _i = 0. \end{aligned}$ (16)

From Eq. (12), we deduce that the covariance matrix between G and G′ admits several equivalent antisymmetric semiseparable representations

$\begin{matrix} cov (G^{'}, G) & = tril (U^{'} V^{T}) - triu (V {U^{'}}^{T}) \\ = - tril (U {V^{'}}^{T}) + triu (V^{'} U^{T}), \\ = tril (U^{'} V^{T}) + triu (V^{'} U^{T}), \\ \dots \end{matrix}$ $\begin{aligned} \text{ cov}\left(G^\prime , G\right)&= \text{ tril}\left(U^\prime V^\mathrm{T}\right) - \text{ triu}\left(V {U^\prime }^\mathrm{T}\right)\nonumber \\&= -\text{ tril}\left(U {V^\prime }^\mathrm{T}\right) + \text{ triu}\left(V^\prime U^\mathrm{T}\right)\!,\nonumber \\&= \text{ tril}\left(U^\prime V^\mathrm{T}\right) + \text{ triu}\left(V^\prime U^\mathrm{T}\right)\!,\nonumber \\&\ldots \end{aligned}$ (17)

Similar representations can be deduced for cov(G,G′) by using the relation

$\begin{matrix} cov (G, G^{'}) = cov {(G^{'}, G)}^{T} = - cov (G^{'}, G) . \end{matrix}$ $\begin{aligned} \text{ cov}\left(G, G^\prime \right) = \text{ cov}\left(G^\prime , G\right)^\mathrm{T}= - \text{ cov}\left(G^\prime , G\right)\!. \end{aligned}$ (18)

For the covariance matrix of G′ itself, we have (for t_i ≥ t_j)

$\begin{matrix} cov (G^{'} (t_{i}), G^{'} (t_{j})) & = \frac{\partial^{2} k (t_{i}, t_{j})}{\partial t_{i} \partial t_{j}} = - k^{″} (t_{i} - t_{j}) \\ = U_{i}^{'} V_{j}^{'} = - U_{i}^{″} V_{j} = - U_{i} V_{j}^{″} . \end{matrix}$ $\begin{aligned} \text{ cov}\left(G^\prime (t_i), G^\prime (t_j)\right)&= \frac{\partial ^2 k(t_i, t_j)}{\partial t_i\partial t_j} = -k^{\prime \prime }(t_i-t_j)\nonumber \\&= U^\prime _i V^\prime _j = -U^{\prime \prime }_i V_j = -U_i V^{\prime \prime }_j. \end{aligned}$ (19)

Therefore, the covariance matrix of G′ also admits several symmetric semiseparable representations:

$\begin{matrix} cov (G^{'}, G^{'}) & = diag (B) + tril (U^{'} {V^{'}}^{T}) + triu (V^{'} {U^{'}}^{T}) \\ = diag (B) - tril (U ″ V^{T}) - triu (V {U^{″}}^{T}) \\ = diag (B) - tril (U {V ″}^{T}) - triu (V^{″} U^{T}) \\ \dots \end{matrix}$ $\begin{aligned} \text{ cov}\left(G^\prime , G^\prime \right)&= \text{ diag}(B) + \text{ tril}\left(U^\prime {V^\prime }^\mathrm{T}\right) + \text{ triu}\left(V^\prime {U^\prime }^\mathrm{T}\right)\nonumber \\&= \text{ diag}(B) - \text{ tril}\left(U^{\prime \prime }V^\mathrm{T}\right) - \text{ triu}\left(V {U^{\prime \prime }}^\mathrm{T}\right)\nonumber \\&= \text{ diag}(B) - \text{ tril}\left(U {V^{\prime \prime }}^\mathrm{T}\right) - \text{ triu}\left(V^{\prime \prime }U^\mathrm{T}\right)\nonumber \\&\ldots \end{aligned}$ (20)

with

$\begin{matrix} B_{i} = U_{i}^{'} V_{i}^{'} = - U_{i}^{'} V_{i} = - U_{i} V_{″}^{i} . \end{matrix}$ $\begin{aligned} B_i = U^\prime _i V^\prime _i = -U^{\prime \prime }_i V_i = - U_i V^{\prime \prime }_i. \end{aligned}$ (21)

Hereinafter, we use the following representations:

$\begin{matrix} cov (G^{'}, G) & = tril (U^{'} V^{T}) + triu (V^{'} U^{T}), \\ cov (G, G^{'}) & = tril (U {V^{'}}^{T}) + triu (V {U^{'}}^{T}), \\ cov (G^{'}, G^{'}) & = diag (B) + tril (U^{'} {V^{'}}^{T}) + triu (V^{'} {U^{'}}^{T}) . \end{matrix}$ $\begin{aligned} \text{ cov}\left(G^\prime , G\right)&= \text{ tril}\left(U^\prime V^\mathrm{T}\right) + \text{ triu}\left(V^\prime U^\mathrm{T}\right)\!,\nonumber \\ \text{ cov}\left(G, G^\prime \right)&= \text{ tril}\left(U {V^\prime }^\mathrm{T}\right) + \text{ triu}\left(V {U^\prime }^\mathrm{T}\right)\!,\nonumber \\ \text{ cov}\left(G^\prime , G^\prime \right)&= \text{ diag}(B) + \text{ tril}\left(U^\prime {V^\prime }^\mathrm{T}\right) + \text{ triu}\left(V^\prime {U^\prime }^\mathrm{T}\right)\!. \end{aligned}$ (22)

Applying this reasoning to the celerite kernel of Eq. (3), we find (as per Eq. (9)):

$\begin{matrix} k^{'} (Δ t) & = \sum_{s < n_{c}} (a_{s}^{'} cos (ν_{s} Δ t) + b_{'}^{s} sin (ν_{s} Δ t)) e^{- λ_{s} Δ t}, \\ - k^{″} (Δ t) & = \sum_{s < n_{c}} (a_{s}^{″} cos (ν_{s} Δ t) + b_{s}^{″} sin (ν_{s} Δ t)) e^{- λ_{s} Δ t}, \\ U_{i, s}^{'} & = e^{- λ_{s} t_{i}} (a_{s}^{'} cos (ν_{s} t_{i}) + b_{'}^{s} sin (ν_{s} t_{i})), \\ U_{i, n_{c} + s}^{'} & = e^{- λ_{s} t_{i}} (a_{s}^{'} sin (ν_{s} t_{i}) - b_{'}^{s} cos (ν_{s} t_{i})), \\ V_{i, s}^{'} & = e^{λ_{s} t_{i}} (λ_{s} cos (ν_{s} t_{i}) - ν_{s} sin (ν_{s} t_{i})), \\ V_{i, n_{c} + s}^{'} & = e^{λ_{s} t_{i}} (λ_{s} sin (ν_{s} t_{i}) + ν_{s} cos (ν_{s} t_{i})), \end{matrix}$ $\begin{aligned} k^\prime (\Delta t)&= \sum _{s < n_{\rm c}} \left(a^\prime _s \cos (\nu _s\Delta t) + b^\prime _s \sin (\nu _s\Delta t) \right) \boldsymbol{e}^{-\lambda _s\Delta t},\nonumber \\ -k^{\prime \prime }(\Delta t)&= \sum _{s < n_{\rm c}} \left(a^{\prime \prime }_s \cos (\nu _s\Delta t) + b^{\prime \prime }_s \sin (\nu _s\Delta t) \right) \boldsymbol{e}^{-\lambda _s\Delta t},\nonumber \\ U^\prime _{i,s}&= \boldsymbol{e}^{-\lambda _s t_i} \left(a^\prime _s \cos (\nu _s t_i) + b^\prime _s \sin (\nu _s t_i)\right)\!,\nonumber \\ U^\prime _{i,n_{\rm c}+s}&= \boldsymbol{e}^{-\lambda _s t_i} \left(a^\prime _s \sin (\nu _s t_i) - b^\prime _s\cos (\nu _s t_i)\right)\!,\nonumber \\ V^\prime _{i,s}&= \boldsymbol{e}^{\lambda _s t_i} \left(\lambda _s \cos (\nu _s t_i) - \nu _s \sin (\nu _s t_i)\right)\!,\nonumber \\ V^\prime _{i,n_{\rm c}+s}&= \boldsymbol{e}^{\lambda _s t_i} \left(\lambda _s \sin (\nu _s t_i) + \nu _s \cos (\nu _s t_i)\right)\!, \end{aligned}$ (23)

with

$\begin{matrix} a_{s}^{'} & = ν_{s} b_{s} - λ_{s} a_{s}, \\ b_{'}^{s} & = - ν_{s} a_{s} - λ_{s} b_{s}, \\ a_{s}^{″} & = (ν_{s}^{2} - λ_{s}^{2}) a_{s} + 2 λ_{s} ν_{s} b_{s}, \\ b_{s}^{″} & = (ν_{s}^{2} - λ_{s}^{2}) b_{s} - 2 λ_{s} ν_{s} a_{s} . \end{matrix}$ $\begin{aligned} a^\prime _s&= \nu _s b_s-\lambda _s a_s,\nonumber \\ b^\prime _s&= -\nu _s a_s-\lambda _s b_s,\nonumber \\ a^{\prime \prime }_s&= (\nu _s^2-\lambda _s^2) a_s + 2\lambda _s\nu _s b_s,\nonumber \\ b^{\prime \prime }_s&= (\nu _s^2-\lambda _s^2) b_s - 2\lambda _s\nu _s a_s. \end{aligned}$ (24)

In this case, the differentiability condition of Eq. (16) can be rewritten as

$\begin{matrix} k^{'} (0) = 0 = \sum_{s < n_{c}} a_{s}^{'} . \end{matrix}$ $\begin{aligned} k^\prime (0) = 0 = \sum _{s < n_{\rm c}} a^\prime _s. \end{aligned}$ (25)

Thus the initial parameters must verify

$\begin{matrix} \sum_{s < n_{c}} ν_{s} b_{s} - λ_{s} a_{s} = 0, \end{matrix}$ $\begin{aligned} \sum _{s < n_{\rm c}} \nu _s b_s-\lambda _s a_s = 0, \end{aligned}$ (26)

for the GP to be differentiable. In particular, in the case of a kernel including a single celerite component (n_c = 1), one must verify $a_{s}^{'} = ν_{s} b_{s} - λ_{s} a_{s} = 0$ $a^\prime_s = \nu_s b_s-\lambda_s a_s = 0$ and Eq. (23) is simplified into

$\begin{matrix} k^{'} (Δ t) & = b_{s}^{'} sin (ν_{s} Δ t) e^{- λ_{s} Δ t}, \\ U_{i, s}^{'} & = b_{s}^{'} e^{- λ_{s} t_{i}} sin (ν_{s} t_{i}), \\ U_{i, n_{c} + s}^{'} & = - b_{s}^{'} e^{- λ_{s} t_{i}} cos (ν_{s} t_{i}) . \end{matrix}$ $\begin{aligned} k^\prime (\Delta t)&= b^\prime _s \sin (\nu _s\Delta t) \boldsymbol{e}^{-\lambda _s\Delta t},\nonumber \\ U^\prime _{i,s}&= b^\prime _s \boldsymbol{e}^{-\lambda _s t_i} \sin (\nu _s t_i),\nonumber \\ U^\prime _{i,n_{\rm c}+s}&= -b^\prime _s \boldsymbol{e}^{-\lambda _s t_i} \cos (\nu _s t_i).\end{aligned}$ (27)

The SHO kernel, which is a particular type of celerite kernel that only depends on three free parameters, always verifies this differentiability condition. It thus provides a smoother GP than the general celerite kernel, which is why Foreman-Mackey et al. (2017) recommended its application.

In the case of the Matérn 3/2 and 5/2 kernels, similar semiseparable decompositions can be achieved for the derivatives (see Appendix A.2). More generally, Eq. (22) provides the semiseparable decomposition of the derivatives for any differentiable semiseparable kernel.

4. S+LEAF 2: Extending celerite/S+LEAF to heterogeneous time series

Following Rajpaul et al. (2015), we assume that the time series of the radial velocities and the different indicators follow

$\begin{matrix} Y_{i, j} = f_{i} (T_{i, j}) + \sum_{k} (α_{k, i} G_{k} (T_{i, j}) + β_{k, i} G_{'}^{k} (T_{i, j})) + ϵ_{i, j}, \end{matrix}$ $\begin{aligned} Y_{i,j} = f_i(T_{i,j}) + \sum _k \left(\alpha _{k,i} G_k(T_{i,j}) + \beta _{k,i} G^\prime _k(T_{i,j})\right) + \epsilon _{i,j}, \end{aligned}$ (28)

where (T_1, ., Y_1, .) is the RV time series and (T_i, ., Y_i, .) are the indicators time series (i > 1), f_i is the determinist part of the model for the time series i, G_k are independent GP, and ϵ is the measurement noise (including photon noise, calibration noise, etc.).

The times and number of measurements need not be the same for all time series, which implies that T and Y are not necessarily matrices but collections of vectors of variable length. In the case of the model presented in Eq. (2), the activity indicators (BIS and $log R_{HK}^{'}$ ${\log R_{\rm HK}^{\prime}}$ ) are typically extracted from the same spectra as the RV time series and, thus, they share the same sampling. However, activity indicators can be extracted from other instruments or techniques. For instance, Haywood et al. (2014) trained a GP on the CoRoT light curve of CoRoT 7 to then use it to model the impact of stellar activity on the HARPS radial velocity time series of the same star. While both data sets were roughly contemporary, the time series did not share the same sampling. In such a case, we could define (T_1, ., Y_1, .) as the RV time series, and (T_2, ., Y_2, .) as the photometric time series (with T₁ ≠ T₂), and model both time series jointly according to Eq. (28).

For the sake of readability, we consider in the following a single GP G, such that

$\begin{matrix} Y_{i, j} = f_{i} (T_{i, j}) + α_{i} G (T_{i, j}) + β_{i} G^{'} (T_{i, j}) + ϵ_{i, j}, \end{matrix}$ $\begin{aligned} Y_{i,j} = f_i(T_{i,j}) + \alpha _i G(T_{i,j}) + \beta _i G^\prime (T_{i,j}) + \epsilon _{i,j}, \end{aligned}$ (29)

but the reasoning holds in the more general case of Eq. (28). The covariance matrix corresponding to Eq. (29) is

$\begin{matrix} cov (Y_{i, j}, Y_{l, m}) = & α_{i} α_{l} cov (G (T_{i, j}), G (T_{l, m})) \\ + β_{i} α_{l} cov (G^{'} (T_{i, j}), G (T_{l, m})) \\ + α_{i} β_{l} cov (G (T_{i, j}), G^{'} (T_{l, m})) \\ + β_{i} β_{l} cov (G^{'} (T_{i, j}), G^{'} (T_{l, m})) \\ + cov (ϵ_{i, j}, ϵ_{l, k}) . \end{matrix}$ $\begin{aligned} \text{ cov}(Y_{i,j}, Y_{l,m}) =~&\alpha _i\alpha _l \text{ cov}\left(G(T_{i,j}), G(T_{l,m})\right)\nonumber \\& + \beta _i \alpha _l \text{ cov}\left(G^\prime (T_{i,j}), G(T_{l,m})\right)\nonumber \\& + \alpha _i \beta _l \text{ cov}\left(G(T_{i,j}), G^\prime (T_{l,m})\right)\nonumber \\& + \beta _i \beta _l \text{ cov}\left(G^\prime (T_{i,j}), G^\prime (T_{l,m})\right)\nonumber \\& + \text{ cov}\left(\epsilon _{i,j}, \epsilon _{l,k}\right)\!. \end{aligned}$ (30)

The full covariance matrix is a n × n matrix, where n is the total number of measurements (radial velocities and indicators). The cost of evaluating the corresponding likelihood – which requires to solve a linear system involving the covariance matrix and computing its determinant – typically scales as 𝒪(n³). Therefore, a direct evaluation of the covariance matrix becomes rapidly prohibitive in terms of memory footprint and computing time.

4.1. Semiseparable representation of the model

In order to construct a semiseparable representation of the covariance matrix of Eq. (30) we merge all time vectors T_i into a single vector t, and all data vectors Y_i into a single vector y. We additionally sort the measurements by increasing time; thus, the measurements of the different time series are completely mixed in the merged time series (t, y). For a measurement (t_k, y_k) we denote by ℐ_k (index of the original time series the measurement belongs to) and 𝒥_k (index of the measurement in this original time series) the corresponding couple of indices of this measurement in T, Y. The model of Eq. (29) can be rewritten as

$\begin{matrix} y_{k} = f_{I_{k}} (t_{k}) + α_{k} G (t_{k}) + β_{k} G^{'} (t_{k}) + ϵ_{I_{k}, J_{k}}, \end{matrix}$ $\begin{aligned} y_k = f_{\mathcal{I} _k}(t_k) + \boldsymbol{\alpha }_k G(t_k) + \boldsymbol{\beta }_k G^\prime (t_k) + \epsilon _{\mathcal{I} _k,\mathcal{J} _k}, \end{aligned}$ (31)

with α_k = α_{ℐ_k} and β_k = β_{ℐ_k}. The corresponding covariance matrix is (as per Eq. (30)):

$\begin{matrix} C = cov (y, y) = & (α α^{T}) * cov (G (t), G (t)) \\ + (β α^{T}) * cov (G^{'} (t), G (t)) \\ + (α β^{T}) * cov (G (t), G^{'} (t)) \\ + (β β^{T}) * cov (G^{'} (t), G^{'} (t)) \\ + Σ, \end{matrix}$ $\begin{aligned} C = \text{ cov}(y, y) =&\left(\boldsymbol{\alpha }\boldsymbol{\alpha }^\mathrm{T}\right) \!*\!\text{ cov}\left(G(t), G(t)\right)\nonumber \\&+ \left(\boldsymbol{\beta } \boldsymbol{\alpha }^\mathrm{T}\right) \!*\!\text{ cov}\left(G^\prime (t), G(t)\right)\nonumber \\&+ \left(\boldsymbol{\alpha } \boldsymbol{\beta }^\mathrm{T}\right) \!*\!\text{ cov}\left(G(t), G^\prime (t)\right)\nonumber \\&+ \left(\boldsymbol{\beta } \boldsymbol{\beta }^\mathrm{T}\right) \!*\!\text{ cov}\left(G^\prime (t), G^\prime (t)\right)\nonumber \\&+ \Sigma , \end{aligned}$ (32)

where Σ is the covariance matrix of the measurement noise

$\begin{matrix} Σ_{k, l} = cov (ϵ_{I_{k}, J_{k}}, ϵ_{I_{l}, J_{l}}), \end{matrix}$ $\begin{aligned} \Sigma _{k,l} = \text{ cov}(\epsilon _{\mathcal{I} _k,\mathcal{J} _k}, \epsilon _{\mathcal{I} _l,\mathcal{J} _l}), \end{aligned}$ (33)

and X * Y is the Hadamard (or element-wise) product

$\begin{matrix} {(X * Y)}_{i, j} = X_{i, j} Y_{i, j} . \end{matrix}$ $\begin{aligned} (X \!*\!Y)_{i,j} = X_{i,j} Y_{i,j}. \end{aligned}$ (34)

We now assume that the GP G can be modeled by a semiseparable kernel (see Eq. (8)). We can thus use the merged time vector t to compute the semiseparable representation of the covariance matrix of G(t), G′(t) according to Eqs. (8), (9), (22), and (23) (see also Appendix A), and we obtain

$\begin{matrix} C = & (α α^{T}) * (diag (A) + tril (U V^{T}) + triu (V U^{T})) \\ + (β α^{T}) * (tril (U^{'} V^{T}) + triu (V^{'} U^{T})) \\ + (α β^{T}) * (tril (U {V^{'}}^{T}) + triu (V {U^{'}}^{T})) \\ + (β β^{T}) * (diag (B) + tril (U^{'} {V^{'}}^{T}) + triu (V^{'} {U^{'}}^{T})) \\ + Σ . \end{matrix}$ $\begin{aligned} C =&\left(\boldsymbol{\alpha }\boldsymbol{\alpha }^\mathrm{T}\right) \!*\!\left( \text{ diag}(A) + \text{ tril}\left(U V^\mathrm{T}\right) + \text{ triu}\left(V U^\mathrm{T}\right) \right)\nonumber \\&+ \left(\boldsymbol{\beta } \boldsymbol{\alpha }^\mathrm{T}\right) \!*\!\left( \text{ tril}\left(U^\prime V^\mathrm{T}\right) + \text{ triu}\left(V^\prime U^\mathrm{T}\right) \right)\nonumber \\&+ \left(\boldsymbol{\alpha } \boldsymbol{\beta }^\mathrm{T}\right) \!*\!\left( \text{ tril}\left(U {V^\prime }^\mathrm{T}\right) + \text{ triu}\left(V {U^\prime }^\mathrm{T}\right) \right)\nonumber \\&+ \left(\boldsymbol{\beta } \boldsymbol{\beta }^\mathrm{T}\right) \!*\!\left( \text{ diag}(B) + \text{ tril}\left(U^\prime {V^\prime }^\mathrm{T}\right) + \text{ triu}\left(V^\prime {U^\prime }^\mathrm{T}\right) \right)\nonumber \\&+ \Sigma . \end{aligned}$ (35)

The Hadamard product (αβ^T) * M can also be rewritten as

$\begin{matrix} (α β^{T}) * M = diag (α) M diag (β), \end{matrix}$ $\begin{aligned} \left(\boldsymbol{\alpha }\boldsymbol{\beta }^\mathrm{T}\right) \!*\!M = \text{ diag}(\boldsymbol{\alpha }) M \text{ diag}(\boldsymbol{\beta }), \end{aligned}$ (36)

thus

$\begin{matrix} (α α^{T}) * diag (A) & = diag (α^{2} * A), \\ (α β^{T}) * tril (U V^{T}) & = tril ((α * U) {(β * V)}^{T}), \\ (α β^{T}) * triu (U V^{T}) & = triu ((α * U) {(β * V)}^{T}), \end{matrix}$ $\begin{aligned} \left(\boldsymbol{\alpha }\boldsymbol{\alpha }^\mathrm{T}\right) \!*\!\text{ diag}(A)&= \text{ diag}\left(\boldsymbol{\alpha }^2*A\right)\!,\nonumber \\ \left(\boldsymbol{\alpha }\boldsymbol{\beta }^\mathrm{T}\right) \!*\!\text{ tril}\left(UV^\mathrm{T}\right)&= \text{ tril}\left(\left(\boldsymbol{\alpha }*U\right)\left(\boldsymbol{\beta }*V\right)^\mathrm{T}\right)\!,\nonumber \\ \left(\boldsymbol{\alpha }\boldsymbol{\beta }^\mathrm{T}\right) \!*\!\text{ triu}\left(UV^\mathrm{T}\right)&= \text{ triu}\left(\left(\boldsymbol{\alpha }*U\right)\left(\boldsymbol{\beta }*V\right)^\mathrm{T}\right)\!,\end{aligned}$ (37)

where α² is the element-wise square of α (α² = α * α) and α * U is the element-wise product of each column of U by α

$\begin{matrix} {(α * U)}_{k, s} = α_{k} U_{k, s} . \end{matrix}$ $\begin{aligned} \left(\boldsymbol{\alpha } * U\right)_{k,s} = \alpha _k U_{k,s}. \end{aligned}$ (38)

Using these relations, Eq. (35) can be rewritten as

$\begin{matrix} C = & diag (α^{2} * A) \\ + tril ((α * U) {(α * V)}^{T}) + triu ((α * V) {(α * U)}^{T}) \\ + tril ((β * U^{'}) {(α * V)}^{T}) + triu ((β * V^{'}) {(α * U)}^{T}) \\ + tril ((α * U) {(β * V^{'})}^{T}) + triu ((α * V) {(β * U^{'})}^{T}) \\ + diag (β^{2} * B) \\ + tril ((β * U^{'}) {(β * V^{'})}^{T}) + triu ((β * V^{'}) {(β * U^{'})}^{T}) \\ + Σ . \end{matrix}$ $\begin{aligned} C =&\text{ diag}\left(\boldsymbol{\alpha }^2 \!*\!A\right)\nonumber \\&+ \text{ tril}\left(\left(\boldsymbol{\alpha }\!*\!U\right)\left(\boldsymbol{\alpha }\!*\!V\right)^\mathrm{T}\right) + \text{ triu}\left(\left(\boldsymbol{\alpha }\!*\!V\right) \left(\boldsymbol{\alpha }\!*\!U\right)^\mathrm{T}\right)\nonumber \\&+ \text{ tril}\left(\left(\boldsymbol{\beta }\!*\!U^\prime \right)\left(\boldsymbol{\alpha }\!*\!V\right)^\mathrm{T}\right) + \text{ triu}\left(\left(\boldsymbol{\beta }\!*\!V^\prime \right) \left(\boldsymbol{\alpha }\!*\!U\right)^\mathrm{T}\right)\nonumber \\&+ \text{ tril}\left(\left(\boldsymbol{\alpha }\!*\!U\right)\left(\boldsymbol{\beta }\!*\!V^\prime \right)^\mathrm{T}\right) + \text{ triu}\left(\left(\boldsymbol{\alpha }\!*\!V\right) \left(\boldsymbol{\beta }\!*\!U^\prime \right)^\mathrm{T}\right)\nonumber \\&+ \text{ diag}\left(\boldsymbol{\beta }^2 \!*\!B\right)\nonumber \\&+ \text{ tril}\left(\left(\boldsymbol{\beta }\!*\!U^\prime \right)\left(\boldsymbol{\beta }\!*\!V^\prime \right)^\mathrm{T}\right) + \text{ triu}\left(\left(\boldsymbol{\beta }\!*\!V^\prime \right) \left(\boldsymbol{\beta }\!*\!U^\prime \right)^\mathrm{T}\right)\nonumber \\&+ \Sigma . \end{aligned}$ (39)

Finally, the latter expression can be factorized to obtain the semiseparable representation

$\begin{matrix} C = diag (A) + tril (U V^{T}) + triu (V U^{T}) + Σ, \end{matrix}$ $\begin{aligned} C = \text{ diag}(\mathcal{A} ) + \text{ tril}\left(\mathcal{U} \mathcal{V} ^\mathrm{T}\right) + \text{ triu}\left(\mathcal{V} \mathcal{U} ^\mathrm{T}\right) + \Sigma , \end{aligned}$ (40)

with

$\begin{matrix} A = α^{2} * A + β^{2} * B, \\ U = α * U + β * U^{'}, \\ V = α * V + β * V^{'} . \end{matrix}$ $\begin{aligned}&\mathcal{A} = \boldsymbol{\alpha }^2 \!*\!A + \boldsymbol{\beta }^2 \!*\!B,\nonumber \\ &\mathcal{U} = \boldsymbol{\alpha }\!*\!U + \boldsymbol{\beta }\!*\!U^\prime ,\nonumber \\ &\mathcal{V} = \boldsymbol{\alpha }\!*\!V + \boldsymbol{\beta }\!*\!V^\prime . \end{aligned}$ (41)

The rank of this semiseparable representation of C (number of columns in 𝒰, 𝒱) is thus the same as the rank of the underlying GP (number of columns in U, V). In the case of the more general model of Eq. (28), with several independent GP, it is straightforward to compute the semiseparable representation of the covariance matrix by vertical concatenation of the matrix 𝒰 and 𝒱 corresponding to each independent GP and the total rank is the sum of all the GP ranks.

Keeping the rank of the covariance matrix as low as possible allows us to significantly improve the performances of the method. Indeed, the cost of likelihood evaluations with a semiseparable covariance matrix of rank r scales as 𝒪(nr²). It is thus remarkable to note that introducing the derivative of the GP in the model, as well as different coefficients α and β for the different time series, does not increase the rank of the semiseparable representation of the covariance matrix. The factorization performed between Eqs. (39) and (40) is the key step that allows us to keep the same rank as the underlying GP. This factorization is achieved thanks to the specific choice of semiseparable representation introduced in Eq. (22). Indeed, instead of using (U, V, U′,V′) to represent the covariance matrix, one could use (U, V, U′,U″) or (U, V, V′,V″) (see Eqs. (17) and (20)). However, with these choices of representation, the covariance matrix would not factorize in the same way, the rank of its semiseparable representation would be twice the rank of the underlying GP, and the cost would thus quadruple.

4.2. Measurement noise and LEAF component

Using the covariance matrix representation of Eq. (40), the results described in Sect. 2 in the case of a homogeneous time series can be extended to the case of heterogeneous time series depending on several independent GP and their derivatives. The celerite algorithms can be applied in the case of purely white noise (Σ diagonal) and the S+LEAF algorithms can be applied to the more general case of close-to-diagonal correlated noise. The LEAF component of the S+LEAF model simply needs to be defined on the merged time series (t, y).

4.3. Computational cost and memory footprint

The computational cost of our model and its memory footprint are the same as the underlying celerite/S+LEAF representation. Therefore, the memory footprint scales as $O ((r + \bar{b}) n)$ ${\mathcal{O}}((r+\bar{b})n)$ and its computational cost as $O ((r^{2} + r \bar{b} + {\bar{b}}^{2}) n)$ ${\mathcal{O}}((r^2+r\bar{b}+\bar{b}^2)n)$ , where n is the total number of measurements (including radial velocities and indicators), r is the rank of 𝒰 and 𝒱, $\bar{b}$ $\bar{b}$ is the average band width of the LEAF component (considering the merged time series (t, y)). This is to be compared with a naive implementation of the same model which has a memory footprint in 𝒪(n²) and a computational cost in 𝒪(n³).

4.4. Overflows and preconditioning

As explained in Ambikasaran (2015), Foreman-Mackey et al. (2017), Delisle et al. (2020b), a naive computer implementation of the semiseparable decomposition of Eq. (9) can lead to overflows and underflows due to the exponential terms in the definition of U and V. However, Foreman-Mackey et al. (2017) proposed a simple preconditioning method to circumvent this issue in the case of the celerite model, which is also valid in the case of the S+LEAF model (Delisle et al. 2020b). Instead of using directly the matrices U and V, one might use the matrices $\tilde{U}$ $\tilde{U}$ , $\tilde{V}$ $\tilde{V}$ , and ϕ defined as

$\begin{matrix} {\tilde{U}}_{i, s} & = a_{s} cos (ν_{s} t_{i}) + b_{s} sin (ν_{s} t_{i}), \\ {\tilde{U}}_{i, n_{c} + s} & = a_{s} sin (ν_{s} t_{i}) - b_{s} cos (ν_{s} t_{i}), \\ {\tilde{V}}_{i, s} & = cos (ν_{s} t_{i}), \\ {\tilde{V}}_{i, n_{c} + s} & = sin (ν_{s} t_{i}), \\ ϕ_{i, s} & = ϕ_{i, n_{c} + s} = e^{- λ_{s} (t_{i + 1} - t_{i})}, \end{matrix}$ $\begin{aligned} \tilde{U}_{i,s}&= a_s \cos (\nu _s t_i) + b_s \sin (\nu _s t_i),\nonumber \\ \tilde{U}_{i,n_{\rm c}+s}&= a_s \sin (\nu _s t_i) - b_s\cos (\nu _s t_i),\nonumber \\ \tilde{V}_{i,s}&= \cos (\nu _s t_i),\nonumber \\ \tilde{V}_{i,n_{\rm c}+s}&= \sin (\nu _s t_i),\nonumber \\ \phi _{i,s}&= \phi _{i,n_{\rm c}+s} = \boldsymbol{e}^{-\lambda _s(t_{i+1}-t_i)}, \end{aligned}$ (42)

such that

$\begin{matrix} U_{i, s} V_{j, s} = {\tilde{U}}_{i, s} {\tilde{V}}_{j, s} \prod_{k = j}^{i - 1} ϕ_{k, s}, \end{matrix}$ $\begin{aligned} U_{i,s} V_{j,s} = \tilde{U}_{i,s} \tilde{V}_{j,s} \prod _{k=j}^{i-1} \phi _{k,s}, \end{aligned}$ (43)

and all the celerite/S+LEAF algorithms can be adapted to use this representation (see Foreman-Mackey et al. 2017; Foreman-Mackey 2018; Delisle et al. 2020b). Similarly, we define the matrices $\tilde{U}'$ $\tilde{U}\prime$ , ${\tilde{V}}^{'}$ $\tilde{V}^\prime$ , $\tilde{U}$ $\tilde{{\mathcal{U}}}$ , and $\tilde{V}$ $\tilde{{\mathcal{V}}}$ as

$\begin{matrix} {\tilde{U}}^{'}_{i, s} & = a_{s}^{'} cos (ν_{s} t_{i}) + b_{s}^{'} sin (ν_{s} t_{i}), \\ {\tilde{U}}^{'}_{i, n_{c} + s} & = a_{s}^{'} sin (ν_{s} t_{i}) - b_{s}^{'} cos (ν_{s} t_{i}), \\ {\tilde{V}}^{'}_{i, s} & = λ_{s} cos (ν_{s} t_{i}) - ν_{s} sin (ν_{s} t_{i}), \\ {\tilde{V}}^{'}_{i, n_{c} + s} & = λ_{s} sin (ν_{s} t_{i}) + ν_{s} cos (ν_{s} t_{i}), \\ \tilde{U} & = α * \tilde{U} + β * {\tilde{U}}^{'}, \\ \tilde{V} & = α * \tilde{V} + β * {\tilde{V}}^{'}, \end{matrix}$ $\begin{aligned} {\tilde{U}^{\prime }}_{i,s}&= a^\prime _s \cos (\nu _s t_i) + b^\prime _s \sin (\nu _s t_i),\nonumber // {\tilde{U}^{\prime }}_{i,n_{\rm c}+s}&= a^\prime _s \sin (\nu _s t_i) - b^\prime _s\cos (\nu _s t_i),\nonumber // {\tilde{V}^{\prime }}_{i,s}&= \lambda _s \cos (\nu _s t_i) - \nu _s \sin (\nu _s t_i),\nonumber \\ {\tilde{V}^{\prime }}_{i,n_{\rm c}+s}&= \lambda _s \sin (\nu _s t_i) + \nu _s\cos (\nu _s t_i),\nonumber \\ \tilde{\mathcal{U} }&= \boldsymbol{\alpha }\!*\!\tilde{U} + \boldsymbol{\beta }\!*\!\tilde{U}^{\prime },\nonumber \\ \tilde{\mathcal{V} }&= \boldsymbol{\alpha }\!*\!\tilde{V} + \boldsymbol{\beta }\!*\!\tilde{V}^{\prime }, \end{aligned}$ (44)

which allow to apply the overflow-proof version of celerite/S+LEAF algorithms. The same preconditioning method can be applied in the case of the Matérn 3/2 and 5/2 kernels (see Appendix A.3).

4.5. Efficient computation of the gradient

In most applications, one needs to explore the parameter space either to maximize the likelihood using optimization algorithms or to obtain samples from the posterior distribution of parameters using Bayesian methods (MCMC, nested sampling, etc.). In both cases, many algorithms have been designed that make use of the gradient of the likelihood with respect to the parameters to improve the convergence efficiency. Following Foreman-Mackey (2018) and Delisle et al. (2020b), we deduce gradient backpropagation algorithms for all the operations used to compute the likelihood. While a detailed presentation of these backpropagation algorithms would be cumbersome, we refer the reader to Delisle et al. (2020b) Appendix B for the general idea of the method, and to the reference S+LEAF 2 implementation² for further details.

5. Application: Reanalysis of HD 13808

In this section, we apply our algorithms to reanalyze the RV time series of HD 13808. This K2V dwarf is known to harbor two planet candidates (see Mayor et al. 2011) recently published as confirmed planets by Ahrer et al. (2021). In the latter study, the authors defined several alternative models of stellar activity and performed a Bayesian model comparison letting the number of planets vary. In addition to confirming the two candidates, Ahrer et al. (2021) concluded that the best stellar activity model was a GP trained simultaneously on the RV, BIS, and $log R_{HK}^{'}$ ${\log R_{\rm HK}^{\prime}}$ time series, following Eq. (2), as proposed by Rajpaul et al. (2015). Modeling this GP required the authors to solve for a 738 × 738 linear system billions of times, which was very demanding in terms of computational resources. Here, we reanalyze the 246 HARPS RV measurements of HD 13808, together with the indicators time series, but modeling the GP using S+LEAF 2.

5.1. Choice of kernel

The SEP kernel (see Eq. (1)), which was used by Ahrer et al. (2021) for their analysis of HD 13808, is not semiseparable and thus cannot be modeled with S+LEAF 2. However, other quasiperiodic kernels, such as the SHO kernel proposed by Foreman-Mackey et al. (2017), admit a semiseparable representation. Here, we aim to reproduce the main characteristics of the SEP kernel but using a semiseparable kernel. The SEP kernel is the product of a squared-exponential and the exponential of a sinusoidal (see Eq. (1)):

$\begin{matrix} k (Δ t) = σ^{2} exp (- \frac{Δ t^{2}}{2 ρ^{2}}) exp (- \frac{{sin}^{2} (\frac{π Δ t}{P})}{2 η^{2}}) . \end{matrix}$ $\begin{aligned} k(\Delta t) = \sigma ^2 \exp \left(-\frac{\Delta t^2}{2 \rho ^2}\right) \exp \left(-\frac{\sin ^2 \left(\frac{\pi \Delta t}{P}\right)}{2 \eta ^2}\right)\!. \end{aligned}$ (45)

The second part can be expanded as a power series, assuming 2η ≳ 1

$\begin{matrix} k (Δ t) & = σ^{2} exp (- \frac{Δ t^{2}}{2 ρ^{2}}) (1 - \frac{{sin}^{2} (\frac{π Δ t}{P})}{2 η^{2}} + \frac{{sin}^{4} (\frac{π Δ t}{P})}{8 η^{4}} + O (η^{- 6})) \\ = σ^{2} exp (- \frac{Δ t^{2}}{2 ρ^{2}}) \frac{1 + f cos (ν Δ t) + \frac{f^{2}}{4} cos (2 ν Δ t)}{1 + f + \frac{f^{2}}{4}} + O (f^{3}), \end{matrix}$ $\begin{aligned} k(\Delta t)&= \sigma ^2 \exp \left(-\frac{\Delta t^2}{2 \rho ^2}\right) \left(1 - \frac{\sin ^2 \left(\frac{\pi \Delta t}{P}\right)}{2 \eta ^2} + \frac{\sin ^4 \left(\frac{\pi \Delta t}{P}\right)}{8\eta ^4} + \mathcal{O} \left(\eta ^{-6}\right)\right)\nonumber \\&= \sigma ^2 \exp \left(-\frac{\Delta t^2}{2 \rho ^2}\right) \frac{1 + f\cos \left(\nu \Delta t\right) + \frac{f^2}{4} \cos \left(2\nu \Delta t\right)}{1+f+\frac{f^2}{4}} + \mathcal{O} \left(f^3\right)\!, \end{aligned}$ (46)

with f = (2η)⁻² and ν = 2π/P. This kernel thus introduces some correlation at the rotation period P, but also at the harmonics (P/2, P/3, etc.), with amplitudes decaying rapidly (scaling as η⁻²ⁿ for the harmonics P/n). The squared-exponential part implies that the correlations vanish over long timescales (Δt ≫ ρ). The squared-exponential kernel is not semiseparable, but the Matérn 1/2 (simple exponential decay), 3/2, and 5/2 kernels admit a semiseparable decomposition (see Appendix A), and offer a similar decay of the correlation over long timescales. The SEP kernel could thus be roughly approximated by:

$\begin{matrix} k (Δ t) = σ^{2} exp (- \frac{Δ t}{ρ}) \frac{1 + f cos (ν Δ t) + \frac{f^{2}}{4} cos (2 ν Δ t)}{1 + f + \frac{f^{2}}{4}} \cdot \end{matrix}$ $\begin{aligned} k(\Delta t) = \sigma ^2 \exp \left(-\frac{\Delta t}{\rho }\right) \frac{1 + f\cos \left(\nu \Delta t\right) + \frac{f^2}{4} \cos \left(2\nu \Delta t\right)}{1+f+\frac{f^2}{4}}\cdot \end{aligned}$ (47)

However, a GP following this kernel would not be differentiable (k′(0) = − σ²/ρ ≠ 0). In order to ensure differentiability, we introduce a modified kernel, which is the combination of a Matérn 3/2 kernel and two underdamped SHO terms

$\begin{matrix} k (Δ t) = σ^{2} \frac{k_{3 / 2} (Δ t) + f k_{SHO, fund .} (Δ t) + \frac{f^{2}}{4} k_{SHO, harm .} (Δ t)}{1 + f + \frac{f^{2}}{4}}, \end{matrix}$ $\begin{aligned} k(\Delta t) = \sigma ^2\frac{k_{3/2}(\Delta t) + f k_{\mathrm{SHO,\,fund.}}(\Delta t) + \frac{f^2}{4} k_{\mathrm{SHO,\,harm.}}(\Delta t)}{1+f+\frac{f^2}{4}}, \end{aligned}$ (48)

where

$\begin{matrix} k_{3 / 2} (Δ t) & = exp (- \frac{\sqrt{3} Δ t}{ρ}) (1 + \frac{\sqrt{3} Δ t}{ρ}), \\ k_{SHO, fund .} (Δ t) & = exp (- \frac{Δ t}{ρ}) (cos (ν Δ t) + \frac{1}{ν ρ} sin (ν Δ t)), \\ k_{SHO, harm .} (Δ t) & = exp (- \frac{Δ t}{ρ}) (cos (2 ν Δ t) + \frac{1}{2 ν ρ} sin (2 ν Δ t)) . \end{matrix}$ $\begin{aligned} k_{3/2}(\Delta t)&= \exp \left(-\frac{\sqrt{3}\Delta t}{\rho }\right)\left(1+\frac{\sqrt{3}\Delta t}{\rho }\right)\!,\nonumber \\ \\ k_{\mathrm{SHO,\,fund.}}(\Delta t)&= \exp \left(-\frac{\Delta t}{\rho }\right)\left(\cos \left(\nu \Delta t\right)+\frac{1}{\nu \rho }\sin \left(\nu \Delta t\right)\right)\!,\nonumber \\ k_{\mathrm{SHO,\,harm.}}(\Delta t)&= \exp \left(-\frac{\Delta t}{\rho }\right)\left(\cos \left(2\nu \Delta t\right)+\frac{1}{2\nu \rho }\sin \left(2\nu \Delta t\right)\right)\!. \end{aligned}$ (49)

The kernel of Eq. (48), which we refer to as the Matérn 3/2 exponential periodic (MEP) kernel in the following, is differentiable and presents the main characteristics of the SEP kernel, while being semiseparable. The semiseparable representation of the MEP kernel is of rank r = 6. We note that similar kernels have already been used in the literature to model stellar activity in photometric time series (e.g., David et al. 2019; Gillen et al. 2020).

It should be noted that the MEP kernel is once mean square differentiable but not twice, which means that G′ is well defined but it is not itself differentiable. Since the time series (RV and activity indicators) are modeled as combinations of G and G′, we could require G′ to be differentiable to obtain a smoother model. Such a twice mean square differentiable kernel should satisfy k⁽³⁾(0) = 0, in addition to the mandatory differentiability condition (k′(0) = 0). For the MEP kernel, $k_{MEP}^{(3)} (0)$ $k_{\mathrm{MEP}}^{(3)}(0)$ is non-zero, however, in practice this kernel seems to produce smooth time series (see Sect. 5.3). Nevertheless, it is possible to design semiseparable kernels that are rigorously twice differentiable. For instance, the Matérn 5/2 kernel is twice differentiable and semiseparable with a rank of 3 (see Appendix A). In Appendix B we present the exponential-sine (ES) and the exponential-sine periodic (ESP) kernels. Both kernels are twice differentiable and semiseparable. The ES kernel is of rank 3 and closely resemble the SE kernel, while the ESP kernel is of rank 15 and approximates the SEP kernel very well. We find very similar results when using the ESP kernel instead of the MEP kernel, while the cost of likelihood evaluations is roughly doubled because of the higher rank of the ESP kernel (see Appendix B). In the following, we thus adopt the MEP kernel as a replacement for the SEP kernel, to reduce the computational cost and since it produces smooth time series, at least in our case. In the general case, the ESP kernel would typically generate smoother times series than the MEP, but with a doubled cost.

Following Rajpaul et al. (2015) and Ahrer et al. (2021) we use the GP to model simultaneously the RV, BIS, and $log R_{HK}^{'}$ ${\log R_{\rm HK}^{\prime}}$ time series of HD 13808 according to (see Eq. (2)):

$\begin{matrix} Δ RV = α_{RV} G (t) + β_{RV} G^{'} (t), \\ Δ BIS = α_{BIS} G (t) + β_{BIS} G^{'} (t), \\ Δ log R_{HK}^{'} = α_{log R_{HK}^{'}} G (t), \end{matrix}$ $\begin{aligned}&\Delta \mathrm{RV} = \alpha _{\rm RV} G(t) + \beta _{\rm RV} G^\prime (t),\nonumber \\&\Delta \mathrm{BIS} = \alpha _{\rm BIS} G(t) + \beta _{\rm BIS} G^\prime (t),\nonumber \\&\Delta \log R_{\rm HK}^{\prime }= \alpha _{\log R_{\rm HK}^{\prime }} G(t), \end{aligned}$ (50)

where the coefficients α, β and the kernel’s hyperparameters (P, ρ, η) need to be determined.

5.2. Performances

To evaluate the performances of S+LEAF 2 for the model of Eqs. (48) and (50), we generated random RV, BIS, and $log R_{HK}^{'}$ ${\log R_{\rm HK}^{\prime}}$ time series and record the cost of the likelihood evaluation as a function of the number of generated measurements. We compared S+LEAF 2 performances using the MEP kernel with the computational cost of evaluating the likelihood using the full covariance matrix with the SEP kernel. The two implementations were run on the same computer, using a single core. The results are shown in Fig. 1 and confirm the 𝒪(n) scaling of S+LEAF 2, while the implementation of the full covariance matrix indeed scales as 𝒪(n³) for a large value of n. In the case of HD 13808, the total number of measurements is 738 (3 × 246), and based on Fig. 1, we obtain a gain of a factor ∼130 in computing time by using S+LEAF 2 instead of the naive implementation. For larger data sets, the gain would be even greater.

Fig. 1.

Cost of a likelihood evaluation as a function of the total number of measurements using S+LEAF 2 or the full covariance matrix (see Sect. 5.2).

In addition to these performances tests, we also ran numerical precision tests. We used the same GP model as for the performances tests and assessed the stability of S+LEAF 2 by computing CC⁻¹x, that is, applying the solving and dot product algorithms on a random merged time series x. In theory, we should find CC⁻¹x = x, however, due to the limited machine precision, numerical errors accumulate and the results stray slightly from x. We computed the root mean square (rms) of these errors, for the S+LEAF 2 methods and for the full covariance matrix (see Fig. 2). In both cases, numerical errors grow with the total number of measurements. However, the level and growth rate are lower when using S+LEAF 2. For a given number of measurements, the number of arithmetic operations is lower when using S+LEAF 2, which could explain the improvement in precision.

Fig. 2.

Precision of the linear solving operation as a function of the total number of measurements using S+LEAF 2 or the full covariance matrix (see Sect. 5.2).

5.3. Periodogram and false alarm probability

We analyzed HD 13808 data using the periodogram and false alarm probability (FAP) approach presented in Delisle et al. (2020a). For each frequency, all the linear parameters (here the RV and indicators offsets) are re-adjusted together with the amplitudes of the cosine and sine at the considered frequency (only applied to the RV time series). The framework of Delisle et al. (2020a) only requires slight adaptations to account for the joint fit of RV and activity indicators, which are detailed in Appendix C.

We first performed a fit of a base model including the offsets γ_RV, γ_BIS, $γ_{log R_{HK}^{'}}$ $\gamma_{{\log R_{\rm HK}^{\prime}}}$ , jitter terms added in quadrature to the measurements errorbars, σ_RV, σ_BIS, $σ_{log R_{HK}^{'}}$ $\sigma_{{\log R_{\rm HK}^{\prime}}}$ , the kernels hyperparameters, P_GP, ρ_GP, η_GP, and the amplitudes, α, β, by maximizing the likelihood. Then using the fitted noise parameters, we computed a first periodogram, reajusting the offsets for each considered frequency. The resulting periodogram is plotted in Fig. 3 (top). We observe a very significant peak at 14.19 d (FAP = 6.7 × 10⁻²²). We then fitted a Keplerian orbit to this signal, and reajusted all parameters. A second periodogram (Fig. 3, middle) was then computed on the residuals, still reajusting the offsets, but keeping the planetary and noise parameters fixed. A second significant peak is visible at 53.7 d (FAP = 7.7 × 10⁻¹²). As for the first planet, we performed a global fit of all the parameters after including this planet in the model. These fitted parameters are presented in Table 1, the kernel function corresponding to the fitted hyperparameters is illustrated in Fig. 5, and the residuals of each time series superimposed with the GP prediction are shown in Figs. 6 and 7. Finally, the periodogram of the residuals after fitting both planets (Fig. 3, bottom) does not show any significant peak (FAP above 1%). Our conclusions, based on this periodogram and FAP approach, thus agree with the findings of Ahrer et al. (2021), with two confirmed planets at 14.19 d and 53.7 d.

Fig. 3.

Periodograms of the raw RV time series of HD 13808 (top) as well as of the residuals after subtracting the 14.19 d (center) and the 53.7 d (bottom) planets.

Fig. 4.

Same as Fig. 3 but neglecting the harmonics component of the MEP kernel (see Eq. (48)).

Fig. 5.

Kernel function used to model HD 13808’s activity (MEP, see Eq. (48)). The GP hyperparameters are taken from the best fit of the two planets solution (Table 1). For comparison, the SEP kernel, which the MEP kernel is design to roughly mimic, is also plotted using the same set of hyperparameters.

Fig. 6.

GP prediction (conditional distribution) from the best fit of the two planets solution (Table 1). The prediction is plotted for the GP and its derivative (top) and the full GP prediction for the RV, BIS, and $log R_{HK}^{'}$ ${\log R_{\rm HK}^{\prime}}$ time series superimposed with the corresponding residuals (bottom three plots).

Fig. 7.

Zoom of Fig. 6 around epoch 2 453 720 BJD.

Table 1.

Maximum likelihood solution and POLYCHORD posterior for the model with a GP and two planets (at 14.19 and 53.7 d).

We note that the first harmonics part (k_{SHO, harm.}) in the MEP kernel (Eq. (48)) plays a key role in the GP modeling, even if its amplitude is significantly smaller than the fundamental term (k_{SHO, fund.}). Indeed, when neglecting the harmonics term (see Fig. 4), we find a third highly significant signal around 19 d (FAP = 4.3 × 10⁻⁵), which corresponds to half the period of the GP (P_GP ≈ 38 d, see Table 1). On the contrary, when including the harmonics part, the peak around 19 d is no more significant (see Fig. 3). This is not surprising since stellar activity is expected to introduce signals in the RV and indicators time series at the rotation period, but also at its harmonics, and, in particular, the first harmonics.

By design, a kernel which has power at the rotation harmonics will have a tendency to absorb signals at this period (here 19 d), even if those are not due to stellar activity. To further test whether the 19 d signal could be due to a planet, Hara et al. (2022a) checked the consistency of this signal over the timespan of the data set. The authors showed that the 19 d signal, while not statistically significant, exhibits a stable presence accross the data set, which might motivate further observations.

5.4. Bayesian framework and false inclusion probability

In order to more directly compare our results with the study of Ahrer et al. (2021), we performed a full Bayesian evidence analysis using the nested sampling algorithm POLYCHORD (Handley et al. 2015). Recently, POLYCHORD was used for radial velocity exoplanet detection by Ahrer et al. (2021), Rajpaul et al. (2021), and Unger et al. (2021). We aimed at reproducing a similar analysis as Ahrer et al. (2021), but using S+LEAF 2 and the GP model detailed in Sect. 5.1.

The prior distributions we used for all parameters are detailed in Table 2. We kept mostly the same priors used by Ahrer et al. (2021), with only a few changes. We set the amplitude for the RV term of the GP to be strictly positive to avoid degeneracies in the sign of the amplitudes. We also shifted the prior for the mean longitudes from [0, 2π] to [ − π, π]. Indeed, the mean longitude of planet c is very close to 0, and this shift avoids splitting the peak in the posterior, thus improving the convergence efficiency. For all runs we used a number of live points equal to 50 n_dim, that is, 50 times the number of free parameters. For the precision criterion (stopping criterion), we used 10⁻⁹.

Table 2.

Prior distributions used for each parameter in the nested sampling runs with POLYCHORD.

We ran POLYCHORD for models with zero up to three planets and each model was run five times to obtain estimates of the value and uncertainty of the evidence. The evidence for each model is presented in Table 3. We see a steady and significant increase in evidence up to the model with two planets. The two and three planet models present similar evidence (Δln Z = 3.5 ± 2.5, compatible at 1.4σ). Moreover, even if the three planet model is marginally favored, no clear period for a third planet emerges from the posteriors. To illustrate this, we computed the false inclusion probability (FIP) periodogram (Hara et al. 2022b) from the posteriors of all our runs with zero to three planets (see Fig. 8). The true inclusion probability (TIP) provides the probability for the system to host at least one planet in a given (small) range of period. On the contrary, the FIP (=1–TIP) is the probability that the system does not host any planet in the given range of period. The FIP periodogram of Fig. 8 is computed with a window size of 1/(t_max − t_min) in frequency. We observe in Fig. 8 two very significant peaks (FIP less than 10⁻⁸) around 14.1 d and 53.7 d, which confirms that these planets should be included in the model. Then, two smaller peaks around 12 d and 19 d are also visible but none of them is significant (FIP higher than 70%). The posterior distribution of parameters of the two-planet runs is given in Table 1. Our results are in agreement with the periodogram and FAP approach detailed in Sect. 5.3, and with Ahrer et al. (2021) findings.

Fig. 8.

FIP periodogram for HD 13808. In blue we represent the FIP (false inclusion probability) and in yellow the TIP (true inclusion probability).

Table 3.

Evidence of each considered model in our POLYCHORD runs.

6. Conclusion

In this article, we have presented S+LEAF 2, a GP framework that is able to efficiently model multiple time series simultaneously. Classical GP models have a computational cost that scales as the cube of the number of measurements, which makes them prohibitive in terms of computational resources for large data sets. The computational cost of our GP framework scales linearly with the data set size, which allows for tractable computations even for large data sets.

This work builds on previous studies that provided efficient GP models for single time series (in particular the celerite and S+LEAF models, see Rybicki & Press 1995; Ambikasaran 2015; Foreman-Mackey et al. 2017; Delisle et al. 2020b). It is also inspired by a recent generalization of celerite to the case of two-dimensional data sets (Gordon et al. 2020), but extends it by accounting for the GP derivatives. These derivatives are especially important when modeling the effect of stellar activity on RV time series (see Aigrain et al. 2012; Rajpaul et al. 2015). Our framework additionally accounts for time series that do not share the same calendar, which is useful to train a GP simultaneously on RV and photometric measurements taken with two different instruments (e.g., Haywood et al. 2014).

We applied our methods to reanalyze the RV time series of the nearby K2 dwarf HD 13808. Our results are very similar to a recent state-of-the-art study of the same system (Ahrer et al. 2021) and we confirm the two planets announced in this article. However, we have shown that using our framework allowed us to dramatically decrease (by more than two orders of magnitude) the computational cost of the GP modeling. While the data set analyzed here consists of 738 measurements (RV, BIS, and $log R_{HK}^{'}$ ${\log R_{\rm HK}^{\prime}}$ at 246 epochs), the gain of using S+LEAF 2 would be even greater for larger data sets. Some data sets (e.g., HARPS or HARPS-N Sun-as-a-star RV time series) that could not have been analyzed with such a GP modeling are now achievable with our GP framework.

Finally, it is worth noting that the results from the periodogram and FAP approach are consistent with the much more computer intensive Bayesian evidence calculations using nested sampling. This illustrates the power of the periodogram and FAP computation including a correlated noise model, as proposed by Delisle et al. (2020a).

¹

https://gitlab.unige.ch/jean-baptiste.delisle/spleaf

²

https://gitlab.unige.ch/jean-baptiste.delisle/spleaf

Acknowledgments

We thank the referee, D. Foreman-Mackey, for his very constructive feedback that helped to improve this manuscript. We acknowledge financial support from the Swiss National Science Foundation (SNSF). This work has, in part, been carried out within the framework of the National Centre for Competence in Research PlanetS supported by SNSF.

References

Ahrer, E., Queloz, D., Rajpaul, V. M., et al. 2021, MNRAS, 503, 1248 [NASA ADS] [CrossRef] [Google Scholar]
Aigrain, S., Pont, F., & Zucker, S. 2012, MNRAS, 419, 3147 [NASA ADS] [CrossRef] [Google Scholar]
Ambikasaran, S. 2015, Numer. Linear Algebra Appl., 22, 1102 [CrossRef] [Google Scholar]
Baluev, R. V. 2008, MNRAS, 385, 1279 [Google Scholar]
Blackman, R. T., Fischer, D. A., Jurgenson, C. A., et al. 2020, AJ, 159, 238 [CrossRef] [Google Scholar]
David, T. J., Petigura, E. A., Luger, R., et al. 2019, ApJ, 885, L12 [Google Scholar]
Delisle, J. B., Hara, N., & Ségransan, D. 2020a, A&A, 635, A83 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Delisle, J. B., Hara, N., & Ségransan, D. 2020b, A&A, 638, A95 [EDP Sciences] [Google Scholar]
Dumusque, X., Udry, S., Lovis, C., Santos, N. C., & Monteiro, M. J. P. F. G. 2011, A&A, 525, A140 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Dumusque, X., Pepe, F., Lovis, C., et al. 2012, Nature, 491, 207 [Google Scholar]
Foreman-Mackey, D. 2018, Res. Notes Am. Astron. Soc., 2, 31 [NASA ADS] [Google Scholar]
Foreman-Mackey, D., Agol, E., Ambikasaran, S., & Angus, R. 2017, AJ, 154, 220 [Google Scholar]
Gillen, E., Briegal, J. T., Hodgkin, S. T., et al. 2020, MNRAS, 492, 1008 [Google Scholar]
Gordon, T. A., Agol, E., & Foreman-Mackey, D. 2020, AJ, 160, 240 [NASA ADS] [CrossRef] [Google Scholar]
Handley, W. J., Hobson, M. P., & Lasenby, A. N. 2015, MNRAS, 453, 4384 [Google Scholar]
Hara, N. C., Delisle, J.-B., Unger, N., & Dumusque, X. 2022a, A&A, 658, A177 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Hara, N. C., Unger, N., Delisle, J.-B., Díaz, R., & Ségransan, D. 2022b, A&A, accepted, https://doi.org/10.1051/0004-6361/202140543 [Google Scholar]
Haywood, R. D., Collier Cameron, A., Queloz, D., et al. 2014, MNRAS, 443, 2517 [Google Scholar]
Jones, D. E., Stenning, D. C., Ford, E. B., et al. 2017, ArXiv e-prints [arXiv:1711.01318] [Google Scholar]
Jordán, A., Eyheramendy, S., & Buchner, J. 2021, Res. Notes Am. Astron. Soc., 5, 107 [Google Scholar]
Mayor, M., Marmier, M., Lovis, C., et al. 2011, ArXiv e-prints [arXiv:1109.2497] [Google Scholar]
Noyes, R. W., Hartmann, L. W., Baliunas, S. L., Duncan, D. K., & Vaughan, A. H. 1984, ApJ, 279, 763 [Google Scholar]
Pepe, F., Cristiani, S., Rebolo, R., et al. 2021, A&A, 645, A96 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Queloz, D., Henry, G. W., Sivan, J. P., et al. 2001, A&A, 379, 279 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Rajpaul, V., Aigrain, S., Osborne, M. A., Reece, S., & Roberts, S. 2015, MNRAS, 452, 2269 [Google Scholar]
Rajpaul, V. M., Buchhave, L. A., Lacedelli, G., et al. 2021, MNRAS, 507, 1847 [NASA ADS] [CrossRef] [Google Scholar]
Rasmussen, C. E., & Williams, C. K. I. 2006, Gaussian Processes for Machine Learning (MIT Press) [Google Scholar]
Rybicki, G. B., & Press, W. H. 1995, Phys. Rev. Lett., 74, 1060 [NASA ADS] [CrossRef] [Google Scholar]
Unger, N., Ségransan, D., Queloz, D., et al. 2021, A&A, 654, A104 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

Appendix A: Semiseparable representation of Matérn 3/2 and Matérn 5/2 kernels

The Matérn 3/2 and 5/2 covariances are widely used in various fields of statistics. Their kernel functions are written as

$\begin{matrix} k_{3 / 2} (Δ t) = σ^{2} (1 + Δ x) e^{- Δ x}, \\ k_{5 / 2} (Δ t) = σ^{2} (1 + Δ x + \frac{1}{3} Δ x^{2}) e^{- Δ x}, \end{matrix}$ $\begin{aligned}&k_{3/2}(\Delta t) = \sigma ^2 \left(1 + \Delta x\right)\mathbf e ^{-\Delta x},\nonumber \\&k_{5/2}(\Delta t) = \sigma ^2 \left(1 + \Delta x + \frac{1}{3}\Delta x^2\right)\mathbf e ^{-\Delta x}, \end{aligned}$ (A.1)

where x is the rescaled time:

$\begin{matrix} x & = \sqrt{3} \frac{t}{ρ} for the Mat \overset{´}{e} rn 3 / 2 kernel, \\ x & = \sqrt{5} \frac{t}{ρ} for the Mat \overset{´}{e} rn 5 / 2 kernel, \end{matrix}$ $\begin{aligned} x&= \sqrt{3} \frac{t}{\rho } \qquad \mathrm{for\,the\,Mat\acute{e}rn\,3/2\,kernel},\nonumber \\ x&= \sqrt{5} \frac{t}{\rho } \qquad \mathrm{for\,the\,Mat\acute{e}rn\,5/2\,kernel}, \end{aligned}$ (A.2)

and σ and ρ are the GP hyperparameters.

A.1. Semiseparable representation

For two times t_i > t_j, we have

$\begin{matrix} k_{3 / 2} (t_{i} - t_{j}) = σ^{2} ((x_{i} e^{- x_{i}}) e^{x_{j}} + e^{- x_{i}} ((1 - x_{j}) e^{x_{j}})), \end{matrix}$ $\begin{aligned} k_{3/2}(t_i-t_j) = \sigma ^2 \left(\left(x_i \mathbf e ^{-x_i}\right) \mathbf e ^{x_j} + \mathbf e ^{-x_i}\left((1-x_j)\mathbf e ^{x_j}\right)\right)\!, \end{aligned}$ (A.3)

which provides the semiseparable representation of rank 2

$\begin{matrix} A_{i} = σ^{2}, \\ U_{i, 1} = σ^{2} x_{i} e^{- x_{i}}, V_{i, 1} = e^{x_{i}}, \\ U_{i, 2} = σ^{2} e^{- x_{i}}, V_{i, 2} = (1 - x_{i}) e^{x_{i}} \end{matrix}$ $\begin{aligned}&A_i = \sigma ^2,\nonumber \\&U_{i,1} = \sigma ^2 x_i \mathbf e ^{-x_i}, \qquad V_{i,1} = \mathbf e ^{x_i},\nonumber \\&U_{i,2} = \sigma ^2 \mathbf e ^{-x_i},\qquad V_{i,2} = (1 - x_i) \mathbf e ^{x_i} \end{aligned}$ (A.4)

for the Matérn 3/2 kernel. Similarly, the Matérn 5/2 kernel admits the semiseparable representation of rank 3

$\begin{matrix} A_{i} = σ^{2}, \\ U_{i, 1} = σ^{2} (x_{i} + \frac{x_{i}^{2}}{3}) e^{- x_{i}}, V_{i, 1} = e^{x_{i}}, \\ U_{i, 2} = σ^{2} e^{- x_{i}}, V_{i, 2} = (1 - x_{i} + \frac{x_{i}^{2}}{3}) e^{x_{i}}, \\ U_{i, 3} = σ^{2} x_{i} e^{- x_{i}}, V_{i, 3} = - \frac{2}{3} x_{i} e^{x_{i}} . \end{matrix}$ $\begin{aligned}&A_i = \sigma ^2,\nonumber \\&U_{i,1} = \sigma ^2 \left(x_i + \frac{x_i^2}{3}\right) \mathbf e ^{-x_i},\qquad V_{i,1} = \mathbf e ^{x_i},\nonumber \\&U_{i,2} = \sigma ^2 \mathbf e ^{-x_i},\qquad V_{i,2} = \left(1 - x_i + \frac{x_i^2}{3}\right) \mathbf e ^{x_i},\nonumber \\&U_{i,3} = \sigma ^2 x_i \mathbf e ^{-x_i},\qquad V_{i,3} = -\frac{2}{3} x_i \mathbf e ^{x_i}. \end{aligned}$ (A.5)

These representations are not unique and the choice of splitting into the 2 (Matérn 3/2) or 3 (Matérn 5/2) semiseparable terms is arbitrary.

A.2. Derivative of a Matérn Gaussian process

A GP following the Matérn 3/2 or 5/2 kernel is always differentiable, independently of the hyperparameters. Following the same reasoning as in Sect. 3, we compute the derivatives of U and V, as well as the matrix B, which appear in the covariance matrix between the GP G(t) and its derivative G′(t) and in the covariance matrix of G′(t) itself (see Eq. (22)). We find

$\begin{matrix} U_{i, 1}^{'} & = \frac{\sqrt{3} σ^{2}}{ρ} (1 - x_{i}) e^{- x_{i}}, \\ U_{i, 2}^{'} & = - \frac{\sqrt{3} σ^{2}}{ρ} e^{- x_{i}}, \\ V_{i, 1}^{'} & = \frac{\sqrt{3}}{ρ} e^{x_{i}}, \\ V_{i, 2}^{'} & = - \frac{\sqrt{3}}{ρ} x_{i} e^{x_{i}}, \\ B_{i} & = \frac{3 σ^{2}}{ρ^{2}}, \end{matrix}$ $\begin{aligned} U^\prime _{i,1}&= \frac{\sqrt{3}\sigma ^2}{\rho } (1 - x_i) \mathbf e ^{-x_i},\nonumber \\ U^\prime _{i,2}&= -\frac{\sqrt{3}\sigma ^2}{\rho } \mathbf e ^{-x_i},\nonumber \\ V^\prime _{i,1}&= \frac{\sqrt{3}}{\rho } \mathbf e ^{x_i},\nonumber \\ V^\prime _{i,2}&= -\frac{\sqrt{3}}{\rho } x_i \mathbf e ^{x_i},\nonumber \\ B_i&= \frac{3 \sigma ^2}{\rho ^2}, \end{aligned}$ (A.6)

for the Matérn 3/2 kernel, and

$\begin{matrix} U_{i, 1}^{'} & = \frac{\sqrt{5} σ^{2}}{3 ρ} (3 - x_{i} - x_{i}^{2}) e^{- x_{i}}, \\ U_{i, 2}^{'} & = - \frac{\sqrt{5} σ^{2}}{ρ} e^{- x_{i}}, \\ U_{i, 3}^{'} & = \frac{\sqrt{5} σ^{2}}{ρ} (1 - x_{i}) e^{- x_{i}}, \\ V_{i, 1}^{'} & = \frac{\sqrt{5}}{ρ} e^{x_{i}}, \\ V_{i, 2}^{'} & = \frac{\sqrt{5}}{3 ρ} (x_{i}^{2} - x_{i}) e^{x_{i}}, \\ V_{i, 3}^{'} & = - \frac{2 \sqrt{5}}{3 ρ} (1 + x_{i}) e^{x_{i}}, \\ B_{i} & = \frac{5 σ^{2}}{3 ρ^{2}}, \end{matrix}$ $\begin{aligned} U^\prime _{i,1}&= \frac{\sqrt{5}\sigma ^2}{3\rho } \left(3-x_i-x_i^2\right) \mathbf e ^{-x_i},\nonumber \\ U^\prime _{i,2}&= -\frac{\sqrt{5}\sigma ^2}{\rho } \mathbf e ^{-x_i},\nonumber \\ U^\prime _{i,3}&= \frac{\sqrt{5}\sigma ^2}{\rho } (1 - x_i)\mathbf e ^{-x_i},\nonumber \\ V^\prime _{i,1}&= \frac{\sqrt{5}}{\rho } \mathbf e ^{x_i},\nonumber \\ V^\prime _{i,2}&= \frac{\sqrt{5}}{3\rho } \left(x_i^2-x_i\right)\mathbf e ^{x_i},\nonumber \\ V^\prime _{i,3}&= -\frac{2\sqrt{5}}{3\rho } (1 + x_i)\mathbf e ^{x_i},\nonumber \\ B_i&= \frac{5 \sigma ^2}{3\rho ^2}, \end{aligned}$ (A.7)

for the Matérn 5/2 kernel.

A.3. Overflows and preconditioning

To avoid overflows (see Sect. 4.4), these representations can be adapted to use the same preconditioning method as in the case of the classical celerite quasiperiodic terms. The preconditioning matrix ϕ is defined as

$\begin{matrix} ϕ_{i, s} = e^{- (x_{i + 1} - x_{i})}, \end{matrix}$ $\begin{aligned} \phi _{i,s} = \mathbf e ^{-(x_{i+1}-x_i)}, \end{aligned}$ (A.8)

and the preconditioned matrices $\tilde{U}$ $\tilde{U}$ , $\tilde{V}$ $\tilde{V}$ , $\tilde{U}'$ $\tilde{U}\prime$ , and ${\tilde{V}}^{'}$ $\tilde{V}^\prime$ are obtained by dropping the exponential terms from the definitions of U, V, U′, and V′. For instance, we have

$\begin{matrix} {\tilde{U}}_{i, 1} = σ^{2} x_{i}, {\tilde{V}}_{i, 1} = 1, \\ {\tilde{U}}_{i, 2} = σ^{2}, {\tilde{V}}_{i, 2} = 1 - x_{i}, \\ {\tilde{U}}^{'}_{i, 1} = \frac{\sqrt{3} σ^{2}}{ρ} (1 - x_{i}), {\tilde{V}}^{'}_{i, 1} = \frac{\sqrt{3}}{ρ}, \\ {\tilde{U}}^{'}_{i, 2} = - \frac{\sqrt{3} σ^{2}}{ρ}, {\tilde{V}}^{'}_{i, 2} = - \frac{\sqrt{3}}{ρ} x_{i}, \\ ϕ_{i, 1} = ϕ_{i, 2} = e^{- (x_{i + 1} - x_{i})}, \end{matrix}$ $\begin{aligned}&\tilde{U}_{i,1} = \sigma ^2 x_i, \qquad \tilde{V}_{i,1} = 1,\nonumber \\&\tilde{U}_{i,2} = \sigma ^2,\qquad \tilde{V}_{i,2} = 1 - x_i,\nonumber \\&{\tilde{U}^{\prime }}_{i,1} = \frac{\sqrt{3}\sigma ^2}{\rho } (1 - x_i), \qquad {\tilde{V}^{\prime }}_{i,1} = \frac{\sqrt{3}}{\rho },\nonumber \\&{\tilde{U}^{\prime }}_{i,2} = -\frac{\sqrt{3}\sigma ^2}{\rho }, \qquad {\tilde{V}^{\prime }}_{i,2} = -\frac{\sqrt{3}}{\rho } x_i,\nonumber \\&\phi _{i,1} = \phi _{i,2} = \mathbf e ^{-(x_{i+1}-x_i)}, \end{aligned}$ (A.9)

for the Matérn 3/2 covariance matrix.

While this preconditioning allows us to prevent overflows and underflows due to the exponential terms, weaker numerical instabilities could arise due to the presence of x_i and $x_{i}^{2}$ $ x_i^2 $ in the definitions of the preconditioned matrices. The absolute values of the rescaled times x_i should thus be kept as small as possible to improve numerical stability. Since the Matérn kernels are stationary (i.e., k only depends on Δt), a reference time t₀ can be chosen arbitrarily, and the definition of x (Eq. (A.2)) can be adapted accordingly

$\begin{matrix} x = \sqrt{3} \frac{t - t_{0}}{ρ} for the Mat \overset{´}{e} rn 3 / 2 kernel, \\ x = \sqrt{5} \frac{t - t_{0}}{ρ} for the Mat \overset{´}{e} rn 5 / 2 kernel . \end{matrix}$ $\begin{aligned}&x = \sqrt{3} \frac{t-t_0}{\rho }\qquad \mathrm{for\,the\,Mat\acute{e}rn\,3/2\,kernel},\nonumber \\&x = \sqrt{5} \frac{t-t_0}{\rho }\qquad \mathrm{for\,the\,Mat\acute{e}rn\,5/2\,kernel}. \end{aligned}$ (A.10)

For instance, we could use

$\begin{matrix} t_{0} = \frac{min (t) + max (t)}{2} \end{matrix}$ $\begin{aligned} t_0 = \frac{\min (t)+\max (t)}{2} \end{aligned}$ (A.11)

to avoid x_i values that are too large. This might not be sufficient for a very large time span compared to the decay timescale, ρ; particularly in the case of the Matérn 5/2 kernel, which contains quadratic terms ( $x_{i}^{2}$ $ x_i^2 $ ). Recently, Jordán et al. (2021) proposed a state-space representation for the Matérn 3/2 and 5/2 kernels, which allows a similar linear scaling of the likelihood evaluation with improved numerical stability.

Appendix B: Twice mean square differentiable semiseparable kernels

In this appendix we construct several twice mean square differentiable semiseparable kernels. When modeling time series as combinations of a GP and its derivative, the chosen GP kernel must at least be once mean square differentiable for the model to be valid. However, using a twice mean square differentiable kernel ensures that the GP’s derivative is itself differentiable, which typically generates smoother models.

B.1. Alternatives to the SE kernel

The Matérn 5/2 kernel is a widely spread kernel which has the advantage of being both twice differentiable and semiseparable with rank r = 3 (see Appendix A). If twice differentiability is required, it is thus a natural alternative to the squared-exponential (SE) kernel:

$\begin{matrix} k_{SE} (Δ t) = σ^{2} exp (- \frac{Δ t^{2}}{2 ρ^{2}}) \end{matrix}$ $\begin{aligned} k_{\rm SE}(\Delta t) = \sigma ^2\exp \left(-\frac{\Delta t^2}{2\rho ^2}\right) \end{aligned}$ (B.1)

since the latter cannot be modeled with celerite/S+LEAF. Higher order Matérn kernels, such as the Matérn 7/2 kernel could also be used as they are more than twice differentiable and admit semiseparable representations. However, the rank of their semiseparable representations would be higher, which would increase the cost of likelihood evaluations.

We additionally propose here the exponential-sine (ES) kernel

$\begin{matrix} k_{ES} (Δ t) = σ^{2} e^{- λ Δ t} & (1 + \frac{1 - 2 μ^{- 2}}{3} (cos (μ λ Δ t) - 1) \\ + μ^{- 1} sin (μ λ Δ t)), \end{matrix}$ $\begin{aligned} k_{\rm ES}(\Delta t) = \sigma ^2 \mathbf e ^{-\lambda \Delta t}&\left(1 + \frac{1-2\mu ^{-2}}{3} \left(\cos (\mu \lambda \Delta t)- 1\right)\right.\nonumber \\&\left.+ \mu ^{-1}\sin (\mu \lambda \Delta t)\right)\!, \end{aligned}$ (B.2)

which is also twice differentiable and semiseparable with rank 3. Its cost is thus similar to the Matérn 5/2 kernel. The corresponding power spectral density (PSD)

$\begin{matrix} S_{ES} (ω) = \frac{2 \sqrt{2}}{3 \sqrt{π}} \frac{σ^{2} (1 + μ^{2}) (4 + μ^{2})}{λ (1 + {(ω / λ)}^{2}) ({(1 + {(ω / λ)}^{2} - μ^{2})}^{2} + 4 μ^{2})} \end{matrix}$ $\begin{aligned} S_{\rm ES}(\omega ) = \frac{2\sqrt{2}}{3\sqrt{\pi }} \frac{\sigma ^2\left(1 + \mu ^2\right) \left(4 + \mu ^2\right)}{\lambda \left(1+\left(\omega /\lambda \right)^2\right) \left(\left(1+\left(\omega /\lambda \right)^2 - \mu ^2\right)^2 + 4\mu ^2\right)} \end{aligned}$ (B.3)

is always positive (for λ > 0), which ensures the positive definiteness of the kernel (e.g., Foreman-Mackey et al. 2017). The parameters λ and μ can be chosen arbitrarily, but using

$\begin{matrix} λ & \approx \frac{1.091}{ρ}, \\ μ & \approx 1.327, \end{matrix}$ $\begin{aligned} \lambda&\approx \frac{1.091}{\rho },\nonumber \\ \mu&\approx 1.327, \end{aligned}$ (B.4)

makes the deviation between the SE and the ES kernels below 0.009σ² for all lags Δt. Figure B.1 illustrates this by comparing the SE, ES, and Matérn 5/2 kernels and PSD.

Fig. B.1.

Comparison of the kernel functions (top) and power spectral density (bottom) of the SE, ES, and Matérn 5/2 kernels. The timescale of the Matérn 5/2 kernel is adjusted such as to minimize the maximum deviation from the SE kernel.

B.2. Alternatives to the SEP kernel

As seen in Eq. (46) the SEP kernel can be approximated by

$\begin{matrix} k_{SEP} (Δ t) \approx k_{SE} (Δ t) \frac{1 + f cos (ν Δ t) + \frac{f^{2}}{4} cos (2 ν Δ t)}{1 + f + \frac{f^{2}}{4}} . \end{matrix}$ $\begin{aligned} k_{\rm SEP}(\Delta t) \approx k_{\rm SE}(\Delta t) \frac{1 + f\cos \left(\nu \Delta t\right) + \frac{f^2}{4} \cos \left(2\nu \Delta t\right)}{1+f+\frac{f^2}{4}}. \end{aligned}$ (B.5)

In this expression, the periodic part

$\begin{matrix} k_{P} (Δ t) = \frac{1 + f cos (ν Δ t) + \frac{f^{2}}{4} cos (2 ν Δ t)}{1 + f + \frac{f^{2}}{4}} \end{matrix}$ $\begin{aligned} k_{\rm P}(\Delta t) = \frac{1 + f\cos \left(\nu \Delta t\right) + \frac{f^2}{4} \cos \left(2\nu \Delta t\right)}{1+f+\frac{f^2}{4}} \end{aligned}$ (B.6)

is semiseparable and verifies $k_{P}^{'} (0) = k_{P}^{(3)} (0) = 0$ $k{\prime}_{\mathrm{P}}(0)=k^{(3)}_{\mathrm{P}}(0)=0$ . Thus, in order to obtain a twice differentiable semiseparable kernel similar to the SEP kernel, we simply need to replace the SE part in Eq. (B.5) by a Matérn 5/2 or ES kernel and we define

$\begin{matrix} k_{5 / 2 P} (Δ t) & = k_{5 / 2} (Δ t) k_{P} (Δ t), \end{matrix}$ $\begin{aligned} k_{\rm 5/2 P}(\Delta t)&= k_{5/2}(\Delta t) k_{\rm P}(\Delta t), \end{aligned}$ (B.7)

$\begin{matrix} k_{ESP} (Δ t) & = k_{ES} (Δ t) k_{P} (Δ t) . \end{matrix}$ $\begin{aligned} k_{\rm ESP}(\Delta t)&= k_{\rm ES}(\Delta t) k_{\rm P}(\Delta t). \end{aligned}$ (B.8)

Indeed, the product of two semiseparable terms is semiseparable (see Foreman-Mackey et al. 2017) and since

$\begin{matrix} (f g)' & = f' g + f g', \\ {(f g)}^{(3)} & = f^{(3)} g + 3 f ″ g' + 3 f' g ″ + f g^{(3)} \end{matrix}$ $\begin{aligned} (fg)\prime&= f^\prime g + fg^\prime ,\nonumber \\ (fg)^{(3)}&= f^{(3)}g + 3f^{\prime \prime }g^\prime + 3f^\prime g^{\prime \prime }+ fg^{(3)} \end{aligned}$ (B.9)

the first and third derivatives of k_5/2P and k_ESP also cancel out at Δt = 0. The PSD of these two kernels are given by

$\begin{matrix} S_{k P} (ω) = \frac{1}{1 + f + \frac{f^{2}}{4}} & (S_{k} (ω) + f \frac{S_{k} (ω + ν) + S_{k} (ω - ν)}{2} \\ + \frac{f^{2}}{4} \frac{S_{k} (ω + 2 ν) + S_{k} (ω - 2 ν)}{2}), \end{matrix}$ $\begin{aligned} S_{k\mathrm{P}}(\omega ) = \frac{1}{1 + f + \frac{f^2}{4}}&\left(S_k(\omega ) + f\frac{S_k(\omega +\nu )+S_k(\omega -\nu )}{2}\right.\nonumber \\&\left.+ \frac{f^2}{4}\frac{S_k(\omega +2\nu )+S_k(\omega -2\nu )}{2} \right)\!, \end{aligned}$ (B.10)

where k = 5/2 or ES. Since the PSD (S_k) of the Matérn 5/2 and ES kernels are strictly positive for all frequencies and the coefficient f = (2η)⁻² is strictly positive, we find that S_5/2P and S_ESP are also strictly positive for all frequencies. The two corresponding kernels are thus positive definite.

The rank of the semiseparable representations of k_5/2P and k_ESP is r = 15, since they are the product of a rank 3 kernel (Matérn 5/2 or ES) and a rank 5 kernel (periodic part). As a comparison, the MEP kernel (see Eq. 48), which is not twice differentiable, is of a rank of 6.

We reproduced the analyses of Sects. 5.2 and 5.3 using the ESP kernel instead of the MEP kernel. The results are presented in Figs. B.2–B.6. The cost of likelihood evaluations using the ESP kernel is about twice the cost of using the MEP kernel (see Fig. B.2), which is still much more efficient than modeling the full covariance matrix. The periodograms (Fig. B.3), as well as the GP prediction (Figs. B.4 and B.5) are very similar to the ones obtained with the MEP kernel (Figs. 3, 6, and 7). Finally, we see in Fig. B.6 that the ESP kernel reproduces the SEP kernel very closely while the MEP kernel mimics it more roughly (see Fig. 5). However, these differences between the MEP and the ESP (or SEP) kernels seem to have a very weak impact on our analysis, since the periodograms and GP prediction are similar in both cases.

Fig. B.2.

Same as Fig. 1 but including the ESP kernel in the performance comparison.

Fig. B.3.

Same as Fig. 3 but using the ESP kernel instead of the MEP kernel.

Fig. B.4.

Same as Fig. 6 but using the ESP kernel instead of the MEP kernel.

Fig. B.5.

Same as Fig. 7 but using the ESP kernel instead of the MEP kernel.

Fig. B.6.

Same as Fig. 5 but using the ESP kernel instead of the MEP kernel.

Appendix C: Periodogram and FAP for heterogeneous time series

We consider here the case of an heterogeneous time series following Eq. (28) and we are aimed at detecting a periodic signal affecting the first time series Y₁ only. The frameworks of Baluev (2008) and Delisle et al. (2020a) defining a general class of linear periodograms with their associated analytical FAP approximations can be applied to the merged time series y of Eq. (31) with a slight modification. We thus refer to Delisle et al. (2020a) for the details of the framework and we focus here on the required adaptations. Following Delisle et al. (2020a), we compare the χ² of a linear base model ℋ of p parameters with enlarged models 𝒦 of p + d parameters, parameterized by the frequency ν. The base model is defined as

$\begin{matrix} H : m_{H} (θ_{H}) = φ_{H} θ_{H}, \end{matrix}$ $\begin{aligned} \mathcal{H} \ : \quad m_\mathcal{H} (\theta _\mathcal{H} ) = \varphi _\mathcal{H} \theta _\mathcal{H} , \end{aligned}$ (C.1)

where θ_ℋ is the vector of size p of the model parameters, φ_ℋ is a n × p matrix, and n the total number of points in the merged time series y. The columns of φ_H are explanatory time series that are scaled by the linear parameters θ_ℋ. For instance, if we consider the merged time series of RV, BIS, and $log R_{HK}^{'}$ ${\log R_{\rm HK}^{\prime}}$ , and we include in the model an offset for each of these time series, we would have to define:

$\begin{matrix} m_{H} = γ_{RV} δ_{RV} + γ_{BIS} δ_{BIS} + γ_{log R_{HK}^{'}} δ_{log R_{HK}^{'}}, \end{matrix}$ $\begin{aligned} m_\mathcal{H} = \gamma _{\rm RV} \delta _{\rm RV} + \gamma _{\rm BIS} \delta _{\rm BIS} + \gamma _{\log R_{\rm HK}^{\prime }} \delta _{\log R_{\rm HK}^{\prime }}, \end{aligned}$ (C.2)

where γ_i is the offset of time series i, and δ_i is equal to one for measurements belonging to time series i and zero otherwise. The matrix φ_ℋ would thus be a n × 3 matrix, whose first column would be filled with 1 for RV measurements and 0 otherwise, the second column would be equal to 1 for BIS measurements, and the last column for $log R_{HK}^{'}$ ${\log R_{\rm HK}^{\prime}}$ measurements. The vector of parameters would then be $θ_{H} = (γ_{RV}, γ_{BIS}, γ_{log R_{HK}^{'}})$ $\theta_{\mathcal{H}} = (\gamma_{\rm RV}, \gamma_{\rm BIS}, \gamma_{{\log R_{\rm HK}^{\prime}}})$ .

The enlarged model 𝒦(ν) is written as

$\begin{matrix} K (ν) : m_{K} (ν, θ_{K}) = φ_{K} (ν) θ_{K}, \end{matrix}$ $\begin{aligned} \mathcal{K} (\nu )\ : \quad m_\mathcal{K} (\nu , \theta _\mathcal{K} ) = \varphi _\mathcal{K} (\nu )\theta _\mathcal{K} , \end{aligned}$ (C.3)

where θ_𝒦 = (θ_ℋ, θ) is the vector of size p + d of the parameters and φ_𝒦(ν) = (φ_H, φ(ν)) is a n × (p + d) matrix whose p first columns are those of φ_ℋ, and whose d last columns are functions of the frequency, ν. In the case of an homogeneous time series, as in Delisle et al. (2020a), one typically uses φ(ν) = (cos(νt), sin(νt)) (with d = 2). The main difference in the case of heterogeneous time series is that we only search for a periodicity in the first time series (Y₁, typically the RV time series). Thus, we would define φ(ν) = (cos(νt)δ₁, sin(νt)δ₁). All the results presented in Delisle et al. (2020a) remain valid when applied to the merged time series. The only difference is that due to the presence of zeroes in φ for all measurements not belonging to the first time series, the averaging used in the definition of the effective time series length (Delisle et al. 2020a Eq. (8)) is restricted to the first time series measurements

$\begin{matrix} ⟨ X ⟩ = \sum_{\begin{matrix} i, j, \\ δ_{1} (i) = δ_{1} (j) = 1 \end{matrix}} C_{i, j}^{- 1} X_{i, j}, \end{matrix}$ $\begin{aligned} \langle X\rangle = \sum _{\begin{matrix} i,j,\\ \delta _1(i) = \delta _1(j) = 1 \end{matrix}} C_{i,j}^{-1} X_{i,j}, \end{aligned}$ (C.4)

where C⁻¹ is the inverse of the full covariance matrix of the merged time series.

All Tables

Table 1.

Maximum likelihood solution and POLYCHORD posterior for the model with a GP and two planets (at 14.19 and 53.7 d).

In the text

Table 2.

Prior distributions used for each parameter in the nested sampling runs with POLYCHORD.

In the text

Table 3.

Evidence of each considered model in our POLYCHORD runs.

In the text

All Figures

	Fig. 1. Cost of a likelihood evaluation as a function of the total number of measurements using S+LEAF 2 or the full covariance matrix (see Sect. 5.2).
In the text

	Fig. 2. Precision of the linear solving operation as a function of the total number of measurements using S+LEAF 2 or the full covariance matrix (see Sect. 5.2).
In the text

	Fig. 3. Periodograms of the raw RV time series of HD 13808 (top) as well as of the residuals after subtracting the 14.19 d (center) and the 53.7 d (bottom) planets.
In the text

	Fig. 4. Same as Fig. 3 but neglecting the harmonics component of the MEP kernel (see Eq. (48)).
In the text

	Fig. 5. Kernel function used to model HD 13808’s activity (MEP, see Eq. (48)). The GP hyperparameters are taken from the best fit of the two planets solution (Table 1). For comparison, the SEP kernel, which the MEP kernel is design to roughly mimic, is also plotted using the same set of hyperparameters.
In the text

	Fig. 6. GP prediction (conditional distribution) from the best fit of the two planets solution (Table 1). The prediction is plotted for the GP and its derivative (top) and the full GP prediction for the RV, BIS, and $log R_{HK}^{'}$ ${\log R_{\rm HK}^{\prime}}$ time series superimposed with the corresponding residuals (bottom three plots).
In the text

	Fig. 7. Zoom of Fig. 6 around epoch 2 453 720 BJD.
In the text

	Fig. 8. FIP periodogram for HD 13808. In blue we represent the FIP (false inclusion probability) and in yellow the TIP (true inclusion probability).
In the text

	Fig. B.1. Comparison of the kernel functions (top) and power spectral density (bottom) of the SE, ES, and Matérn 5/2 kernels. The timescale of the Matérn 5/2 kernel is adjusted such as to minimize the maximum deviation from the SE kernel.
In the text

	Fig. B.2. Same as Fig. 1 but including the ESP kernel in the performance comparison.
In the text

	Fig. B.3. Same as Fig. 3 but using the ESP kernel instead of the MEP kernel.
In the text

	Fig. B.4. Same as Fig. 6 but using the ESP kernel instead of the MEP kernel.
In the text

	Fig. B.5. Same as Fig. 7 but using the ESP kernel instead of the MEP kernel.
In the text

	Fig. B.6. Same as Fig. 5 but using the ESP kernel instead of the MEP kernel.
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Ahrer, E., Queloz, D., Rajpaul, V. M., et al. 2021, MNRAS, 503, 1248 [NASA ADS] [CrossRef] [Google Scholar]

[2] Aigrain, S., Pont, F., & Zucker, S. 2012, MNRAS, 419, 3147 [NASA ADS] [CrossRef] [Google Scholar]

[3] Ambikasaran, S. 2015, Numer. Linear Algebra Appl., 22, 1102 [CrossRef] [Google Scholar]

[4] Baluev, R. V. 2008, MNRAS, 385, 1279 [Google Scholar]

[5] Blackman, R. T., Fischer, D. A., Jurgenson, C. A., et al. 2020, AJ, 159, 238 [CrossRef] [Google Scholar]

[6] David, T. J., Petigura, E. A., Luger, R., et al. 2019, ApJ, 885, L12 [Google Scholar]

[7] Delisle, J. B., Hara, N., & Ségransan, D. 2020a, A&A, 635, A83 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[8] Delisle, J. B., Hara, N., & Ségransan, D. 2020b, A&A, 638, A95 [EDP Sciences] [Google Scholar]

[9] Dumusque, X., Udry, S., Lovis, C., Santos, N. C., & Monteiro, M. J. P. F. G. 2011, A&A, 525, A140 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[10] Dumusque, X., Pepe, F., Lovis, C., et al. 2012, Nature, 491, 207 [Google Scholar]

[11] Foreman-Mackey, D. 2018, Res. Notes Am. Astron. Soc., 2, 31 [NASA ADS] [Google Scholar]

[12] Foreman-Mackey, D., Agol, E., Ambikasaran, S., & Angus, R. 2017, AJ, 154, 220 [Google Scholar]

[13] Gillen, E., Briegal, J. T., Hodgkin, S. T., et al. 2020, MNRAS, 492, 1008 [Google Scholar]

[14] Gordon, T. A., Agol, E., & Foreman-Mackey, D. 2020, AJ, 160, 240 [NASA ADS] [CrossRef] [Google Scholar]

[15] Handley, W. J., Hobson, M. P., & Lasenby, A. N. 2015, MNRAS, 453, 4384 [Google Scholar]

[16] Hara, N. C., Delisle, J.-B., Unger, N., & Dumusque, X. 2022a, A&A, 658, A177 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[17] Hara, N. C., Unger, N., Delisle, J.-B., Díaz, R., & Ségransan, D. 2022b, A&A, accepted, https://doi.org/10.1051/0004-6361/202140543 [Google Scholar]

[18] Haywood, R. D., Collier Cameron, A., Queloz, D., et al. 2014, MNRAS, 443, 2517 [Google Scholar]

[19] Jones, D. E., Stenning, D. C., Ford, E. B., et al. 2017, ArXiv e-prints [arXiv:1711.01318] [Google Scholar]

[20] Jordán, A., Eyheramendy, S., & Buchner, J. 2021, Res. Notes Am. Astron. Soc., 5, 107 [Google Scholar]

[21] Mayor, M., Marmier, M., Lovis, C., et al. 2011, ArXiv e-prints [arXiv:1109.2497] [Google Scholar]

[22] Noyes, R. W., Hartmann, L. W., Baliunas, S. L., Duncan, D. K., & Vaughan, A. H. 1984, ApJ, 279, 763 [Google Scholar]

[23] Pepe, F., Cristiani, S., Rebolo, R., et al. 2021, A&A, 645, A96 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[24] Queloz, D., Henry, G. W., Sivan, J. P., et al. 2001, A&A, 379, 279 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[25] Rajpaul, V., Aigrain, S., Osborne, M. A., Reece, S., & Roberts, S. 2015, MNRAS, 452, 2269 [Google Scholar]

[26] Rajpaul, V. M., Buchhave, L. A., Lacedelli, G., et al. 2021, MNRAS, 507, 1847 [NASA ADS] [CrossRef] [Google Scholar]

[27] Rasmussen, C. E., & Williams, C. K. I. 2006, Gaussian Processes for Machine Learning (MIT Press) [Google Scholar]

[28] Rybicki, G. B., & Press, W. H. 1995, Phys. Rev. Lett., 74, 1060 [NASA ADS] [CrossRef] [Google Scholar]

[29] Unger, N., Ségransan, D., Queloz, D., et al. 2021, A&A, 654, A104 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]