Gaussian processes for radial velocity modeling

Stephan Stock; Jonas Kemmer; Diana Kossakowski; Silvia Sabotta; Sabine Reffert; Andreas Quirrenbach

doi:10.1051/0004-6361/202244629

Home

All issues

Volume 674 (June 2023)

A&A, 674 (2023) A108

Full HTML

Open Access

Issue		A&A Volume 674, June 2023


Article Number		A108
Number of page(s)		26
Section		Planets and planetary systems
DOI		https://doi.org/10.1051/0004-6361/202244629
Published online		09 June 2023

A&A 674, A108 (2023)

Better rotation periods and planetary parameters with the quasi-periodic kernel and constrained priors

Stephan Stock¹^,★, Jonas Kemmer¹^,★, Diana Kossakowski², Silvia Sabotta¹, Sabine Reffert¹ and Andreas Quirrenbach¹

¹ Landessternwarte, Zentrum für Astronomie der Universität Heidelberg, Königstuhl 12, 69117 Heidelberg, Germany
e-mail: j.kemmer@lsw.uni-heidelberg.de, sstock@lsw.uni-heidelberg.de, sreffert@lsw.uni-heidelberg.de
² Max-Planck-Institut für Astronomie, Königstuhl 17, 69117 Heidelberg, Germany

Received: 29 July 2022
Accepted: 24 April 2023

Abstract

Context. Instrumental radial velocity (RV) precision has reached a level where the detection of planetary signals is limited by the ability to understand and simultaneously model stellar astrophysical “noise.” A common method for mitigating the effects of stellar activity is Gaussian process (GP) regression.

Aims. In this study we present an analysis of the performance and properties of the quasi-periodic (QP) GP kernel, which is the multiplication of the squared-exponential kernel by the exponential-sine-squared kernel, based on an extensive set of synthetic RVs, into which the signature of activity was injected.

Methods. The stellar activity within our synthetic data sets was simulated using astrophysically motivated models with different spot distributions and spot lifetimes rotating on the surface of a modeled late-type star. We used dynamic nested sampling to fit different model sets, including QP–GPs, Keplerian models, white noise models, and combinations of these, to synthetic RV time series data that in some cases included additional injected planetary signals.

Results. We find that while the QP–GP rotation parameter matches the simulated rotation period of the star, the length scale cannot be directly connected to the spot lifetimes on the stellar surface. Regarding the setup of the priors for the QP–GP, we find that it can be advantageous to constrain the QP–GP hyperparameters in different ways depending on the application and the goal of the analysis. We find that a constraint on the length scale of the QP–GP can lead to a significant improvement in identifying the correct rotation period of the star, while a constraint on the rotation hyperparameter tends to lead to improved planet detection efficiency and more accurately derived planet parameters. Even though for most of the simulations the Bayesian evidence performed as expected, we identified not far-fetched cases where a blind adoption of this metric would lead to wrong conclusions.

Conclusions. We conclude that modeling stellar astrophysical noise by using a QP–GP considerably improves detection efficiencies and leads to precise planet parameters. Nevertheless, there are also cases in which the QP–GP does not perform optimally, for example RV variations dynamically evolving on short timescales or a mixture of a very stable activity component and random variations. Knowledge of these limitations is essential for drawing correct conclusions from observational data.

Key words: planetary systems / techniques: radial velocities / methods: data analysis / methods: statistical / stars: activity

^★

Both authors contributed equally to the manuscript.

© The Authors 2023

Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

1 Introduction

The search for and characterization of low-mass exoplanets, in particular in the habitable zone of their host stars, is in full swing. Indispensable to the mass determination of these exoplanets is the radial velocity (RV) method. Current and next-generation high-resolution spectrographs searching for exoplanet signals in the spectra of stars via the RV method are reaching precisions of around or below 1 m s⁻¹, such as HARPS (Mayor et al. 2003), CARMENES (Quirrenbach et al. 2014), ESPRESSO (Pepe et al. 2014), MAROON-X (Seifahrt et al. 2018), NEID (Gupta et al. 2021), EXPRES (Petersburg et al. 2020; Zhao et al. 2022), and KPF (Gibson et al. 2020). However, the stellar activity induced by host stars can be a significant obstacle to the detection and precise characterization of these low-mass, low-amplitude (~1 m s⁻¹) planetary signals, as even the quietest stars show intrinsic RV variations on this order of magnitude (Robertson et al. 2014; Anglada-Escudé & Tuomi 2015). For the detection and characterization of planets with the RV method, it is therefore crucial to understand the consequences of stellar activity in order to find optimal ways to mitigate its impact.

The stellar “noise” affecting RV measurements of stars is mainly dominated by four different effects that occur on different timescales: oscillations, granulation, active regions, and magnetic cycles (Dumusque 2016). The most problematic effects, especially for later-type stars, are caused by active regions and magnetic cycles, since the timescales and RV amplitudes can be similar to those of potential planetary signals. Active regions on the stellar surface have limited lifetimes and are constantly evolving dynamically as they move across the observed stellar hemisphere due to stellar rotation. This leads to correlated noise that can manifest itself as quasi-periodic (QP) variations in the measured RVs. This noise can mask, mimic, or influence planetary signals in the RV data (e.g., Saar et al. 1998; Queloz et al. 2001; Boisse et al. 2011; Baluev 2013; Haywood et al. 2014; Hatzes 2016; Stock et al. 2020a). In this work we focus on stellar spots and neglect faculae. In particular, for the astrophysical modeling of the correlated noise, we use an M dwarf of spectral class M2.0V. M dwarfs can show a high degree of magnetic activity and are in general spot dominated (Jeffers et al. 2022). Furthermore, no stellar oscillations, for example p-modes, have been observed for M dwarfs thus far, and the magnitude of any stellar oscillations is expected to be in the cm s⁻¹ range (Rodríguez-López 2019) and can therefore be neglected in our simulated RV data set. Although we use a specific spectral type for our simulations, the general findings of this work can also be quite beneficial for modeling the RVs of other spectral types.

One way to mitigate correlated time variations is to apply noise models in the framework of Gaussian process (GP) regression models, and one particular kernel that is often used in such cases is the QP kernel (e.g., Haywood et al. 2014; Rajpaul et al. 2015; Ribas et al. 2018; Stock et al. 2020b). These stochastic models fit the stellar and instrumental noise contribution in RVs or light curves. However, the use of these models has raised some questions and concerns regarding the flexibility of these GPs (e.g., whether GPs are prone to overfitting or to absorbing planetary signals into the noise model), their application (e.g., selecting the best priors), and the correct interpretation of the results (e.g., the interpretation of hyperparameters and the implications for derived planetary parameters).

The main objective of this work is to perform a systematic analysis based on numerical simulations with known conditions in order to investigate these important questions. We created synthetic RV time series data that include the stellar astrophysical noise due to active regions. For this purpose, we used StaгSim (Herrero et al. 2016) to create RV data sets based on different spot patterns and distributions. We then investigated the properties of the QP–GP applied to these data sets in order to derive possible ways of optimally constraining the QP–GP priors. In addition, we injected Keplerian signals into the synthetic RV time series data to investigate the effect of the stellar activity and the GP modeling on the derived parameters of the planets. Investigations based on numerical simulations, such as those presented in this work, are critical for understanding how boundary conditions, properties of the data, or properties of the model may affect the results of an analysis involving a GP fit.

In Sect. 2, we describe the methods as well as the setup of the simulations and models that are used in this work. Section 3 presents and visualizes our results for GP fits to simulated activity data without additional Keplerian components, while Sect. 4 includes simulations with injected Keplerian signals. In Sect. 5, we provide a discussion of our results as well as useful guidelines for using GP models for photometric and spectroscopic time series data. Finally, we give our conclusions and a comprehensive summary of this work in Sect. 6.

2 Setup and methods

2.1 Simulation of activity-induced radial velocity signals

In order to produce synthetic activity-contaminated RV data, we made use of the tool StarSim¹ (Herrero et al. 2016). StarSim models RVs by dividing the stellar surface into a grid of different effective temperatures depending on whether there is an active surface feature (spot) at the grid element position or not (quiet photosphere). A synthetic Phoenix spectrum (Husser et al. 2013) based on the effective temperature, the surface gravity and other relevant stellar parameters, is assigned to each of the two different surface regions. In addition, each grid element is Doppler-shifted according to its position on the rotating surface. StarSim then calculates the cross-correlation function (CCF) for each of the two different spectra, and integrates the entire grid of surface elements in terms of the CCFs instead of spectra to speed up the calculations. The effects of limb darkening, convection and limb brightening are included within StarSim. The RVs are computed from the CCFs for a user-defined grid of epochs.

2.2 Setup of the investigated stellar activity configurations

Within our investigations, we focus on the effect of different spot sizes, spot distribution, spot lifetimes, spot number, and the stellar rotation period on the stellar astrophysical noise, and hence on the derived properties of the QP–GP. We neglect instrumental jitter for most of our simulations. Nevertheless, we allow for a small instrumental uncertainty of 30 cm s⁻¹, which corresponds to the performance of current ultra-stable spectrographs (e.g., shown for ESPRESSO by Suarez 2020). To introduce the RV uncertainty into the modeled stellar activity signal, we perturbed the simulated StarSim RVs with a Gaussian distribution with mean zero and standard deviation of 30 cm s⁻¹, if not stated otherwise.

The fixed parameters of the simulated star by starsim to model the RV variations due to stellar activity are given in Table 1. The stellar parameters, for example the stellar mass, radius, surface gravity, and the effective temperature, are based on our M2.0V example star. The spectral range of the simulated RV instrument is 380–690 nm, typical for modern optical spectrographs. We assumed a temperature contrast of 400 K for an effective temperature of 3500K based on the investigations of Andersen & Korhonen (2015). This temperature contrast reflects roughly the result of both the linear and polynomial fit in Fig. 2 provided in Andersen & Korhonen (2015). Simulations with different temperature contrasts have been carried out and showed that the influence of this parameter, as expected, mainly affects the RV amplitude (Reiners et al. 2013; Bauer et al. 2018).

We fixed the rotation period of our example star to P_rot=22 days, which is within the range of common values found for M2 stars (Newton et al. 2018; Jeffers et al. 2018; Popinchalk et al. 2021). To keep the investigations computationally feasible, the total time baseline of the RV observations, if not stated otherwise, is set to 10 P_rot=220 days, in which 11 RV measurements per rotation period were assumed (one measurement every 2 days). This results in 111 RV measurements over 220 days, close to a best-case scenario for the RV observations and our GP analysis.

We systematically investigated different ratios of the spot lifetimes with respect to the stellar rotation period for spots either distributed randomly or centered at two active longitudes, resulting in a grid of 10 different cases whose details are provided in Table 2. Regarding the spot distributions, we partly followed the approach in Perger et al. (2021). We assumed a case where spots appear completely randomly on the stellar surface and a case where spots preferentially appear at longitudes that have a distance of 180 deg (so-called active longitudes). The random case can be characteristic of active, fast rotating stars (Morin et al. 2010; Perger et al. 2021). It may result in a more difficult and less favorable case for the QP–GP to pick up any correlated stellar activity signal. The case of two active longitudes has been observed for our Sun (Zhang et al. 2011). Here we assumed that the spots can only appear at longitudes drawn from the sum of two normal distributions with means of x deg and x+180deg, where x is randomly chosen between 0 and 180 deg. The standard deviation of these normal distributions has been set to σ_act.long.= 45 deg. Since we used different discrete spot lifetimes and distributions and had fixed the rotation period, we tweaked the spot sizes and evolution rates to get comparable activity signals of about σ_RV = 1–2 m s⁻¹. For each of the activity configurations under investigation, we generated 100 random realizations, taking into account random birth times and positions of spots.

The spot birth times are drawn from a uniform distribution ranging from [t₀ − t_spot,t₀ + t_obs + t_spot], where t_spot is the spot lifetime and t_obs the time span of the observations. This accommodates for the fact that spots appearing before the first RV observation at t₀ takes place can still be visible. In addition, the random birth dates imply that there will always be periods of time when more or fewer star spots are present on the simulated stellar surface. The average number of star spots on the stellar disk visible to the observer per time interval is one of the parameters affecting the RV amplitude. We used the definition of Basri & Shah (2020), which is based on the average number of star spots present on the whole stellar surface at a random time. We show four example data sets of different configurations in Fig. 1.

StarSim assumes that spots have a circular shape and a linear growth and decay law of the spot radius (in degrees) that is based on the spot evolution rate (in deg day⁻¹). For the grid analysis we assumed that spots grow until they reach their maximum spot size defined as R_spot,max, then stay about 10% of their lifetime at that radius, and decay thereafter, all within the assumed spot lifetime. We also performed simulations with different spot evolution rates. However, we are aware that the assumption of a linear and symmetric growth and decay of the spot radii does not represent the properties of observed Sun spots (Petrovay & van Driel-Gesztelyi 1997; van Driel-Gesztelyi & Green 2015; Forgács-Dajka et al. 2021). This prescription is a fixed property of StarSim that has not been changed for the investigations in this work.

The maximum star spot radius, R_spot,max, is set in such a way that the desired median RV scatter is obtained, which is between 1–2 m s⁻¹ for the grid analysis. However, due to the randomness regarding spot distribution and birth times of spots within our simulations, it is not possible to perfectly control the RV scatter caused by the stellar activity simulations. We also note that due to the growth and decay of the spots, the fixed maximum star spot radius only represents an upper boundary. In general, a variety of different spot sizes are apparent within our simulations at any time.

Table 1

Relevant fixed parameters used to simulate the stellar activity within StarSim.

Table 2

Grid of investigated stellar activity configurations.

Fig. 1

Four simulated example data sets. The black points represent the modeled StarSim RVs, and the red points represent the simulated observations with their error bars. Top left: random spot distribution, t_spot = P_rot = 22 days. Top right: two active longitudes, t_spot = P_rot = 22 days. Bottom left: random spot distribution, t_spot = 5P_rot = 110. Bottom right: two active longitudes, t_spot = 5P_rott = 110 days.

2.3 The quasi-periodic Gaussian process (QP–GP)

A full text search for “exoplanet” and “Gaussian process” in the peer-reviewed astronomical literature results in more than 300 publications². A significant fraction of them mention the “quasi-periodic” GP or kernel. Generally, a QP kernel is a family of GP covariance functions that are able to model both a periodic signal and a correlation length of that signal simultaneously. The term does not define a particular unique kernel function; however, in the astronomical literature it is common to denote the kernel resulting from the multiplication of an exponential-sine-squared kernel with a squared-exponential kernel as a “quasi-periodic kernel” (Haywood et al. 2014; Rajpaul et al. 2015). This kernel has the form $k (τ) = σ_{GP}^{2} \exp (- \frac{τ^{2}}{2 l^{2}} - Γ \sin^{2} (\frac{π τ}{P_{rot}})),$ $k\,\left( \tau \right) = \sigma _{{\rm{GP}}}^2\,\exp \,\left( { - {{{\tau ^2}} \over {2{l^2}}} - {\rm{\Gamma }}\,\,{{\sin }^2}\,\left( {{{\pi \tau } \over {{P_{{\rm{rot}}}}}}} \right)} \right),$ (1)

where σ_GP is the amplitude of the GP component given in units of the data, Γ defines the relative weight between the GP sine-squared component and squared-exponential component and is dimensionless, l is the correlation length scale of the GP squared-exponential component, P_rot is the period of the GP sine-squared component, and τ is the time lag. l, P_rot, and τ are commonly given in units of time. A very common extension of the kernel function in Eq. (1) is to add a jitter term to the diagonal terms of the covariance function, allowing for a white noise contribution; this is the form we use and refer to in this work as the QP kernel. From Eq. (1), it is evident that the epithet QP for this kernel comes from its sinusoidal component. However, there are many other forms of kernels that will have a QP behavior, for example the quasi-periodic cosine-kernel (QPC; Perger et al. 2021).

The particular choice of a GP kernel represents a significant part of the prior knowledge on which the modeling is based. Here, we focus on the modeling of the effects from star spots and stellar activity on RV data, where the QP–GP kernel in Eq. (1) has several advantages, as it was specifically designed for this purpose and is motivated by characteristics of stellar activity in time series data (Aigrain et al. 2012; Rajpaul et al. 2015; Angus et al. 2018). Furthermore, it is infinitely differentiable and thus leads to well-behaved representations of the data.

2.4 The QP-GP hyperparameters

The physical motivation of the QP–GP hyperparameters can be difficult: whether the parameters can be connected to real physical processes on the stellar surface and whether it may be possible to calibrate them is currently debated in the literature (Rajpaul et al. 2015; Angus et al. 2018; Perger et al. 2021). With this work we also intend to improve the general understanding of the QP–GP hyperparameters.

The amplitude hyperparameter $σ_{GP}^{2}$ $\sigma _{{\rm{GP}}}^2$ defines the absolute strength of the covariance. This hyperparameter is therefore related to the amplitude of the activity signal. In this regard, it can be considered as an outcome of a number of physical processes on the star, such as the spot size, number of visible spots on the stellar surface, or temperature contrast. Many different combinations of these processes can lead to the same value of $σ_{GP}^{2}$ $\sigma _{{\rm{GP}}}^2$ ; thus, it is unfeasible to connect this hyperparameter to any single physical parameter on the star.

The length scale l defines the correlation length of the signal. Large values of the length scale suggest a strong correlation for data points separated in time, while small values lead to weak or no correlation. This parameter is generally related to the lifetime of activity features on the star, for example stellar spots, and some studies have tried to calibrate the length scale parameter of kernels similar to the QP–GP kernel in attempts to better physically fit this lifetime (see for example Perger et al. 2021; Nicholson & Aigrain 2022).

The hyperparameter P correlates data points that are roughly one period apart from each other. The QP–GP has been designed in such a way that this hyperparameter should be directly connectable to the stellar rotation period, which is responsible for active magnetic regions moving across the observed hemisphere of the stellar disk.

The Γ hyperparameter describes the harmonic complexity. For small values of Γ, points with a lag other than a multiple of P, are more highly correlated than for large values of Γ, where points are less correlated if their lag is not close to a multiple of P. This hyperparameter can be related to the number of variations (or local maxima and minima) during one full rotation of the star, and by that include information about the average distribution of active regions on the stellar surface.

2.5 QP–GP priors

Regarding the choice of priors for the GP model, there are two distinct main approaches that are often used in the literature. One is to use wide and unconstrained priors for the QP–GP to allow maximum flexibility of the GP model (see for example Espinoza et al. 2020; Pinamonti et al. 2022). The other one is to curtail the flexibility of the GP model as much as possible to reduce “overfitting-like” behavior. This is achieved by applying more constrained, in most cases physically motivated, priors (see for example Nava et al. 2020; Stock et al. 2020b). The idea behind using tighter GP priors is to incorporate all the information that is available in the fit, which as a result can lead to better precision of the planetary signals. An additional benefit is a shorter run time of the sampler.

Typically, the QP–GP is constrained as follows. The amplitude parameter $σ_{GP}^{2}$ $\sigma _{{\rm{GP}}}^2$ is challenging to constrain due to the large degeneracy of parameters affecting it. Nevertheless, there is a physically meaningful maximum, namely the scatter of the RV data of the star under consideration, which combines all the parameters of activity influencing the RV amplitude. The distribution of this hyperparameter is often chosen to be positive and uniform, or log-uniform. For our analysis, we adopted a universal uniform range between 0 m s⁻¹ and 40 m s⁻¹, which generously considers the range of scatter that we expected for our simulated activity.

The length scale parameter, l, is essential when it comes to modeling a meaningful periodic signal. It is a common approach to use a uniform or log-uniform prior spanning a large range of possible values. However, there are some caveats: for example, a small length scale parameter leads to a dominating squared-exponential kernel, and the signal decays before one rotation has finished. One aim of our investigation was to see how the choice of this prior affects the results. We therefore tested different priors in different situations, as will be described later.

For the GP rotation period, P, it is common to use a uniform or log-uniform prior that spans the range of physically possible stellar rotation periods. These can range from less than a day for very young stars to hundreds of days for giant stars in their post main sequence evolutionary phase. In cases where an estimate of the rotation period is available, for example due to auxiliary data from photometry and/or activity indicators, it is possible to use this estimate as a physical prior to further constrain the GP rotation hyperparameter. In our analysis, we use both uninformed uniform priors and constrained priors to investigate how the results differ between the two.

Large values of the harmonic complexity parameter, Γ, allow the GP to become very flexible. The reason is that the fitted curve drawn from the GP may include a higher number of turning points, which in turn can allow for more frequencies to be fitted that may not necessarily be related to the RV variations by the active regions. Therefore, it can be useful to impose an informative prior or an upper limit on Γ. Jeffers & Keller (2009) have shown that, to first approximation, any light curve can be explained by two unequally large spots that are 180 deg apart, independent of the complexity of the spot distribution on the stellar surface. This roughly corresponds to two active longitudes on the stellar surface, and its RV signal can be modeled by one to three local maxima of the GP within one full rotation period. Therefore, to reduce the flexibility of the GP at least an upper limit of Γ should be set to allow for a maximum of two to three inflection points within one rotation period, which concretely means that within this work we use a log-uniform prior on Γ ranging from 0.01 to 10. Within the literature, there are cases where even more informative priors were applied to Γ, for example informative normal priors (see, for example, Nava et al. 2020, and references therein).

2.6 Modeling approach

Within this work, we used juliet (Espinoza et al. 2019) to fit the synthetic RV data that include the simulated stellar activity signals and injected planetary signals. For the modeling of Keplerian signals, juliet uses the publicly available package radvel (Fulton et al. 2018). The QP–GP kernel investigated within this work is computed by the publicly available package george (Ambikasaran et al. 2015). The evaluation of the GP likelihood with george is computationally expensive and scales as N ln N , where N is the number of data points (Ambikasaran et al. 2015). Nevertheless, george is one of the most widely used packages.

Within juliet, we used nested sampling, in particular, dynamic nested sampling based on the dynesty package (Speagle 2020) to fit the models to the data and derive the posterior distributions of the model parameters. Dynamic nested sampling is better suited than standard Markov chain Monte Carlo procedures for dealing with multi-modal posterior distributions, which are common for the GP hyperparameters, since it is not as susceptible to problems caused by local minima (Feroz et al. 2009; Feroz & Skilling 2013; Speagle 2020).

While Markov chain Monte Carlo samplers generally require a good initial parameter vector, nested sampling efficiently samples the complete user-defined prior volume. An additional advantage of using nested-sampling algorithms is the fact that the marginal log-likelihood (hereafter referred to as evidence) is a natural outcome of the sampling process (Feroz et al. 2009). Bayesian model comparison, which uses the Bayesian evidence, is a powerful method for comparing distinct or nested models with different numbers of parameters and complexity against each other. Following Trotta (2008), we consider a model as a significant improvement compared to another model if the log-evidence difference is larger than ∆ ln 𝒵 > 5, while ∆ ln 𝒵 > 2.5 is moderate evidence in favor of one model.

Table 3

Different model components that were investigated in this work.

3 Results of QP–GP fits to synthetic RV activity-only data

In this section, we present our investigation of the relationship between the GP hyperparameters and the physical properties of the simulated stellar activity. Our large grid of different activity configurations not only allowed us to investigate the correlation between the parameters of the stellar activity and the GP hyperparameters, but also how the determination of the hyperparameters is affected by different states of activity. Our starting point for this was to use wide, uninformative priors for the GP hyperparameters (see also Table 3; referred to as GP Prior I hereafter). The focus was on the question whether the QP–GP can robustly derive the simulated stellar rotation period and spot lifetimes by connecting these values to the physically motivated QP–GP hyperparameters as described in Sect. 2.4. An analysis based on the Bayesian evidence of how the different conditions affect the probability of finding the (correct) rotation period in practice is presented in Appendix B.1.

3.1 QP–GP properties and the l_GP versus P_GP diagram

As can be seen in the middle panel of Fig. 2, the GP rotation period and the length scale parameter can be highly correlated with each other, which is not surprising, given Eq. (1). Both hyperparameters are as an exponential product interconnected with each other and should therefore not be considered in isolation. Shorter QP–GP length scales mean that the signal of the rotation period is less coherent over the time of observations.

Within the QP–GP fits to our simulated RV data sets, we find three predominant classes of possible distributions in the l_GP versus P_GP plane: the “o,” “triangle,” and “bar” shapes presented in Fig. 2 (see also Stock et al. 2020a,b; Bluhm et al. 2021; Kossakowski et al. 2021). We note that real-world scenarios of active stars have however also shown mixtures of the posterior distributions introduced here. For instance, TOI-1201 (Kossakowski et al. 2021, their Fig. A.1) was observed to exhibit a mixture of a triangle shape together with a bar shape.

The left plot shows an o-shaped posterior distribution, resulting in a well-defined peak within the l_GP versus P_GP parameter plane. Only in such a case can both the length scale and the rotational period of the QP–GP be well determined from the RV data. However, we only observe o-shaped posterior distributions for spot lifetimes that are at least five times the rotation period. A real-world target for which such a 2D posterior distribution of the GP length scale and rotation period has been observed is the nearby star Lalande 21185 (see Fig. 22 in Stock et al. 2020b).

The middle plot shows a triangle shape that was also commonly identified in our simulations. This 2D posterior distribution is characterized by a broad plateau of solutions at small length scales that are consistent with all allowed rotation periods given the prior volume, and a triangle of posterior solutions (often with higher likelihood) sitting on top of the observed plateau indicating solutions with larger GP length scales that are also consistent with the data. Based on the fits to our simulations, the position of the triangle is often found to be related to the simulated stellar rotation period. The plateau of posterior solutions with small l_GP and no preference for any rotation period within the l_GP versus P_GP parameter space is a consequence of the fact that the rotation hyperparameter within the exponential-sine-squared kernel is undefined if the signal significantly decays on timescales smaller than one rotation, resulting in a dominant exponential-squared term. This is for example the case if the spot lifetime of the star is close to or smaller than the simulated stellar rotation period. Real-world examples of such triangle-shaped posterior distributions have been identified for the RV time series data of YZ Ceti, GJ 251, and TOI-1201 (Stock et al. 2020a,b; Kossakowski et al. 2021).

The right diagram in Fig. 2 shows the third class of frequently observed relations between the GP length scale and the GP period. For some of our simulations, the posterior solutions form a bar shape that extends toward the upper boundary of the prior volume of l_GP at a period related to the stellar rotation. Such a shape is caused by the fact that the decay of the signal cannot be constrained given the data or time of observations. As a result, the length scale is estimated to “infinity” resulting in valid solutions for any large l_GP·In the l_GP versus P_GP diagram, this distribution can be interpreted as a signal that is to some extent coherent over the entire time of observations. A real-world example where such a posterior distribution has been observed is HD 238090 (Stock et al. 2020b).

Fig. 2

l versus P diagrams showing the posterior samples of a single GP fit with uninformed priors to one example simulation. The horizontal dashed line indicates the simulated spot lifetime, and the vertical dashed line indicates the simulated rotation period. Left: posterior samples fitted to a star with random spot distribution, P_rot = 22 days, and P_life = 110 days. A well-behaved peak is found where both the rotation period and the GP length scale are well constrained. Middle: posterior samples fitted to a star with random spot distribution, P_rot = 22 days, and P_life = 22 days. A posterior plateau with a short length scale spans the entire posterior volume. The posterior of the rotation period is not well constrained due to this plateau; however, some posterior samples form a “triangle” shape at the rotation period. Right: posterior samples fitted to a star with two active longitudes, P_rot = 22 days, and P_life = 11 days. The spots that appear on a similar longitude mimic a coherent signal whose length scale cannot be constrained by the GP, resulting in an “I-shaped” posterior distribution that reaches the upper boundary of the posterior volume.

3.2 Detection of stellar rotation periods by the QP–GP

In Fig. 3 we show the comparison between the derived medians of the QP-rotation hyperparameters and the simulated period for five different spot lifetimes and two different spot distributions. In doing so, we consider the unconstrained GP Prior I, as well as a length scale constrained prior (GP Prior II in Table 3 and hereafter), which was motivated by the triangle shape distribution described in the previous section. The idea behind this is to exclude the plateau with small length scales and unconstrained rotational modulation in order to force the GP to model a QP signal.

With both GP priors, the rotation period was derived very accurately and precisely by the QP–GP for the majority of cases in which the spot lifetime was at least a few times larger than the rotation period. For such cases, there was also no significant difference between GP Prior I or II regarding the derived median value of the stellar rotation period. In cases where the spot lifetime is close to the stellar rotation timescale (P_life ≤ 2P_rot), the estimates of the rotation period by the QP–GP were less precise. This is expected, as the signal’s coherence decreases if the positions, size, and number of stellar spots on the surface change significantly within one rotation. The determination of the rotation period is particularly hampered if the spot distribution is, in addition to the short spot lifetime, positionally uncorrected, which means for random spot distributions. In this case, we see that the unconstrained GP is not able to recover the simulated rotation period. A feature that can be seen directly from Fig. 3 is that fits using GP Prior I for the RV data sets with short spot lifetimes favor solutions with rotation periods that lie exactly in the middle of the predefined prior volume. The reason is the plateau of posterior solutions that dominates the posterior distribution over the entire period prior volume, as shown in the example plot in Fig. 2. For those instances, the median rotation periods derived by the length scale constrained GP Prior II, which excludes this plateau, are much more consistent and closer to the simulated rotation period. However, even when applying a QP–GP with Prior II we find that the derived rotation period is sensitive to the spot distribution and spot lifetimes. In the case of two active longitudes and with decreasing spot lifetimes, there are more and more cases where the median of the determined rotation period is half of the simulated one and represents therefore the second harmonic of the true rotation period. It is straight-forward to understand how two active longitudes can lead to such an estimate, as it is very difficult for a correlated noise model, such as the QP–GP, to distinguish between two almost equally spotted sides of the stellar surface. In the case of shorter spot lifetimes and with two active longitudes the much faster dynamic evolution of the spot pattern observed at shorter spot lifetimes complicates the distinction of the active longitudes by the GP noise model, explaining the higher number of solutions that are then found at the second harmonic.

Fig. 3

Violin plots showing the distribution of the derived median rotation periods and length scale hyperparameters of the QP–GP based on 100 simulations per spot lifetime. The blue distribution shows the results obtained from using GP Prior I (wide prior), and the orange distribution shows the results obtained by using GP Prior II (length scale constrained). For the period, the dashed line shows the simulated rotation period and the dash-dotted line the first harmonic at P_rot/2. For the length scale, we show the simulated spot lifetime divided by the determined GP length scale so that the dashed line marks a 1:1 correspondence and the dash-dotted line a factor of 4 between the two. Top: for a random spot distribution on the stellar surface. Bottom: with two active longitudes on the stellar surface.

3.3 Limitations in correctly identifying the star spot lifetimes

In Sect. 3.1, we showed already that only one of the posterior distributions occurring in the l_GP versus P_GP diagram results in an unambiguous measurement for the length scale hyperparameter of the GP. For a more in-depth analysis of the connection between the determined length scale parameter and the simulated spot lifetime, we investigated the outcome from the previous section with focus on the length scale.

We observed a clear relation in most of the cases: longer simulated spot lifetimes result in longer correlation length scales of the QP–GP. However, as can be seen in Fig. 3, we found no 1:1 correspondence but a dependence of the derived length scale of the QP–GP that varies with the simulated spot lifetime and spot pattern. As already seen for the rotation period, the constrained GP Prior II had a clear advantage for short spot lifetimes.

Figure 3 shows that the largest deviation occurred for the unconstrained GP Prior I and short spot lifetimes (0.5 P_rot. to 2P_rot) and the resulting plateau with indefinite rotation period discussed in Sect. 3.1. The length scale hyperparameters determined from this were generally much smaller than the simulated spot lifetime – on the order of a few days at most. Strikingly, for the spot lifetimes of 22 days and 44 days the gross of the determined length scales in those cases is centered around ratios of ~10 and ~20, respectively, which corresponds to a length scale of 2 days and matches the sampling of our data. For a less regular sampling, as in the real observations in Stock et al. (2020a,b), this relation is however not apparent and the plateau occurs for length scales between 10–100 days, which highlights the importance of the visual inspection of the posterior using the l_GP versus P_GP diagram as presented in Sect. 3.1. Fitting a QP–GP with GP Prior II, which excludes the observed posterior plateau in l_GP, reduced the offset between l and the simulated spot lifetime, but still shows a dependence on the spot lifetime. For the cases with spot lifetimes of 5 P_rot or 10 P_rot we observed no difference in the results based on the prior choice as these result in well-behaved o-shaped 2D posterior distributions, with an approximate factor of 4 to the simulated spot lifetime.

Especially for the shortest spot lifetimes and the configuration of two active longitudes, we recognized a lot of samples that resulted in length scale parameters much larger than the simulated spot lifetime (i.e., ratios close to 0). These are the cases with a bar shape in the 2D posterior distribution. A possible explanation is that due to our definition of an average number of spots over a specific time interval, such cases are much more dynamic than our simulated cases with longer spot lifetimes. A large number of spots forms and decays over the time of the simulations, and the fast reoccurrence of spots with short lifetimes that are smaller than the rotation period tend to lead to a coherent signal that the QP–GP cannot distinguish from that of a single long-lived signal. Because the spots occurring on two active longitudes are positionally correlated on opposite sides of the stellar disk, the resulting determined rotation period is often found at the second harmonic (0.5 Prot.) of the simulated rotation period. For a random spot distribution and spot lifetimes shorter than the rotation period, we find samples with determined length scale parameters longer than the simulated spot lifetimes and measured rotation periods completely unrelated to the rotation period.

3.4 Dependence of QP–GP length scale and period on jitter and RV uncertainty

So far, we only considered scenarios in which we adopted a fixed instrumental uncertainty of 30 cm s⁻¹ and no additional white-noise contribution by the star or instrument. In the following we discuss the sensitivity of the derived GP hyperparameters with respect to an added jitter, which in reality may result from unresolved stellar oscillations of the star itself or from an unstable RV instrument. For these investigations we used the stellar activity models with random spot distribution and a simulated spot lifetime of 110 days, as this is one of the cases where the QP–GP kernel is particularly effective, as we show in the previous section.

We applied different RV uncertainties, different white noise contributions, or a combination of both to the simulated data and then fitted a QP–GP with Prior I to the data. We note the difference between RV uncertainties and jitter: for the RV uncertainties, the shift of the measured value is taken into account by the larger error bars of the data, while this is not the case for the white noise contribution caused by jitter.

Figure 4 shows the derived GP hyperparameters l_GP and P_GP as a function of the ratio between the simulated stellar activity amplitude, σmodel, and the applied jitter, σ_jitter, as well as the RV uncertainty σ_RV. As can be seen, the derived length scale by the QP–GP is sensitive to this ratio. If the ratio between the activity amplitude and the RV uncertainty or jitter, respectively, is close to or below one, the GP length scale is generally determined to be larger. However, when this ratio becomes greater than one, l_GP approaches a lower boundary whose absolute value is only about 25% of the simulated lifetime. In comparison, the derived GP rotation period for the same simulations show a rather stable behavior, independent of the ratio of the simulated stellar activity amplitude σ_model to the applied jitter σ_jitter or to the RV uncertainty σ_RV. This showcases that the derived QP–GP length scale is rather sensitive to the data quality.

Fig. 4

Relations of the derived GP parameters to the RV uncertainty. Top: derived GP length scale as a function of the ratio of the standard deviation of the stellar activity signal over the RV uncertainty or jitter. Note that only GP-only fits that had at least moderate significance (with ∆ ln 𝒵 > 2.5) compared to a white noise model that includes only jitter have been used. The curves were computed by applying a median filter with a window size incorporating 50 simulations. The red curve assumes a fixed jitter of σ_jitter = 0 m s⁻¹ and changes the ratio σ_model/σ_RV, the black curve assumes a fixed jitter of σ_jitter = 1.2 m s⁻¹ and changes the ratio σ_model/σ_RV, and the blue curve assumes a fixed uncertainty of σ_RV = 0.3 m s⁻¹ and changes the ratio σ_model/σ_jitter. The colored shaded area shows the appropriate inter-quartile range. The vertical dotted line represents a ratio of 1. The horizontal dashed line represents the simulated spot lifetime. Bottom: analogous to the top plot, but for the GP rotation period.

4 Results of QP–GP fits to combined simulated activity and planetary signals

It is common practice to use GP components in fits to improve planetary parameters in the presence of stellar activity. However, there are different approaches to the treatment of the GP kernel, such as a free prior on the GP rotation period versus a prior constrained to the possibly known rotation period of the star. In the following, we investigate how the choice of GP prior affects the precision and accuracy of the planetary parameters.

In order to do so, the grid of investigations is extended by injecting Keplerian signals with periods P of 5.12 days, 30.35 days and 44days, hereafter referred to as KI5, KI30, and KI44. We note that KI44 has a period twice of the rotation period to investigate whether these signals can be recovered, and to which extent. The RV semi-amplitudes, K, of the injected Keplerian signals have been set to specific ratios to the standard deviation of the modeled stellar activity to investigate the effect of planetary signals that have higher or smaller amplitudes than the stellar activity signal. The investigated ratios $(K : \sqrt{2} * σ_{activity})$ $\left( {K:\sqrt 2 *{\sigma _{{\rm{activity}}}}} \right)$ include 2:1, 1:1, 1:2, and 1:5. The eccentricity e, mean anomaly M and argument of periastron ω of the injected Keplerian were set to zero. We used five different model types for the fits: (i) a Keplerian model that contains only white noise (K + W); (ii) a Keplerian model plus an unconstrained GP (K + GP I); (iii) Keplerian model plus a length-scale-constrained GP (K + GP II), motivated by the results from Sect. 3; (iv) a Keplerian model plus a GP with a constrained rotation period (K + GP III), assuming that the stellar rotation period is known through auxiliary measurements; and (v) a Keplerian model plus a GP that is constrained in terms of the length scale parameter and the rotation period (K + GP IV), which is the combination of the previous two assumptions.

The priors used for the different model components are given in Table 3. Using four different stellar activity configurations, each having 12 different planet configurations and with 100 simulated data sets per final configuration, we end up with 4800 different data sets that need to be investigated by fitting five different combinations of models to them, resulting in a total of 24000 fits. As for the activity-only analysis, we also used our results for an extensive model comparison, which is presented in Appendix B.2

4.1 Accuracy

To investigate the accuracy of the derived planetary parameters we used the median symmetric accuracy (MSA; see for example chapter 4 in Morley et al. 2018). It is defined as $ξ = 100 (\exp (ℳ (| \ln (Q_{i}) |)) - 1),$ $\xi = 100\,\left( {\exp \left( {{\cal M}\left( {\left| {\ln \left( {{Q_i}} \right)} \right|} \right)} \right) - 1} \right),$ (2)

where ℳ(…) is the median function and ln(Q_i) is the natural logarithm of the accuracy ratio (Kitchenham et al. 2001; Tofallis 2015) $Q_{i} = \frac{x_{i}}{y_{i}},$ ${Q_i} = {{{x_i}} \over {{y_i}}},$ (3)

with x_i being the estimated (predicted) value, which in our case is the median of the posterior distribution for a fit to simulation i, and y_i, which is the actual value of simulation i.

The MSA can intuitively be interpreted as a percentage error. This is because if the relative error is defined as always having the same direction, the value of the MSA is the same as the uncertainty of the median percentage (Morley et al. 2018, see in particular Eqs. (12) and (13)).

Table 4 shows the MSA that we derived for the orbital periods and RV semi-amplitudes based on different activity configurations, as well as aggregated over all 24 000 investigated fits. Given the configurations we investigated and all our fit results, we find that using a Keplerian simultaneously with a QP–GP does provide the best results in terms of the MSA metric. A Keplerian model without additionally taking into account the stellar activity does generally perform much worse and results in less accurate planet parameters. While the largest positive effect is observed by adding a QP-GP to model the activity signal, constraining the QP–GP to the rotation period (K + GP III) benefits the accuracy of the derived planet parameters in most of our investigated cases. On the other hand, constraining only the length scale of the QP–GP (K + GP II) does actually lead to less accurate planet parameters based on the MSA metric, even when compared to a fully unconstrained QP–GP (K + GP I). However, constraining both the length scale and the rotation period (K + GP IV) is almost indistinguishable from a prior that only constrains the rotation period.

While Table 4 provides a general summary of the MSA metric for different combined sets of simulations not taking into account the different amplitude ratios and period ratios between activity and planet signal, we show the MSA of the orbital period and RV semi-amplitude distinguishing between all simulated configurations graphically in Fig. C.1 and Fig. C.2 in the appendix. Examining the MSA of the fitting results as a function of the period ratio between the stellar rotation period and the injected orbital period, we find that the MSA is generally smaller when the planet’s orbital period is shorter than the rotation period. An additional but obvious result is that the larger the amplitude ratio between planet and activity, the more accurately the planet parameters are determined.

Using the MSA (or any other metric based on relative accuracy) to investigate the eccentricity for our circular (e = 0) injected Keplerian signals is problematic because it would lead to a division by zero in Eq. (3). Furthermore, in contrast to the orbital period and RV semi-amplitude, the eccentricity is a parameter that is constrained between 0 and 1, and therefore the expected variance rather small. We therefore decided to investigate the median of the median eccentricities instead of the MSA for the eccentricity, since RV-semi amplitude and eccentricity are correlated anyway (i.e., a higher fit eccentricity results in higher RV semi-amplitudes). Furthermore, the derived eccentricity should be as close to zero as possible for any of our simulated cases, which makes it easy to compare the different simulations in a similar fashion as for the MSA metric. We visualize the median eccentricity for the 100 simulations of each configuration in Fig. C.3 while Table 4 shows the median eccentricities derived for each model class. The main results from the investigation of the eccentricity parameter basically reflect what has been observed for the MSA on the period and RV semi-amplitude, which is that K + GP III performs better in terms of planet parameter accuracy than the QP–GP models using the other prior distributions.

Besides, we find that the median eccentricity increases for an increasing amplitude of the stellar activity with respect to the planet RV semi-amplitude. This highlights that the eccentricity is rather difficult to constrain and very sensitive to the correlated noise in the data. Since we find significantly smaller eccentricities when simultaneously fitting for the activity by using the QP–GP, it is therefore especially helpful to use an activity model to account for the stellar noise. Nevertheless, even with the QP–GP we retrieve too high eccentricities for the simulations that consist of very prominent activity signals.

An additional effect that can be observed fromFig. C.3 is that the derived eccentricity is generally the highest for the orbital period of 44 days, which is twice that of the stellar rotation period (2:1). This particular period ratio was chosen to investigate how the activity affects the eccentricity of a planet signal when the period of the stellar rotation overlaps with the second harmonic of the injected planet. For example, Stock et al. (2020a) stated that the derived eccentricity of YZ Cet b is largely influenced by the stellar activity due to this effect. Anglada-Escudé et al. (2010) showed that eccentric solutions for a planet can hide a second planet in a 2:1 mean motion resonance. Analogously, our results show that this is also possible for another periodic signal, in our case the stellar rotation period, overlapping with the second harmonic of the planet.

Table 4

MSA of the derived orbital periods and RV semi-amplitudes and median of the derived eccentricities for different stellar activity configurations or all simulations.

4.2 Precision

The MSA does not provide any information about the individual model uncertainties and confidence intervals. For example, a prediction that is close to the actual value and by that is accurate can have underestimated uncertainties (an overestimated precision) and therefore be many standard deviations away, while a less accurate estimate with larger error bars does include the true value within less standard deviations and by that may provide a better representation of the precision of the planetary parameters. Therefore, we investigated the distance between injected and retrieved median posterior parameters in terms of standard deviations. We show an overview of these numbers for the different models and activity configurations in Table 5, while Figs. C.4, C.5, and C.6 graphically show our results as a function of amplitude and period ratio between the activity and injected planet signal. Examining this metric, we find that while GP Prior III and IV do again perform the best in the majority of cases, there are configurations where a completely unconstrained, or in one case even a Keplerian without a simultaneous GP, do result in the smallest standard deviation distance between injected and retrieved planet parameters.

In particular, the configuration with a random spot distribution and a spot lifetime of five rotation periods stands out. Here, the QP–GP leads to significant deviations between injected and obtained RV semi-amplitude, as the median of the standard deviation distance for the 100 ensembles is as large as 10 or even higher. However, considering the derived orbital periods or the eccentricities under the influence of the very same activity configuration, the fits including the GP models are again clearly superior.

Nevertheless, the examination of all three investigated planetary parameter results (the distinction between period and amplitude ratio, the scatter of the standard deviation distance and the consistency of the derived standard deviation distance by the number of outliers for a given prior), leads us to conclude that a QP–GP that is constrained in its rotation parameter to the stellar rotation period, but is not constrained in its length scale parameter, has the best performance when it comes to realistic uncertainties of the parameter estimates.

5 Discussion

5.1 QP–GP rotation and length scale hyperparameter

In the first part, we investigated the relation of the QP–GP rotation period and length scale hyperparameters to the stellar rotation period and spot lifetime using simulated activity data sets. We identified three distinct shapes in the l_GP versus P_GP diagram of the posterior distributions for the rotation period and length scale, which are directly linked to the simulated rotation period and spot lifetime. Further, we found that the spot distribution also influences the determination of the hyperparameters.

For spot lifetimes of similar magnitude to the rotation period, the posterior distribution contains a large number of solutions that are consistent with any rotation period that is permitted by the prior, but these solutions tend to have a very small length scale. In such cases, we often find that only a small fraction of the posterior solutions are consistent with a higher length scale. Only those solutions that are not part of the plateau as described in Sect. 3.1 are then also consistent with a unique rotation period, which is, as we later showed, related to the simulated stellar rotation period.

These results suggest that by considering l_GP and P_GP together, a rotation period can be determined by the QP–GP in most cases, even if the star spot lifetime is on the order of the rotation period or less. Consequentially, for the detection of a QP signal in time series data, it can be advantageous to define a lower limit for the QP–GP length scale. The presented empirical approach, to fit first an unconstrained QP–GP and then repeat the fit with a more constrained QP–GP length scale, is versatile and can be used on any time series data with correlated noise. The reason for redoing the fit and excluding the plateau is that the nested sampling algorithm is then more efficient at identifying posterior solutions with higher length scale at the possible rotation period, which otherwise would not have been detected.

However, while based on our simulated data and configurations the simulated stellar rotation period and the rotation period hyperparameter of the QP–GP generally match, there are still some exceptions. For example, for certain spot distributions, especially in the case of two active longitudes, it is possible that most or all posterior solutions of a QP–GP fit are related to a higher-order harmonic of the rotation period. For the simulations with two active longitudes, the QP–GP has had difficulties in detecting and modeling the fundamental period of the stellar rotation, especially if the spot lifetimes were rather small in comparison to the stellar rotation period, which is a result of the much more dynamic nature of spot formation and decay for such cases.

While the rotation hyperparameter provides a good description of the simulated rotation period, the picture is somewhat different when comparing the QP–GP length scale with the simulated spot lifetime. Here we find a clear dependence as longer spot lifetimes result in larger QP–GP length scales, but the absolute value of the length scale does not reflect the true simulated spot lifetime. Most importantly we find that the QP–GP length scale is sensitive to the spot distribution and to the quality of the data, which means, jitter and RV error bars have an influence on the estimated median of the spot lifetime. For our simulations, poorer data quality led to higher values for the length scale of the QP–GP.

Such a varying relation between the determined length scale parameter and the simulated spot lifetime is contradictory to the results from other studies such as Perger et al. (2021) or Nicholson & Aigrain (2022), who find strong correlations between the simulated spot lifetime and the determined length scale parameter. There is a clear difference between a parameter that gives the well-defined rotation period (differential rotation left aside) and a parameter that captures something like the “typical” lifetime of star spots. The GP kernel that we use models the lifetime in the sense of the “decay time” of the signal (in the sense of 1/e-time), which is different from the setup of our simulations where the lifetime is the time span between the emergence and disappearance of the spot. This leads to about a factor of 2 between the input parameter and the measured decay time of the GP and agrees with the results of Perger et al. (2021), who use the same configuration to simulate the activity as we do. The factor of 4, which we obtain for the well-behaved cases with spot lifetimes five and ten times the rotation period, results from the additional factor of 2 in the exponent of our kernel. However, with the wide grid of our simulations, we show that this is only the optimal case and that the ratio between the simulated and determined lifetime is not a constant. The positional correlation of star spots, for example, due to active longitudes, can significantly affect the properties of the activity signal and by this the derived QP–GP length scale, as can the uncertainty or jitter of the data set.

The co-variance function of the QP–GP is not based on an exponential function, but has a Gaussian shape. A number of studies (e.g., Rajpaul et al. 2015; Foreman-Mackey et al. 2017; Espinoza et al. 2019) argue that this does not have a severe influence on the derived length scale of the QP–GP. This is due to the fact that the two secondary diagonals that are exactly one rotation period away from the main diagonal are especially important regarding the derivation of the length scale parameter; how fast they fall off toward the corners is mostly irrelevant. But, this also suggests that not too much physical importance should be given to the length scale parameter.

In contrast to the period hyperparameter, it seems reasonable to regard the length scale hyperparameter as an “effective model parameter,” which may be related to stellar physics only very indirectly. One could of course do simulations assuming a certain lifetime for the star spots and then calibrate this parameter (see for example Perger et al. 2021), but it is not clear whether this relationship between QP–GP length scale and lifetime will still be valid if basic assumptions of the model or the data change, which is why we refrain from attempting to perform such a calibration.

A good example are the simulated cases with two active longitudes provided in this work. Here, the RV variations seem to be stable over all observed rotations of the star, although the simulated spot lifetime is only half of the simulated rotation period. In such cases, an empirically determined lifetime would measure the stability of the magnetic field configuration, or perhaps of active regions controlled by the magnetic field, but not of individual spots. Such cases are not far-fetched; real-world examples for such stars include HD 166435 (Queloz et al. 2001) or AD Leo (Kossakowski et al. 2022). In addition, we found that RV uncertainty and white noise jitter do also affect the derived length scale. It is therefore not easily possible to conclude from the data what importance one should attach to the QP–GP length scale parameter. In most cases, this is not a problem for the field of exoplanets, since we are primarily interested in planet detection and characterization. Therefore, based on our results, we advise against giving too much importance to the absolute value of the QP–GP length scale as it is not necessarily a reliable measure of the spot lifetime, even for our simplified simulations consisting of spots that all have exactly the same lifetime.

Table 5

Median value of the standard deviation distance between the derived and simulated orbital periods, RV semi-amplitudes, and eccentricities for different stellar activity configurations or all simulations.

5.2 Best practices: accurate planet parameters from activity-contaminated RV data using QP–GP models

Based on the MSA and the median eccentricity, we identify several informative key findings. Not surprisingly, but nevertheless most importantly, adding a GP to the modeling does improve the overall accuracy of the derived planet parameters, independently of the used priors. This is true even in cases where the activity signal is weaker than the planetary signal. In the majority of our simulated cases, one of the four combinations of a Keplerian model together with a red-noise model in the form of a QP–GP results in smaller MSA compared to a Keplerian model that only included a jitter term assuming white noise.

The choice of the GP Prior, however, has only little influence on accuracy and precision of the determined planetary parameters. Based on our analysis, we find that the best performance of the QP–GP is achieved when the GP is constrained to the stellar rotation period (GP Prior III), sometimes even in combination with the length scale (GP Prior IV). It can therefore be essential to take photometry or other data into consideration to estimate the stellar rotation period in order to use this as physical prior knowledge in the modeling of the RV data with the QP–GP. In contrast to the findings on the stellar activity-only analysis, we find that a constraint solely on the length scale is not beneficial, but actually reduces the accuracy of the derived planet parameters in comparison to an unconstrained prior. A reason for this could be the reduced flexibility of the QP–GP model. The length scale constrained prior (GP Prior II) should therefore only be used to detect rotation periods from time series data, but not for the simultaneous modeling of activity and planetary signals.

6 Conclusions

In this work, we have investigated the properties and advantages of QP–GP models for the detection and characterization of exoplanet signals from RV data. For this purpose, we first generated synthetic RV data that contain stellar rotational signals. In order to do this, we modeled the surface of a rotating star based on different spot distributions and spot lifetimes, and generated the RV data from these models, taking sampling and instrumental precision into account. Based on these simulations and our results, we come to the following conclusions and recommendations:

The rotation period hyperparameter of the QP–GP is in agreement with the stellar rotation period;
The QP-GP length scale hyperparameter correlates with the star’s spot lifetime, but there is no 1:1 connection between the absolute values. In addition, spot distribution and data quality can significantly affect the derived QP–GP length scale. Therefore, we advise against using the QP–GP length scale parameter as an estimate of the typical spot lifetime on the stellar surface;
To determine an unknown stellar rotation period with the QP–GP kernel, the 2D posterior distribution of the rotation period and the length scale hyperparameter provides crucial information. Constraining the length scale of the QP–GP in the case of a plateau of samples that do not favor any specific rotation period significantly increases the efficiency of detecting the correct value of the rotation period;
To derive precise and accurate planet parameters, we strongly recommend fitting for the stellar activity and not ignoring it, even if the stellar activity amplitude is small compared to the planetary amplitude;
For our simulations, we derived the best results in terms of accuracy and precision regarding the planetary parameters either when the QP–GP was constrained to the rotation period or when it was constrained to the rotation period and had a minimum length scale. However, constraining only the length scale resulted in less accurate planet parameters. This prior should therefore not be used for simultaneous fitting of planet and activity signals, but only for the detection of a rotation period from time series data.

Simulations like those presented in this work are essential for assessing the performance and properties of the models that are used for real data. Especially when it comes to non-parametric noise models such as GP regression, such analyses provide a better understanding of possible pitfalls and considerations that need to be taken into account. While the recommendations based on our astrophysically motivated simulations can certainly help us better understand GPs and the QP–GP in particular, every system can show individual characteristics, not all of which could be considered in the analysis presented in this work. Nevertheless, in a large majority of our simulations, we were able to correctly identify stellar rotation periods with the QP–GP, and we obtained better planet parameters by using it to model stellar activity.

Acknowledgements

All authors acknowledge support from the Deutsche Forschungsgemeinschaft (DFG) through the Research Unit FOR2544 “Blue Planets around Red Stars”, in particular project numbers RE 2694/4-1, RE 2694/8-1 and KU 3625/1-1. The authors thank Stefan Dreizler and Manuel Perger for fruitful discussions on Gaussian process regression for radial velocity data. Software: astropy (Astropy Collaboration 2018), dynesty (Speagle 2020), scipy (Virtanen et al. 2020), numpy (Oliphant 2006), matplotlib (Hunter 2007), pandas (The pandas development team 2020), seaborn (Waskom et al. 2020), radvel (Fulton et al. 2018), george (Ambikasaran et al. 2015), juliet (Espinoza et al. 2019).

Appendix A Conversion between l and α

The QP–GP kernel that we use in this study is parametrized in Juliet according to Espinoza et al. (2019) and is given as $k (τ) = σ_{GP}^{2} \exp (- α_{GP} τ^{2} - Γ {sin}^{2} (π τ / P_{rot})),$ $k\left( \tau \right) = \sigma _{{\rm{GP}}}^2\exp \,\left( { - {\alpha _{{\rm{GP}}}}{\tau ^2} - {\rm{\Gamma }}\,{\rm{si}}{{\rm{n}}^2}\left( {{{\pi \tau } \mathord{\left/ {\vphantom {{\pi \tau } {{P_{{\rm{rot}}}}}}} \right. \kern-\nulldelimiterspace} {{P_{{\rm{rot}}}}}}} \right)} \right),$ (A.1)

where σ_GP is the amplitude of the GP component given in parts per million (ppm) for photometric data or m s⁻¹ for RV data, Γ is the amplitude of the GP sine-squared component and is dimensionless, α is the inverse length scale of the GP exponential component given in d⁻², P_rot the period of the GP’s component given in d, and τ is the time lag.

For the fitting of the above kernel, Juliet makes use of the python package george, in particular a multiplication of an exponential-squared kernel and an exponential-sine-squared kernel³. The exponential-squared kernel is defined in george as exp(−r²/2), where r corresponds to τ in the Juliet paper. The length scale in george is proportional to exp(−r²/(2Θ))⁴. Within Juliet, the inverse of this length scale Θ = 1/α, is defined, so the kernel being fitted by Juliet is exp(−α · r^2/2) and not exp(−α · r²) as written in Eq. 8 of Espinoza et al. (2019). The conversion from the inverse length scale α to l is therefore given as $l = {\sqrt{α}}^{- 1} .$ $l = {\sqrt \alpha ^{ - 1}}.$ (A.2)

For the convenience of the reader, we have transformed the parametrization of the GP inverse length scale α given in d⁻² as provided within Juliet to the more intuitive GP length scale l given in days within this work.

Appendix B Model comparisons using the Bayesian evidence

In the main body of this work, we examined the connection of the hyperparameters of the QP–GP to the properties of the stellar activity and how the prior choice affects the modeling by the GP. Beyond that, the nested sampling algorithm we used for the fits allows us to investigate in more detail how the prior choice also affects the detection efficiency – in terms of both stellar activity signals and planetary signals.

Appendix B.1 When only stellar activity is present

First, we tested the GP models against a pure white noise model and a Kepler model, which corresponds to a non-detection and a false detection, respectively. We investigated whether the QP–GP model is the favored model for the stellar activity signal in all cases, as one would expect. Further, we considered the combination of a QP– GP component and a Keplerian component as a method for assessing the presence of additional signals in unexplored data sets, and whether the choice of the GP priors can facilitate false positives in doing so.

Fig. B.1

Single-model comparison of activity-only data. We show the percentage of significant (sig.), moderately favored (mod.), and indistinguishable (indist.) models based on all ten investigated stellar activity configurations, with each configuration consisting of 100 synthetic RV data sets (a total of 1000 data sets).

Appendix B.1.1 Single-model comparison

We compared the single models GP Prior I, GP Prior II, GP Prior III, GP Prior IV, K11, K22, and W, which are given in Table 3. The priors for the Keplerian models assume that there is some information from significant peaks in the periodogram, whose origin, however, whether planetary or due to activity, is not yet determined. With this comparison, we intend to investigate whether the Bayesian evidence is an additional reliable tool for distinguishing between stellar activity and planetary signals in cases of limited information regarding the signals in the data, for example when there is no known photometric rotation period or no compelling evidence from the stellar activity indicators.

In doing so, we investigated whether one model family (i.e., GP model, Keplerian model, or white noise model) performed better than another for a synthetic RV data set, independent of the chosen prior distributions. This means that, as soon as a GP model was better than all Κ models, the GP model was classified as the winning model for the simulation. This is important because the choice of prior affects the derived evidence, which must be taken into account when using Bayesian evidence to compare models. If two models of different classes performed similarly well (Δln𝒵 < 2.5) they were classified as indistinguishable. The results of this analysis are visualized in the pie chart in Fig. B.1.

For the majority of the data sets, about 78%, a QP– GP-only model has been the significantly favored model (Δln𝒵 > 5) while for about 5% it was at least moderately favored (Δln𝒵 > 2.5). In about 13% of the cases, the two best models of different class were indistinguishable. It is interesting that we find that for about 4% of the simulated stellar activity time series data sets, a Keplerian or white noise model has been favored for the activity signal by the log-evidence metric. Most of these cases consist of moderately favored Keplerian models.

Almost all cases where a non-GP model was indistinguishable or performed better than the GP model for the activity signal belong to simulations where the spot lifetime was small compared to the simulated stellar rotation period. Figure B.2 shows the percentages of the winning models between the different investigated activity configurations. Short spot lifetimes with respect to the rotation period lead to correlated noise on shorter timescales, which can also be well described by a simple white noise model. This explains why many indistinguishable cases and all cases where the white noise model is favored occur for a spot lifetime of 11 d. Especially in the case of a random spot distribution with spot lifetimes smaller than the rotation period, a W model has often been indistinguishable compared to the QP–GP model.

Fig. B.2

Single-model comparison of activity-only data. Shown is a breakdown of the values from Fig. B.1 into the different activity configurations. The percentages, also indicated by the point sizes, sum up to 100% for each configuration parameter category highlighted by the gray-shaded areas.

In the case of two active longitudes, it has been mainly the Keplerian model that competed with the GP regarding the modeling of the stellar activity, resulting in a higher single-digit percentage of indistinguishable solutions and a few percent of a moderately favored, or in a few cases even significantly favored, Keplerian model with respect to the GP models. The fact that a Keplerian model has been indistinguishable, or in a few cases even been moderately favored, compared to a GP model for these activity cases shows that caution is advised when using model comparison: Although in this first experiment the evidence has shown to be reliable in the majority of investigated cases, there are still a few cases where the evidence prefers the wrong model. This highlights the importance of additional investigations, auxiliary data, and simulations such as those presented in this work to verify the nature of a signal, in addition to the Bayesian evidence. It is interesting to note that most of the cases where the Keplerian model has been favored were simulations where the l_GP versus P_GP diagram showed a bar shape. Since we used non-eccentric Keplerians, this means that in cases of rather coherent stellar activity a sinusoidal fit can do a decent job at modeling stellar activity. In such cases the flexibility of the QP–GP may not be required.

Fig. B.3

Single-model comparison of activity-only data. Shown is the breakdown to the individual GP priors for the significant or moderately favored GP models from Fig. B.1. Note that this is only a qualitative comparison based on the highest evidence compared to the best non-GP model. The percentages, also indicated by the point sizes, sum up to 100% for each configuration parameter category highlighted by the gray-shaded areas.

For the cases where the GP models were significantly or moderately favored over a Keplerian or white noise model, we investigated which GP prior actually performed best. In Fig. B.3 we present a quantitative breakdown of the winning GP models with respect to the activity properties. First, it can be seen that Priors III and IV make up the bulk of the models that performed better than the Keplerian or white noise models. Only for shorter spot lifetimes, the unconstrained GP Prior I performs similarly well. This is plausible because the correlated noise on shorter timescales associated with the shorter spot lifetimes can be better modeled by the more flexible GP model. This is also reflected in the generally better performance of the GP Prior III compared to the less flexible GP IV. Solely constraining the length scale hyperparame-ter as in the GP Prior II, on the other hand, seems to not have any advantage over the other priors and accounts for only a negligible proportion of the winning models.

Appendix B.1.2 Mixed-model comparison

The Bayesian evidence is commonly used to justify whether adding an extra Keplerian is better than just fitting for white or red noise – in other words, whether a bona fide planet has been detected. Here we compare the QP–GP-only models against mixed models comprising a Keplerian and a QP–GP component (results summarized in Fig. B.4). The goal was to investigate whether certain prior choices favor the addition of an additional sinusoidal or Keplerian component by the Bayesian evidence, even though only activity signals are included in the data. In particular, we used the four different QP–GP priors motivated before and fitted them together with constrained Keple-rian models whose priors are given in Table 3. In the first scenario, we assumed that auxiliary data did not provide any constraints on the rotation period; therefore, a wide uninformative prior for the GP was chosen and the planet hypothesis of a significant peak in the periodogram was checked by the constrained Keplerian (GP I + K) to the rotational signal. Another scenario we tested is whether it would be possible within the GP framework to distinguish a planet in a 1:1 or 1:2 spin-orbit resonance, which motivates using the other three more constrained GP priors. It is not entirely clear how reliable the log-evidence metric and the QP–GP kernel are in such complex real-life cases with discrete sampling and various instruments covering a wide wavelength range.

Fig. B.4

Mixed-model comparison of activity-only data. Each panel shows the fraction of significant (sig.), moderately favored (mod.), and indistinguishable (indist.) models for one of our GP prior settings. In each plot, the percentages, also indicated by the point sizes, sum up to 100% for each configuration parameter category highlighted by the gray-shaded areas.

For all four QP–GP priors studied, we find a number of simulations where an additional Keplerian, respectively sinusoidal, component has been at least moderately favored. This is particularly the case when the simulated configuration has two active longitudes. For this configuration, the K11 model is highly significant when added to the QP–GP model, even if the QP–GP is not constrained to the fundamental period of 22 d. This shows that while the QP–GP is capable to also model harmonics of the rotation period, it is not very efficient in doing so, which is why an extra component is required by the Bayesian evidence. A similar conclusion was reached by Perger et al. (2021) based on the auto-correlation function, which motivated the authors to use a second periodic component that is added to the standard QP–GP resulting in their QPC. From our investigations, we find that an additional sinusoidal or Keplerian component was not as necessary if the QP–GP length scale was restricted to exclude small values, such that the GP does not include the posterior solutions from the plateau observed in the middle plot of Fig. 2. Nevertheless, our results regarding the harmonic of the rotation period at 11 d, and the need for an additional model component to model it, strongly favors the use of a GP in the form of the QPC (Perger et al. 2021) or the sum of multiple harmonic oscillator kernels as for example in Kossakowski et al. (2021) It shows that the standard QP–GP in the form that has been applied in this work lacks flexibility to model stellar activity in all its aspects.

If we leave the K11 model for the second harmonic aside, and concentrate on the K22 model, we still find a number of simulations where an additional Keplerian component at the rotation period is favored by the Bayesian evidence in addition to the QP–GP model. This might indicate that there exists a stable periodic component for certain stellar activity configurations, which can lead to better Bayesian evidence when an extra sinusoidal model fits this stable component while the QP–GP accounts simultaneously for any deviations from it. Our results do definitively show that the improvement of the evidence by the addition of a Keplerian model to a base QP–GP model is not a strong proof that a real planetary signal is present in the data. This has particular relevance for targets such as AD Leo (Kossakowski et al. 2022).

Appendix B.2 In the presence of stellar activity and Keplerian signals

Two of the obvious questions are (i) under which conditions Keplerian signals can successfully be recovered from the simulated activity data, and (ii) which assumptions for the QP–GP model provide the highest detection probability. To answer them, we explored three kinds of model fits to our grid of activity-plus-Keplerian data. In the first case, we assume GP-only models as the reference for pure activity and thus a non-detection of the Keplerian signal. In addition, we consider Kepler-only and combined Keplerian + GP models based on the GP priors discussed already for the activity only analysis. For the Keplerian components, we assumed a free eccentricity and uniform amplitude between 0 and 40 ms⁻¹, while allowing the period to vary in the range of plus and minus ten percent of the input period. An overview of the results is provided by the pie chart in Fig. B.5 and the detailed breakdowns into the simulated conditions and used priors in Fig. B.6 and Fig. B.7.

Fig. B.5

Model comparison on activity data with added Keplerian signals. Shown is the percentage of significant (sig.), moderately favored (mod.), and indistinguishable (indist.) models based on all configurations.

Fig. B.6

Model comparison on activity data with added Keplerian signals. Shown is the breakdown of the values from Fig. B.5 into the different activity configurations. The percentages, also indicated by the point sizes, sum up to 100% for each configuration parameter category highlighted by the gray-shaded areas.

Fig. B.7

Model comparison on activity data with added Keplerian signals. Each panel shows the fraction of significant (sig.), moderately favored (mod.), and indistinguishable (indist.) models for one of our GP prior settings. In each plot, the percentages, also indicated by the point sizes, sum up to 100% for each configuration parameter category highlighted by the gray-shaded areas.

First, it is apparent that a simple Keplerian model was never significant compared to the GP-only, or combined Keplerian+GP models, even in the case of an amplitude ratio in favor of the injected planetary signal. We therefore discuss here only the direct comparison of GP-only versus combined Keplerian + GP models. In only 68% of the cases, combined Keplerian + GP models performed significantly better than the pure activity models and thus provided a clear detection of the Keplerian signal. For almost 15% of the cases the combined models are indistinguishable from the best GP-only models, and in 11% of the cases a GP-only model even wins.

Comparing the results for the different priors in Fig. B.7, most of the significant detections of the Keplerian signal are found in combination with the GPs that had the rotation period constrained (GP Prior III), or both, the period and length scale, constrained (GP Prior IV). A sole constraint on the length scale (GP Prior II), however, performed apparently worse, also compared to the unconstrained GP Prior I. These results do not differ between the two simulated spot patterns and are in good agreement with our findings for the activity only analysis in Sect. B.1.

With respect to the different configurations of simulated activity and added Kepler signals, the amplitude of the simulated Kepler signal has, as expected, the largest influence on the proportion of false positives, since a large proportion of the models in which the Kepler signal could not be clearly detected are associated with an amplitude ratio between Kepler and activity of 0.2 or 0.5. The second-biggest factor seems to be the spot lifetime, since a larger number of non-detections is obtained when the lifetime is 22 days. This is plausible since a shorter lifetime of active regions means a less coherent activity signal and therefore more random dispersion of the RVs. In contrast, the simulated spot pattern does not make a strong difference, even though the random distribution would be expected to also lead to a less coherent activity signal. The simulated period of the Keplerian signal seemed to have no impact on the detection probability.

Appendix C Detailed results of Sect. 4

Fig. C.1

MSA (ξ) of the planetary orbital periods based on 100 ensembles, distinguishing between three differently injected orbital periods (separated by the vertical dashed gray lines) and between the five differently applied models (distinguishable by the different color of the dots). The ratio of the injected planet amplitude to the modeled stellar rotational signal increases in favor of the planet signal from the left columns to right columns. The top eight plots show the results for a random spot distribution, and the lower eight plots show the results for two active longitudes on the stellar surface. These two cases are further separated into the top row, which shows the results for a simulated spot lifetime of 22 d, and the bottom row, which shows the results for a simulated spot lifetime of 110 d.

Fig. C.2

Same as Fig. C.1 but for the planetary RV semi-amplitudes.

Fig. C.3

Same as Fig. C.1 but instead of the MSA, the median of the derived eccentrics for the 100 ensembles is shown.

Fig. C.4

Median of the absolute standard deviation distance between the injected and retrieved planetary orbital periods based on the median posteriors of 100 ensembles each. Each plot distinguishes between three different orbital periods (separated by the vertical dashed gray lines) and between the five differently applied models (distinguishable by the different color of the dots). The error bars show the 0.16 and 0.84 quartiles. The ratio of the injected planet amplitude to the modeled stellar rotational signal increases in favor of the planet signal from the left columns to right columns. The top eight plots show the results for a random spot distribution, and the lower eight plots show the results for two active longitudes on the stellar surface. These two cases are further separated into the top row, which shows the results for a simulated spot lifetime of 22 d, and the bottom row, which shows the results for a simulated spot lifetime of 110 d.

Fig. C.5

Same as Fig. C.4 but for the planetary RV semi-amplitudes.

Fig. C.6

Same as Fig. C.4 but for the planetary RV semi-amplitudes.

References

Aigrain, S., Pont, F., & Zucker, S. 2012, MNRAS, 419, 3147 [NASA ADS] [CrossRef] [Google Scholar]
Ambikasaran, S., Foreman-Mackey, D., Greengard, L., Hogg, D. W., & O’Neil, M. 2015, IEEE Trans. Pattern Anal. Mach. Intell., 38, 252 [Google Scholar]
Andersen, J. M., & Korhonen, H. 2015, MNRAS, 448, 3053 [NASA ADS] [CrossRef] [Google Scholar]
Anglada-Escudé, G., & Tuomi, M. 2015, Science, 347, 1080 [CrossRef] [Google Scholar]
Anglada-Escudé, G., López-Morales, M., & Chambers, J. E. 2010, ApJ, 709, 168 [CrossRef] [Google Scholar]
Angus, R., Morton, T., Aigrain, S., Foreman-Mackey, D., & Rajpaul, V. 2018, MNRAS, 474, 2094 [Google Scholar]
Astropy Collaboration (Price-Whelan, A. M., et al.) 2018, AJ, 156, 123 [Google Scholar]
Baluev, R. V. 2013, MNRAS, 429, 2052 [Google Scholar]
Basri, G., & Shah, R. 2020, ApJ, 901, 14 [Google Scholar]
Bauer, F. F., Reiners, A., Beeck, B., & Jeffers, S. V. 2018, A&A, 610, A52 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Bluhm, P., Pallé, E., Molaverdikhani, K., et al. 2021, A&A, 650, A78 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Boisse, I., Bouchy, F., Hébrard, G., et al. 2011, A&A, 528, A4 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Dumusque, X. 2016, A&A, 593, A5 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Espinoza, N., Kossakowski, D., & Brahm, R. 2019, MNRAS, 490, 2262 [Google Scholar]
Espinoza, N., Brahm, R., Henning, T., et al. 2020, MNRAS, 491, 2982 [NASA ADS] [Google Scholar]
Feroz, F., & Skilling, J. 2013, AIP Conf. Ser., 1553, 106 [NASA ADS] [Google Scholar]
Feroz, F., Hobson, M. P., & Bridges, M. 2009, MNRAS, 398, 1601 [NASA ADS] [CrossRef] [Google Scholar]
Foreman-Mackey, D., Agol, E., Ambikasaran, S., & Angus, R. 2017, AJ, 154, 220 [Google Scholar]
Forgács-Dajka, E., Dobos, L., & Ballai, I. 2021, A&A, 653, A50 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Fulton, B. J., Petigura, E. A., Blunt, S., & Sinukoff, E. 2018, PASP, 130, 044504 [Google Scholar]
Gibson, S. R., Howard, A. W., Rider, K., et al. 2020, SPIE Conf. Ser., 11447, 1144742 [NASA ADS] [Google Scholar]
Gupta, A. F., Wright, J. T., Robertson, P., et al. 2021, AJ, 161, 130 [NASA ADS] [CrossRef] [Google Scholar]
Hatzes, A. P. 2016, A&A, 585, A144 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Haywood, R. D., Collier Cameron, A., Queloz, D., et al. 2014, MNRAS, 443, 2517 [Google Scholar]
Herrero, E., Ribas, I., Jordi, C., et al. 2016, A&A, 586, A131 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Hunter, J. D. 2007, Comput. Sci. Eng., 9, 90 [NASA ADS] [CrossRef] [Google Scholar]
Husser, T.-O., Wende-von Berg, S., Dreizler, S., et al. 2013, A&A, 553, A6 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Jeffers, S. V., & Keller, C. U. 2009, AIP Conf. Ser., 1094, 664 [NASA ADS] [CrossRef] [Google Scholar]
Jeffers, S. V., Schöfer, P., Lamert, A., et al. 2018, A&A, 614, A76 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Jeffers, S. V., Barnes, J. R., Scheofer, P., et al. 2022, A&A, 663, A27 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Jeffreys, H. 1946, Proc. Roy. Soc. Lond. Ser. A, 186, 453 [Google Scholar]
Kitchenham, B. A., Pickard, L. M., MacDonell, S. G., & Shepperd, M. J. 2001, IEE Proc. Softw., 148, 81 [CrossRef] [Google Scholar]
Kossakowski, D., Kemmer, J., Bluhm, P., et al. 2021, A&A, 656, A124 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Kossakowski, D., Kürster, M., Henning, T., et al. 2022, A&A 666, A143 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Mayor, M., Pepe, F., Queloz, D., et al. 2003, The Messenger, 114, 20 [NASA ADS] [Google Scholar]
Morin, J., Donati, J. F., Petit, P., et al. 2010, MNRAS, 407, 2269 [Google Scholar]
Morley, S. K., Brito, T. V., & Welling, D. T. 2018, Space Weather, 16, 69 [NASA ADS] [CrossRef] [Google Scholar]
Nava, C., López-Morales, M., Haywood, R. D., & Giles, H. A. C. 2020, AJ, 159, 23 [Google Scholar]
Newton, E. R., Mondrik, N., Irwin, J., Winters, J. G., & Charbonneau, D. 2018, AJ, 156, 217 [Google Scholar]
Nicholson, B. A., & Aigrain, S. 2022, MNRAS, 515, 5251 [NASA ADS] [CrossRef] [Google Scholar]
Oliphant, T. E. 2006, A Guide to NumPy, 1 (Trelgol Publishing USA) [Google Scholar]
Pepe, F., Molaro, P., Cristiani, S., et al. 2014, Astron. Nachr., 335, 8 [Google Scholar]
Perger, M., Anglada-Escudé, G., Ribas, I., et al. 2021, A&A, 645, A58 [EDP Sciences] [Google Scholar]
Petersburg, R. R., Ong, J. M. J., Zhao, L. L., et al. 2020, AJ, 159, 187 [NASA ADS] [CrossRef] [Google Scholar]
Petrovay, K., & van Driel-Gesztelyi, L. 1997, Sol. Phys., 176, 249 [Google Scholar]
Pinamonti, M., Sozzetti, A., Maldonado, J., et al. 2022, A&A 664, A65 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Popinchalk, M., Faherty, J. K., Kiman, R., et al. 2021, ApJ, 916, 77 [NASA ADS] [CrossRef] [Google Scholar]
Queloz, D., Henry, G. W., Sivan, J. P., et al. 2001, A&A, 379, 279 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Quirrenbach, A., Amado, P. J., Caballero, J. A., et al. 2014, in Proc. SPIE, 9147, 91471F [Google Scholar]
Rajpaul, V., Aigrain, S., Osborne, M. A., Reece, S., & Roberts, S. 2015, MNRAS, 452, 2269 [Google Scholar]
Reiners, A., Shulyak, D., Anglada-Escudé, G., et al. 2013, A&A, 552, A103 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Ribas, I., Tuomi, M., Reiners, A., et al. 2018, Nature, 563, 365 [Google Scholar]
Robertson, P., Mahadevan, S., Endl, M., & Roy, A. 2014, Science, 345, 440 [Google Scholar]
Rodríguez-López, C. 2019, Front. Astron. Space Sci., 6, 76 [Google Scholar]
Saar, S. H., Butler, R. P., & Marcy, G. W. 1998, ApJ, 498, L153 [NASA ADS] [CrossRef] [Google Scholar]
Seifahrt, A., Stürmer, J., Bean, J. L., & Schwab, C. 2018, SPIE Conf. Ser., 10702, 107026D [NASA ADS] [Google Scholar]
Speagle, J. S. 2020, MNRAS, 493, 3132 [Google Scholar]
Stock, S., Kemmer, J., Reffert, S., et al. 2020a, A&A, 636, A119 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Stock, S., Nagel, E., Kemmer, J., et al. 2020b, A&A, 643, A112 [EDP Sciences] [Google Scholar]
The pandas development team 2020, https://zenodo.org/record/7979740 [Google Scholar]
Tofallis, C. 2015, J. Oper. Res. Soc., 66, 1352 [CrossRef] [Google Scholar]
Trotta, R. 2008, Contemp. Phys., 49, 71 [Google Scholar]
van Driel-Gesztelyi, L., & Green, L. M. 2015, Living Rev. Solar Phys., 12, 1 [NASA ADS] [CrossRef] [Google Scholar]
Virtanen, P., Gommers, R., Oliphant Travis E., et al. 2020, Nat. Methods, 17, 261 [NASA ADS] [CrossRef] [Google Scholar]
Waskom, M., Botvinnik, O., Ostblom, J., et al. 2020, Mwaskom/seaborn: v0.10.1 [Google Scholar]
Zhang, L., Mursula, K., Usoskin, I., & Wang, H. 2011, A&A, 529, A23 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Zhao, L. L., Fischer, D. A., Ford, E. B., et al. 2022, AJ, 163, 171 [NASA ADS] [CrossRef] [Google Scholar]

¹

https://github.com/rosich/starsim-2

²

As of July 2022.

³

See kernel definitions here: https://george.readthedocs.io/en/latest/user/kernels/

⁴

See how k₁ is defined in the george tutorial here: https://george.readthedocs.io/en/latest/tutorials/hyper/

All Tables

Table 1

Relevant fixed parameters used to simulate the stellar activity within StarSim.

In the text

Table 2

Grid of investigated stellar activity configurations.

In the text

Table 3

Different model components that were investigated in this work.

In the text

Table 4

MSA of the derived orbital periods and RV semi-amplitudes and median of the derived eccentricities for different stellar activity configurations or all simulations.

In the text

Table 5

Median value of the standard deviation distance between the derived and simulated orbital periods, RV semi-amplitudes, and eccentricities for different stellar activity configurations or all simulations.

In the text

All Figures

Fig. 1

Four simulated example data sets. The black points represent the modeled StarSim RVs, and the red points represent the simulated observations with their error bars. Top left: random spot distribution, t_spot = P_rot = 22 days. Top right: two active longitudes, t_spot = P_rot = 22 days. Bottom left: random spot distribution, t_spot = 5P_rot = 110. Bottom right: two active longitudes, t_spot = 5P_rott = 110 days.

In the text

Fig. 2

l versus P diagrams showing the posterior samples of a single GP fit with uninformed priors to one example simulation. The horizontal dashed line indicates the simulated spot lifetime, and the vertical dashed line indicates the simulated rotation period. Left: posterior samples fitted to a star with random spot distribution, P_rot = 22 days, and P_life = 110 days. A well-behaved peak is found where both the rotation period and the GP length scale are well constrained. Middle: posterior samples fitted to a star with random spot distribution, P_rot = 22 days, and P_life = 22 days. A posterior plateau with a short length scale spans the entire posterior volume. The posterior of the rotation period is not well constrained due to this plateau; however, some posterior samples form a “triangle” shape at the rotation period. Right: posterior samples fitted to a star with two active longitudes, P_rot = 22 days, and P_life = 11 days. The spots that appear on a similar longitude mimic a coherent signal whose length scale cannot be constrained by the GP, resulting in an “I-shaped” posterior distribution that reaches the upper boundary of the posterior volume.

In the text

Fig. 3

Violin plots showing the distribution of the derived median rotation periods and length scale hyperparameters of the QP–GP based on 100 simulations per spot lifetime. The blue distribution shows the results obtained from using GP Prior I (wide prior), and the orange distribution shows the results obtained by using GP Prior II (length scale constrained). For the period, the dashed line shows the simulated rotation period and the dash-dotted line the first harmonic at P_rot/2. For the length scale, we show the simulated spot lifetime divided by the determined GP length scale so that the dashed line marks a 1:1 correspondence and the dash-dotted line a factor of 4 between the two. Top: for a random spot distribution on the stellar surface. Bottom: with two active longitudes on the stellar surface.

In the text

Fig. 4

Relations of the derived GP parameters to the RV uncertainty. Top: derived GP length scale as a function of the ratio of the standard deviation of the stellar activity signal over the RV uncertainty or jitter. Note that only GP-only fits that had at least moderate significance (with ∆ ln 𝒵 > 2.5) compared to a white noise model that includes only jitter have been used. The curves were computed by applying a median filter with a window size incorporating 50 simulations. The red curve assumes a fixed jitter of σ_jitter = 0 m s⁻¹ and changes the ratio σ_model/σ_RV, the black curve assumes a fixed jitter of σ_jitter = 1.2 m s⁻¹ and changes the ratio σ_model/σ_RV, and the blue curve assumes a fixed uncertainty of σ_RV = 0.3 m s⁻¹ and changes the ratio σ_model/σ_jitter. The colored shaded area shows the appropriate inter-quartile range. The vertical dotted line represents a ratio of 1. The horizontal dashed line represents the simulated spot lifetime. Bottom: analogous to the top plot, but for the GP rotation period.

In the text

	Fig. B.1 Single-model comparison of activity-only data. We show the percentage of significant (sig.), moderately favored (mod.), and indistinguishable (indist.) models based on all ten investigated stellar activity configurations, with each configuration consisting of 100 synthetic RV data sets (a total of 1000 data sets).
In the text

	Fig. B.2 Single-model comparison of activity-only data. Shown is a breakdown of the values from Fig. B.1 into the different activity configurations. The percentages, also indicated by the point sizes, sum up to 100% for each configuration parameter category highlighted by the gray-shaded areas.
In the text

Fig. B.3

Single-model comparison of activity-only data. Shown is the breakdown to the individual GP priors for the significant or moderately favored GP models from Fig. B.1. Note that this is only a qualitative comparison based on the highest evidence compared to the best non-GP model. The percentages, also indicated by the point sizes, sum up to 100% for each configuration parameter category highlighted by the gray-shaded areas.

In the text

	Fig. B.4 Mixed-model comparison of activity-only data. Each panel shows the fraction of significant (sig.), moderately favored (mod.), and indistinguishable (indist.) models for one of our GP prior settings. In each plot, the percentages, also indicated by the point sizes, sum up to 100% for each configuration parameter category highlighted by the gray-shaded areas.
In the text

	Fig. B.5 Model comparison on activity data with added Keplerian signals. Shown is the percentage of significant (sig.), moderately favored (mod.), and indistinguishable (indist.) models based on all configurations.
In the text

	Fig. B.6 Model comparison on activity data with added Keplerian signals. Shown is the breakdown of the values from Fig. B.5 into the different activity configurations. The percentages, also indicated by the point sizes, sum up to 100% for each configuration parameter category highlighted by the gray-shaded areas.
In the text

	Fig. B.7 Model comparison on activity data with added Keplerian signals. Each panel shows the fraction of significant (sig.), moderately favored (mod.), and indistinguishable (indist.) models for one of our GP prior settings. In each plot, the percentages, also indicated by the point sizes, sum up to 100% for each configuration parameter category highlighted by the gray-shaded areas.
In the text

Fig. C.1

MSA (ξ) of the planetary orbital periods based on 100 ensembles, distinguishing between three differently injected orbital periods (separated by the vertical dashed gray lines) and between the five differently applied models (distinguishable by the different color of the dots). The ratio of the injected planet amplitude to the modeled stellar rotational signal increases in favor of the planet signal from the left columns to right columns. The top eight plots show the results for a random spot distribution, and the lower eight plots show the results for two active longitudes on the stellar surface. These two cases are further separated into the top row, which shows the results for a simulated spot lifetime of 22 d, and the bottom row, which shows the results for a simulated spot lifetime of 110 d.

In the text

	Fig. C.2 Same as Fig. C.1 but for the planetary RV semi-amplitudes.
In the text

	Fig. C.3 Same as Fig. C.1 but instead of the MSA, the median of the derived eccentrics for the 100 ensembles is shown.
In the text

Fig. C.4

Median of the absolute standard deviation distance between the injected and retrieved planetary orbital periods based on the median posteriors of 100 ensembles each. Each plot distinguishes between three different orbital periods (separated by the vertical dashed gray lines) and between the five differently applied models (distinguishable by the different color of the dots). The error bars show the 0.16 and 0.84 quartiles. The ratio of the injected planet amplitude to the modeled stellar rotational signal increases in favor of the planet signal from the left columns to right columns. The top eight plots show the results for a random spot distribution, and the lower eight plots show the results for two active longitudes on the stellar surface. These two cases are further separated into the top row, which shows the results for a simulated spot lifetime of 22 d, and the bottom row, which shows the results for a simulated spot lifetime of 110 d.

In the text

	Fig. C.5 Same as Fig. C.4 but for the planetary RV semi-amplitudes.
In the text

	Fig. C.6 Same as Fig. C.4 but for the planetary RV semi-amplitudes.
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Aigrain, S., Pont, F., & Zucker, S. 2012, MNRAS, 419, 3147 [NASA ADS] [CrossRef] [Google Scholar]

[2] Ambikasaran, S., Foreman-Mackey, D., Greengard, L., Hogg, D. W., & O’Neil, M. 2015, IEEE Trans. Pattern Anal. Mach. Intell., 38, 252 [Google Scholar]

[3] Andersen, J. M., & Korhonen, H. 2015, MNRAS, 448, 3053 [NASA ADS] [CrossRef] [Google Scholar]

[4] Anglada-Escudé, G., & Tuomi, M. 2015, Science, 347, 1080 [CrossRef] [Google Scholar]

[5] Anglada-Escudé, G., López-Morales, M., & Chambers, J. E. 2010, ApJ, 709, 168 [CrossRef] [Google Scholar]

[6] Angus, R., Morton, T., Aigrain, S., Foreman-Mackey, D., & Rajpaul, V. 2018, MNRAS, 474, 2094 [Google Scholar]

[7] Astropy Collaboration (Price-Whelan, A. M., et al.) 2018, AJ, 156, 123 [Google Scholar]

[8] Baluev, R. V. 2013, MNRAS, 429, 2052 [Google Scholar]

[9] Basri, G., & Shah, R. 2020, ApJ, 901, 14 [Google Scholar]

[10] Bauer, F. F., Reiners, A., Beeck, B., & Jeffers, S. V. 2018, A&A, 610, A52 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[11] Bluhm, P., Pallé, E., Molaverdikhani, K., et al. 2021, A&A, 650, A78 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[12] Boisse, I., Bouchy, F., Hébrard, G., et al. 2011, A&A, 528, A4 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[13] Dumusque, X. 2016, A&A, 593, A5 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[14] Espinoza, N., Kossakowski, D., & Brahm, R. 2019, MNRAS, 490, 2262 [Google Scholar]

[15] Espinoza, N., Brahm, R., Henning, T., et al. 2020, MNRAS, 491, 2982 [NASA ADS] [Google Scholar]

[16] Feroz, F., & Skilling, J. 2013, AIP Conf. Ser., 1553, 106 [NASA ADS] [Google Scholar]

[17] Feroz, F., Hobson, M. P., & Bridges, M. 2009, MNRAS, 398, 1601 [NASA ADS] [CrossRef] [Google Scholar]

[18] Foreman-Mackey, D., Agol, E., Ambikasaran, S., & Angus, R. 2017, AJ, 154, 220 [Google Scholar]

[19] Forgács-Dajka, E., Dobos, L., & Ballai, I. 2021, A&A, 653, A50 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[20] Fulton, B. J., Petigura, E. A., Blunt, S., & Sinukoff, E. 2018, PASP, 130, 044504 [Google Scholar]

[21] Gibson, S. R., Howard, A. W., Rider, K., et al. 2020, SPIE Conf. Ser., 11447, 1144742 [NASA ADS] [Google Scholar]

[22] Gupta, A. F., Wright, J. T., Robertson, P., et al. 2021, AJ, 161, 130 [NASA ADS] [CrossRef] [Google Scholar]

[23] Hatzes, A. P. 2016, A&A, 585, A144 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[24] Haywood, R. D., Collier Cameron, A., Queloz, D., et al. 2014, MNRAS, 443, 2517 [Google Scholar]

[25] Herrero, E., Ribas, I., Jordi, C., et al. 2016, A&A, 586, A131 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[26] Hunter, J. D. 2007, Comput. Sci. Eng., 9, 90 [NASA ADS] [CrossRef] [Google Scholar]

[27] Husser, T.-O., Wende-von Berg, S., Dreizler, S., et al. 2013, A&A, 553, A6 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[28] Jeffers, S. V., & Keller, C. U. 2009, AIP Conf. Ser., 1094, 664 [NASA ADS] [CrossRef] [Google Scholar]

[29] Jeffers, S. V., Schöfer, P., Lamert, A., et al. 2018, A&A, 614, A76 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[30] Jeffers, S. V., Barnes, J. R., Scheofer, P., et al. 2022, A&A, 663, A27 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[31] Jeffreys, H. 1946, Proc. Roy. Soc. Lond. Ser. A, 186, 453 [Google Scholar]

[32] Kitchenham, B. A., Pickard, L. M., MacDonell, S. G., & Shepperd, M. J. 2001, IEE Proc. Softw., 148, 81 [CrossRef] [Google Scholar]

[33] Kossakowski, D., Kemmer, J., Bluhm, P., et al. 2021, A&A, 656, A124 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[34] Kossakowski, D., Kürster, M., Henning, T., et al. 2022, A&A 666, A143 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[35] Mayor, M., Pepe, F., Queloz, D., et al. 2003, The Messenger, 114, 20 [NASA ADS] [Google Scholar]

[36] Morin, J., Donati, J. F., Petit, P., et al. 2010, MNRAS, 407, 2269 [Google Scholar]

[37] Morley, S. K., Brito, T. V., & Welling, D. T. 2018, Space Weather, 16, 69 [NASA ADS] [CrossRef] [Google Scholar]

[38] Nava, C., López-Morales, M., Haywood, R. D., & Giles, H. A. C. 2020, AJ, 159, 23 [Google Scholar]

[39] Newton, E. R., Mondrik, N., Irwin, J., Winters, J. G., & Charbonneau, D. 2018, AJ, 156, 217 [Google Scholar]

[40] Nicholson, B. A., & Aigrain, S. 2022, MNRAS, 515, 5251 [NASA ADS] [CrossRef] [Google Scholar]

[41] Oliphant, T. E. 2006, A Guide to NumPy, 1 (Trelgol Publishing USA) [Google Scholar]

[42] Pepe, F., Molaro, P., Cristiani, S., et al. 2014, Astron. Nachr., 335, 8 [Google Scholar]

[43] Perger, M., Anglada-Escudé, G., Ribas, I., et al. 2021, A&A, 645, A58 [EDP Sciences] [Google Scholar]

[44] Petersburg, R. R., Ong, J. M. J., Zhao, L. L., et al. 2020, AJ, 159, 187 [NASA ADS] [CrossRef] [Google Scholar]

[45] Petrovay, K., & van Driel-Gesztelyi, L. 1997, Sol. Phys., 176, 249 [Google Scholar]

[46] Pinamonti, M., Sozzetti, A., Maldonado, J., et al. 2022, A&A 664, A65 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[47] Popinchalk, M., Faherty, J. K., Kiman, R., et al. 2021, ApJ, 916, 77 [NASA ADS] [CrossRef] [Google Scholar]

[48] Queloz, D., Henry, G. W., Sivan, J. P., et al. 2001, A&A, 379, 279 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[49] Quirrenbach, A., Amado, P. J., Caballero, J. A., et al. 2014, in Proc. SPIE, 9147, 91471F [Google Scholar]

[50] Rajpaul, V., Aigrain, S., Osborne, M. A., Reece, S., & Roberts, S. 2015, MNRAS, 452, 2269 [Google Scholar]

[51] Reiners, A., Shulyak, D., Anglada-Escudé, G., et al. 2013, A&A, 552, A103 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[52] Ribas, I., Tuomi, M., Reiners, A., et al. 2018, Nature, 563, 365 [Google Scholar]

[53] Robertson, P., Mahadevan, S., Endl, M., & Roy, A. 2014, Science, 345, 440 [Google Scholar]

[54] Rodríguez-López, C. 2019, Front. Astron. Space Sci., 6, 76 [Google Scholar]

[55] Saar, S. H., Butler, R. P., & Marcy, G. W. 1998, ApJ, 498, L153 [NASA ADS] [CrossRef] [Google Scholar]

[56] Seifahrt, A., Stürmer, J., Bean, J. L., & Schwab, C. 2018, SPIE Conf. Ser., 10702, 107026D [NASA ADS] [Google Scholar]

[57] Speagle, J. S. 2020, MNRAS, 493, 3132 [Google Scholar]

[58] Stock, S., Kemmer, J., Reffert, S., et al. 2020a, A&A, 636, A119 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[59] Stock, S., Nagel, E., Kemmer, J., et al. 2020b, A&A, 643, A112 [EDP Sciences] [Google Scholar]

[60] The pandas development team 2020, https://zenodo.org/record/7979740 [Google Scholar]

[61] Tofallis, C. 2015, J. Oper. Res. Soc., 66, 1352 [CrossRef] [Google Scholar]

[62] Trotta, R. 2008, Contemp. Phys., 49, 71 [Google Scholar]

[63] van Driel-Gesztelyi, L., & Green, L. M. 2015, Living Rev. Solar Phys., 12, 1 [NASA ADS] [CrossRef] [Google Scholar]

[64] Virtanen, P., Gommers, R., Oliphant Travis E., et al. 2020, Nat. Methods, 17, 261 [NASA ADS] [CrossRef] [Google Scholar]