Issue 
A&A
Volume 674, June 2023



Article Number  A108  
Number of page(s)  26  
Section  Planets and planetary systems  
DOI  https://doi.org/10.1051/00046361/202244629  
Published online  09 June 2023 
Gaussian processes for radial velocity modeling
Better rotation periods and planetary parameters with the quasiperiodic kernel and constrained priors
^{1}
Landessternwarte, Zentrum für Astronomie der Universität Heidelberg,
Königstuhl 12,
69117
Heidelberg, Germany
email: j.kemmer@lsw.uniheidelberg.de, sstock@lsw.uniheidelberg.de, sreffert@lsw.uniheidelberg.de
^{2}
MaxPlanckInstitut für Astronomie,
Königstuhl 17,
69117
Heidelberg, Germany
Received:
29
July
2022
Accepted:
24
April
2023
Context. Instrumental radial velocity (RV) precision has reached a level where the detection of planetary signals is limited by the ability to understand and simultaneously model stellar astrophysical “noise.” A common method for mitigating the effects of stellar activity is Gaussian process (GP) regression.
Aims. In this study we present an analysis of the performance and properties of the quasiperiodic (QP) GP kernel, which is the multiplication of the squaredexponential kernel by the exponentialsinesquared kernel, based on an extensive set of synthetic RVs, into which the signature of activity was injected.
Methods. The stellar activity within our synthetic data sets was simulated using astrophysically motivated models with different spot distributions and spot lifetimes rotating on the surface of a modeled latetype star. We used dynamic nested sampling to fit different model sets, including QP–GPs, Keplerian models, white noise models, and combinations of these, to synthetic RV time series data that in some cases included additional injected planetary signals.
Results. We find that while the QP–GP rotation parameter matches the simulated rotation period of the star, the length scale cannot be directly connected to the spot lifetimes on the stellar surface. Regarding the setup of the priors for the QP–GP, we find that it can be advantageous to constrain the QP–GP hyperparameters in different ways depending on the application and the goal of the analysis. We find that a constraint on the length scale of the QP–GP can lead to a significant improvement in identifying the correct rotation period of the star, while a constraint on the rotation hyperparameter tends to lead to improved planet detection efficiency and more accurately derived planet parameters. Even though for most of the simulations the Bayesian evidence performed as expected, we identified not farfetched cases where a blind adoption of this metric would lead to wrong conclusions.
Conclusions. We conclude that modeling stellar astrophysical noise by using a QP–GP considerably improves detection efficiencies and leads to precise planet parameters. Nevertheless, there are also cases in which the QP–GP does not perform optimally, for example RV variations dynamically evolving on short timescales or a mixture of a very stable activity component and random variations. Knowledge of these limitations is essential for drawing correct conclusions from observational data.
Key words: planetary systems / techniques: radial velocities / methods: data analysis / methods: statistical / stars: activity
© The Authors 2023
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.
1 Introduction
The search for and characterization of lowmass exoplanets, in particular in the habitable zone of their host stars, is in full swing. Indispensable to the mass determination of these exoplanets is the radial velocity (RV) method. Current and nextgeneration highresolution spectrographs searching for exoplanet signals in the spectra of stars via the RV method are reaching precisions of around or below 1 m s^{−1}, such as HARPS (Mayor et al. 2003), CARMENES (Quirrenbach et al. 2014), ESPRESSO (Pepe et al. 2014), MAROONX (Seifahrt et al. 2018), NEID (Gupta et al. 2021), EXPRES (Petersburg et al. 2020; Zhao et al. 2022), and KPF (Gibson et al. 2020). However, the stellar activity induced by host stars can be a significant obstacle to the detection and precise characterization of these lowmass, lowamplitude (~1 m s^{−1}) planetary signals, as even the quietest stars show intrinsic RV variations on this order of magnitude (Robertson et al. 2014; AngladaEscudé & Tuomi 2015). For the detection and characterization of planets with the RV method, it is therefore crucial to understand the consequences of stellar activity in order to find optimal ways to mitigate its impact.
The stellar “noise” affecting RV measurements of stars is mainly dominated by four different effects that occur on different timescales: oscillations, granulation, active regions, and magnetic cycles (Dumusque 2016). The most problematic effects, especially for latertype stars, are caused by active regions and magnetic cycles, since the timescales and RV amplitudes can be similar to those of potential planetary signals. Active regions on the stellar surface have limited lifetimes and are constantly evolving dynamically as they move across the observed stellar hemisphere due to stellar rotation. This leads to correlated noise that can manifest itself as quasiperiodic (QP) variations in the measured RVs. This noise can mask, mimic, or influence planetary signals in the RV data (e.g., Saar et al. 1998; Queloz et al. 2001; Boisse et al. 2011; Baluev 2013; Haywood et al. 2014; Hatzes 2016; Stock et al. 2020a). In this work we focus on stellar spots and neglect faculae. In particular, for the astrophysical modeling of the correlated noise, we use an M dwarf of spectral class M2.0V. M dwarfs can show a high degree of magnetic activity and are in general spot dominated (Jeffers et al. 2022). Furthermore, no stellar oscillations, for example pmodes, have been observed for M dwarfs thus far, and the magnitude of any stellar oscillations is expected to be in the cm s^{−1} range (RodríguezLópez 2019) and can therefore be neglected in our simulated RV data set. Although we use a specific spectral type for our simulations, the general findings of this work can also be quite beneficial for modeling the RVs of other spectral types.
One way to mitigate correlated time variations is to apply noise models in the framework of Gaussian process (GP) regression models, and one particular kernel that is often used in such cases is the QP kernel (e.g., Haywood et al. 2014; Rajpaul et al. 2015; Ribas et al. 2018; Stock et al. 2020b). These stochastic models fit the stellar and instrumental noise contribution in RVs or light curves. However, the use of these models has raised some questions and concerns regarding the flexibility of these GPs (e.g., whether GPs are prone to overfitting or to absorbing planetary signals into the noise model), their application (e.g., selecting the best priors), and the correct interpretation of the results (e.g., the interpretation of hyperparameters and the implications for derived planetary parameters).
The main objective of this work is to perform a systematic analysis based on numerical simulations with known conditions in order to investigate these important questions. We created synthetic RV time series data that include the stellar astrophysical noise due to active regions. For this purpose, we used StaгSim (Herrero et al. 2016) to create RV data sets based on different spot patterns and distributions. We then investigated the properties of the QP–GP applied to these data sets in order to derive possible ways of optimally constraining the QP–GP priors. In addition, we injected Keplerian signals into the synthetic RV time series data to investigate the effect of the stellar activity and the GP modeling on the derived parameters of the planets. Investigations based on numerical simulations, such as those presented in this work, are critical for understanding how boundary conditions, properties of the data, or properties of the model may affect the results of an analysis involving a GP fit.
In Sect. 2, we describe the methods as well as the setup of the simulations and models that are used in this work. Section 3 presents and visualizes our results for GP fits to simulated activity data without additional Keplerian components, while Sect. 4 includes simulations with injected Keplerian signals. In Sect. 5, we provide a discussion of our results as well as useful guidelines for using GP models for photometric and spectroscopic time series data. Finally, we give our conclusions and a comprehensive summary of this work in Sect. 6.
2 Setup and methods
2.1 Simulation of activityinduced radial velocity signals
In order to produce synthetic activitycontaminated RV data, we made use of the tool StarSim^{1} (Herrero et al. 2016). StarSim models RVs by dividing the stellar surface into a grid of different effective temperatures depending on whether there is an active surface feature (spot) at the grid element position or not (quiet photosphere). A synthetic Phoenix spectrum (Husser et al. 2013) based on the effective temperature, the surface gravity and other relevant stellar parameters, is assigned to each of the two different surface regions. In addition, each grid element is Dopplershifted according to its position on the rotating surface. StarSim then calculates the crosscorrelation function (CCF) for each of the two different spectra, and integrates the entire grid of surface elements in terms of the CCFs instead of spectra to speed up the calculations. The effects of limb darkening, convection and limb brightening are included within StarSim. The RVs are computed from the CCFs for a userdefined grid of epochs.
2.2 Setup of the investigated stellar activity configurations
Within our investigations, we focus on the effect of different spot sizes, spot distribution, spot lifetimes, spot number, and the stellar rotation period on the stellar astrophysical noise, and hence on the derived properties of the QP–GP. We neglect instrumental jitter for most of our simulations. Nevertheless, we allow for a small instrumental uncertainty of 30 cm s^{−1}, which corresponds to the performance of current ultrastable spectrographs (e.g., shown for ESPRESSO by Suarez 2020). To introduce the RV uncertainty into the modeled stellar activity signal, we perturbed the simulated StarSim RVs with a Gaussian distribution with mean zero and standard deviation of 30 cm s^{−1}, if not stated otherwise.
The fixed parameters of the simulated star by starsim to model the RV variations due to stellar activity are given in Table 1. The stellar parameters, for example the stellar mass, radius, surface gravity, and the effective temperature, are based on our M2.0V example star. The spectral range of the simulated RV instrument is 380–690 nm, typical for modern optical spectrographs. We assumed a temperature contrast of 400 K for an effective temperature of 3500K based on the investigations of Andersen & Korhonen (2015). This temperature contrast reflects roughly the result of both the linear and polynomial fit in Fig. 2 provided in Andersen & Korhonen (2015). Simulations with different temperature contrasts have been carried out and showed that the influence of this parameter, as expected, mainly affects the RV amplitude (Reiners et al. 2013; Bauer et al. 2018).
We fixed the rotation period of our example star to P_{rot}=22 days, which is within the range of common values found for M2 stars (Newton et al. 2018; Jeffers et al. 2018; Popinchalk et al. 2021). To keep the investigations computationally feasible, the total time baseline of the RV observations, if not stated otherwise, is set to 10 P_{rot}=220 days, in which 11 RV measurements per rotation period were assumed (one measurement every 2 days). This results in 111 RV measurements over 220 days, close to a bestcase scenario for the RV observations and our GP analysis.
We systematically investigated different ratios of the spot lifetimes with respect to the stellar rotation period for spots either distributed randomly or centered at two active longitudes, resulting in a grid of 10 different cases whose details are provided in Table 2. Regarding the spot distributions, we partly followed the approach in Perger et al. (2021). We assumed a case where spots appear completely randomly on the stellar surface and a case where spots preferentially appear at longitudes that have a distance of 180 deg (socalled active longitudes). The random case can be characteristic of active, fast rotating stars (Morin et al. 2010; Perger et al. 2021). It may result in a more difficult and less favorable case for the QP–GP to pick up any correlated stellar activity signal. The case of two active longitudes has been observed for our Sun (Zhang et al. 2011). Here we assumed that the spots can only appear at longitudes drawn from the sum of two normal distributions with means of x deg and x+180deg, where x is randomly chosen between 0 and 180 deg. The standard deviation of these normal distributions has been set to σ_{act.long.}= 45 deg. Since we used different discrete spot lifetimes and distributions and had fixed the rotation period, we tweaked the spot sizes and evolution rates to get comparable activity signals of about σ_{RV} = 1–2 m s^{−1}. For each of the activity configurations under investigation, we generated 100 random realizations, taking into account random birth times and positions of spots.
The spot birth times are drawn from a uniform distribution ranging from [t_{0} − t_{spot},t_{0} + t_{obs} + t_{spot}], where t_{spot} is the spot lifetime and t_{obs} the time span of the observations. This accommodates for the fact that spots appearing before the first RV observation at t_{0} takes place can still be visible. In addition, the random birth dates imply that there will always be periods of time when more or fewer star spots are present on the simulated stellar surface. The average number of star spots on the stellar disk visible to the observer per time interval is one of the parameters affecting the RV amplitude. We used the definition of Basri & Shah (2020), which is based on the average number of star spots present on the whole stellar surface at a random time. We show four example data sets of different configurations in Fig. 1.
StarSim assumes that spots have a circular shape and a linear growth and decay law of the spot radius (in degrees) that is based on the spot evolution rate (in deg day^{−1}). For the grid analysis we assumed that spots grow until they reach their maximum spot size defined as R_{spot,max}, then stay about 10% of their lifetime at that radius, and decay thereafter, all within the assumed spot lifetime. We also performed simulations with different spot evolution rates. However, we are aware that the assumption of a linear and symmetric growth and decay of the spot radii does not represent the properties of observed Sun spots (Petrovay & van DrielGesztelyi 1997; van DrielGesztelyi & Green 2015; ForgácsDajka et al. 2021). This prescription is a fixed property of StarSim that has not been changed for the investigations in this work.
The maximum star spot radius, R_{spot,max}, is set in such a way that the desired median RV scatter is obtained, which is between 1–2 m s^{−1} for the grid analysis. However, due to the randomness regarding spot distribution and birth times of spots within our simulations, it is not possible to perfectly control the RV scatter caused by the stellar activity simulations. We also note that due to the growth and decay of the spots, the fixed maximum star spot radius only represents an upper boundary. In general, a variety of different spot sizes are apparent within our simulations at any time.
Relevant fixed parameters used to simulate the stellar activity within StarSim.
Grid of investigated stellar activity configurations.
Fig. 1 Four simulated example data sets. The black points represent the modeled StarSim RVs, and the red points represent the simulated observations with their error bars. Top left: random spot distribution, t_{spot} = P_{rot} = 22 days. Top right: two active longitudes, t_{spot} = P_{rot} = 22 days. Bottom left: random spot distribution, t_{spot} = 5P_{rot} = 110. Bottom right: two active longitudes, t_{spot} = 5P_{rott} = 110 days. 
2.3 The quasiperiodic Gaussian process (QP–GP)
A full text search for “exoplanet” and “Gaussian process” in the peerreviewed astronomical literature results in more than 300 publications^{2}. A significant fraction of them mention the “quasiperiodic” GP or kernel. Generally, a QP kernel is a family of GP covariance functions that are able to model both a periodic signal and a correlation length of that signal simultaneously. The term does not define a particular unique kernel function; however, in the astronomical literature it is common to denote the kernel resulting from the multiplication of an exponentialsinesquared kernel with a squaredexponential kernel as a “quasiperiodic kernel” (Haywood et al. 2014; Rajpaul et al. 2015). This kernel has the form (1)
where σ_{GP} is the amplitude of the GP component given in units of the data, Γ defines the relative weight between the GP sinesquared component and squaredexponential component and is dimensionless, l is the correlation length scale of the GP squaredexponential component, P_{rot} is the period of the GP sinesquared component, and τ is the time lag. l, P_{rot}, and τ are commonly given in units of time. A very common extension of the kernel function in Eq. (1) is to add a jitter term to the diagonal terms of the covariance function, allowing for a white noise contribution; this is the form we use and refer to in this work as the QP kernel. From Eq. (1), it is evident that the epithet QP for this kernel comes from its sinusoidal component. However, there are many other forms of kernels that will have a QP behavior, for example the quasiperiodic cosinekernel (QPC; Perger et al. 2021).
The particular choice of a GP kernel represents a significant part of the prior knowledge on which the modeling is based. Here, we focus on the modeling of the effects from star spots and stellar activity on RV data, where the QP–GP kernel in Eq. (1) has several advantages, as it was specifically designed for this purpose and is motivated by characteristics of stellar activity in time series data (Aigrain et al. 2012; Rajpaul et al. 2015; Angus et al. 2018). Furthermore, it is infinitely differentiable and thus leads to wellbehaved representations of the data.
2.4 The QPGP hyperparameters
The physical motivation of the QP–GP hyperparameters can be difficult: whether the parameters can be connected to real physical processes on the stellar surface and whether it may be possible to calibrate them is currently debated in the literature (Rajpaul et al. 2015; Angus et al. 2018; Perger et al. 2021). With this work we also intend to improve the general understanding of the QP–GP hyperparameters.
The amplitude hyperparameter defines the absolute strength of the covariance. This hyperparameter is therefore related to the amplitude of the activity signal. In this regard, it can be considered as an outcome of a number of physical processes on the star, such as the spot size, number of visible spots on the stellar surface, or temperature contrast. Many different combinations of these processes can lead to the same value of ; thus, it is unfeasible to connect this hyperparameter to any single physical parameter on the star.
The length scale l defines the correlation length of the signal. Large values of the length scale suggest a strong correlation for data points separated in time, while small values lead to weak or no correlation. This parameter is generally related to the lifetime of activity features on the star, for example stellar spots, and some studies have tried to calibrate the length scale parameter of kernels similar to the QP–GP kernel in attempts to better physically fit this lifetime (see for example Perger et al. 2021; Nicholson & Aigrain 2022).
The hyperparameter P correlates data points that are roughly one period apart from each other. The QP–GP has been designed in such a way that this hyperparameter should be directly connectable to the stellar rotation period, which is responsible for active magnetic regions moving across the observed hemisphere of the stellar disk.
The Γ hyperparameter describes the harmonic complexity. For small values of Γ, points with a lag other than a multiple of P, are more highly correlated than for large values of Γ, where points are less correlated if their lag is not close to a multiple of P. This hyperparameter can be related to the number of variations (or local maxima and minima) during one full rotation of the star, and by that include information about the average distribution of active regions on the stellar surface.
2.5 QP–GP priors
Regarding the choice of priors for the GP model, there are two distinct main approaches that are often used in the literature. One is to use wide and unconstrained priors for the QP–GP to allow maximum flexibility of the GP model (see for example Espinoza et al. 2020; Pinamonti et al. 2022). The other one is to curtail the flexibility of the GP model as much as possible to reduce “overfittinglike” behavior. This is achieved by applying more constrained, in most cases physically motivated, priors (see for example Nava et al. 2020; Stock et al. 2020b). The idea behind using tighter GP priors is to incorporate all the information that is available in the fit, which as a result can lead to better precision of the planetary signals. An additional benefit is a shorter run time of the sampler.
Typically, the QP–GP is constrained as follows. The amplitude parameter is challenging to constrain due to the large degeneracy of parameters affecting it. Nevertheless, there is a physically meaningful maximum, namely the scatter of the RV data of the star under consideration, which combines all the parameters of activity influencing the RV amplitude. The distribution of this hyperparameter is often chosen to be positive and uniform, or loguniform. For our analysis, we adopted a universal uniform range between 0 m s^{−1} and 40 m s^{−1}, which generously considers the range of scatter that we expected for our simulated activity.
The length scale parameter, l, is essential when it comes to modeling a meaningful periodic signal. It is a common approach to use a uniform or loguniform prior spanning a large range of possible values. However, there are some caveats: for example, a small length scale parameter leads to a dominating squaredexponential kernel, and the signal decays before one rotation has finished. One aim of our investigation was to see how the choice of this prior affects the results. We therefore tested different priors in different situations, as will be described later.
For the GP rotation period, P, it is common to use a uniform or loguniform prior that spans the range of physically possible stellar rotation periods. These can range from less than a day for very young stars to hundreds of days for giant stars in their post main sequence evolutionary phase. In cases where an estimate of the rotation period is available, for example due to auxiliary data from photometry and/or activity indicators, it is possible to use this estimate as a physical prior to further constrain the GP rotation hyperparameter. In our analysis, we use both uninformed uniform priors and constrained priors to investigate how the results differ between the two.
Large values of the harmonic complexity parameter, Γ, allow the GP to become very flexible. The reason is that the fitted curve drawn from the GP may include a higher number of turning points, which in turn can allow for more frequencies to be fitted that may not necessarily be related to the RV variations by the active regions. Therefore, it can be useful to impose an informative prior or an upper limit on Γ. Jeffers & Keller (2009) have shown that, to first approximation, any light curve can be explained by two unequally large spots that are 180 deg apart, independent of the complexity of the spot distribution on the stellar surface. This roughly corresponds to two active longitudes on the stellar surface, and its RV signal can be modeled by one to three local maxima of the GP within one full rotation period. Therefore, to reduce the flexibility of the GP at least an upper limit of Γ should be set to allow for a maximum of two to three inflection points within one rotation period, which concretely means that within this work we use a loguniform prior on Γ ranging from 0.01 to 10. Within the literature, there are cases where even more informative priors were applied to Γ, for example informative normal priors (see, for example, Nava et al. 2020, and references therein).
2.6 Modeling approach
Within this work, we used juliet (Espinoza et al. 2019) to fit the synthetic RV data that include the simulated stellar activity signals and injected planetary signals. For the modeling of Keplerian signals, juliet uses the publicly available package radvel (Fulton et al. 2018). The QP–GP kernel investigated within this work is computed by the publicly available package george (Ambikasaran et al. 2015). The evaluation of the GP likelihood with george is computationally expensive and scales as N ln N , where N is the number of data points (Ambikasaran et al. 2015). Nevertheless, george is one of the most widely used packages.
Within juliet, we used nested sampling, in particular, dynamic nested sampling based on the dynesty package (Speagle 2020) to fit the models to the data and derive the posterior distributions of the model parameters. Dynamic nested sampling is better suited than standard Markov chain Monte Carlo procedures for dealing with multimodal posterior distributions, which are common for the GP hyperparameters, since it is not as susceptible to problems caused by local minima (Feroz et al. 2009; Feroz & Skilling 2013; Speagle 2020).
While Markov chain Monte Carlo samplers generally require a good initial parameter vector, nested sampling efficiently samples the complete userdefined prior volume. An additional advantage of using nestedsampling algorithms is the fact that the marginal loglikelihood (hereafter referred to as evidence) is a natural outcome of the sampling process (Feroz et al. 2009). Bayesian model comparison, which uses the Bayesian evidence, is a powerful method for comparing distinct or nested models with different numbers of parameters and complexity against each other. Following Trotta (2008), we consider a model as a significant improvement compared to another model if the logevidence difference is larger than ∆ ln 𝒵 > 5, while ∆ ln 𝒵 > 2.5 is moderate evidence in favor of one model.
Different model components that were investigated in this work.
3 Results of QP–GP fits to synthetic RV activityonly data
In this section, we present our investigation of the relationship between the GP hyperparameters and the physical properties of the simulated stellar activity. Our large grid of different activity configurations not only allowed us to investigate the correlation between the parameters of the stellar activity and the GP hyperparameters, but also how the determination of the hyperparameters is affected by different states of activity. Our starting point for this was to use wide, uninformative priors for the GP hyperparameters (see also Table 3; referred to as GP Prior I hereafter). The focus was on the question whether the QP–GP can robustly derive the simulated stellar rotation period and spot lifetimes by connecting these values to the physically motivated QP–GP hyperparameters as described in Sect. 2.4. An analysis based on the Bayesian evidence of how the different conditions affect the probability of finding the (correct) rotation period in practice is presented in Appendix B.1.
3.1 QP–GP properties and the l_{GP} versus P_{GP} diagram
As can be seen in the middle panel of Fig. 2, the GP rotation period and the length scale parameter can be highly correlated with each other, which is not surprising, given Eq. (1). Both hyperparameters are as an exponential product interconnected with each other and should therefore not be considered in isolation. Shorter QP–GP length scales mean that the signal of the rotation period is less coherent over the time of observations.
Within the QP–GP fits to our simulated RV data sets, we find three predominant classes of possible distributions in the l_{GP} versus P_{GP} plane: the “o,” “triangle,” and “bar” shapes presented in Fig. 2 (see also Stock et al. 2020a,b; Bluhm et al. 2021; Kossakowski et al. 2021). We note that realworld scenarios of active stars have however also shown mixtures of the posterior distributions introduced here. For instance, TOI1201 (Kossakowski et al. 2021, their Fig. A.1) was observed to exhibit a mixture of a triangle shape together with a bar shape.
The left plot shows an oshaped posterior distribution, resulting in a welldefined peak within the l_{GP} versus P_{GP} parameter plane. Only in such a case can both the length scale and the rotational period of the QP–GP be well determined from the RV data. However, we only observe oshaped posterior distributions for spot lifetimes that are at least five times the rotation period. A realworld target for which such a 2D posterior distribution of the GP length scale and rotation period has been observed is the nearby star Lalande 21185 (see Fig. 22 in Stock et al. 2020b).
The middle plot shows a triangle shape that was also commonly identified in our simulations. This 2D posterior distribution is characterized by a broad plateau of solutions at small length scales that are consistent with all allowed rotation periods given the prior volume, and a triangle of posterior solutions (often with higher likelihood) sitting on top of the observed plateau indicating solutions with larger GP length scales that are also consistent with the data. Based on the fits to our simulations, the position of the triangle is often found to be related to the simulated stellar rotation period. The plateau of posterior solutions with small l_{GP} and no preference for any rotation period within the l_{GP} versus P_{GP} parameter space is a consequence of the fact that the rotation hyperparameter within the exponentialsinesquared kernel is undefined if the signal significantly decays on timescales smaller than one rotation, resulting in a dominant exponentialsquared term. This is for example the case if the spot lifetime of the star is close to or smaller than the simulated stellar rotation period. Realworld examples of such triangleshaped posterior distributions have been identified for the RV time series data of YZ Ceti, GJ 251, and TOI1201 (Stock et al. 2020a,b; Kossakowski et al. 2021).
The right diagram in Fig. 2 shows the third class of frequently observed relations between the GP length scale and the GP period. For some of our simulations, the posterior solutions form a bar shape that extends toward the upper boundary of the prior volume of l_{GP} at a period related to the stellar rotation. Such a shape is caused by the fact that the decay of the signal cannot be constrained given the data or time of observations. As a result, the length scale is estimated to “infinity” resulting in valid solutions for any large l_{GP}·In the l_{GP} versus P_{GP} diagram, this distribution can be interpreted as a signal that is to some extent coherent over the entire time of observations. A realworld example where such a posterior distribution has been observed is HD 238090 (Stock et al. 2020b).
Fig. 2 l versus P diagrams showing the posterior samples of a single GP fit with uninformed priors to one example simulation. The horizontal dashed line indicates the simulated spot lifetime, and the vertical dashed line indicates the simulated rotation period. Left: posterior samples fitted to a star with random spot distribution, P_{rot} = 22 days, and P_{life} = 110 days. A wellbehaved peak is found where both the rotation period and the GP length scale are well constrained. Middle: posterior samples fitted to a star with random spot distribution, P_{rot} = 22 days, and P_{life} = 22 days. A posterior plateau with a short length scale spans the entire posterior volume. The posterior of the rotation period is not well constrained due to this plateau; however, some posterior samples form a “triangle” shape at the rotation period. Right: posterior samples fitted to a star with two active longitudes, P_{rot} = 22 days, and P_{life} = 11 days. The spots that appear on a similar longitude mimic a coherent signal whose length scale cannot be constrained by the GP, resulting in an “Ishaped” posterior distribution that reaches the upper boundary of the posterior volume. 
3.2 Detection of stellar rotation periods by the QP–GP
In Fig. 3 we show the comparison between the derived medians of the QProtation hyperparameters and the simulated period for five different spot lifetimes and two different spot distributions. In doing so, we consider the unconstrained GP Prior I, as well as a length scale constrained prior (GP Prior II in Table 3 and hereafter), which was motivated by the triangle shape distribution described in the previous section. The idea behind this is to exclude the plateau with small length scales and unconstrained rotational modulation in order to force the GP to model a QP signal.
With both GP priors, the rotation period was derived very accurately and precisely by the QP–GP for the majority of cases in which the spot lifetime was at least a few times larger than the rotation period. For such cases, there was also no significant difference between GP Prior I or II regarding the derived median value of the stellar rotation period. In cases where the spot lifetime is close to the stellar rotation timescale (P_{life} ≤ 2P_{rot}), the estimates of the rotation period by the QP–GP were less precise. This is expected, as the signal’s coherence decreases if the positions, size, and number of stellar spots on the surface change significantly within one rotation. The determination of the rotation period is particularly hampered if the spot distribution is, in addition to the short spot lifetime, positionally uncorrected, which means for random spot distributions. In this case, we see that the unconstrained GP is not able to recover the simulated rotation period. A feature that can be seen directly from Fig. 3 is that fits using GP Prior I for the RV data sets with short spot lifetimes favor solutions with rotation periods that lie exactly in the middle of the predefined prior volume. The reason is the plateau of posterior solutions that dominates the posterior distribution over the entire period prior volume, as shown in the example plot in Fig. 2. For those instances, the median rotation periods derived by the length scale constrained GP Prior II, which excludes this plateau, are much more consistent and closer to the simulated rotation period. However, even when applying a QP–GP with Prior II we find that the derived rotation period is sensitive to the spot distribution and spot lifetimes. In the case of two active longitudes and with decreasing spot lifetimes, there are more and more cases where the median of the determined rotation period is half of the simulated one and represents therefore the second harmonic of the true rotation period. It is straightforward to understand how two active longitudes can lead to such an estimate, as it is very difficult for a correlated noise model, such as the QP–GP, to distinguish between two almost equally spotted sides of the stellar surface. In the case of shorter spot lifetimes and with two active longitudes the much faster dynamic evolution of the spot pattern observed at shorter spot lifetimes complicates the distinction of the active longitudes by the GP noise model, explaining the higher number of solutions that are then found at the second harmonic.
Fig. 3 Violin plots showing the distribution of the derived median rotation periods and length scale hyperparameters of the QP–GP based on 100 simulations per spot lifetime. The blue distribution shows the results obtained from using GP Prior I (wide prior), and the orange distribution shows the results obtained by using GP Prior II (length scale constrained). For the period, the dashed line shows the simulated rotation period and the dashdotted line the first harmonic at P_{rot}/2. For the length scale, we show the simulated spot lifetime divided by the determined GP length scale so that the dashed line marks a 1:1 correspondence and the dashdotted line a factor of 4 between the two. Top: for a random spot distribution on the stellar surface. Bottom: with two active longitudes on the stellar surface. 
3.3 Limitations in correctly identifying the star spot lifetimes
In Sect. 3.1, we showed already that only one of the posterior distributions occurring in the l_{GP} versus P_{GP} diagram results in an unambiguous measurement for the length scale hyperparameter of the GP. For a more indepth analysis of the connection between the determined length scale parameter and the simulated spot lifetime, we investigated the outcome from the previous section with focus on the length scale.
We observed a clear relation in most of the cases: longer simulated spot lifetimes result in longer correlation length scales of the QP–GP. However, as can be seen in Fig. 3, we found no 1:1 correspondence but a dependence of the derived length scale of the QP–GP that varies with the simulated spot lifetime and spot pattern. As already seen for the rotation period, the constrained GP Prior II had a clear advantage for short spot lifetimes.
Figure 3 shows that the largest deviation occurred for the unconstrained GP Prior I and short spot lifetimes (0.5 P_{rot}. to 2P_{rot}) and the resulting plateau with indefinite rotation period discussed in Sect. 3.1. The length scale hyperparameters determined from this were generally much smaller than the simulated spot lifetime – on the order of a few days at most. Strikingly, for the spot lifetimes of 22 days and 44 days the gross of the determined length scales in those cases is centered around ratios of ~10 and ~20, respectively, which corresponds to a length scale of 2 days and matches the sampling of our data. For a less regular sampling, as in the real observations in Stock et al. (2020a,b), this relation is however not apparent and the plateau occurs for length scales between 10–100 days, which highlights the importance of the visual inspection of the posterior using the l_{GP} versus P_{GP} diagram as presented in Sect. 3.1. Fitting a QP–GP with GP Prior II, which excludes the observed posterior plateau in l_{GP}, reduced the offset between l and the simulated spot lifetime, but still shows a dependence on the spot lifetime. For the cases with spot lifetimes of 5 P_{rot} or 10 P_{rot} we observed no difference in the results based on the prior choice as these result in wellbehaved oshaped 2D posterior distributions, with an approximate factor of 4 to the simulated spot lifetime.
Especially for the shortest spot lifetimes and the configuration of two active longitudes, we recognized a lot of samples that resulted in length scale parameters much larger than the simulated spot lifetime (i.e., ratios close to 0). These are the cases with a bar shape in the 2D posterior distribution. A possible explanation is that due to our definition of an average number of spots over a specific time interval, such cases are much more dynamic than our simulated cases with longer spot lifetimes. A large number of spots forms and decays over the time of the simulations, and the fast reoccurrence of spots with short lifetimes that are smaller than the rotation period tend to lead to a coherent signal that the QP–GP cannot distinguish from that of a single longlived signal. Because the spots occurring on two active longitudes are positionally correlated on opposite sides of the stellar disk, the resulting determined rotation period is often found at the second harmonic (0.5 Prot.) of the simulated rotation period. For a random spot distribution and spot lifetimes shorter than the rotation period, we find samples with determined length scale parameters longer than the simulated spot lifetimes and measured rotation periods completely unrelated to the rotation period.
3.4 Dependence of QP–GP length scale and period on jitter and RV uncertainty
So far, we only considered scenarios in which we adopted a fixed instrumental uncertainty of 30 cm s^{−1} and no additional whitenoise contribution by the star or instrument. In the following we discuss the sensitivity of the derived GP hyperparameters with respect to an added jitter, which in reality may result from unresolved stellar oscillations of the star itself or from an unstable RV instrument. For these investigations we used the stellar activity models with random spot distribution and a simulated spot lifetime of 110 days, as this is one of the cases where the QP–GP kernel is particularly effective, as we show in the previous section.
We applied different RV uncertainties, different white noise contributions, or a combination of both to the simulated data and then fitted a QP–GP with Prior I to the data. We note the difference between RV uncertainties and jitter: for the RV uncertainties, the shift of the measured value is taken into account by the larger error bars of the data, while this is not the case for the white noise contribution caused by jitter.
Figure 4 shows the derived GP hyperparameters l_{GP} and P_{GP} as a function of the ratio between the simulated stellar activity amplitude, σmodel, and the applied jitter, σ_{jitter}, as well as the RV uncertainty σ_{RV}. As can be seen, the derived length scale by the QP–GP is sensitive to this ratio. If the ratio between the activity amplitude and the RV uncertainty or jitter, respectively, is close to or below one, the GP length scale is generally determined to be larger. However, when this ratio becomes greater than one, l_{GP} approaches a lower boundary whose absolute value is only about 25% of the simulated lifetime. In comparison, the derived GP rotation period for the same simulations show a rather stable behavior, independent of the ratio of the simulated stellar activity amplitude σ_{model} to the applied jitter σ_{jitter} or to the RV uncertainty σ_{RV}. This showcases that the derived QP–GP length scale is rather sensitive to the data quality.
Fig. 4 Relations of the derived GP parameters to the RV uncertainty. Top: derived GP length scale as a function of the ratio of the standard deviation of the stellar activity signal over the RV uncertainty or jitter. Note that only GPonly fits that had at least moderate significance (with ∆ ln 𝒵 > 2.5) compared to a white noise model that includes only jitter have been used. The curves were computed by applying a median filter with a window size incorporating 50 simulations. The red curve assumes a fixed jitter of σ_{jitter} = 0 m s^{−1} and changes the ratio σ_{model}/σ_{RV}, the black curve assumes a fixed jitter of σ_{jitter} = 1.2 m s^{−1} and changes the ratio σ_{model}/σ_{RV}, and the blue curve assumes a fixed uncertainty of σ_{RV} = 0.3 m s^{−1} and changes the ratio σ_{model}/σ_{jitter}. The colored shaded area shows the appropriate interquartile range. The vertical dotted line represents a ratio of 1. The horizontal dashed line represents the simulated spot lifetime. Bottom: analogous to the top plot, but for the GP rotation period. 
4 Results of QP–GP fits to combined simulated activity and planetary signals
It is common practice to use GP components in fits to improve planetary parameters in the presence of stellar activity. However, there are different approaches to the treatment of the GP kernel, such as a free prior on the GP rotation period versus a prior constrained to the possibly known rotation period of the star. In the following, we investigate how the choice of GP prior affects the precision and accuracy of the planetary parameters.
In order to do so, the grid of investigations is extended by injecting Keplerian signals with periods P of 5.12 days, 30.35 days and 44days, hereafter referred to as KI5, KI30, and KI44. We note that KI44 has a period twice of the rotation period to investigate whether these signals can be recovered, and to which extent. The RV semiamplitudes, K, of the injected Keplerian signals have been set to specific ratios to the standard deviation of the modeled stellar activity to investigate the effect of planetary signals that have higher or smaller amplitudes than the stellar activity signal. The investigated ratios include 2:1, 1:1, 1:2, and 1:5. The eccentricity e, mean anomaly M and argument of periastron ω of the injected Keplerian were set to zero. We used five different model types for the fits: (i) a Keplerian model that contains only white noise (K + W); (ii) a Keplerian model plus an unconstrained GP (K + GP I); (iii) Keplerian model plus a lengthscaleconstrained GP (K + GP II), motivated by the results from Sect. 3; (iv) a Keplerian model plus a GP with a constrained rotation period (K + GP III), assuming that the stellar rotation period is known through auxiliary measurements; and (v) a Keplerian model plus a GP that is constrained in terms of the length scale parameter and the rotation period (K + GP IV), which is the combination of the previous two assumptions.
The priors used for the different model components are given in Table 3. Using four different stellar activity configurations, each having 12 different planet configurations and with 100 simulated data sets per final configuration, we end up with 4800 different data sets that need to be investigated by fitting five different combinations of models to them, resulting in a total of 24000 fits. As for the activityonly analysis, we also used our results for an extensive model comparison, which is presented in Appendix B.2
4.1 Accuracy
To investigate the accuracy of the derived planetary parameters we used the median symmetric accuracy (MSA; see for example chapter 4 in Morley et al. 2018). It is defined as (2)
where ℳ(…) is the median function and ln(Q_{i}) is the natural logarithm of the accuracy ratio (Kitchenham et al. 2001; Tofallis 2015) (3)
with x_{i} being the estimated (predicted) value, which in our case is the median of the posterior distribution for a fit to simulation i, and y_{i}, which is the actual value of simulation i.
The MSA can intuitively be interpreted as a percentage error. This is because if the relative error is defined as always having the same direction, the value of the MSA is the same as the uncertainty of the median percentage (Morley et al. 2018, see in particular Eqs. (12) and (13)).
Table 4 shows the MSA that we derived for the orbital periods and RV semiamplitudes based on different activity configurations, as well as aggregated over all 24 000 investigated fits. Given the configurations we investigated and all our fit results, we find that using a Keplerian simultaneously with a QP–GP does provide the best results in terms of the MSA metric. A Keplerian model without additionally taking into account the stellar activity does generally perform much worse and results in less accurate planet parameters. While the largest positive effect is observed by adding a QPGP to model the activity signal, constraining the QP–GP to the rotation period (K + GP III) benefits the accuracy of the derived planet parameters in most of our investigated cases. On the other hand, constraining only the length scale of the QP–GP (K + GP II) does actually lead to less accurate planet parameters based on the MSA metric, even when compared to a fully unconstrained QP–GP (K + GP I). However, constraining both the length scale and the rotation period (K + GP IV) is almost indistinguishable from a prior that only constrains the rotation period.
While Table 4 provides a general summary of the MSA metric for different combined sets of simulations not taking into account the different amplitude ratios and period ratios between activity and planet signal, we show the MSA of the orbital period and RV semiamplitude distinguishing between all simulated configurations graphically in Fig. C.1 and Fig. C.2 in the appendix. Examining the MSA of the fitting results as a function of the period ratio between the stellar rotation period and the injected orbital period, we find that the MSA is generally smaller when the planet’s orbital period is shorter than the rotation period. An additional but obvious result is that the larger the amplitude ratio between planet and activity, the more accurately the planet parameters are determined.
Using the MSA (or any other metric based on relative accuracy) to investigate the eccentricity for our circular (e = 0) injected Keplerian signals is problematic because it would lead to a division by zero in Eq. (3). Furthermore, in contrast to the orbital period and RV semiamplitude, the eccentricity is a parameter that is constrained between 0 and 1, and therefore the expected variance rather small. We therefore decided to investigate the median of the median eccentricities instead of the MSA for the eccentricity, since RVsemi amplitude and eccentricity are correlated anyway (i.e., a higher fit eccentricity results in higher RV semiamplitudes). Furthermore, the derived eccentricity should be as close to zero as possible for any of our simulated cases, which makes it easy to compare the different simulations in a similar fashion as for the MSA metric. We visualize the median eccentricity for the 100 simulations of each configuration in Fig. C.3 while Table 4 shows the median eccentricities derived for each model class. The main results from the investigation of the eccentricity parameter basically reflect what has been observed for the MSA on the period and RV semiamplitude, which is that K + GP III performs better in terms of planet parameter accuracy than the QP–GP models using the other prior distributions.
Besides, we find that the median eccentricity increases for an increasing amplitude of the stellar activity with respect to the planet RV semiamplitude. This highlights that the eccentricity is rather difficult to constrain and very sensitive to the correlated noise in the data. Since we find significantly smaller eccentricities when simultaneously fitting for the activity by using the QP–GP, it is therefore especially helpful to use an activity model to account for the stellar noise. Nevertheless, even with the QP–GP we retrieve too high eccentricities for the simulations that consist of very prominent activity signals.
An additional effect that can be observed fromFig. C.3 is that the derived eccentricity is generally the highest for the orbital period of 44 days, which is twice that of the stellar rotation period (2:1). This particular period ratio was chosen to investigate how the activity affects the eccentricity of a planet signal when the period of the stellar rotation overlaps with the second harmonic of the injected planet. For example, Stock et al. (2020a) stated that the derived eccentricity of YZ Cet b is largely influenced by the stellar activity due to this effect. AngladaEscudé et al. (2010) showed that eccentric solutions for a planet can hide a second planet in a 2:1 mean motion resonance. Analogously, our results show that this is also possible for another periodic signal, in our case the stellar rotation period, overlapping with the second harmonic of the planet.
MSA of the derived orbital periods and RV semiamplitudes and median of the derived eccentricities for different stellar activity configurations or all simulations.
4.2 Precision
The MSA does not provide any information about the individual model uncertainties and confidence intervals. For example, a prediction that is close to the actual value and by that is accurate can have underestimated uncertainties (an overestimated precision) and therefore be many standard deviations away, while a less accurate estimate with larger error bars does include the true value within less standard deviations and by that may provide a better representation of the precision of the planetary parameters. Therefore, we investigated the distance between injected and retrieved median posterior parameters in terms of standard deviations. We show an overview of these numbers for the different models and activity configurations in Table 5, while Figs. C.4, C.5, and C.6 graphically show our results as a function of amplitude and period ratio between the activity and injected planet signal. Examining this metric, we find that while GP Prior III and IV do again perform the best in the majority of cases, there are configurations where a completely unconstrained, or in one case even a Keplerian without a simultaneous GP, do result in the smallest standard deviation distance between injected and retrieved planet parameters.
In particular, the configuration with a random spot distribution and a spot lifetime of five rotation periods stands out. Here, the QP–GP leads to significant deviations between injected and obtained RV semiamplitude, as the median of the standard deviation distance for the 100 ensembles is as large as 10 or even higher. However, considering the derived orbital periods or the eccentricities under the influence of the very same activity configuration, the fits including the GP models are again clearly superior.
Nevertheless, the examination of all three investigated planetary parameter results (the distinction between period and amplitude ratio, the scatter of the standard deviation distance and the consistency of the derived standard deviation distance by the number of outliers for a given prior), leads us to conclude that a QP–GP that is constrained in its rotation parameter to the stellar rotation period, but is not constrained in its length scale parameter, has the best performance when it comes to realistic uncertainties of the parameter estimates.
5 Discussion
5.1 QP–GP rotation and length scale hyperparameter
In the first part, we investigated the relation of the QP–GP rotation period and length scale hyperparameters to the stellar rotation period and spot lifetime using simulated activity data sets. We identified three distinct shapes in the l_{GP} versus P_{GP} diagram of the posterior distributions for the rotation period and length scale, which are directly linked to the simulated rotation period and spot lifetime. Further, we found that the spot distribution also influences the determination of the hyperparameters.
For spot lifetimes of similar magnitude to the rotation period, the posterior distribution contains a large number of solutions that are consistent with any rotation period that is permitted by the prior, but these solutions tend to have a very small length scale. In such cases, we often find that only a small fraction of the posterior solutions are consistent with a higher length scale. Only those solutions that are not part of the plateau as described in Sect. 3.1 are then also consistent with a unique rotation period, which is, as we later showed, related to the simulated stellar rotation period.
These results suggest that by considering l_{GP} and P_{GP} together, a rotation period can be determined by the QP–GP in most cases, even if the star spot lifetime is on the order of the rotation period or less. Consequentially, for the detection of a QP signal in time series data, it can be advantageous to define a lower limit for the QP–GP length scale. The presented empirical approach, to fit first an unconstrained QP–GP and then repeat the fit with a more constrained QP–GP length scale, is versatile and can be used on any time series data with correlated noise. The reason for redoing the fit and excluding the plateau is that the nested sampling algorithm is then more efficient at identifying posterior solutions with higher length scale at the possible rotation period, which otherwise would not have been detected.
However, while based on our simulated data and configurations the simulated stellar rotation period and the rotation period hyperparameter of the QP–GP generally match, there are still some exceptions. For example, for certain spot distributions, especially in the case of two active longitudes, it is possible that most or all posterior solutions of a QP–GP fit are related to a higherorder harmonic of the rotation period. For the simulations with two active longitudes, the QP–GP has had difficulties in detecting and modeling the fundamental period of the stellar rotation, especially if the spot lifetimes were rather small in comparison to the stellar rotation period, which is a result of the much more dynamic nature of spot formation and decay for such cases.
While the rotation hyperparameter provides a good description of the simulated rotation period, the picture is somewhat different when comparing the QP–GP length scale with the simulated spot lifetime. Here we find a clear dependence as longer spot lifetimes result in larger QP–GP length scales, but the absolute value of the length scale does not reflect the true simulated spot lifetime. Most importantly we find that the QP–GP length scale is sensitive to the spot distribution and to the quality of the data, which means, jitter and RV error bars have an influence on the estimated median of the spot lifetime. For our simulations, poorer data quality led to higher values for the length scale of the QP–GP.
Such a varying relation between the determined length scale parameter and the simulated spot lifetime is contradictory to the results from other studies such as Perger et al. (2021) or Nicholson & Aigrain (2022), who find strong correlations between the simulated spot lifetime and the determined length scale parameter. There is a clear difference between a parameter that gives the welldefined rotation period (differential rotation left aside) and a parameter that captures something like the “typical” lifetime of star spots. The GP kernel that we use models the lifetime in the sense of the “decay time” of the signal (in the sense of 1/etime), which is different from the setup of our simulations where the lifetime is the time span between the emergence and disappearance of the spot. This leads to about a factor of 2 between the input parameter and the measured decay time of the GP and agrees with the results of Perger et al. (2021), who use the same configuration to simulate the activity as we do. The factor of 4, which we obtain for the wellbehaved cases with spot lifetimes five and ten times the rotation period, results from the additional factor of 2 in the exponent of our kernel. However, with the wide grid of our simulations, we show that this is only the optimal case and that the ratio between the simulated and determined lifetime is not a constant. The positional correlation of star spots, for example, due to active longitudes, can significantly affect the properties of the activity signal and by this the derived QP–GP length scale, as can the uncertainty or jitter of the data set.
The covariance function of the QP–GP is not based on an exponential function, but has a Gaussian shape. A number of studies (e.g., Rajpaul et al. 2015; ForemanMackey et al. 2017; Espinoza et al. 2019) argue that this does not have a severe influence on the derived length scale of the QP–GP. This is due to the fact that the two secondary diagonals that are exactly one rotation period away from the main diagonal are especially important regarding the derivation of the length scale parameter; how fast they fall off toward the corners is mostly irrelevant. But, this also suggests that not too much physical importance should be given to the length scale parameter.
In contrast to the period hyperparameter, it seems reasonable to regard the length scale hyperparameter as an “effective model parameter,” which may be related to stellar physics only very indirectly. One could of course do simulations assuming a certain lifetime for the star spots and then calibrate this parameter (see for example Perger et al. 2021), but it is not clear whether this relationship between QP–GP length scale and lifetime will still be valid if basic assumptions of the model or the data change, which is why we refrain from attempting to perform such a calibration.
A good example are the simulated cases with two active longitudes provided in this work. Here, the RV variations seem to be stable over all observed rotations of the star, although the simulated spot lifetime is only half of the simulated rotation period. In such cases, an empirically determined lifetime would measure the stability of the magnetic field configuration, or perhaps of active regions controlled by the magnetic field, but not of individual spots. Such cases are not farfetched; realworld examples for such stars include HD 166435 (Queloz et al. 2001) or AD Leo (Kossakowski et al. 2022). In addition, we found that RV uncertainty and white noise jitter do also affect the derived length scale. It is therefore not easily possible to conclude from the data what importance one should attach to the QP–GP length scale parameter. In most cases, this is not a problem for the field of exoplanets, since we are primarily interested in planet detection and characterization. Therefore, based on our results, we advise against giving too much importance to the absolute value of the QP–GP length scale as it is not necessarily a reliable measure of the spot lifetime, even for our simplified simulations consisting of spots that all have exactly the same lifetime.
Median value of the standard deviation distance between the derived and simulated orbital periods, RV semiamplitudes, and eccentricities for different stellar activity configurations or all simulations.
5.2 Best practices: accurate planet parameters from activitycontaminated RV data using QP–GP models
Based on the MSA and the median eccentricity, we identify several informative key findings. Not surprisingly, but nevertheless most importantly, adding a GP to the modeling does improve the overall accuracy of the derived planet parameters, independently of the used priors. This is true even in cases where the activity signal is weaker than the planetary signal. In the majority of our simulated cases, one of the four combinations of a Keplerian model together with a rednoise model in the form of a QP–GP results in smaller MSA compared to a Keplerian model that only included a jitter term assuming white noise.
The choice of the GP Prior, however, has only little influence on accuracy and precision of the determined planetary parameters. Based on our analysis, we find that the best performance of the QP–GP is achieved when the GP is constrained to the stellar rotation period (GP Prior III), sometimes even in combination with the length scale (GP Prior IV). It can therefore be essential to take photometry or other data into consideration to estimate the stellar rotation period in order to use this as physical prior knowledge in the modeling of the RV data with the QP–GP. In contrast to the findings on the stellar activityonly analysis, we find that a constraint solely on the length scale is not beneficial, but actually reduces the accuracy of the derived planet parameters in comparison to an unconstrained prior. A reason for this could be the reduced flexibility of the QP–GP model. The length scale constrained prior (GP Prior II) should therefore only be used to detect rotation periods from time series data, but not for the simultaneous modeling of activity and planetary signals.
6 Conclusions
In this work, we have investigated the properties and advantages of QP–GP models for the detection and characterization of exoplanet signals from RV data. For this purpose, we first generated synthetic RV data that contain stellar rotational signals. In order to do this, we modeled the surface of a rotating star based on different spot distributions and spot lifetimes, and generated the RV data from these models, taking sampling and instrumental precision into account. Based on these simulations and our results, we come to the following conclusions and recommendations:
The rotation period hyperparameter of the QP–GP is in agreement with the stellar rotation period;
The QPGP length scale hyperparameter correlates with the star’s spot lifetime, but there is no 1:1 connection between the absolute values. In addition, spot distribution and data quality can significantly affect the derived QP–GP length scale. Therefore, we advise against using the QP–GP length scale parameter as an estimate of the typical spot lifetime on the stellar surface;
To determine an unknown stellar rotation period with the QP–GP kernel, the 2D posterior distribution of the rotation period and the length scale hyperparameter provides crucial information. Constraining the length scale of the QP–GP in the case of a plateau of samples that do not favor any specific rotation period significantly increases the efficiency of detecting the correct value of the rotation period;
To derive precise and accurate planet parameters, we strongly recommend fitting for the stellar activity and not ignoring it, even if the stellar activity amplitude is small compared to the planetary amplitude;
For our simulations, we derived the best results in terms of accuracy and precision regarding the planetary parameters either when the QP–GP was constrained to the rotation period or when it was constrained to the rotation period and had a minimum length scale. However, constraining only the length scale resulted in less accurate planet parameters. This prior should therefore not be used for simultaneous fitting of planet and activity signals, but only for the detection of a rotation period from time series data.
Simulations like those presented in this work are essential for assessing the performance and properties of the models that are used for real data. Especially when it comes to nonparametric noise models such as GP regression, such analyses provide a better understanding of possible pitfalls and considerations that need to be taken into account. While the recommendations based on our astrophysically motivated simulations can certainly help us better understand GPs and the QP–GP in particular, every system can show individual characteristics, not all of which could be considered in the analysis presented in this work. Nevertheless, in a large majority of our simulations, we were able to correctly identify stellar rotation periods with the QP–GP, and we obtained better planet parameters by using it to model stellar activity.
Acknowledgements
All authors acknowledge support from the Deutsche Forschungsgemeinschaft (DFG) through the Research Unit FOR2544 “Blue Planets around Red Stars”, in particular project numbers RE 2694/41, RE 2694/81 and KU 3625/11. The authors thank Stefan Dreizler and Manuel Perger for fruitful discussions on Gaussian process regression for radial velocity data. Software: astropy (Astropy Collaboration 2018), dynesty (Speagle 2020), scipy (Virtanen et al. 2020), numpy (Oliphant 2006), matplotlib (Hunter 2007), pandas (The pandas development team 2020), seaborn (Waskom et al. 2020), radvel (Fulton et al. 2018), george (Ambikasaran et al. 2015), juliet (Espinoza et al. 2019).
Appendix A Conversion between l and α
The QP–GP kernel that we use in this study is parametrized in Juliet according to Espinoza et al. (2019) and is given as (A.1)
where σ_{GP} is the amplitude of the GP component given in parts per million (ppm) for photometric data or m s^{−1} for RV data, Γ is the amplitude of the GP sinesquared component and is dimensionless, α is the inverse length scale of the GP exponential component given in d^{−2}, P_{rot} the period of the GP’s component given in d, and τ is the time lag.
For the fitting of the above kernel, Juliet makes use of the python package george, in particular a multiplication of an exponentialsquared kernel and an exponentialsinesquared kernel^{3}. The exponentialsquared kernel is defined in george as exp(−r^{2}/2), where r corresponds to τ in the Juliet paper. The length scale in george is proportional to exp(−r^{2}/(2Θ))^{4}. Within Juliet, the inverse of this length scale Θ = 1/α, is defined, so the kernel being fitted by Juliet is exp(−α · r^{2/}2) and not exp(−α · r^{2}) as written in Eq. 8 of Espinoza et al. (2019). The conversion from the inverse length scale α to l is therefore given as (A.2)
For the convenience of the reader, we have transformed the parametrization of the GP inverse length scale α given in d^{−2} as provided within Juliet to the more intuitive GP length scale l given in days within this work.
Appendix B Model comparisons using the Bayesian evidence
In the main body of this work, we examined the connection of the hyperparameters of the QP–GP to the properties of the stellar activity and how the prior choice affects the modeling by the GP. Beyond that, the nested sampling algorithm we used for the fits allows us to investigate in more detail how the prior choice also affects the detection efficiency – in terms of both stellar activity signals and planetary signals.
Appendix B.1 When only stellar activity is present
First, we tested the GP models against a pure white noise model and a Kepler model, which corresponds to a nondetection and a false detection, respectively. We investigated whether the QP–GP model is the favored model for the stellar activity signal in all cases, as one would expect. Further, we considered the combination of a QP– GP component and a Keplerian component as a method for assessing the presence of additional signals in unexplored data sets, and whether the choice of the GP priors can facilitate false positives in doing so.
Fig. B.1 Singlemodel comparison of activityonly data. We show the percentage of significant (sig.), moderately favored (mod.), and indistinguishable (indist.) models based on all ten investigated stellar activity configurations, with each configuration consisting of 100 synthetic RV data sets (a total of 1000 data sets). 
Appendix B.1.1 Singlemodel comparison
We compared the single models GP Prior I, GP Prior II, GP Prior III, GP Prior IV, K11, K22, and W, which are given in Table 3. The priors for the Keplerian models assume that there is some information from significant peaks in the periodogram, whose origin, however, whether planetary or due to activity, is not yet determined. With this comparison, we intend to investigate whether the Bayesian evidence is an additional reliable tool for distinguishing between stellar activity and planetary signals in cases of limited information regarding the signals in the data, for example when there is no known photometric rotation period or no compelling evidence from the stellar activity indicators.
In doing so, we investigated whether one model family (i.e., GP model, Keplerian model, or white noise model) performed better than another for a synthetic RV data set, independent of the chosen prior distributions. This means that, as soon as a GP model was better than all Κ models, the GP model was classified as the winning model for the simulation. This is important because the choice of prior affects the derived evidence, which must be taken into account when using Bayesian evidence to compare models. If two models of different classes performed similarly well (Δln𝒵 < 2.5) they were classified as indistinguishable. The results of this analysis are visualized in the pie chart in Fig. B.1.
For the majority of the data sets, about 78%, a QP– GPonly model has been the significantly favored model (Δln𝒵 > 5) while for about 5% it was at least moderately favored (Δln𝒵 > 2.5). In about 13% of the cases, the two best models of different class were indistinguishable. It is interesting that we find that for about 4% of the simulated stellar activity time series data sets, a Keplerian or white noise model has been favored for the activity signal by the logevidence metric. Most of these cases consist of moderately favored Keplerian models.
Almost all cases where a nonGP model was indistinguishable or performed better than the GP model for the activity signal belong to simulations where the spot lifetime was small compared to the simulated stellar rotation period. Figure B.2 shows the percentages of the winning models between the different investigated activity configurations. Short spot lifetimes with respect to the rotation period lead to correlated noise on shorter timescales, which can also be well described by a simple white noise model. This explains why many indistinguishable cases and all cases where the white noise model is favored occur for a spot lifetime of 11 d. Especially in the case of a random spot distribution with spot lifetimes smaller than the rotation period, a W model has often been indistinguishable compared to the QP–GP model.
Fig. B.2 Singlemodel comparison of activityonly data. Shown is a breakdown of the values from Fig. B.1 into the different activity configurations. The percentages, also indicated by the point sizes, sum up to 100% for each configuration parameter category highlighted by the grayshaded areas. 
In the case of two active longitudes, it has been mainly the Keplerian model that competed with the GP regarding the modeling of the stellar activity, resulting in a higher singledigit percentage of indistinguishable solutions and a few percent of a moderately favored, or in a few cases even significantly favored, Keplerian model with respect to the GP models. The fact that a Keplerian model has been indistinguishable, or in a few cases even been moderately favored, compared to a GP model for these activity cases shows that caution is advised when using model comparison: Although in this first experiment the evidence has shown to be reliable in the majority of investigated cases, there are still a few cases where the evidence prefers the wrong model. This highlights the importance of additional investigations, auxiliary data, and simulations such as those presented in this work to verify the nature of a signal, in addition to the Bayesian evidence. It is interesting to note that most of the cases where the Keplerian model has been favored were simulations where the l_{GP} versus P_{GP} diagram showed a bar shape. Since we used noneccentric Keplerians, this means that in cases of rather coherent stellar activity a sinusoidal fit can do a decent job at modeling stellar activity. In such cases the flexibility of the QP–GP may not be required.
Fig. B.3 Singlemodel comparison of activityonly data. Shown is the breakdown to the individual GP priors for the significant or moderately favored GP models from Fig. B.1. Note that this is only a qualitative comparison based on the highest evidence compared to the best nonGP model. The percentages, also indicated by the point sizes, sum up to 100% for each configuration parameter category highlighted by the grayshaded areas. 
For the cases where the GP models were significantly or moderately favored over a Keplerian or white noise model, we investigated which GP prior actually performed best. In Fig. B.3 we present a quantitative breakdown of the winning GP models with respect to the activity properties. First, it can be seen that Priors III and IV make up the bulk of the models that performed better than the Keplerian or white noise models. Only for shorter spot lifetimes, the unconstrained GP Prior I performs similarly well. This is plausible because the correlated noise on shorter timescales associated with the shorter spot lifetimes can be better modeled by the more flexible GP model. This is also reflected in the generally better performance of the GP Prior III compared to the less flexible GP IV. Solely constraining the length scale hyperparameter as in the GP Prior II, on the other hand, seems to not have any advantage over the other priors and accounts for only a negligible proportion of the winning models.
Appendix B.1.2 Mixedmodel comparison
The Bayesian evidence is commonly used to justify whether adding an extra Keplerian is better than just fitting for white or red noise – in other words, whether a bona fide planet has been detected. Here we compare the QP–GPonly models against mixed models comprising a Keplerian and a QP–GP component (results summarized in Fig. B.4). The goal was to investigate whether certain prior choices favor the addition of an additional sinusoidal or Keplerian component by the Bayesian evidence, even though only activity signals are included in the data. In particular, we used the four different QP–GP priors motivated before and fitted them together with constrained Keplerian models whose priors are given in Table 3. In the first scenario, we assumed that auxiliary data did not provide any constraints on the rotation period; therefore, a wide uninformative prior for the GP was chosen and the planet hypothesis of a significant peak in the periodogram was checked by the constrained Keplerian (GP I + K) to the rotational signal. Another scenario we tested is whether it would be possible within the GP framework to distinguish a planet in a 1:1 or 1:2 spinorbit resonance, which motivates using the other three more constrained GP priors. It is not entirely clear how reliable the logevidence metric and the QP–GP kernel are in such complex reallife cases with discrete sampling and various instruments covering a wide wavelength range.
Fig. B.4 Mixedmodel comparison of activityonly data. Each panel shows the fraction of significant (sig.), moderately favored (mod.), and indistinguishable (indist.) models for one of our GP prior settings. In each plot, the percentages, also indicated by the point sizes, sum up to 100% for each configuration parameter category highlighted by the grayshaded areas. 
For all four QP–GP priors studied, we find a number of simulations where an additional Keplerian, respectively sinusoidal, component has been at least moderately favored. This is particularly the case when the simulated configuration has two active longitudes. For this configuration, the K11 model is highly significant when added to the QP–GP model, even if the QP–GP is not constrained to the fundamental period of 22 d. This shows that while the QP–GP is capable to also model harmonics of the rotation period, it is not very efficient in doing so, which is why an extra component is required by the Bayesian evidence. A similar conclusion was reached by Perger et al. (2021) based on the autocorrelation function, which motivated the authors to use a second periodic component that is added to the standard QP–GP resulting in their QPC. From our investigations, we find that an additional sinusoidal or Keplerian component was not as necessary if the QP–GP length scale was restricted to exclude small values, such that the GP does not include the posterior solutions from the plateau observed in the middle plot of Fig. 2. Nevertheless, our results regarding the harmonic of the rotation period at 11 d, and the need for an additional model component to model it, strongly favors the use of a GP in the form of the QPC (Perger et al. 2021) or the sum of multiple harmonic oscillator kernels as for example in Kossakowski et al. (2021) It shows that the standard QP–GP in the form that has been applied in this work lacks flexibility to model stellar activity in all its aspects.
If we leave the K11 model for the second harmonic aside, and concentrate on the K22 model, we still find a number of simulations where an additional Keplerian component at the rotation period is favored by the Bayesian evidence in addition to the QP–GP model. This might indicate that there exists a stable periodic component for certain stellar activity configurations, which can lead to better Bayesian evidence when an extra sinusoidal model fits this stable component while the QP–GP accounts simultaneously for any deviations from it. Our results do definitively show that the improvement of the evidence by the addition of a Keplerian model to a base QP–GP model is not a strong proof that a real planetary signal is present in the data. This has particular relevance for targets such as AD Leo (Kossakowski et al. 2022).
Appendix B.2 In the presence of stellar activity and Keplerian signals
Two of the obvious questions are (i) under which conditions Keplerian signals can successfully be recovered from the simulated activity data, and (ii) which assumptions for the QP–GP model provide the highest detection probability. To answer them, we explored three kinds of model fits to our grid of activityplusKeplerian data. In the first case, we assume GPonly models as the reference for pure activity and thus a nondetection of the Keplerian signal. In addition, we consider Kepleronly and combined Keplerian + GP models based on the GP priors discussed already for the activity only analysis. For the Keplerian components, we assumed a free eccentricity and uniform amplitude between 0 and 40 ms^{−1}, while allowing the period to vary in the range of plus and minus ten percent of the input period. An overview of the results is provided by the pie chart in Fig. B.5 and the detailed breakdowns into the simulated conditions and used priors in Fig. B.6 and Fig. B.7.
Fig. B.5 Model comparison on activity data with added Keplerian signals. Shown is the percentage of significant (sig.), moderately favored (mod.), and indistinguishable (indist.) models based on all configurations. 
Fig. B.6 Model comparison on activity data with added Keplerian signals. Shown is the breakdown of the values from Fig. B.5 into the different activity configurations. The percentages, also indicated by the point sizes, sum up to 100% for each configuration parameter category highlighted by the grayshaded areas. 
Fig. B.7 Model comparison on activity data with added Keplerian signals. Each panel shows the fraction of significant (sig.), moderately favored (mod.), and indistinguishable (indist.) models for one of our GP prior settings. In each plot, the percentages, also indicated by the point sizes, sum up to 100% for each configuration parameter category highlighted by the grayshaded areas. 
First, it is apparent that a simple Keplerian model was never significant compared to the GPonly, or combined Keplerian+GP models, even in the case of an amplitude ratio in favor of the injected planetary signal. We therefore discuss here only the direct comparison of GPonly versus combined Keplerian + GP models. In only 68% of the cases, combined Keplerian + GP models performed significantly better than the pure activity models and thus provided a clear detection of the Keplerian signal. For almost 15% of the cases the combined models are indistinguishable from the best GPonly models, and in 11% of the cases a GPonly model even wins.
Comparing the results for the different priors in Fig. B.7, most of the significant detections of the Keplerian signal are found in combination with the GPs that had the rotation period constrained (GP Prior III), or both, the period and length scale, constrained (GP Prior IV). A sole constraint on the length scale (GP Prior II), however, performed apparently worse, also compared to the unconstrained GP Prior I. These results do not differ between the two simulated spot patterns and are in good agreement with our findings for the activity only analysis in Sect. B.1.
With respect to the different configurations of simulated activity and added Kepler signals, the amplitude of the simulated Kepler signal has, as expected, the largest influence on the proportion of false positives, since a large proportion of the models in which the Kepler signal could not be clearly detected are associated with an amplitude ratio between Kepler and activity of 0.2 or 0.5. The secondbiggest factor seems to be the spot lifetime, since a larger number of nondetections is obtained when the lifetime is 22 days. This is plausible since a shorter lifetime of active regions means a less coherent activity signal and therefore more random dispersion of the RVs. In contrast, the simulated spot pattern does not make a strong difference, even though the random distribution would be expected to also lead to a less coherent activity signal. The simulated period of the Keplerian signal seemed to have no impact on the detection probability.
Appendix C Detailed results of Sect. 4
Fig. C.1 MSA (ξ) of the planetary orbital periods based on 100 ensembles, distinguishing between three differently injected orbital periods (separated by the vertical dashed gray lines) and between the five differently applied models (distinguishable by the different color of the dots). The ratio of the injected planet amplitude to the modeled stellar rotational signal increases in favor of the planet signal from the left columns to right columns. The top eight plots show the results for a random spot distribution, and the lower eight plots show the results for two active longitudes on the stellar surface. These two cases are further separated into the top row, which shows the results for a simulated spot lifetime of 22 d, and the bottom row, which shows the results for a simulated spot lifetime of 110 d. 
Fig. C.3 Same as Fig. C.1 but instead of the MSA, the median of the derived eccentrics for the 100 ensembles is shown. 
Fig. C.4 Median of the absolute standard deviation distance between the injected and retrieved planetary orbital periods based on the median posteriors of 100 ensembles each. Each plot distinguishes between three different orbital periods (separated by the vertical dashed gray lines) and between the five differently applied models (distinguishable by the different color of the dots). The error bars show the 0.16 and 0.84 quartiles. The ratio of the injected planet amplitude to the modeled stellar rotational signal increases in favor of the planet signal from the left columns to right columns. The top eight plots show the results for a random spot distribution, and the lower eight plots show the results for two active longitudes on the stellar surface. These two cases are further separated into the top row, which shows the results for a simulated spot lifetime of 22 d, and the bottom row, which shows the results for a simulated spot lifetime of 110 d. 
References
 Aigrain, S., Pont, F., & Zucker, S. 2012, MNRAS, 419, 3147 [NASA ADS] [CrossRef] [Google Scholar]
 Ambikasaran, S., ForemanMackey, D., Greengard, L., Hogg, D. W., & O’Neil, M. 2015, IEEE Trans. Pattern Anal. Mach. Intell., 38, 252 [Google Scholar]
 Andersen, J. M., & Korhonen, H. 2015, MNRAS, 448, 3053 [NASA ADS] [CrossRef] [Google Scholar]
 AngladaEscudé, G., & Tuomi, M. 2015, Science, 347, 1080 [CrossRef] [Google Scholar]
 AngladaEscudé, G., LópezMorales, M., & Chambers, J. E. 2010, ApJ, 709, 168 [CrossRef] [Google Scholar]
 Angus, R., Morton, T., Aigrain, S., ForemanMackey, D., & Rajpaul, V. 2018, MNRAS, 474, 2094 [Google Scholar]
 Astropy Collaboration (PriceWhelan, A. M., et al.) 2018, AJ, 156, 123 [Google Scholar]
 Baluev, R. V. 2013, MNRAS, 429, 2052 [Google Scholar]
 Basri, G., & Shah, R. 2020, ApJ, 901, 14 [Google Scholar]
 Bauer, F. F., Reiners, A., Beeck, B., & Jeffers, S. V. 2018, A&A, 610, A52 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Bluhm, P., Pallé, E., Molaverdikhani, K., et al. 2021, A&A, 650, A78 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Boisse, I., Bouchy, F., Hébrard, G., et al. 2011, A&A, 528, A4 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Dumusque, X. 2016, A&A, 593, A5 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Espinoza, N., Kossakowski, D., & Brahm, R. 2019, MNRAS, 490, 2262 [Google Scholar]
 Espinoza, N., Brahm, R., Henning, T., et al. 2020, MNRAS, 491, 2982 [NASA ADS] [Google Scholar]
 Feroz, F., & Skilling, J. 2013, AIP Conf. Ser., 1553, 106 [NASA ADS] [Google Scholar]
 Feroz, F., Hobson, M. P., & Bridges, M. 2009, MNRAS, 398, 1601 [NASA ADS] [CrossRef] [Google Scholar]
 ForemanMackey, D., Agol, E., Ambikasaran, S., & Angus, R. 2017, AJ, 154, 220 [Google Scholar]
 ForgácsDajka, E., Dobos, L., & Ballai, I. 2021, A&A, 653, A50 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Fulton, B. J., Petigura, E. A., Blunt, S., & Sinukoff, E. 2018, PASP, 130, 044504 [Google Scholar]
 Gibson, S. R., Howard, A. W., Rider, K., et al. 2020, SPIE Conf. Ser., 11447, 1144742 [NASA ADS] [Google Scholar]
 Gupta, A. F., Wright, J. T., Robertson, P., et al. 2021, AJ, 161, 130 [NASA ADS] [CrossRef] [Google Scholar]
 Hatzes, A. P. 2016, A&A, 585, A144 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Haywood, R. D., Collier Cameron, A., Queloz, D., et al. 2014, MNRAS, 443, 2517 [Google Scholar]
 Herrero, E., Ribas, I., Jordi, C., et al. 2016, A&A, 586, A131 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Hunter, J. D. 2007, Comput. Sci. Eng., 9, 90 [Google Scholar]
 Husser, T.O., Wendevon Berg, S., Dreizler, S., et al. 2013, A&A, 553, A6 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Jeffers, S. V., & Keller, C. U. 2009, AIP Conf. Ser., 1094, 664 [NASA ADS] [CrossRef] [Google Scholar]
 Jeffers, S. V., Schöfer, P., Lamert, A., et al. 2018, A&A, 614, A76 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Jeffers, S. V., Barnes, J. R., Scheofer, P., et al. 2022, A&A, 663, A27 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Jeffreys, H. 1946, Proc. Roy. Soc. Lond. Ser. A, 186, 453 [Google Scholar]
 Kitchenham, B. A., Pickard, L. M., MacDonell, S. G., & Shepperd, M. J. 2001, IEE Proc. Softw., 148, 81 [CrossRef] [Google Scholar]
 Kossakowski, D., Kemmer, J., Bluhm, P., et al. 2021, A&A, 656, A124 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Kossakowski, D., Kürster, M., Henning, T., et al. 2022, A&A 666, A143 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Mayor, M., Pepe, F., Queloz, D., et al. 2003, The Messenger, 114, 20 [NASA ADS] [Google Scholar]
 Morin, J., Donati, J. F., Petit, P., et al. 2010, MNRAS, 407, 2269 [Google Scholar]
 Morley, S. K., Brito, T. V., & Welling, D. T. 2018, Space Weather, 16, 69 [NASA ADS] [CrossRef] [Google Scholar]
 Nava, C., LópezMorales, M., Haywood, R. D., & Giles, H. A. C. 2020, AJ, 159, 23 [Google Scholar]
 Newton, E. R., Mondrik, N., Irwin, J., Winters, J. G., & Charbonneau, D. 2018, AJ, 156, 217 [Google Scholar]
 Nicholson, B. A., & Aigrain, S. 2022, MNRAS, 515, 5251 [NASA ADS] [CrossRef] [Google Scholar]
 Oliphant, T. E. 2006, A Guide to NumPy, 1 (Trelgol Publishing USA) [Google Scholar]
 Pepe, F., Molaro, P., Cristiani, S., et al. 2014, Astron. Nachr., 335, 8 [Google Scholar]
 Perger, M., AngladaEscudé, G., Ribas, I., et al. 2021, A&A, 645, A58 [EDP Sciences] [Google Scholar]
 Petersburg, R. R., Ong, J. M. J., Zhao, L. L., et al. 2020, AJ, 159, 187 [NASA ADS] [CrossRef] [Google Scholar]
 Petrovay, K., & van DrielGesztelyi, L. 1997, Sol. Phys., 176, 249 [Google Scholar]
 Pinamonti, M., Sozzetti, A., Maldonado, J., et al. 2022, A&A 664, A65 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Popinchalk, M., Faherty, J. K., Kiman, R., et al. 2021, ApJ, 916, 77 [NASA ADS] [CrossRef] [Google Scholar]
 Queloz, D., Henry, G. W., Sivan, J. P., et al. 2001, A&A, 379, 279 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Quirrenbach, A., Amado, P. J., Caballero, J. A., et al. 2014, in Proc. SPIE, 9147, 91471F [Google Scholar]
 Rajpaul, V., Aigrain, S., Osborne, M. A., Reece, S., & Roberts, S. 2015, MNRAS, 452, 2269 [Google Scholar]
 Reiners, A., Shulyak, D., AngladaEscudé, G., et al. 2013, A&A, 552, A103 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Ribas, I., Tuomi, M., Reiners, A., et al. 2018, Nature, 563, 365 [Google Scholar]
 Robertson, P., Mahadevan, S., Endl, M., & Roy, A. 2014, Science, 345, 440 [Google Scholar]
 RodríguezLópez, C. 2019, Front. Astron. Space Sci., 6, 76 [Google Scholar]
 Saar, S. H., Butler, R. P., & Marcy, G. W. 1998, ApJ, 498, L153 [NASA ADS] [CrossRef] [Google Scholar]
 Seifahrt, A., Stürmer, J., Bean, J. L., & Schwab, C. 2018, SPIE Conf. Ser., 10702, 107026D [NASA ADS] [Google Scholar]
 Speagle, J. S. 2020, MNRAS, 493, 3132 [Google Scholar]
 Stock, S., Kemmer, J., Reffert, S., et al. 2020a, A&A, 636, A119 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Stock, S., Nagel, E., Kemmer, J., et al. 2020b, A&A, 643, A112 [EDP Sciences] [Google Scholar]
 The pandas development team 2020, https://zenodo.org/record/7979740 [Google Scholar]
 Tofallis, C. 2015, J. Oper. Res. Soc., 66, 1352 [CrossRef] [Google Scholar]
 Trotta, R. 2008, Contemp. Phys., 49, 71 [Google Scholar]
 van DrielGesztelyi, L., & Green, L. M. 2015, Living Rev. Solar Phys., 12, 1 [NASA ADS] [CrossRef] [Google Scholar]
 Virtanen, P., Gommers, R., Oliphant Travis E., et al. 2020, Nat. Methods, 17, 261 [NASA ADS] [CrossRef] [Google Scholar]
 Waskom, M., Botvinnik, O., Ostblom, J., et al. 2020, Mwaskom/seaborn: v0.10.1 [Google Scholar]
 Zhang, L., Mursula, K., Usoskin, I., & Wang, H. 2011, A&A, 529, A23 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Zhao, L. L., Fischer, D. A., Ford, E. B., et al. 2022, AJ, 163, 171 [NASA ADS] [CrossRef] [Google Scholar]
See kernel definitions here: https://george.readthedocs.io/en/latest/user/kernels/
See how k_{1} is defined in the george tutorial here: https://george.readthedocs.io/en/latest/tutorials/hyper/
All Tables
MSA of the derived orbital periods and RV semiamplitudes and median of the derived eccentricities for different stellar activity configurations or all simulations.
Median value of the standard deviation distance between the derived and simulated orbital periods, RV semiamplitudes, and eccentricities for different stellar activity configurations or all simulations.
All Figures
Fig. 1 Four simulated example data sets. The black points represent the modeled StarSim RVs, and the red points represent the simulated observations with their error bars. Top left: random spot distribution, t_{spot} = P_{rot} = 22 days. Top right: two active longitudes, t_{spot} = P_{rot} = 22 days. Bottom left: random spot distribution, t_{spot} = 5P_{rot} = 110. Bottom right: two active longitudes, t_{spot} = 5P_{rott} = 110 days. 

In the text 
Fig. 2 l versus P diagrams showing the posterior samples of a single GP fit with uninformed priors to one example simulation. The horizontal dashed line indicates the simulated spot lifetime, and the vertical dashed line indicates the simulated rotation period. Left: posterior samples fitted to a star with random spot distribution, P_{rot} = 22 days, and P_{life} = 110 days. A wellbehaved peak is found where both the rotation period and the GP length scale are well constrained. Middle: posterior samples fitted to a star with random spot distribution, P_{rot} = 22 days, and P_{life} = 22 days. A posterior plateau with a short length scale spans the entire posterior volume. The posterior of the rotation period is not well constrained due to this plateau; however, some posterior samples form a “triangle” shape at the rotation period. Right: posterior samples fitted to a star with two active longitudes, P_{rot} = 22 days, and P_{life} = 11 days. The spots that appear on a similar longitude mimic a coherent signal whose length scale cannot be constrained by the GP, resulting in an “Ishaped” posterior distribution that reaches the upper boundary of the posterior volume. 

In the text 
Fig. 3 Violin plots showing the distribution of the derived median rotation periods and length scale hyperparameters of the QP–GP based on 100 simulations per spot lifetime. The blue distribution shows the results obtained from using GP Prior I (wide prior), and the orange distribution shows the results obtained by using GP Prior II (length scale constrained). For the period, the dashed line shows the simulated rotation period and the dashdotted line the first harmonic at P_{rot}/2. For the length scale, we show the simulated spot lifetime divided by the determined GP length scale so that the dashed line marks a 1:1 correspondence and the dashdotted line a factor of 4 between the two. Top: for a random spot distribution on the stellar surface. Bottom: with two active longitudes on the stellar surface. 

In the text 
Fig. 4 Relations of the derived GP parameters to the RV uncertainty. Top: derived GP length scale as a function of the ratio of the standard deviation of the stellar activity signal over the RV uncertainty or jitter. Note that only GPonly fits that had at least moderate significance (with ∆ ln 𝒵 > 2.5) compared to a white noise model that includes only jitter have been used. The curves were computed by applying a median filter with a window size incorporating 50 simulations. The red curve assumes a fixed jitter of σ_{jitter} = 0 m s^{−1} and changes the ratio σ_{model}/σ_{RV}, the black curve assumes a fixed jitter of σ_{jitter} = 1.2 m s^{−1} and changes the ratio σ_{model}/σ_{RV}, and the blue curve assumes a fixed uncertainty of σ_{RV} = 0.3 m s^{−1} and changes the ratio σ_{model}/σ_{jitter}. The colored shaded area shows the appropriate interquartile range. The vertical dotted line represents a ratio of 1. The horizontal dashed line represents the simulated spot lifetime. Bottom: analogous to the top plot, but for the GP rotation period. 

In the text 
Fig. B.1 Singlemodel comparison of activityonly data. We show the percentage of significant (sig.), moderately favored (mod.), and indistinguishable (indist.) models based on all ten investigated stellar activity configurations, with each configuration consisting of 100 synthetic RV data sets (a total of 1000 data sets). 

In the text 
Fig. B.2 Singlemodel comparison of activityonly data. Shown is a breakdown of the values from Fig. B.1 into the different activity configurations. The percentages, also indicated by the point sizes, sum up to 100% for each configuration parameter category highlighted by the grayshaded areas. 

In the text 
Fig. B.3 Singlemodel comparison of activityonly data. Shown is the breakdown to the individual GP priors for the significant or moderately favored GP models from Fig. B.1. Note that this is only a qualitative comparison based on the highest evidence compared to the best nonGP model. The percentages, also indicated by the point sizes, sum up to 100% for each configuration parameter category highlighted by the grayshaded areas. 

In the text 
Fig. B.4 Mixedmodel comparison of activityonly data. Each panel shows the fraction of significant (sig.), moderately favored (mod.), and indistinguishable (indist.) models for one of our GP prior settings. In each plot, the percentages, also indicated by the point sizes, sum up to 100% for each configuration parameter category highlighted by the grayshaded areas. 

In the text 
Fig. B.5 Model comparison on activity data with added Keplerian signals. Shown is the percentage of significant (sig.), moderately favored (mod.), and indistinguishable (indist.) models based on all configurations. 

In the text 
Fig. B.6 Model comparison on activity data with added Keplerian signals. Shown is the breakdown of the values from Fig. B.5 into the different activity configurations. The percentages, also indicated by the point sizes, sum up to 100% for each configuration parameter category highlighted by the grayshaded areas. 

In the text 
Fig. B.7 Model comparison on activity data with added Keplerian signals. Each panel shows the fraction of significant (sig.), moderately favored (mod.), and indistinguishable (indist.) models for one of our GP prior settings. In each plot, the percentages, also indicated by the point sizes, sum up to 100% for each configuration parameter category highlighted by the grayshaded areas. 

In the text 
Fig. C.1 MSA (ξ) of the planetary orbital periods based on 100 ensembles, distinguishing between three differently injected orbital periods (separated by the vertical dashed gray lines) and between the five differently applied models (distinguishable by the different color of the dots). The ratio of the injected planet amplitude to the modeled stellar rotational signal increases in favor of the planet signal from the left columns to right columns. The top eight plots show the results for a random spot distribution, and the lower eight plots show the results for two active longitudes on the stellar surface. These two cases are further separated into the top row, which shows the results for a simulated spot lifetime of 22 d, and the bottom row, which shows the results for a simulated spot lifetime of 110 d. 

In the text 
Fig. C.2 Same as Fig. C.1 but for the planetary RV semiamplitudes. 

In the text 
Fig. C.3 Same as Fig. C.1 but instead of the MSA, the median of the derived eccentrics for the 100 ensembles is shown. 

In the text 
Fig. C.4 Median of the absolute standard deviation distance between the injected and retrieved planetary orbital periods based on the median posteriors of 100 ensembles each. Each plot distinguishes between three different orbital periods (separated by the vertical dashed gray lines) and between the five differently applied models (distinguishable by the different color of the dots). The error bars show the 0.16 and 0.84 quartiles. The ratio of the injected planet amplitude to the modeled stellar rotational signal increases in favor of the planet signal from the left columns to right columns. The top eight plots show the results for a random spot distribution, and the lower eight plots show the results for two active longitudes on the stellar surface. These two cases are further separated into the top row, which shows the results for a simulated spot lifetime of 22 d, and the bottom row, which shows the results for a simulated spot lifetime of 110 d. 

In the text 
Fig. C.5 Same as Fig. C.4 but for the planetary RV semiamplitudes. 

In the text 
Fig. C.6 Same as Fig. C.4 but for the planetary RV semiamplitudes. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.