Accounting for stellar activity signals in radial-velocity data by using change point detection techniques

U. Simola; A. Bonfanti; X. Dumusque; J. Cisewski-Kehe; S. Kaski; J. Corander

doi:10.1051/0004-6361/202142941

Home

All issues

Volume 664 (August 2022)

A&A, 664 (2022) A127

Full HTML

Open Access

Issue		A&A Volume 664, August 2022


Article Number		A127
Number of page(s)		29
Section		Planets and planetary systems
DOI		https://doi.org/10.1051/0004-6361/202142941
Published online		23 August 2022

A&A 664, A127 (2022)

Accounting for stellar activity signals in radial-velocity data by using change point detection techniques^★

U. Simola¹, A. Bonfanti², X. Dumusque³^,★★, J. Cisewski-Kehe⁴, S. Kaski⁵ and J. Corander¹^,6

¹ Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
e-mail: umberto.simola@gmail.com
² Space Research Institute, Austrian Academy of Sciences, Schmiedlstrasse 6, 8042 Graz, Austria
e-mail: andrea.bonfanti@oeaw.ac.at
³ Observatoire de Genève, Université de Genève, 51 ch. des Maillettes, 1290 Versoix, Switzerland
⁴ Department of Statistics, University of Wisconsin – Madison, USA
⁵ Department of Computer Science, Aalto University, Espoo, Finland
⁶ Department of Biostatistics, University of Oslo, Oslo, Norway

Received: 17 December 2021
Accepted: 20 May 2022

Abstract

Context. Active regions on the photosphere of a star have been the major obstacle for detecting Earth-like exoplanets using the radial velocity (RV) method. A commonly employed solution for addressing stellar activity is to assume a linear relationship between the RV observations and the activity indicators along the entire time series, and then remove the estimated contribution of activity from the variation in RV data (overall correction method). However, since active regions evolve on the photosphere over time, correlations between the RV observations and the activity indicators will correspondingly be anisotropic.

Aims. We present an approach that recognizes the RV locations where the correlations between the RV and the activity indicators significantly change in order to better account for variations in RV caused by stellar activity.

Methods. The proposed approach uses a general family of statistical breakpoint methods, often referred to as change point detection (CPD) algorithms; several implementations of which are available in R and python. A thorough comparison is made between the breakpoint-based approach and the overall correction method. To ensure wide representativity, we use measurements from real stars that have different levels of stellar activity and whose spectra have different signal-to-noise ratios.

Results. When the corrections for stellar activity are applied separately to each temporal segment identified by the breakpoint method, the corresponding residuals in the RV time series are typically much smaller than those obtained by the overall correction method. Consequently, the generalized Lomb–Scargle periodogram contains a smaller number of peaks caused by active regions. The CPD algorithm is particularly effective when focusing on active stars with long time series, such as α Cen B. In that case, we demonstrate that the breakpoint method improves the detection limit of exoplanets by 74% on average with respect to the overall correction method.

Conclusions. CPD algorithms provide a useful statistical framework for estimating the presence of change points in a time series. Since the process underlying the RV measurements generates anisotropic data by its intrinsic properties, it is natural to use CPD to obtain cleaner signals from RV data. We anticipate that the improved exoplanet detection limit may lead to a widespread adoption of such an approach. Our test on the HD 192310 planetary system is encouraging, as we confirm the presence of the two hosted exoplanets and we determine orbital parameters consistent with the literature, also providing much more precise estimates for HD 192310 c.

Key words: techniques: radial velocities / methods: data analysis / stars: activity / planetary systems

^★

Based on observations collected at the La Silla Paranal Observatory, ESO (Chile), with the HARPS spectrograph at the 3.6-m telescope.

^★★

Branco Weiss Fellow–Society in Science (http://www.society-in-science.org)

© U. Simola et al. 2022

Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe-to-Open model. Subscribe to A&A to support open access publication.

1 Introduction

The radial velocity (RV) method (e.g., Mayor & Queloz 1995; Lovis & Fischer 2010; Hatzes 2016) is one of the most successful techniques for detecting extrasolar planets (e.g., Fischer et al. 2016). Nevertheless, perturbations in the RV data caused by different types of stellar signals such as stellar oscillations, granulations, and the presence of active photospheric regions continuously plague the exoplanet community (e.g., Saar & Donahue 1997; Queloz et al. 2001; Lindegren & Dravins 2003; Desort et al. 2007; Meunier et al. 2010, 2017; Dumusque et al. 2011a; Dumusque 2016, 2018). In particular, stellar activity in the form of spots and faculae represents the main limitation in the full characterization of Earth-like exoplanets in RV surveys (e.g., Saar & Donahue 1997; Meunier et al. 2010; Dumusque et al. 2014; Dumusque 2018). New generation spectrographs such as the EXtreme PREcision Spectrometer (EXPRES, Jurgenson et al. 2016) and the Echelle SPectrograph for Rocky Exoplanet and Stable Spectroscopic Observations (ESPRESSO, Pepe et al. 2014) have recently been constructed to solve the issue of instrumental signal, and therefore to provide data only affected by stellar and planetary signals. However, disentangling the perturbations due to stellar activity from the RV variations caused by small size exoplanets remains the most crucial challenge (e.g., Davis et al. 2017; Dumusque 2018; Reiners et al. 2018; Simola et al. 2019), as the RV variations induced by active regions are an order of magnitude larger than those expected from Earth-like exoplanets (e.g., Dumusque 2018).

Efforts have been made to model the signals caused by different sources of stellar variability within the RV time series (e.g., Tuomi et al. 2013; Rajpaul et al. 2015; Davis et al. 2017; Simola et al. 2019). Several solutions have been successfully proposed in order to deal with stellar oscillations and granulation phenomena, such as: calculating stellar evolution sequences (e.g., Christensen-Dalsgaard et al. 1995); fitting a two-level structure tracking (TST) algorithm based on a two-level representation of granulation (Del Moro 2004); using daytime spectra of the Sun in order to measure the solar oscillations (e.g., Kjeldsen et al. 2008; Lefebvre et al. 2008); and characterizing the statistical properties of magnetic activity cycles focusing on HARPS observations (e.g., Pepe et al. 2011; Dumusque et al. 2011b). However, properly modeling the other sources of stellar activity remains extremely challenging (e.g., Nava et al. 2019). In the present work, we deal with the cross correlation function (CCF) that is derived from the stellar spectrum (e.g., Hatzes 1996; Hatzes & Cochran 2000; Fiorenzano et al. 2005). As it is well known, the CCF barycenter estimates the RV of the star. The asymmetry and the full width at half maximum (FWHM) of the CCF give a strong indication for stellar activity, meaning that variations in RV are caused by active regions rather than by an exoplanet (e.g., Hatzes 1996; Queloz et al. 2001; Boisse et al. 2011; Figueira et al. 2013; Simola et al. 2019). Several solutions have been successfully proposed for mitigating stellar activity perturbations when working with RV measurements, including: decorrelating RV data against activity indicators such as log ${R^{'}}_{HK}$ ${{R'}_{{\rm{HK}}}}$ (e.g., Wilson 1968; Noyes et al. 1984) or Hα (e.g., Robertson et al. 2014); modeling stellar activity by fitting Gaussian processes (GPs, Rasmussen & Williams 2005; Haywood et al. 2014; Rajpaul et al. 2015); or moving averages (e.g., Tuomi et al. 2013) to the RV data. A common statistic employed for identifying changes in the shape of the CCF is the bisector span (e.g., Hatzes 1996; Queloz et al. 2001).

As already pointed out, disentangling the signal caused by stellar active regions and the signal due to an Earth-like planet is extremely challenging, but some differences in the characteristics of the two signals may be of help. Active regions produced by spots and faculae evolve on the star’s photosphere and can generate RV variations that spread from a few days to several weeks, depending on the rotation period of the star and the intensity of the magnetic cycle (e.g., Noyes et al. 1984; Dumusque et al. 2014; Davis et al. 2017; Dumusque 2018; Nava et al. 2019). While exoplanets produce a Doppler-shift on the CCF without modifying its shape or its width, active regions produce variations in both the asymmetry and the FWHM of the CCF, whose effect is a variation in the barycenter of the CCF, and therefore in the estimated median of the RV CCF (see e.g., Hatzes 1996; Queloz et al. 2001; Boisse et al. 2011; Figueira et al. 2013; Dumusque et al. 2017; Simola et al. 2019; Thompson et al. 2020), hereafter indicated as RV. We refer the reader to Table 1 for the main RV-related symbols we adopt throughout paper. Moreover, while an exoplanet produces a persistent Keplerian signal, the signal produced by active regions is not persistent as it waxes, wanes, and changes as a function of time (e.g., Fischer et al. 2016; Davis et al. 2017; Dumusque 2018; Thompson et al. 2020).

By using the stellar activity parameters estimated from the study of the CCF, a linear model is often proposed in order to decorrelate the RV data against the RV variations due to stellar activity (e.g., Dumusque et al. 2017; Feng et al. 2017a; Simola et al. 2019). Rather than using a normal fit to the CCF and then calculating the bisector span, Simola et al. (2019) used a skew normal (SN) fit to the CCF. The SN distribution allows a parameter to be specified, hereafter γ, which describes the asymmetry of the CCF (e.g., Simola et al. 2019; Adcock & Azzalini 2020).

Simola et al. (2019) suggest using the median $\bar{R V}$ ${\rm{\Delta }}RV_{{\rm{bp}}}^p$ as the barycenter of the SN fit to the CCF.

Given the benefits of fitting an SN to the CCF as emphasized by Simola et al. (2019), the model we employed in the analyses of this work is defined as follows: $R V_{activity}^{*} β_{0} + β_{1} A + β_{2} γ + β_{3} {FWHM}_{SN + ϵ},$ $RV_{{\rm{activity}}}^*{\beta _0} + {\beta _1}A + {\beta _2}\gamma + {\beta _3}{\rm{FWH}}{{\rm{M}}_{{\rm{SN}}\,{\rm{ + }}\,}},$ (1)

where β₀ is the coefficient corresponding to the intercept, A is the contrast parameter of the CCF (Fig. 2 in Dumusque et al. 2014), γ is the aforementioned CCF asymmetry parameter, FWHM_sn is the FWHM of the CCF fitted with the SN, and ϵ is the random error having a multivariate normal distribution with vector of means equal to 0 and variance-covariance matrix equal to σ²I (with I being the identity matrix).

The temporal evolution of active regions is reflected by the change of the activity indicators obtained from the CCF (i.e., the already defined A, γ, and FWHM_SN) and depends on the rotation period of the star and on its magnetic cycle (e.g., DeWarf et al. 2010; Dumusque et al. 2011b; Borgniet et al. 2015). It seems therefore reasonable to assume that stellar activity is not stationary, but rather “piecewise stationary”. By the term “piecewise stationary”, we mean that the stellar activity does not change significantly within certain properly selected temporal segments. If we were able to detect the bounds of those segments, we could properly split the RV time series and correct for stellar activity by applying Eq. (1) to each segment. This represents the core of the change point detection (CPD) methods to be compared with the overall correction (oc) method, where the correction model based on Eq. (1) is applied over the entire RV time series.

In this paper, we propose the breakpoint (bp) method using a CPD technique to correct for variations within an RV time series that are induced by active stellar regions. The bp method is compared to the oc method using four different stars, some of which have known exoplanets. We demonstrate that the bp method is better able to correct the RV time series variations, suggesting that the class of CPD methods may be helpful for detecting low-RV planetary signals such as the ones induced by Earth-like exoplanets.

The paper is organized as follows. In Sect. 2, we introduce the CPD methods, which constitute the statistical framework we used to perform our analyses. In Sect. 3, we compare the performances of the CPD method in use and the oc method, both using real observational data and developing a proper simulation study to quantify the threshold of detection of exoplanets. The discussion of the results and our conclusions are outlined in Sects. 4 and 5, respectively.

Table 1

Legend of the RV-related symbols adopted in this paper.

2 Change Point Detection Methods

CPD methods are widely used when the goal is to find changes and variations in a time series (e.g., Truong et al. 2020). The presence of change points in a time series is a strong indication that the data generating process has changed (e.g., van den Burg & Williams 2020). The first CPD method (the so-called “intercept-only” method) was originally introduced by Page (1954) in order to identify changes in the mean of an industrial quality control variable. CPD methods have since become a reliable and widely used solution in bioinformatics (e.g., Guédon 2013; Hocking et al. 2013; Truong et al. 2018), climatology (e.g., Reeves et al. 2007; Verbesselt et al. 2010; Maidstone 2016), financial analyses (e.g., Lavielle et al. 2006; Frick et al. 2014), medicine (e.g., Liu et al. 2018), network data traffic analysis (e.g., Lévy-Leduc 2009; Lung-Yut-Fong et al. 2011), signal processing analysis (e.g., Lavielle & Teyssiere 2007; Jandhyala et al. 2013; Haynes et al. 2017), and speech processing (e.g., Angelosante & Giannakis 2012).

Following Truong et al. (2020), a CPD method may be classified as either online or offline. The online methods are used when seeking real-time changes in the data generating process (e.g., Adams & MacKay 2007; Sahki et al. 2018). Instead, the so-called offline (referred to also as “a posteriori”) methods are based on the CPD algorithms designed to estimate change points in the generative process of a time series when the collecting data process is over (e.g., Truong et al. 2020; van den Burg & Williams 2020). If only a single-parameter change is monitored, then we talk about univariate processes and univariate CPD methods being employed; otherwise we deal with multivariate methods (e.g., Aminikhanghahi & Cook 2017).

2.1 The CPD Statistical Framework

In order to introduce the CPD methods, we start by defining the object of interest as an univariate time series $y = y_{1}, \dots, = {y_{i}}_{i = 1}^{T}$ $y = {y_1}, \ldots , = \left\{ {{y_i}} \right\}_{i = 1}^T$ made of T data points. The y time series is assumed to be “piece-wise stationary” (e.g., Chen & Gupta 2011; Truong et al. 2020; van den Burg & Williams 2020), which means that the temporal behavior of y changes at D different and (generally) unknown locations. The goal of the CPD methods is to estimate the set of indices l corresponding to the D locations, l = {l₁, …, l_D}, which delimit different stationary regions; by convention l₀ ≡ 1 and l_D+1 ≡ T.

From a statistical standpoint, CPD methods fall into the model selection problem, because the overall goal of estimating the vector l of change point locations is equivalent to retrieving the best segmentation of the time series among all the possibilities (e.g., Chen & Gupta 2011; Truong et al. 2020; van den Burg & Williams 2020).

In order to retrieve the best possible segmentation, CPD methods rely on the maximization of the penalized log-likelihood function ℒ_P(D; y), which is defined as: $\begin{array}{l} L_{P} (D; y) & = L (D; y) - λ P (T) \\ = \sum_{k = 1}^{D + 1} \log L (y_{[l_{k - 1} : l_{k} - 1]}) - λ P (T), \end{array}$ $\matrix{ {{L_P}\left( {D;y} \right)} \hfill & { = L\left( {D;y} \right) - \lambda P\left( T \right)} \hfill \cr {} \hfill & { = \sum\limits_{k = 1}^{D + 1} {\log \,L\left( {{y_{\left[ {{l_{k - 1}}:{l_k} - 1} \right]}}} \right) - \lambda P\left( T \right)} ,} \hfill \cr }$ (2)

where k is the index used to identify the D + 1 segments of y bounded by l₀, …, l_D+1, log $l_{0}, \dots, l_{D + 1}, \log L (y_{[l_{k - 1} : l_{k - 1}]})$ ${l_0}, \ldots ,{l_{D + 1}},\,\,\log L\left( {{y_{\left[ {{l_{k - 1}}:{l_{k - 1}}} \right]}}} \right)$ is the log-likelihood function evaluated at the kth segment, λ is a positive pre-selected constant, and P(T) is a penalty function that may be added to balance out the goodness-of-fit term given by the log-likelihood function $\begin{matrix} \hat{L} (D : y) & \equiv & \sum_{k = 1}^{D + 1} \log L (y_{[l_{k - 1} : l_{k - 1}]}) \end{matrix}$ $\matrix{ {\hat L\left( {D:y} \right)} & \equiv & {\sum\nolimits_{k = 1}^{D + 1} {\,\log \,L\left( {{y_{\left[ {{l_{k - 1}}:{l_{k - 1}}} \right]}}} \right)} } \cr }$ (e.g., Zeileis et al. 2002; Bai & Perron 2003; Truong et al. 2020). After evaluating the $\hat{D}$ ${\hat D}$ value for which ℒ(D; y) is maximum (i.e., $L (\hat{D}; y) = \hat{L} (y) = \max_{D} L (D; y), λ P (T)$ $L\left( {\hat D;\,y} \right)\, = \,\hat L\left( y \right)\, = \,{\max _D}L\left( {D;y} \right),\,\,\lambda P\left( T \right)$ (which is, in general, a function of $\hat{L} (\hat{D}; y)$ ${L_{\cal P}}\left( {\hat D;\,y} \right)$ , see e.g., Eq. (5)) is subtracted from $L (\hat{D}; y)$ $L\left( {\hat D;\,y} \right)$ to obtain $L_{P} (\hat{D}; y)$ ${L_P}\left( {\hat D;\,y} \right)$ . The best partition $\hat{l} = {{\hat{l}}_{k}}$ $\hat l = \left\{ {{{\hat l}_k}} \right\}$ for that given $\hat{D}$ $\hat D$ is the one corresponding to the maximum value of the penalized log-likelihood function, that is ${\hat{L}}_{P} = \max_{l} L_{P} (\hat{D}, y_{{l}}) .$ ${{\hat L}_P} = {\max _l}\,{L_P}\left( {\hat D,{y_{\left\{ l \right\}}}} \right).$ . In other words, the different ${\hat{L}}_{P} (\hat{D}, y_{{l}})$ ${{\hat L}_P}\left( {\hat D,{y_{\left\{ l \right\}}}} \right)$ values are used to address the model selection problem (i.e., the choice of the best segmentation), and ${\hat{L}}_{P}$ ${{\hat L}_P}$ identifies the optimal partition of the y time series made of $\hat{D} + 1$ $\hat D + 1$ segments. If the total number of change points is assigned a priori and only their locations along y is unknown, no penalty is added in Eq. (2).

2.2 The Breakpoint Method

Given the multiple sources of stellar activity, the most adequate methods to treat the changes in RV time series are the offline multivariate CPD methods, which allow a linear relationship to be expressed between the response variable y and a set of covariates (which may be arranged in a matrix, X), as the one defined in Eq. (1). Using those methods, the set of locations l = l₁, …, l_D is estimated not only by evaluating the changes in the mean and in the variance of y, but also by considering the relation between y and X (e.g., Hackl & Westlund 1989; Bai 1997a,b; Bai & Perron 1998, 2003; Zeileis et al. 2002, 2003). In the analyses presented in the following Sections, we use the CPD method proposed by Bai & Perron (2003), known as the breakpoint (bp) method.

In order to introduce the bp method, we start by providing the definition of the multivariate linear regression model made of p covariates for the vector of the observations y, which generalizes the model proposed in Eq. (1). In matrix form, this is: $y = X β + ϵ = β_{0} 1 + β_{1} X_{1} + \dots + β_{p} X_{p} + ϵ,$ $y = X\beta + = {\beta _0}1 + {\beta _1}{X_1} + \cdots + {\beta _p}{X_p} + ,$ (3)

where X is the matrix of covariates, β is the vector of parameters to be determined according to the CPD method in use¹, and ϵ is the already defined random error distributed according to a multivariate normal distribution having vector of means equal to 0 and variance-covariance matrix equal to σ²I. Assuming the whole time series y as stationary implies that all the coefficients β = (β₀, …, β_p) are constant over the entire temporal range of observations. Conversely, assuming piecewise stationarity, y can be divided in D + 1 segments over each of which the coefficients are constant, so that there is one set of coefficients per segment, for a total of D + 1 vectors β_k (k = 1, …, D + 1). With the piecewise stationarity assumption, following Bai & Perron (2003); Zeileis et al. (2003), we may update Eq. (3) as: $y_{[l_{k - 1} : l_{k} - 1]} = X_{[l_{k - 1} : l_{k} - 1],} . β_{k} + ϵ_{k} .$ ${y_{\left[ {{l_{k - 1}}:{l_k} - 1} \right]}} = {X_{\left[ {{l_{k - 1}}:{l_k} - 1} \right]\,\,,}}.{\beta _k} + {_k}.$ (4)

The index k = 1, … , D + 1 indicates the temporal segment over which the vector of the β_k coefficients is constant.

Bai & Perron (2003) retrieve the simultaneous estimation of multiple breakpoints l₁, …, l_D by calculating the residual sum of squares (RSS) of Eq. (4). As ϵ is distributed according to a multivariate normal distribution, we recall that the least squares solution proposed by Bai & Perron (2003) is equivalent to the maximum likelihood solution retrieved by maximizing Eq. (2), as long as the same penalty is used in both procedures. In particular, following the implementation of Bai & Perron (2003), λ = 1 and the penalty function P(T) is given by the Bayesian information criterion (BIC, Schwarz 1978), leading to $P (T) = (p + 1) (D + 1) \ln (T) - 2 \ln \hat{L} (y),$ $P\left( T \right) = \left( {p + 1} \right)\left( {D + 1} \right)\ln \left( T \right) - 2\,\ln \,\hat L\left( y \right),$ (5)

where the multiplicative term (p + 1)(D + 1) is equal to the overall number of parameters used by the bp method. According to Eq. (1), p + 1 = 4 parameters (i.e., β₀, …, β₃) are used inside each segment.

Since D > 1 is unknown, the optimization problem might be computationally challenging. In order to tackle the computational burden of estimating $\hat{D}$ ${\hat D}$ , Bai & Perron (2003) used the dynamic programming algorithm originally proposed by Fisher (1958) and extended by Bellman & Roth (1969) and Guthery (1974). The core idea of the dynamic programming approach is to first compute a set of triangular RSS matrices for each D = 1, …, D_max value (where D_max is an integer number tuned by the researcher, indicating the maximum number of breaks allowed in that analysis²). In other words, within a given set of triangular RSS matrices, the number of segments D is fixed, while the starting and ending locations of each segment change. Then, the bp method returns that partition, which minimizes the RSS of Eq. (4) (and hence maximizes ℒ_P(D; y) of Eq. (2)) among all the previously created matrices. The main challenge of the dynamic programming approach is given by the computational effort needed to compute the sets of triangular RSS matrices. Further details about the dynamic programming algorithm and the bp method can be found in Bai & Perron (2003); Zeileis et al. (2002, 2003). In particular, this work uses the version proposed by Zeileis et al. (2003).

3 Data Analysis

3.1 Preliminary Considerations

To test the performances in modeling and removing stellar activity from RV time series, we carried out several analyses of RV time series to compare the bp method presented in Sect. 2.2 with the oc method that corrects for stellar activity over the entire temporal range considered as a whole segment. We considered real RV observations of four different stars, namely α Cen B, HD 215152, HD 10700, and HD 192310, which were selected because of their high signal-to-noise ratio (S/N) and long-baseline coverage with an intensive observational cadence.

In particular, Rajpaul et al. (2015) analyzed the 459 HARPS RV data points of α Cen B obtained by Dumusque et al. (2012) (a subsample of the RV observations used in this work). After properly modeling the RV time series, they concluded that all the significant variations in the time series are compatible with stellar activity. Therefore, as α Cen B does not likely host any planet, by applying both the bp and oc methods to its RV time series, we may test the performances of the two methods in removing stellar sources of noise and retrieving a “cleaned” time series. Only if a further signal is present will it then be worth wondering whether it is exoplanetary in origin.

We considered RV data points derived only from spectra with S/N ≥ 10 at 550 nm. The average S/N values at 550 nm, the number of CCF data points, and the rotation period P_rot of the four stars analyzed in this work are listed in Table 2.

All the results were obtained using the statistical software R (R Core Team 2019). In particular, the vector listing the locations of change points within the bp method was estimated by using the R package strucchange³. Since the presence of outliers in the data could have led to an exaggerated number of change point locations (e.g., Fearnhead & Rigaill 2019), before applying the bp method and the oc method to the data, we preliminarily rejected all those measurements falling off the range between the 5th and 95th percentile for each considered variable (e.g., Ghosh & Vogt 2012; Pollet & van der Meij 2017).

Table 2

Average signal to noise ratio (S/N) of the available spectra, number of RV data points (# CCFs), and stellar rotational periods (P_rot) for each of the four targets of interest.

3.2 Analyses of a Real Star: α Cen B

The RV data of α Cen B consist of T = 17627 CCFs taken from the end of February 2008 to the end of May 2013 by the HARPS spectrograph. Firstly we used the SN fit to the CCF in order to estimate the parameters $\bar{R V}$ $\overline {RV}$ , A, γ, and FWHM_SN. Secondly, following Dumusque et al. (2012), all the measurements were preliminarily corrected from the contamination that was induced by the close-in α Cen A. Finally, the outliers were clipped as specified above, which resulted in a final set of T = 16451 measurements.

After cleaning the data, the response variable becomes $y = \bar{R V},$ $y\, = \overline {\,RV} ,$ , while the matrix of the covariates is X = (1, A, γ, FWHM_SN). Since X also contains the intercept term X₀ ≡ 1, we emphasize again that the parameter vector β is made of p + 1 = 4 elements, where p is the number of covariates.

We applied both the oc and bp methods to the $y = \bar{R V}$ $y\, = \overline {\,RV}$ time series, further assuming as input the linear regression model of Eq. (1) that specifies the relationship between the response variable y and the matrix of covariates X. The bp method found $\hat{D} = 4$ $\hat D\, = \,4$ change points (i.e., $\hat{D} + 1 = 5$ $\hat D\,{\rm{ + }}\,1$ piecewise stationary segments), whose locations are summarized in Table 3, where the error bars at the1σ level highlight the precision of the bp method. The variability of the activity indicators inside each segment is extremely small, while the variability between each segment is larger, suggesting indeed that the activity of the star changes where a new segment starts. To highlight the locality, the spread, and the skewness of the data synthesized in Table3, we produced the boxplots (DuToit et al. 2012) shown in Fig. 1. For each segment, Fig. 1 displays the boxplots of the response variable $\bar{R V}$ $\overline {RV}$ and of the 3 covariates A, γ, and FWHM_SN. The

difference between contiguous boxplots describing the covariate distributions in contiguous segments visually marks the change in the correlation between $\bar{R V}$ $\overline {RV}$ and the activity indicators. From a quantitative point of view, the Mann-Whitney test (McKnight & Najab 2010) – which assumes that the two samples to be compared are not statistically different as null hypothesis – confirms the differences of the covariates distributions between contiguous segments. In fact, for all the covariates, it gives p-values ~10⁻⁸ for each pair of contiguous segments. Moreover, we note that A globally decreases, while γ and FWHM_SN increase by moving from one segment to the next, which suggests a significant temporal change in the stellar activity level. Overall, Table 3 and Fig. 1 show the characteristics and the quality of the optimal segmentation determined by the bp method for α Cen B.

Another instrument used to evaluate the quality of the solution recommended by the bp method is the correlation–pairs plot. Figure 2 displays the correlation plots between $\bar{R V}$ $\overline {RV}$ and the activity indicators when the oc method is applied to the data (top left panel) that is to be compared to the results obtained on each of the five segments found by the bp method (other panels). The oc method is not able to catch all the variations over the entire time series. In fact, the correlation parameters sensibly differ from one segment to the other, suggesting that the piecewise stationary assumption of $\bar{R V}$ $\overline {RV}$ as a function of A, γ, and FWHM_SN is reasonable, as it is also highlighted in Table 4.

We can further stress the benefits of using the bp method by comparing $Δ R V_{bp}^{*}$ ${\rm{\Delta }}RV_{{\rm{bp}}}^*$ and $Δ R V_{oc}^{*}$ ${\rm{\Delta }}RV_{{\rm{oc}}}^*$ , which are defined as the residuals obtained using the bp and oc methods, respectively, when the estimated activity level of the star $R V_{activity}^{*}$ $RV_{{\rm{activity}}}^*$ is subtracted from the $\bar{R V}$ $\overline {RV}$ time series: $Δ R V_{bp}^{*} = \bar{R V} - \sum_{k = 1}^{D + 1} R V_{activity, k,}^{*}$ ${\rm{\Delta }}RV_{{\rm{bp}}}^* = \overline {RV} - \sum\limits_{k = 1}^{D + 1} {RV_{{\rm{activity,}}\,k,}^*}$ (6) $Δ R V_{oc}^{*} = \bar{R V} - R V_{activity}^{*} .$ ${\rm{\Delta }}RV_{{\rm{oc}}}^* = \overline {RV} - RV_{{\rm{activity}}}^*.$ (7)

The rms of $Δ R V_{oc}^{*}$ ${\rm{\Delta }}RV_{{\rm{oc}}}^*$ vs. $Δ R V_{bp}^{*}$ ${\rm{\Delta }}RV_{{\rm{bp}}}^*$ are 3.48 m s⁻¹ and 1.75 m s⁻¹ respectively, so we reach similar conclusions in preferring the piecewise linear regression strategy, which lowers the rms by 50%. In addition, by comparing the rms of the modeled stellar activity with the rms of the $\bar{R V}$ $\overline {RV}$ signal, it turns out that the bp method explains 89% of the variability of the $\bar{R V}$ $\overline {RV}$ signal in terms of stellar activity, while the oc method is able to model only 40% of the $\bar{R V}$ $\overline {RV}$ signal (see Fig. 3). Since we expect the $\bar{R V}$ $\overline {RV}$ time series to be entirely produced by stellar activity, adopting Eq. (1) in each of the D + 1 segments for the bp method significantly improves the detrending performances.

The detailed rms values of the relevant quantities for both methods are listed in Table 5. In particular, the ∆ BIC between the oc and the bp methods is +22439, thus favoring the bp method according to the BIC minimization criterion. We recall that HARPS is built to obtain RV precision of the order of 1 m s⁻¹.

A fourth approach used to compare the performances of the two methods is the evaluation of the generalized Lomb– Scargle (GLS) periodograms (e.g., Lomb 1976; Scargle 1982; Zechmeister & Kürster 2009) on both the $Δ R V_{oc}^{*}$ ${\rm{\Delta }}RV_{{\rm{oc}}}^*$ and $Δ R V_{bp}^{*}$ ${\rm{\Delta }}RV_{{\rm{bp}}}^*$ residuals. The GLS periodograms for both methods are presented in Fig. 4. Thanks to the bp method, the majority of the peaks caused by stellar activity are successfully removed from the GLS periodogram (dash-dotted blue line in Fig. 4).

Conversely, the oc method is not able to properly tackle the peaks caused by active regions (dashed red line in Fig. 4), especially for periods longer than 100 days.

We used the Cramér-von-Mises (CvM) distance minimization criterion (Cramér 1928), which defines the measure we refer to as the critical value (cv), to further compare the GLS periodograms obtained from the two different methods. Going

into further detail, after the GLS is obtained, the main goal is to check whether any periods are significant. A period is considered significant if its GLS periodogram peak statistically differs from the distribution that would result from the absence of periodic fluctuations (null hypothesis). To determine the significance level of the peak, this distribution needs to be known or estimated. A Beta distribution $ℬ (α, β)$ ${\cal B}\left( {\alpha ,\beta } \right)$ whose parameters are $α = \frac{m - 1}{2}$ $\alpha = {{m - 1} \over 2}$ and $β = \frac{n - m}{2}$ $\beta = {{n - m} \over 2}$ , where n is the number of data points and m is the number of GLS parameters, is usually assumed (see e.g., Schwarzenberg-Czerny 1998; Seber & Lee 2003; Gupta & Nadarajah 2004). However, adopting those α and β values is not flexible enough, as some period values identified as significant may turn out to be false positives (see e.g., Thieler 2014; Thieler et al. 2016). In order to avoid the selection of those false positive periods, Thieler (2014) and Thieler et al. (2016) propose relaxing the assumption on the Beta distribution, by not defining its parameters “a priori”, which makes the distribution more flexible. Once the assumptions on the Beta distribution are relaxed, Thieler (2014) and Thieler et al. (2016) suggest estimating the parameters of the $ℬ (α, β)$ ${\cal B}\left( {\alpha ,\beta } \right)$ by minimizing the following CvM distance: $\begin{array}{l} CvM (θ) & = \int_{0}^{+ \infty} {(F_{n} (u) - F_{θ} (u))}^{2} d F_{θ} (u) \\ = \frac{1}{n} \sum_{i = 1}^{n} {(F_{θ} (u_{(i)}) - \frac{i - 0.5}{n})}^{2} + \frac{1}{12 n^{2}}, \end{array}$ $\matrix{ {{\rm{CvM}}\left( \theta \right)} \hfill & { = \int_0^{ + \infty } {{{\left( {{F_n}\left( u \right) - {F_\theta }\left( u \right)} \right)}^2}{\rm{d}}{F_\theta }\left( u \right)} } \hfill \cr {} \hfill & { = {1 \over n}\sum\limits_{i = 1}^n {{{\left( {{F_\theta }\left( {{u_{\left( i \right)}}} \right) - {{i - 0.5} \over n}} \right)}^2} + {1 \over {12{n^2}}},} } \hfill \cr }$ (8)

where θ is the parameter space (θ = (α, β) in our case), u = u(1), …, u(n) is the vector of the ordered set of the observations (data points), F_n(u) is the empirical distribution function (i.e., the empirical cumulative distribution function built from the data), and F_θ(u) is the theoretical distribution function (the Beta distribution in our case). Once an estimate for θ is found (i.e., $\hat{θ} = (\hat{α}, \hat{β})$ $\hat \theta = \left( {\hat \alpha ,\hat \beta } \right)$ , the cv is calculated as the $\sqrt[q]{0.95}$ $\root q \of {0.95}$ – quantile of the CvM-fitted Beta distribution $ℬ (\hat{α}, \hat{β})$ ${\cal B}\left( {\hat \alpha ,\hat \beta } \right)$ , where q indicates the number of period grid-points used to build the GLS periodogram. According to this criterion, the GLS peaks above the computed cv value are significant and deserve further investigation. In this work, we used the cv rather than the well known false alarm probability (FAP) because the cv is more robust (e.g., Thieler 2014; Thieler et al. 2016).

The cv computation for $Δ R V_{oc}^{*}$ ${\rm{\Delta }}RV_{{\rm{oc}}}^*$ and $Δ R V_{bp}^{*}$ ${\rm{\Delta }}RV_{{\rm{bp}}}^*$ yields cv_oc = 0.43 and cv_bp = 0.022. Peaks in the GLS below the cv are to be considered as caused by stellar activity, while more detailed analyses might be needed for peaks above the cv to discover their nature. All in all, the high cv_oc value suggests quite a noisy GLS, which prevents an effective detection of exoplanetary signals. Instead, the much lower cv_bp indicates a cleaner GLS, meaning that it is worth investigating even low-powered peaks, which in principle might reveal the presence of small exoplanets. We specifically checked the behavior of the GLS periodogram, by inspecting the period interval that includes the rotation period of α Cen B. Looking at Fig. 5, we note that the oc method is still unable to remove the peak at P ~ 39 days, which is close to P_rot of α Cen B (e.g., DeWarf et al. 2010). In fact, the 39-day peaks in the original and in the oc-corrected GLS periodograms both have normalized powers $P_{\bar{R V}} \approx P_{oc} \approx 0.17$ ${P_{\overline {RV} }} \approx {P_{{\rm{oc}}}} \approx 0.17$ . Instead, in the bp method, the normalized power of this peak is sensibly reduced to P_bp ≈ 0.027. Both P_oc and P_bp cannot bring us to immediately postulate the existence of a candidate exoplanet with an orbital period of 39 days. In fact P_oc < cv_oc = 0.43, while P_bp is quite close to cv_bp = 0.022, although greater. Regardless, as our main goal is cleaning the original $\bar{R V}$ $\overline {RV}$ time series from stellar activity,

it is better to deal with the bp method because it recognizes and reduces the GLS-P(P_rot) (although the corrected peak is higher than cv_bp, which will encourage further investigatations to better understand its nature), rather than dealing with the oc method where the unchanged GLS-P(P_rot) (which is below cv_oc) may lead to a false negative case. As such, since P_bp > cv_bp, we refer the reader to Table 6 and to our discussion in Sect. 4, where we investigate the significance of all the $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ -peaks whose normalized powers are greater than the cv threshold. We anticipate that no Keplerian-like signals have been detected.

Finally, to test the strength of the bp method, we also applied it to the RV data set analyzed by Dumusque et al. (2012) and

then by Rajpaul et al. (2015). As already noted, that data set made of 459 CCFs is a subset of the RV time series considered in this work. The GP framework presented in Rajpaul et al. (2015) contains significantly fewer free parameters than the model by Dumusque et al. (2012) (14 vs. 23). Using our bp method, we found 3 piecewise stationary segments. This makes our model comparable to the GP model of Rajpaul et al. (2015) in terms of free parameters (12 in our case), with the advantage that the linear correction we propose in Eq. (1) is simpler than any GP framework. According to our analyses, whose consequent GLS periodograms are displayed in Fig. 6, there is not any statistical evidence showing the presence of an exoplanet having an orbital period of ~3.2 days, disproving the discovery by Dumusque et al. (2012) and confirming the conclusions presented in Rajpaul et al. (2015). In fact, our GLS normalized powers are well below the respective cvbp and cv_oc thresholds for a wide neighborhood of 3 days (see Fig. 6). Therefore, we conclude that the GLS signal at ~3.2 days visible in Fig. 4 of Dumusque et al. (2012) and announced as an exoplanet is instead likely caused by an overfitting issue.

Table 3

Locations of the change points (CPL) estimated by the bp method for α Cen B.

Fig. 1

Boxplots of the response variable $\bar{R V}$ $\overline {RV}$ and of the activity indicators A, γ, and FWHM_SN for each segment of the optimal partition found by the bp method when applied to the α Cen B RV time series. We recall that a boxplot is a graphical tool that visualizes a distribution; the median is shown by the thick horizontal line, while the box extension defines the interquartile range (IQR), being bounded by the first (Q₁) and third (Q₃) quartiles. The upper and lower limits of the vertical dashed lines (the whiskers) are defined as min {max(X), Q₃ + 1.5 · IQR} and max {min (X), Q₁ − 1.5 · IQR}, respectively, where X indicates the sample to be represented. Outliers beyond the whiskers (if any) are represented as empty dots.

Table 4

Correlation ρ between $\bar{R V}$ $\overline {RV}$ and the activity indicators of α Cen B for each of the piecewise-stationary segments detected by the bp method.

Table 5

Variability of the $\bar{R V}, R V_{activity}^{*}$ $\overline {RV} ,RV_{{\rm{activity}}}^*$ , and ∆RV^* time series of α Cen B quantified through the rms.

Fig. 2

α Cen B: correlation plots between $\bar{R V}$ $\overline {RV}$ and the activity indicators (A, γ, and FWHM_SN); the contrast parameter A is shown after having been multiplied by a factor ten. Top left panel: the entire data are taken into account as a result of the oc method. Other panels: correlations are shown for each segment found by the bp method. A nonparametric local regression for each pair of considered variables (continuous red line) and its barycenter (i.e., the median, red dot) are also shown.

Fig. 3

Modeling of the stellar activity of α Cen B, as performed by the bp and oc methods. Top panel: signal due to stellar activity as estimated by the bp method (blue crosses) and by the oc method (red triangles) superimposed to the $\bar{R V}$ $\overline {RV}$ time series (black circles) of α Cen B. Bottom panel: ∆RV^* as computed by the bp method (blue crosses) and by the oc method (red triangles). The black circles on the background show the original $\bar{R V}$ $\overline {RV}$ time series, enabling a visual comparison of the improvement achieved by the bp method in the data correction. The change point locations (Table 3) are displayed in both plots as vertical orange dashed lines.

Fig. 4

GLS periodograms of α Cen B derived from the original $\bar{R V}$ $\overline {RV}$ time series (solid black line), $Δ R V_{oc}^{*}$ ${\rm{\Delta }}RV_{{\rm{oc}}}^*$ (dashed red line), and $Δ R V_{bp}^{*}$ ${\rm{\Delta }}RV_{{\rm{bp}}}^*$ (dash-dotted blue line). The x-axis uses the base-10 logarithmic transformation of the period in order to improve the readability of the plot. The GLS periodogram obtained from $Δ R V_{bp}^{*}$ ${\rm{\Delta }}RV_{{\rm{bp}}}^*$ does not show any of the periodical peaks caused by stellar signals, which are instead present in the GLS periodogram obtained from $Δ R V_{oc}^{*}$ ${\rm{\Delta }}RV_{{\rm{oc}}}^*$ . The cv values for $Δ R V_{oc}^{*}$ ${\rm{\Delta }}RV_{{\rm{oc}}}^*$ and $Δ R V_{bp}^{*}$ ${\rm{\Delta }}RV_{{\rm{bp}}}^*$ are displayed as solid red and blue lines, respectively.

Fig. 5

GLS periodograms of α Cen B derived from the original $\bar{R V}$ $\overline {RV}$ time series (solid black line), $Δ R V_{oc}^{*}$ ${\rm{\Delta }}RV_{{\rm{oc}}}^*$ (dashed red line), and $Δ R V_{bp}^{*}$ ${\rm{\Delta }}RV_{{\rm{bp}}}^*$ (dash-dotted blue line), focusing on a subset of the epochs including the rotation period of the star equal to 39 days (highlighted by the vertical gray line). The 39-day peaks have normalized powers $P_{\bar{R V}} = 0.173$ ${P_{\overline {RV} }} = 0.173$ and P_oc = 0.167 in the original and in the oc-corrected GLS, respectively. The normalized power of this peak is reduced at P_bp = 0.027 for the bp-corrected GLS. The cv value for $Δ R V_{bp}^{*}$ ${\rm{\Delta }}RV_{{\rm{bp}}}^*$ is displayed as the horizontal blue line and is equal to 0.022; we recall that the cv for $Δ R V_{oc}^{*}$ $\Delta RV_{{\rm{oc}}}^*$ is 0.43.

Table 6

Periods P and corresponding normalized powers P found in the $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ GLS periodogram of α Cen B, whose peaks are above the critical value of 0.022.

Table 7

rms statistic of α Cen B RV time series for different partitions (D is the number of breakpoints); we recall that rms $(\bar{R V})$ $\left( {\overline {RV} } \right)$ m s⁻¹.

3.3 Comparison between the Optimal and the Suboptimal Solutions of the Bp Method

The bp method does not only return the best $\hat{D}$ $\hat D$ and the corresponding change point locations in the time series; the bp method also returns the best partition for each D = 1, …, D_max. We can interpret those solutions as the best segmentations for each fixed D. When working with α Cen B, we found $\hat{D} = 4$ $\hat D = 4$ as the best number of breakpoints, by applying the RSS criterion to Eq. (4), combined with the BIC-based penalty function of Eq. (5). Indeed the $\hat{D} = 4$ $\hat D = 4$ -model has the lowest BIC among all the other models, as emphasized by the ∆BIC values reported in Table 7.

We decided to further compare the results obtained from the best partition (i.e., the optimal solution $\hat{D} = 4$ $\hat D = 4$ ) with the other suboptimal solutions derived from the other D values. From a statistical standpoint, we found that the solutions

for D = 2, D = 3 or D = 5 did not differ noticeably from the optimal one in terms of rms computed on $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ ; the results are presented in Table 7. In particular, we used the Wilcoxon test (e.g., Zimmerman & Zumbo 1993) to compare the rms of $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ for $\hat{D} = 4$ $\hat D = 4$ with the rms of $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ for the other available cases (null hypothesis: the compared samples are not statistically different). The p-values inferred from the Wilcoxon tests were 0.27, 0.83, and 0.68, obtained by comparing the $\hat{D} = 4$ $\hat D = 4$ optimal solution with the suboptimal solutions derived from D = 2, 3, and 5, respectively. The results of the Wilcoxon tests suggest that the solutions

for D = 2, 3, 4, and 5 are not statistically different for the commonly employed significance levels: α_fixed = (0.01, 0.05, 0.1). However, the solution for $\hat{D} = 4$ $\hat D = 4$ leads not only to the smallest rms of the $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ time series, but also to the smallest cv among all the considered solutions, as shown in Fig. 7. We further investigated the robustness of the estimated cv by performing a bootstrap simulation study (e.g., Efron & Tibshirani 1993). The results confirmed that the cv found when using $\hat{D} = 4$ $\hat D = 4$ is significantly smaller than the cv retrieved by using suboptimal solutions; in fact the standard deviation of each cv is of the order of 10⁻³. Since all the GLS peaks below the cv threshold are ignored, dealing with low cv values means not cancelling out low GLS peaks, which might represent putative low-mass planets. As a consequence, from an astronomical standpoint, the optimal solution retrieved by the bp method is always preferable, especially when seeking super–Earths or Earth–like exoplanets.

Fig. 6

GLS periodograms of α Cen B derived from the subset of 459 RVs, also analyzed by Dumusque et al. (2012); Rajpaul et al. (2015). In detail, the periodograms have been computed from the original $\bar{R V}$ $\overline {RV}$ time series (solid black line), $Δ R V_{oc}^{*}$ $\Delta RV_{{\rm{oc}}}^*$ (dashed red line), and $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ (dash-dotted blue line). The plot focuses on the neighborhood of 3.2357 days (highlighted as a vertical gray line), which corresponds to the orbital period of the exoplanet announced by Dumusque et al. (2012) and then disproved by Rajpaul et al. (2015). We note that all the GLS peaks are below the cv thresholds for both of the methods.

Fig. 7

Generalized Lomb–Scargle (GLS) periodograms of α Cen B derived from the $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ when using $\hat{D} = 4$ $\hat D = 4$ (solid black line; this solution is the same reported in Fig. 4 as the dash-dotted blue line). The GLS periodograms derived from the $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ when using D = 2, 3, and 5 are displayed as a dotted blue line, a dash-dotted orange line and a dashed red line, respectively. The x-axis uses the base-10 logarithmic transformation of the period in order to improve the readability of the plot. The cv values for $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ are displayed as black, blue, orange and red lines for the cases D = 4, 2, 3, and 5, respectively. Although the rms of $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ for the considered cases are not statistically different, the bp-optimal solution ( $\hat{D} = 4$ $\hat D = 4$ ) leads to the smallest cv.

3.4 Exoplanets Detection Limit

3.4.1 Simulation Study based on α Cen B Data

After the results were obtained by analysing real RV data of α Cen B, we decided to perform a simulation study to test how well the bp method works in detecting exoplanets. As already stressed, the RV time series of α Cen B is essentially made of stellar activity signals since no exoplanets have been detected (e.g., Rajpaul et al. 2015). Therefore, we added artificial Keplerian signals (RV_K) that would be generated by exoplanets in circular orbits to the original RV time series, we applied both the bp and oc methods to clean the data set for stellar activity (obtaining $Δ R V_{bp}^{p}$ $\Delta RV_{{\rm{bp}}}^p$ and $Δ R V_{oc}^{p}$ $\Delta RV_{{\rm{oc}}}^p$ , respectively), and we checked to which extent the two methods are able to find the simulated planets, distinguishing the planetary signals from stellar activity. Any Keplerian signal added to the RV time series translates in a shifting of the CCFs that changes according to the period, the amplitude, and the phase that is imposed. In particular, we used the following grid of values: periods from 1 to 500 days, by steps of 1 day for periods shorter than 50 days, and then by steps of 10 days for longer orbital periods, semiamplitudes from 0.1 to 15m s⁻¹ by steps of 0.1 m s⁻¹and 51 evenly sampled phases between 0 and 2π, for a total of 726 750 simulated planets.

Similar simulation studies have already been proposed by Howard et al. (2010); Mayor et al. (2011) and Simola et al. (2019). In this work, the detection of an exoplanet was tested by inspecting the GLS produced from the $Δ R V_{bp}^{p}$ $\Delta RV_{{\rm{bp}}}^p$ and $Δ R V_{oc}^{p}$ $\Delta RV_{{\rm{oc}}}^p$ data. This was implemented by first flagging as a positive finding those GLS features that are above the cv previously calculated for $Δ R V_{oc}^{*}$ $\Delta RV_{{\rm{oc}}}^*$ (cv_oc = 0.43) and for $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ (cv_bp = 0.022)⁴. Then, we checked that the normalized power at P_p (i.e., P_p = P(P_p)) was statistically comparable to the theoretical normalized power ${\hat{P}}_{p}$ ${\hat P_p}$ . This would appear from the time series only made of the Keplerian signal of the synthetic planet that had been perturbed with a normal distribution having a mean equal to 0 and a standard deviation computed by subtracting in quadrature the rms induced by RV_K from the rms of the ∆RV^p time series. Because the theoretical normalized powers produced by any given exoplanet ${\hat{P}}_{p}$ ${\hat P_p}$ follow a normal distribution having constant standard deviation equal to $σ_{\hat{P}} = 0.028$ ${\sigma _{\hat P}} = 0.028$ (which can be interpreted as the inner variability of the GLS periodogram), P_p is compared to ${\hat{P}}_{p}$ ${\hat P_p}$ with the commonly used Wald test (e.g., Gourieroux et al. 1982). In particular, for a given exoplanet, the test checks whether P_p belongs to the 99% confidence interval inferred from $N ({\hat{P}}_{p}, σ_{\hat{p}})$ $N\left( {{{\hat P}_p},{\sigma _{\hat p}}} \right)$ . This test enables us to establish whether the expected and observed RV semiamplitudes are consistent, so to declare the synthetic exoplanet as detected.

This procedure is repeated for each of the 726 750 simulated planets. In order to quantify the K detection threshold, we search for the minimum RV amplitude at which, for a given orbital period, at least 90% of the planets having different phases are detected. The results of the simulation study, presented in Fig. 8, confirm the intuition that properly dividing the $\bar{R V}$ $\overline {RV}$ data into segments (where each segment is piecewise stationary) significantly improves the chances of detecting exoplanets that have smaller amplitudes with respect to the oc method. In fact, by applying the correction for stellar activity based on the bp method, the detection threshold is on average lower by 74% with respect to the detection threshold estimated using the oc method. The median threshold for the bp method is 2 m s⁻¹, while the median threshold for the oc method is 7.55 m s⁻¹.

A statistical test to compare the two vectors of minimum RV amplitudes was also carried out. Assuming as null hypothesis that the two groups are not statistically different, the Wilcoxon test estimated a p-value equal to 2.22 × 10⁻¹⁶, which means that the null hypothesis is strongly rejected and that the two groups are statistically different. The detection threshold of an exoplanet lowers by 78% on average when focusing on planets up to an orbital period of 250 days. Figure 8 displays the orbital periods for which the oc method and the bp method were not able to detect any planet having K lower than our grid upper limit. In order to explain this situation, we further explored the behavior of the GLS, highlighting our findings in Sect. 3.4.2.

Fig. 8

Upper panels: GLS periodograms derived from synthetic Keplerian RV signals of K = 15, 10, 5, 2.5, 1, and 0.5 m s⁻¹, which were perturbed by injecting white noise whose amplitude follows a normal distribution with standard deviation of 1.75 m s⁻¹ (top) and 3.48 m s⁻¹ (middle). The unevenly sampled temporal baseline (i.e., the temporal baseline of α Cen B) is responsible for the power decrease at 365 days, as pointed out in Sect. 3.4.2. Bottom panel: detection thresholds of synthetic exoplanets injected into the α Cen B RV time series. It is the minimum semiamplitude K at which a synthetic exoplanet of a given period is recovered from the two different GLS periodograms. The minimum K values are inferred from the $Δ R V_{bp}^{p}$ $\Delta RV_{{\rm{bp}}}^p$ time series (bp method, dashed blue line) and from the $Δ R V_{oc}^{p}$ $\Delta RV_{{\rm{oc}}}^p$ time series (oc method, solid red line). Blue crosses (bp method) or red crosses (oc method) highlight the null detection of any exoplanet having K ≤ 15 m s⁻¹. For both methods, a null detection occurs at ~365 days, where the GLS normalized power drops. Overall, by correcting for stellar activity on each of the five temporal segments found by the bp method, the detection threshold of an exoplanet lowers by 74% on average. The two boxplots represented in the top right inset show the distributions of the minimum K for the two methods. The two distributions are statistically different as quantified by the Wilcoxon test through its p-value = 2.22 × 10⁻¹⁶. The detection threshold of an exoplanet lowers by 78% on average, when focusing on planets up to an orbital period of 250 days.

Fig. 9

Relation between the peak normalized power P of the GLS periodogram obtained from a synthetic Keplerian RV signal and the period of that signal (i.e., the orbital period of the synthetic exoplanet). The different colors represent the different K values imposed on the Keplerian signals, which increase from bottom to top. The ruggedness of the lines is due to the white noise that was also injected into the RV data series. Regardless of the considered K (i.e., the color), P decreases at ~365 days (close to the revolution period of the Earth), as expected in ground-based observations.

3.4.2 Behavior of the GLS Periodogram

We investigated the behavior of the GLS periodogram in order to understand the inability to detect exoplanets. In particular, we were interested in understanding if the unavailability of detections at a given P_orb were caused by the criteria we employed for our simulation study or rather by an issue related to the GLS periodogram. We generated several Keplerian signals sampled upon the epochs of observations of α Cen B using the following grid of values⁵: periods P_orb from 1 to 500 days, by steps of 1 day, semiamplitudes K from 0.1 to 15 m s⁻¹ by steps of 0.1 m s⁻¹ and a phase value, randomly sampled from the [0, 2π] interval. Each vector of RV was perturbed by using a normal distribution having a mean equal to 0 and a standard deviation of 1 m s⁻¹, which is the error expected by HARPS. At this stage, as vector of RV, just the Keplerian signal and the error term are used.

When plotting P as a function of P_orb for any given value of K, rather than finding constant behavior, there is a problematic behavior when considering short orbital periods. Moreover, we see a systematic decrease of P at some specific P_orb values, which is caused by the data sampling due to α Cen B visibility. In particular, a clear decrease occurs at P_orb ~365 days, as expected in ground-based observations. These conclusions are confirmed in our simulation study that aims to establish the detection threshold of the bp and oc methods, as both the methods have some issues in detecting planets that have a short orbital period, and both methods are unable to detect planets with P_orb ~365 days (Fig. 9).

3.5 Comparison of the Oc and the Bp Methods Applied to HD 215152, HD 10700, and HD 192310

As done for the RV time series of α Cen B, we repeated the comparison between the oc and the bp methods for three other main sequence stars: HD 215152 (K3V, Delisle et al. 2018), HD 10700 (G8V, Feng et al. 2017b), and HD 192310 (K2V, Pepe et al. 2011). Unlike α Cen B, which does not host any known planet, HD 215152 and HD 10700 host four planets (Delisle et al. 2018; Feng et al. 2017b, each), while HD 192310 hosts two planets (Pepe et al. 2011). Therefore, to obtain ∆RV^*-like indicators that are consistent with those derived for α Cen B, we should also remove the exoplanetary signals from these RV time series, after applying the correction for stellar activity. Since the discovered exoplanets orbiting HD 215152 and HD 10700 are at the level of instrumental precision, removing those planetary signals would have led to biased results. For this reason, in the following analyses we decided to remove the planetary signals only for the HD 192310 case.

We used the adaptive Markov chain Monte-Carlo (A-MCMC) algorithm (Haario et al. 2005) to estimate the orbital parameters of the two known exoplanets orbiting HD 192310. In detail, following a step-by-step procedure, first we corrected the original $\bar{R V}$ $\overline {RV}$ time series by using both the oc and the bp methods. As a result, a clear peak due to the exoplanet with P_orb ≈ 75 days appeared in the two GLS periodograms (although only in case of the bp method P_bp(75 d) > cv_bp; top panel of Fig. 10). Therefore, we estimated the orbital parameters of this exoplanet by using the A-MCMC algorithm, and removed its signal (RV_P:75) from the ∆RV^* time series. After producing new GLS periodograms from the ∆RV* – RV_P:75 time series, for both methods we saw another peak above the respective cv values, which was compatible with P_orb ≈ 526 days of the second exoplanet (second panel of Fig. 10). Then we used the A-MCMC algorithm again to estimate the orbital parameters of the exoplanet with P_orb ≈ 526 days, we removed its signal (RV_P:526) from the previously used RV time series, and we produced new GLS periodograms based on the ∆RV* – RV_P:75 – RV_P:526 time series. The inspection of these GLS periodograms did not show any other significant peak (third panel of Fig. 10) and we concluded that only the bp method finds both the two known exoplanets following the cv-threshold criterion. Finally, we launched a global A-MCMC run

accounting for both planets at the same time to obtain unbiased exoplanetary parameters, which were used to build our final GLS periodograms, where both the planetary signals were subtracted from ∆RV^* (bottom panel of Fig. 10).

Our outcomes for the HD 192310 planetary system are listed in Table 8 and compared with the results obtained by Pepe et al. (2011). The precision we got for the planet b parameters is comparable with that obtained by Pepe et al. (2011) and all those parameters are consistent within ~1.5σ, except for the orbital period P_b. Our $P_{b} = {74.06}_{- 0.09}^{+ 0.10}$ ${P_b} = 74.06_{ - 0.09}^{ + 0.10}$ days is 4.8σ away from the Pepe et al. (2011) estimate. However, it is in excellent agreement with the recent revision by Rosenthal et al. (2021), who found a period of 74.062 ± 0.085 days. Regarding planet c, instead, we sensibly improve the precision on the determined parameters (from a factor of two up to a factor of four) with respect to the results provided by Pepe et al. (2011). All the respective estimates are within 2σ, except for K_c, for which we infer a higher value after applying our bp method for cleaning the RV time series. Even if a detailed analysis of the orbital parameters is beyond the scope of this paper, we note that our estimates for HD 192310 c are the most precise available in the literature so far⁶. Coming to the effectiveness of modeling the stellar activity, when compared to the rms of the $Δ R V_{oc}^{*}$ $\Delta RV_{{\rm{oc}}}^*$ time series, the rms of $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ is lower by 0.21 m s⁻¹ (HD 215152), 0.06 m s⁻¹ (HD 10700), and 0.1 m s⁻¹ (HD 192310, after the planets removal). This means that using Eq. (6) rather than Eq. (7) for correcting $R V$ ${RV}$ , we better explain the $R V$ ${RV}$ -variability by gaining 9.5%, 4.3%, and 6.6% for HD 215152, HD 10700, and HD 192310, respectively. The bp method has to be preferred also following the BIC minimization criterion for model selection; in fact, ∆BIC = BIC_oc − BIC_bp = +79, +503, and +174 for HD 215152, HD 10700, and HD 192310, respectively.

The number of available data points per time series is significantly smaller if compared to the α Cen B time series, especially for HD 215152 and HD 192310. It seems therefore reasonable that the bp method finds a lower number of piecewise-stationary segments as optimal solutions for those stars ( $\hat{D} = 1$ $\hat D = 1$ and 2 for HD 215152 and HD 192310, respectively). Instead, concerning HD 10700, a larger number of observations spread on a longer baseline is available, which yields to $\hat{D} = 5$ $\hat D = 5$ optimal breakpoints.

As a final note, since HD 10700 is a quiet star, the bp vs. oc improvement in modeling the stellar activity is lower (i.e., we registered a lower gain when comparing the $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ vs. $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ rms, as expected). However, the bp method still models the stellar activity better than the oc method and the ∆BIC confirms a strong preference for the bp method. Overall, the obtained results confirm our assumption that the CPD algorithm is particularly effective when focusing on active stars. Relevant tables and plots synthesizing our results can be found in Appendix A.

Fig. 10

Similar to Fig. 4, but for HD 192310. The vertical dashed orange lines at ~75 days and ~526 days highlight the orbital periods of the two Neptune-mass planets found by Pepe et al. (2011). Top panel: as a result of the stellar activity correction, the bp-peak at ~75 days is above the respective cv value, revealing the planetary signal. Second panel: once the P ~ 75 days planetary signal (RV_P:75) is removed from the original time series, another peak corresponding to the P ~ 526 days planetary signal pops up (this time both the bp- and oc-peaks are above the respective cv values). Third panel: after removing also the second planetary signal (RV_P:526), the GLS periodogram does not show any other significant peak. Bottom panel: from the activity-corrected RV time series, we subtracted the Keplerian signals of the two planets (RV_P:74&P:532) inferred from the global fit of the joint A-MCMC analysis. It turned out that P_b ~ 75 days and P_c ~ 532 days. The main orbital parameters are summarized in Table 8.

Table 8

Planetary parameters as derived in this work and by Pepe et al. (2011) of both HD 192310 b and HD 192310 c.

4 Discussion

We tested the bp method by using real measurements taken from four stars, carrying out comparisons with the oc method that considers a single correction for stellar activity on the entire time series. Before performing the comparisons, the data were properly cleaned by removing the outliers from the set of $\bar{R V}$ $\overline {RV}$ values. The results suggest that properly dividing the $\bar{R V}$ $\overline {RV}$ time series into segments (where each segment is piecewise stationary) is a helpful operation when trying to account for higher variations in $\bar{R V}$ $\overline {RV}$ caused by stellar activity. For all the considered stars, the rms on $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ are smaller than the rms on the corresponding $Δ R V_{oc}^{*}$ $\Delta RV_{{\rm{oc}}}^*$ .

The stronger the activity signals within the time series, the more effective the bp method is at modeling the $\bar{R V}$ $\overline {RV}$ variations. In addition, the longer the time series, the more likely sensible variations in stellar activity have occurred, and so the bp method is particularly suitable to detrend RV data. This is especially the case of α Cen B, which is characterized by strong solar-like activity signals (e.g., Thompson et al. 2017; Dumusque 2018). By dividing the RV measurements of α Cen B into five piecewise stationary segments, we reach an rms of $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ of 1.75 m s⁻¹ (to be compared with an rms of $Δ R V_{oc}^{*}$ $\Delta RV_{{\rm{oc}}}^*$ of 3.48 m s⁻¹ resulting from the application of the oc method). Also, the activity indicators’ variability within each segment is smaller than the variability between each segment, as displayed in Fig. 1 and summarized by Table 3, which provides strong evidence that the segmentation proposed by the bp method captures those time series locations where the stellar activity changes significantly. As a consequence, the bp method is able to interpret a larger fraction of $\bar{R V}$ $\overline {RV}$ variability in terms of stellar activity: 89% vs. 40% for the oc method.

The α Cen B GLS periodogram obtained with the bp method shows a much smaller number of peaks caused by active regions with respect to the GLS periodogram obtained by performing the oc method. When producing the GLS periodogram from the $Δ R V_{oc}^{*}$ $\Delta RV_{{\rm{oc}}}^*$ time series, we found that there were no peaks above cv_oc. Conversely, when considering the $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ time series, 11 peaks in the GLS were above the critical value of cv_bp (see Table 6). We further investigated the nature of those peaks to check whether they could be produced by exoplanets. For a given exoplanet candidate, starting from $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ and assuming a Keplerian model, we used the A-MCMC algorithm to retrieve the posterior distribution of the parameters of interest. Then, we used the marginal posterior means for each parameter in order to estimate the RV-variations caused by the candidate planet. Finally, we compared the rms of $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ with the new rms obtained by subtracting to $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ the RV Keplerian signal that would be caused by the candidate planet. We did not find any strong signal suggesting the possible presence of an exoplanet, confirming the conclusions provided in Rajpaul et al. (2015).

When repeating the comparative analyses for HD 215152, HD 10700, and HD 192310, the bp method produces $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ time series having rms lower by 0.21 m s⁻¹ for HD 215152, 0.06 m s⁻¹ for HD 10700, and 0.1 m s⁻¹ for HD 192310 when compared to the rms of $Δ R V_{oc}^{*}$ $\Delta RV_{{\rm{oc}}}^*$ . In other words, the bp method increases the fraction of $\bar{R V}$ $\overline {RV}$ variability that can be explained in terms of stellar activity by 9.5%, 4.3%, and 6.6% for HD 215152, HD 10700, and HD 192310, respectively. The improvement given by the bp method is less evident in these three cases, as these stars are less active than α Cen B and their time series span a shorter temporal range. Since these time series already appear to be piecewise stationary as a whole, the bp and oc methods are comparable. However, thanks to its better cleaning performance, only the bp method was able to detect both the exoplanets hosted by HD 192310, following our cv-threshold criterion. In particular, we essentially confirm the orbital parameters estimates already available in the literature, but we sensibly improve the precision in the case of HD 192310 c. Given the key role of the planetary mass when studying the composition of exoplanets, we emphasize that our M_p sin i estimates are affected by an error of ~5% for both planet b and planet c. We reached the same precision level of Pepe et al. (2011) for planet b, while we improved the precision of M_p sin i of HD 192310 c by a factor of approximately four.

To further evaluate the ability of the bp and oc methods to detect exoplanets, we designed a simulation study starting from α Cen B data. We added several synthetic Keplerian signals to the time series and checked the exoplanet detection effectiveness of both the bp and oc methods when dealing with RV data points contaminated by solar-like activity signals. We found that the bp method lowers the detection threshold (i.e., the minimum K of the Keplerian signal at which a planet is detected from the inspection of the GLS periodogram) with respect to the oc method by 74% when considering planets up to an orbital period of 500 days.

5 Conclusion

Stellar activity, in the form of active regions evolving on a star’s photosphere, has so far been the major obstacle for the detection and the characterization of Earth-like exoplanets when using the RV method. Spots and faculae cause variations in the shape and in the width of the CCF, changing the correlations between $\bar{R V}$ $\overline {RV}$ and the indicators of stellar activity, such as A, γ, and FWHM_SN. Since an exoplanet would not change the shape of the CCF, but just its barycenter, a common strategy to account for stellar activity is to employ a linear correction of the RV time series involving the activity parameters. In fact, it is well known that variations in the correlations between $\bar{R V}$ $\overline {RV}$ and these activity indicators suggest the presence of active regions evolving on the stellar photosphere over time. A simple way to model the changes in $\bar{R V}$ $\overline {RV}$ caused by stellar activity is provided by Eq. (1). Since RV surveys often spread over years of measurements, it seems reasonable to assume that the stellar activity level changes multiple times during the observational period. Rather than using an overall correction for stellar activity on the entire time series, we still rely on Eq. (1), but we propose performing multiple corrections by suitably dividing the overall time series into segments. The number of segments depends on how often the correlations between $\bar{R V}$ $\overline {RV}$ and the activity indicators significantly change.

In order to estimate the time series locations where the dependence of stellar activity upon activity indicators significantly changes, we draw attention to the family of CPD algorithms.

In particular, in this paper we used the CPD-based bp method (e.g., Bai & Perron 2003) to properly model the variations of $\bar{R V}$ $\overline {RV}$ caused by stellar activity. We compared the effectiveness of the bp method with the commonly employed oc method, by using real observations taken on four different stars. The results show that identifying the locations in the RV data where the correlations between $\bar{R V}$ $\overline {RV}$ and the indicators of stellar activity significantly change, produces much cleaner RV time series as we model the stellar activity signals on each of the piecewise stationary segments. The GLS periodograms are then less contaminated by the presence of spurious periodical peaks caused by stellar activity. As demonstrated by our simulation study on α Cen B, the bp method was able to detect exoplanets that produce RV amplitudes 74% smaller than those detected by the oc method.

Finally, we note that the bp method is most effective when working with active stars whose RV time series are made of several hundreds of data points. In fact, the longer the time series, the more likely sensible variations in stellar activity have occurred, suggesting that the bp technique is a suitable statistical tool for removing activity-induced variations from the RV data.

Acknowledgements

The authors are extremely thankful to the CSC-IT Center for Science, Finland, for the computational resources provided to perform the analyses presented in this work. US was funded by Academy of Finland grant no. 320182. J.C. was funded by the ERC grant no. 742158. X.D. is grateful to The Branco Weiss Fellowship-Society in Science for its financial support. J.C.K. was partially supported by the National Science Foundation under Grant AST 1616086 and 2009528, and by the National Aeronautics and Space Administration under grant 80NSSC18K0443. The authors are grateful to all technical and scientific collaborators of the HARPS Consortium, ESO Headquarters and ESO La Silla who have contributed with their extraordinary passion and valuable work to the success of the HARPS project. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement SCORE No 851555). This work has been carried out within the framework of the NCCR PlanetS supported by the Swiss National Science Foundation.

Appendix A HD 215152, HD 10700, and HD 192310

In this Appendix we show the relevant Tables and Figures summarizing the results we obtained for the other stars of our sample: HD 215152, HD 10700, and HD 192310. Similarly to α Cen B, we show the change point locations detected in each RV time series and the covariate correlations within each stationary segment. In addition, we show the effectiveness of both the bp and oc methods in cleaning the RV time series and the consequent GLS periodograms.

Table A.1

Similar to Table 3, but for HD 215152.

Table A.2

Similar to Table 4, but for HD 215152.

Table A.3

Similar to Table 5, but for HD 215152.

Table A.4

Similar to Table 3, but for HD 10700.

Table A.5

Similar to Table 4, but for HD 10700.

Table A.6

Similar to Table 5, but for HD 10700.

Table A.7

Similar to Table 3, but for HD 192310.

Table A.8

Similar to Table 4, but for HD 192310.

Table A.9

Similar to Table 5, but for HD 192310.

Fig. A.1

Similar to Fig. 2, but for HD 215152.

Fig. A.2

Similar to Fig. 3, but for HD 215152.

Fig. A.3

Similar to Fig. 4, but for HD 215152. The orbital periods of the already discovered exoplanets (Delisle et al. 2018) are displayed as orange dashed lines. The planetary signals have not been removed from the time series, because they are at the level of the instrumental precision.

Fig. A.4

Similar to Fig. 2, but for HD 10700.

Fig. A.5

Similar to Fig. 3, but for HD10700.

Fig. A.6

Similar to Fig. 4, but for HD 10700. The orbital periods of the already discovered exoplanets (Feng et al. 2017b) are displayed as orange dashed lines. The planetary signals have not been removed from the time series, because they are at the level of the instrumental precision.

Fig. A.7

Similar to Fig. 2, but for HD 192310.

Fig. A.8

Similar to Fig. 3, but for HD 192310. We recall that the two planetary signals were removed from our analyses after the bp method and the oc method were used, in order to better focus on the residuals of the two methods.

Table A.10

Normalized power P and cv values inferred from the bp-corrected GLS (P_bp, cv_bp) and oc-corrected GLS (P_oc, cv_oc) for HD 192310. They are evaluated at the stellar rotation period (P_rot) and at the at the orbital periods (P_orb) of the hosted exoplanets found by Pepe et al. (2011). The first two rows show P inferred from the ∆RV^* signals, while in the last row P are computed after removing the Keplerian signal of P_orb,1 = 75 days from the ∆RV^* signals. Both the two planets would be visible in the GLS after applying the bp method and the iterative removal of planetary signals, as P_bp are always above the respective cv_bp values.

References

Adams, R. P., & MacKay, D. J. 2007, ArXiv e-prints [arXiv:0710.3742] [Google Scholar]
Adcock, C., & Azzalini, A. 2020, Symmetry, 12, 118 [CrossRef] [Google Scholar]
Aminikhanghahi, S., & Cook, D. J. 2017, Knowl. Inform. Syst., 51, 339 [CrossRef] [Google Scholar]
Angelosante, D., & Giannakis, G. B. 2012, EURASIP J. Adv. Signal Process., 2012, 70 [CrossRef] [Google Scholar]
Bai, J. 1997a, Econometric Theory, 13, 315 [CrossRef] [Google Scholar]
Bai, J. 1997b, Rev. Econ. Stat., 79, 551 [CrossRef] [Google Scholar]
Bai, J., & Perron, P. 1998, Econometrica, 47 [CrossRef] [Google Scholar]
Bai, J., & Perron, P. 2003, J. Appl. Econometrics, 18, 1 [CrossRef] [Google Scholar]
Baliunas, S., Sokoloff, D., & Soon, W. 1996, ApJL, 457, L99 [NASA ADS] [CrossRef] [Google Scholar]
Bellman, R., & Roth, R. 1969, J. Am. Stat. Assoc., 64, 1079 [CrossRef] [Google Scholar]
Boisse, I., Bouchy, F., Hébrard, G., et al. 2011, Astron. Astrophys., 528, A4 [CrossRef] [EDP Sciences] [Google Scholar]
Borgniet, S., Meunier, N., & Lagrange, A.-M. 2015, A&A, 581, A133 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Chen, J., & Gupta, A. K. 2011, Parametric Statistical Change Point Analysis: with Applications to Genetics, Medicine, and Finance (Springer Science & Business Media) [Google Scholar]
Christensen-Dalsgaard, J., Bedding, T. R., & Kjeldsen, H. 1995, ApJ, 443, L29 [NASA ADS] [CrossRef] [Google Scholar]
Cramér, H. 1928, Scand. Actuar. J., 1928, 13 [CrossRef] [Google Scholar]
Davis, A. B., Cisewski, J., Dumusque, X., Fischer, D. A., & Ford, E. B. 2017, ApJ, 846, 59 [NASA ADS] [CrossRef] [Google Scholar]
Delisle, J.-B., Ségransan, D., Dumusque, X., et al. 2018, A&A, 614, A133 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Del Moro, D. 2004, A&A, 428, 1007 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Desort, M., Lagrange, A.-M., Galland, F., Udry, S., & Mayor, M. 2007, A&A, 473, 983 [CrossRef] [EDP Sciences] [Google Scholar]
DeWarf, L. E., Datin, K. M., & Guinan, E. F. 2010, ApJ, 722, 343 [NASA ADS] [CrossRef] [Google Scholar]
Dumusque, X. 2016, A&A, 593, A5 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Dumusque, X. 2018, A&A, 620, A47 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Dumusque, X., Lovis, C., Ségransan, D., et al. 2011b, A&A, 535, A55 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Dumusque, X., Udry, S., Lovis, C., Santos, N. C., & Monteiro, M. 2011a, A&A, 525, A140 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Dumusque, X., Pepe, F., Lovis, C., et al. 2012, Nature, 491, 207 [Google Scholar]
Dumusque, X., Boisse, I., & Santos, N. 2014, ApJ, 796, 132 [Google Scholar]
Dumusque, X., Borsa, F., Damasso, M., et al. 2017, A&A, 598, A133 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
DuToit, S. H., Steyn, A. G. W., & Stumpf, R. H. 2012, Graphical Exploratory Data Analysis (Springer Science & Business Media) [Google Scholar]
Efron, B., & Tibshirani, R. 1993, An Introduction to the Bootstrap (Boca Raton), 57 [Google Scholar]
Fearnhead, P., & Rigaill, G. 2019, J. Am. Stat. Assoc., 114, 169 [CrossRef] [Google Scholar]
Feng, F., Tuomi, M., & Jones, H. R. A. 2017a, A&A, 605, A103 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Feng, F., Tuomi, M., Jones, H. R. A., et al. 2017b, AJ, 154, 135 [Google Scholar]
Figueira, P., Santos, N., Pepe, F., Lovis, C., & Nardetto, N. 2013, A&A, 557, A93 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Fiorenzano, A. M., Gratton, R., Desidera, S., Cosentino, R., & Endl, M. 2005, A&A, 442, 775 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Fischer, D. A., Anglada-Escude, G., Arriagada, P., et al. 2016, PASP, 128, 066001 [Google Scholar]
Fisher, W. D. 1958, J. Am. Stat. Assoc., 53, 789 [CrossRef] [Google Scholar]
Frick, K., Munk, A., & Sieling, H. 2014, J. R. Stat. Soc. B (Stat. Methodol.), 76, 495 [CrossRef] [Google Scholar]
Ghosh, D., & Vogt, A. 2012, Retrieved from American Statistical Association’s Section on Survey Research Methods Proceedings, [Google Scholar]
Gourieroux, C., Holly, A., & Monfort, A. 1982, Econometrica, 63 [CrossRef] [Google Scholar]
Guédon, Y. 2013, Comput. Stat., 28, 2641 [CrossRef] [Google Scholar]
Gupta, A. K., & Nadarajah, S. 2004, Handbook of Beta Distribution and its Applications (CRC Press) [CrossRef] [Google Scholar]
Guthery, S. B. 1974, J. Am. Stat. Assoc., 69, 945 [CrossRef] [Google Scholar]
Haario, H., Saksman, E., & Tamminen, J. 2005, Comput. Stat., 20, 265 [CrossRef] [Google Scholar]
Hackl, P., & Westlund, A. H. 1989, in Econometrics of Structural Change (Springer), 103 [CrossRef] [Google Scholar]
Hatzes, A. P. 1996, PASP, 108, 839 [NASA ADS] [CrossRef] [Google Scholar]
Hatzes, A. P. 2016, in Methods of Detecting Exoplanets (Springer), 3 [NASA ADS] [CrossRef] [Google Scholar]
Hatzes, A. P., & Cochran, W. D. 2000, AJ, 120, 979 [CrossRef] [Google Scholar]
Haynes, K., Eckley, I. A., & Fearnhead, P. 2017, J. Comput. Graph. Stat., 26, 134 [CrossRef] [Google Scholar]
Haywood, R. D., Collier Cameron, A., Queloz, D., et al. 2014, MNRAS, 443, 2517 [Google Scholar]
Hocking, T. D., Schleiermacher, G., Janoueix-Lerosey, I., et al. 2013, BMC Bioinformatics, 14, 164 [CrossRef] [Google Scholar]
Howard, A. W., Marcy, G. W., Johnson, J. A., et al. 2010, Science, 330, 653 [Google Scholar]
Jandhyala, V., Fotopoulos, S., MacNeill, I., & Liu, P. 2013, J. Time Ser. Anal., 34, 423 [CrossRef] [Google Scholar]
Jurgenson, C., Fischer, D., McCracken, T., et al. 2016, SPIE Conf. Ser., 9908, 99086T [Google Scholar]
Kjeldsen, H., Bedding, T. R., Arentoft, T., et al. 2008, ApJ, 682, 1370 [Google Scholar]
Lavielle, M., & Teyssiere, G. 2007, in Long Memory in Economics (Springer), 129 [CrossRef] [Google Scholar]
Lavielle, M., Teyssiere, G., & Stochastique, M. 2006, Lietuvos Matematikos Rinikinys, 46, 25 [Google Scholar]
Lefebvre, S., García, R., Jiménez-Reyes, S., Turck-Chièze, S., & Mathur, S. 2008, A&A, 490, 1143 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Lévy-Leduc, C., Roueff, F., 2009, Ann. Appl. Stat., 3, 637 [Google Scholar]
Lindegren, L., & Dravins, D. 2003, A&A, 401, 1185 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Liu, S., Wright, A., & Hauskrecht, M. 2018, Artif. Intell. Med., 91, 49 [CrossRef] [Google Scholar]
Lomb, N. R. 1976, Astrophys. Space Sci., 39, 447 [Google Scholar]
Lovis, C., & Fischer, D. 2010, Exoplanets, 27 [Google Scholar]
Lung-Yut-Fong, A., Lévy-Leduc, C., & Cappé, O. 2011, ArXiv e-prints [arXiv: 1107.1971] [Google Scholar]
Maidstone, R. 2016, PhD thesis, Lancaster University [Google Scholar]
Mayor, M., & Queloz, D. 1995, Nature, 378, 355 [Google Scholar]
Mayor, M., Marmier, M., Lovis, C., et al. 2011, ArXiv e-prints [arXiv: 1109.2497] [Google Scholar]
McKnight, P. E., & Najab, J. 2010, The Corsini encyclopedia of psychology, 1 [Google Scholar]
Meunier, N., Desort, M., & Lagrange, A.-M. 2010, A&A, 512, A39 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Meunier, N., Lagrange, A.-M., Kabuiku, L. M., et al. 2017, A&A, 597, A52 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Nava, C., López-Morales, M., Haywood, R. D., & Giles, H. A. 2019, AJ, 159, 23 [CrossRef] [Google Scholar]
Noyes, R., Hartmann, L., Baliunas, S., Duncan, D., & Vaughan, A. 1984, ApJ, 279, 763 [NASA ADS] [CrossRef] [Google Scholar]
Page, E. S. 1954, Biometrika, 41, 100 [CrossRef] [Google Scholar]
Pepe, F., Lovis, C., Ségransan, D., et al. 2011, A&A, 534, A58 [CrossRef] [EDP Sciences] [Google Scholar]
Pepe, F., Molaro, P., Cristiani, S., et al. 2014, Astron. Nachr., 335, 8 [Google Scholar]
Pollet, T. V., & van der Meij, L. 2017, Adapt. Hum. Behav. Physiol., 3, 43 [CrossRef] [Google Scholar]
Queloz, D., Henry, G., Sivan, J., et al. 2001, A&A, 379, 279 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
R Core Team. 2019, R: A Language and Environment for Statistical Computing (Vienna, Austria: R Foundation for Statistical Computing) [Google Scholar]
Rajpaul, V., Aigrain, S., & Roberts, S. 2015, MNRAS, 456, L6 [Google Scholar]
Rasmussen, C. E., & Williams, C. K. I. 2005, Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) (The MIT Press) [Google Scholar]
Reeves, J., Chen, J., Wang, X. L., Lund, R., & Lu, Q. Q. 2007, J. Appl. Meteorol. Climatol., 46, 900 [CrossRef] [Google Scholar]
Reiners, A., Zechmeister, M., Caballero, J., et al. 2018, A&A, 612, A49 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Robertson, P., Mahadevan, S., Endl, M., & Roy, A. 2014, Science, 345, 440 [Google Scholar]
Rosenthal, L. J., Fulton, B. J., Hirsch, L. A., et al. 2021, ApJS, 255, 8 [NASA ADS] [CrossRef] [Google Scholar]
Saar, S. H., & Donahue, R. A. 1997, Astrophys. J., 485, 319 [NASA ADS] [CrossRef] [Google Scholar]
Sahki, N., Gégout-Petit, A., & Mézières-Wantz, S. 2018, in ENBIS 2018-18th Annual Conference of the European Network for Business and Industrial Statistics [Google Scholar]
Scargle, J. D. 1982, ApJ, 263, 835 [Google Scholar]
Schwarz, G., 1978, Ann. Stat., 6, 461 [Google Scholar]
Schwarzenberg-Czerny, A. 1998, MNRAS, 301, 831 [NASA ADS] [CrossRef] [Google Scholar]
Seber, G. A., & Lee, A. J. 2003, Linear Regression Analysis, 165 [CrossRef] [Google Scholar]
Simola, U., Dumusque, X., & Cisewski-Kehe, J. 2019, A&A, 622, A131 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Thieler, A. M. 2014, PhD thesis, Dissertation, Dortmund, Technische Universität, 2014 [Google Scholar]
Thieler, A. M., Fried, R., & Rathjens, J. 2016, J. Stat. Softw., 69, 1 [CrossRef] [Google Scholar]
Thompson, A., Watson, C., de Mooij, E., & Jess, D. 2017, MNRAS, 468, L16 [NASA ADS] [CrossRef] [Google Scholar]
Thompson, A., Watson, C., Haywood, R., et al. 2020, MNRAS, 494, 4279 [NASA ADS] [CrossRef] [Google Scholar]
Truong, C., Oudre, L., & Vayatis, N. 2018, ArXiv e-prints [arXiv: 1801.00826] [Google Scholar]
Truong, C., Oudre, L., & Vayatis, N. 2020, Signal Process., 167, 107299 [CrossRef] [Google Scholar]
Tuomi, M., Anglada-Escudé, G., Gerlach, E., et al. 2013, A&A, 549, A48 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
van den Burg, G.J., & Williams, C.K. 2020, ArXiv e-prints [arXiv:2003.06222] [Google Scholar]
Verbesselt, J., Hyndman, R., Newnham, G., & Culvenor, D. 2010, Rem. Sens. Environ., 114, 106 [NASA ADS] [CrossRef] [Google Scholar]
Wilson, O. 1968, ApJ, 153, 221 [CrossRef] [Google Scholar]
Zechmeister, M., & Kürster, M. 2009, A&A, 496, 577 [CrossRef] [EDP Sciences] [Google Scholar]
Zeileis, A., Leisch, F., Hornik, K., & Kleiber, C. 2002, J. Stat. Softw., 7, 1 [CrossRef] [Google Scholar]
Zeileis, A., Kleiber, C., Krämer, W., & Hornik, K. 2003, Comput. Stat. Data Anal., 44, 109 [CrossRef] [Google Scholar]
Zimmerman, D. W., & Zumbo, B. D. 1993, J. Exp. Educ., 62, 75 [CrossRef] [Google Scholar]

^★★

Branco Weiss Fellow–Society in Science (http://www.society-in-science.org)

¹

Since y is a vector of dimension T × 1, the matrix of covariates has dimensions T × (p + 1), where p is the number of covariates. The columns of X are X₁, …, X_p plus an initial extra column X₀ padded with a vector of ones that is indicated as 1. Consequently, the length of the parameters’ vector β is p + 1.

²

We set D_max = 8 throughout the analyses of this paper, which turned out to be high enough to avoid putting a strong a priori constrain on the number of breakpoints. In fact, the bp method usually converges only up to $\bar{D} < D_{\max}$ $\bar D < {D_{\max }}$ , and therefore the set of RSS matrices is computed only for those models.

³

https://cran.r-project.org/web/packages/strucchange/. The same method is also available in PYTHON through the ruptures package (Truong et al. 2018).

⁴

Despite the addition of the Keplerian signal, we noted that the ∆RV^p cvs are basically the same as the cvs computed from $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ and $Δ R V_{oc}^{*}$ $\Delta RV_{{\rm{oc}}}^*$ , and therefore decided to use the already computed cv_oc = 0.43 and cv_bp = 0.022 throughout the following analysis.

⁵

This grid produces fewer synthetic exoplanets than before. This made the present simulation quicker, without penalizing the strength of our conclusions.

⁶

According to the Nasa Exoplanet Archive (https://exoplanetarchive.ipac.caltech.edu/overview/HD192310), where only the set of parameters by Pepe et al. (2011) is provided.

All Tables

Table 1

Legend of the RV-related symbols adopted in this paper.

In the text

Table 2

Average signal to noise ratio (S/N) of the available spectra, number of RV data points (# CCFs), and stellar rotational periods (P_rot) for each of the four targets of interest.

In the text

Table 3

Locations of the change points (CPL) estimated by the bp method for α Cen B.

In the text

Table 4

Correlation ρ between $\bar{R V}$ $\overline {RV}$ and the activity indicators of α Cen B for each of the piecewise-stationary segments detected by the bp method.

In the text

Table 5

Variability of the $\bar{R V}, R V_{activity}^{*}$ $\overline {RV} ,RV_{{\rm{activity}}}^*$ , and ∆RV^* time series of α Cen B quantified through the rms.

In the text

Table 6

Periods P and corresponding normalized powers P found in the $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ GLS periodogram of α Cen B, whose peaks are above the critical value of 0.022.

In the text

Table 7

rms statistic of α Cen B RV time series for different partitions (D is the number of breakpoints); we recall that rms $(\bar{R V})$ $\left( {\overline {RV} } \right)$ m s⁻¹.

In the text

Table 8

Planetary parameters as derived in this work and by Pepe et al. (2011) of both HD 192310 b and HD 192310 c.

In the text

Table A.1

Similar to Table 3, but for HD 215152.

In the text

Table A.2

Similar to Table 4, but for HD 215152.

In the text

Table A.3

Similar to Table 5, but for HD 215152.

In the text

Table A.4

Similar to Table 3, but for HD 10700.

In the text

Table A.5

Similar to Table 4, but for HD 10700.

In the text

Table A.6

Similar to Table 5, but for HD 10700.

In the text

Table A.7

Similar to Table 3, but for HD 192310.

In the text

Table A.8

Similar to Table 4, but for HD 192310.

In the text

Table A.9

Similar to Table 5, but for HD 192310.

In the text

Table A.10

Normalized power P and cv values inferred from the bp-corrected GLS (P_bp, cv_bp) and oc-corrected GLS (P_oc, cv_oc) for HD 192310. They are evaluated at the stellar rotation period (P_rot) and at the at the orbital periods (P_orb) of the hosted exoplanets found by Pepe et al. (2011). The first two rows show P inferred from the ∆RV^* signals, while in the last row P are computed after removing the Keplerian signal of P_orb,1 = 75 days from the ∆RV^* signals. Both the two planets would be visible in the GLS after applying the bp method and the iterative removal of planetary signals, as P_bp are always above the respective cv_bp values.

In the text

All Figures

Fig. 1

Boxplots of the response variable $\bar{R V}$ $\overline {RV}$ and of the activity indicators A, γ, and FWHM_SN for each segment of the optimal partition found by the bp method when applied to the α Cen B RV time series. We recall that a boxplot is a graphical tool that visualizes a distribution; the median is shown by the thick horizontal line, while the box extension defines the interquartile range (IQR), being bounded by the first (Q₁) and third (Q₃) quartiles. The upper and lower limits of the vertical dashed lines (the whiskers) are defined as min {max(X), Q₃ + 1.5 · IQR} and max {min (X), Q₁ − 1.5 · IQR}, respectively, where X indicates the sample to be represented. Outliers beyond the whiskers (if any) are represented as empty dots.

In the text

Fig. 2

α Cen B: correlation plots between $\bar{R V}$ $\overline {RV}$ and the activity indicators (A, γ, and FWHM_SN); the contrast parameter A is shown after having been multiplied by a factor ten. Top left panel: the entire data are taken into account as a result of the oc method. Other panels: correlations are shown for each segment found by the bp method. A nonparametric local regression for each pair of considered variables (continuous red line) and its barycenter (i.e., the median, red dot) are also shown.

In the text

Fig. 3

Modeling of the stellar activity of α Cen B, as performed by the bp and oc methods. Top panel: signal due to stellar activity as estimated by the bp method (blue crosses) and by the oc method (red triangles) superimposed to the $\bar{R V}$ $\overline {RV}$ time series (black circles) of α Cen B. Bottom panel: ∆RV^* as computed by the bp method (blue crosses) and by the oc method (red triangles). The black circles on the background show the original $\bar{R V}$ $\overline {RV}$ time series, enabling a visual comparison of the improvement achieved by the bp method in the data correction. The change point locations (Table 3) are displayed in both plots as vertical orange dashed lines.

In the text

Fig. 4

GLS periodograms of α Cen B derived from the original $\bar{R V}$ $\overline {RV}$ time series (solid black line), $Δ R V_{oc}^{*}$ ${\rm{\Delta }}RV_{{\rm{oc}}}^*$ (dashed red line), and $Δ R V_{bp}^{*}$ ${\rm{\Delta }}RV_{{\rm{bp}}}^*$ (dash-dotted blue line). The x-axis uses the base-10 logarithmic transformation of the period in order to improve the readability of the plot. The GLS periodogram obtained from $Δ R V_{bp}^{*}$ ${\rm{\Delta }}RV_{{\rm{bp}}}^*$ does not show any of the periodical peaks caused by stellar signals, which are instead present in the GLS periodogram obtained from $Δ R V_{oc}^{*}$ ${\rm{\Delta }}RV_{{\rm{oc}}}^*$ . The cv values for $Δ R V_{oc}^{*}$ ${\rm{\Delta }}RV_{{\rm{oc}}}^*$ and $Δ R V_{bp}^{*}$ ${\rm{\Delta }}RV_{{\rm{bp}}}^*$ are displayed as solid red and blue lines, respectively.

In the text

Fig. 5

GLS periodograms of α Cen B derived from the original $\bar{R V}$ $\overline {RV}$ time series (solid black line), $Δ R V_{oc}^{*}$ ${\rm{\Delta }}RV_{{\rm{oc}}}^*$ (dashed red line), and $Δ R V_{bp}^{*}$ ${\rm{\Delta }}RV_{{\rm{bp}}}^*$ (dash-dotted blue line), focusing on a subset of the epochs including the rotation period of the star equal to 39 days (highlighted by the vertical gray line). The 39-day peaks have normalized powers $P_{\bar{R V}} = 0.173$ ${P_{\overline {RV} }} = 0.173$ and P_oc = 0.167 in the original and in the oc-corrected GLS, respectively. The normalized power of this peak is reduced at P_bp = 0.027 for the bp-corrected GLS. The cv value for $Δ R V_{bp}^{*}$ ${\rm{\Delta }}RV_{{\rm{bp}}}^*$ is displayed as the horizontal blue line and is equal to 0.022; we recall that the cv for $Δ R V_{oc}^{*}$ $\Delta RV_{{\rm{oc}}}^*$ is 0.43.

In the text

Fig. 6

GLS periodograms of α Cen B derived from the subset of 459 RVs, also analyzed by Dumusque et al. (2012); Rajpaul et al. (2015). In detail, the periodograms have been computed from the original $\bar{R V}$ $\overline {RV}$ time series (solid black line), $Δ R V_{oc}^{*}$ $\Delta RV_{{\rm{oc}}}^*$ (dashed red line), and $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ (dash-dotted blue line). The plot focuses on the neighborhood of 3.2357 days (highlighted as a vertical gray line), which corresponds to the orbital period of the exoplanet announced by Dumusque et al. (2012) and then disproved by Rajpaul et al. (2015). We note that all the GLS peaks are below the cv thresholds for both of the methods.

In the text

Fig. 7

Generalized Lomb–Scargle (GLS) periodograms of α Cen B derived from the $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ when using $\hat{D} = 4$ $\hat D = 4$ (solid black line; this solution is the same reported in Fig. 4 as the dash-dotted blue line). The GLS periodograms derived from the $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ when using D = 2, 3, and 5 are displayed as a dotted blue line, a dash-dotted orange line and a dashed red line, respectively. The x-axis uses the base-10 logarithmic transformation of the period in order to improve the readability of the plot. The cv values for $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ are displayed as black, blue, orange and red lines for the cases D = 4, 2, 3, and 5, respectively. Although the rms of $Δ R V_{bp}^{*}$ $\Delta RV_{{\rm{bp}}}^*$ for the considered cases are not statistically different, the bp-optimal solution ( $\hat{D} = 4$ $\hat D = 4$ ) leads to the smallest cv.

In the text

Fig. 8

Upper panels: GLS periodograms derived from synthetic Keplerian RV signals of K = 15, 10, 5, 2.5, 1, and 0.5 m s⁻¹, which were perturbed by injecting white noise whose amplitude follows a normal distribution with standard deviation of 1.75 m s⁻¹ (top) and 3.48 m s⁻¹ (middle). The unevenly sampled temporal baseline (i.e., the temporal baseline of α Cen B) is responsible for the power decrease at 365 days, as pointed out in Sect. 3.4.2. Bottom panel: detection thresholds of synthetic exoplanets injected into the α Cen B RV time series. It is the minimum semiamplitude K at which a synthetic exoplanet of a given period is recovered from the two different GLS periodograms. The minimum K values are inferred from the $Δ R V_{bp}^{p}$ $\Delta RV_{{\rm{bp}}}^p$ time series (bp method, dashed blue line) and from the $Δ R V_{oc}^{p}$ $\Delta RV_{{\rm{oc}}}^p$ time series (oc method, solid red line). Blue crosses (bp method) or red crosses (oc method) highlight the null detection of any exoplanet having K ≤ 15 m s⁻¹. For both methods, a null detection occurs at ~365 days, where the GLS normalized power drops. Overall, by correcting for stellar activity on each of the five temporal segments found by the bp method, the detection threshold of an exoplanet lowers by 74% on average. The two boxplots represented in the top right inset show the distributions of the minimum K for the two methods. The two distributions are statistically different as quantified by the Wilcoxon test through its p-value = 2.22 × 10⁻¹⁶. The detection threshold of an exoplanet lowers by 78% on average, when focusing on planets up to an orbital period of 250 days.

In the text

Fig. 9

Relation between the peak normalized power P of the GLS periodogram obtained from a synthetic Keplerian RV signal and the period of that signal (i.e., the orbital period of the synthetic exoplanet). The different colors represent the different K values imposed on the Keplerian signals, which increase from bottom to top. The ruggedness of the lines is due to the white noise that was also injected into the RV data series. Regardless of the considered K (i.e., the color), P decreases at ~365 days (close to the revolution period of the Earth), as expected in ground-based observations.

In the text

Fig. 10

Similar to Fig. 4, but for HD 192310. The vertical dashed orange lines at ~75 days and ~526 days highlight the orbital periods of the two Neptune-mass planets found by Pepe et al. (2011). Top panel: as a result of the stellar activity correction, the bp-peak at ~75 days is above the respective cv value, revealing the planetary signal. Second panel: once the P ~ 75 days planetary signal (RV_P:75) is removed from the original time series, another peak corresponding to the P ~ 526 days planetary signal pops up (this time both the bp- and oc-peaks are above the respective cv values). Third panel: after removing also the second planetary signal (RV_P:526), the GLS periodogram does not show any other significant peak. Bottom panel: from the activity-corrected RV time series, we subtracted the Keplerian signals of the two planets (RV_P:74&P:532) inferred from the global fit of the joint A-MCMC analysis. It turned out that P_b ~ 75 days and P_c ~ 532 days. The main orbital parameters are summarized in Table 8.

In the text

	Fig. A.1 Similar to Fig. 2, but for HD 215152.
In the text

	Fig. A.2 Similar to Fig. 3, but for HD 215152.
In the text

	Fig. A.3 Similar to Fig. 4, but for HD 215152. The orbital periods of the already discovered exoplanets (Delisle et al. 2018) are displayed as orange dashed lines. The planetary signals have not been removed from the time series, because they are at the level of the instrumental precision.
In the text

	Fig. A.4 Similar to Fig. 2, but for HD 10700.
In the text

	Fig. A.5 Similar to Fig. 3, but for HD10700.
In the text

	Fig. A.6 Similar to Fig. 4, but for HD 10700. The orbital periods of the already discovered exoplanets (Feng et al. 2017b) are displayed as orange dashed lines. The planetary signals have not been removed from the time series, because they are at the level of the instrumental precision.
In the text

	Fig. A.7 Similar to Fig. 2, but for HD 192310.
In the text

	Fig. A.8 Similar to Fig. 3, but for HD 192310. We recall that the two planetary signals were removed from our analyses after the bp method and the oc method were used, in order to better focus on the residuals of the two methods.
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Adams, R. P., & MacKay, D. J. 2007, ArXiv e-prints [arXiv:0710.3742] [Google Scholar]

[2] Adcock, C., & Azzalini, A. 2020, Symmetry, 12, 118 [CrossRef] [Google Scholar]

[3] Aminikhanghahi, S., & Cook, D. J. 2017, Knowl. Inform. Syst., 51, 339 [CrossRef] [Google Scholar]

[4] Angelosante, D., & Giannakis, G. B. 2012, EURASIP J. Adv. Signal Process., 2012, 70 [CrossRef] [Google Scholar]

[5] Bai, J. 1997a, Econometric Theory, 13, 315 [CrossRef] [Google Scholar]

[6] Bai, J. 1997b, Rev. Econ. Stat., 79, 551 [CrossRef] [Google Scholar]

[7] Bai, J., & Perron, P. 1998, Econometrica, 47 [CrossRef] [Google Scholar]

[8] Bai, J., & Perron, P. 2003, J. Appl. Econometrics, 18, 1 [CrossRef] [Google Scholar]

[9] Baliunas, S., Sokoloff, D., & Soon, W. 1996, ApJL, 457, L99 [NASA ADS] [CrossRef] [Google Scholar]

[10] Bellman, R., & Roth, R. 1969, J. Am. Stat. Assoc., 64, 1079 [CrossRef] [Google Scholar]

[11] Boisse, I., Bouchy, F., Hébrard, G., et al. 2011, Astron. Astrophys., 528, A4 [CrossRef] [EDP Sciences] [Google Scholar]

[12] Borgniet, S., Meunier, N., & Lagrange, A.-M. 2015, A&A, 581, A133 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[13] Chen, J., & Gupta, A. K. 2011, Parametric Statistical Change Point Analysis: with Applications to Genetics, Medicine, and Finance (Springer Science & Business Media) [Google Scholar]

[14] Christensen-Dalsgaard, J., Bedding, T. R., & Kjeldsen, H. 1995, ApJ, 443, L29 [NASA ADS] [CrossRef] [Google Scholar]

[15] Cramér, H. 1928, Scand. Actuar. J., 1928, 13 [CrossRef] [Google Scholar]

[16] Davis, A. B., Cisewski, J., Dumusque, X., Fischer, D. A., & Ford, E. B. 2017, ApJ, 846, 59 [NASA ADS] [CrossRef] [Google Scholar]

[17] Delisle, J.-B., Ségransan, D., Dumusque, X., et al. 2018, A&A, 614, A133 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[18] Del Moro, D. 2004, A&A, 428, 1007 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[19] Desort, M., Lagrange, A.-M., Galland, F., Udry, S., & Mayor, M. 2007, A&A, 473, 983 [CrossRef] [EDP Sciences] [Google Scholar]

[20] DeWarf, L. E., Datin, K. M., & Guinan, E. F. 2010, ApJ, 722, 343 [NASA ADS] [CrossRef] [Google Scholar]

[21] Dumusque, X. 2016, A&A, 593, A5 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[22] Dumusque, X. 2018, A&A, 620, A47 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[23] Dumusque, X., Lovis, C., Ségransan, D., et al. 2011b, A&A, 535, A55 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[24] Dumusque, X., Udry, S., Lovis, C., Santos, N. C., & Monteiro, M. 2011a, A&A, 525, A140 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[25] Dumusque, X., Pepe, F., Lovis, C., et al. 2012, Nature, 491, 207 [Google Scholar]

[26] Dumusque, X., Boisse, I., & Santos, N. 2014, ApJ, 796, 132 [Google Scholar]

[27] Dumusque, X., Borsa, F., Damasso, M., et al. 2017, A&A, 598, A133 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[28] DuToit, S. H., Steyn, A. G. W., & Stumpf, R. H. 2012, Graphical Exploratory Data Analysis (Springer Science & Business Media) [Google Scholar]

[29] Efron, B., & Tibshirani, R. 1993, An Introduction to the Bootstrap (Boca Raton), 57 [Google Scholar]

[30] Fearnhead, P., & Rigaill, G. 2019, J. Am. Stat. Assoc., 114, 169 [CrossRef] [Google Scholar]

[31] Feng, F., Tuomi, M., & Jones, H. R. A. 2017a, A&A, 605, A103 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[32] Feng, F., Tuomi, M., Jones, H. R. A., et al. 2017b, AJ, 154, 135 [Google Scholar]

[33] Figueira, P., Santos, N., Pepe, F., Lovis, C., & Nardetto, N. 2013, A&A, 557, A93 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[34] Fiorenzano, A. M., Gratton, R., Desidera, S., Cosentino, R., & Endl, M. 2005, A&A, 442, 775 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[35] Fischer, D. A., Anglada-Escude, G., Arriagada, P., et al. 2016, PASP, 128, 066001 [Google Scholar]

[36] Fisher, W. D. 1958, J. Am. Stat. Assoc., 53, 789 [CrossRef] [Google Scholar]

[37] Frick, K., Munk, A., & Sieling, H. 2014, J. R. Stat. Soc. B (Stat. Methodol.), 76, 495 [CrossRef] [Google Scholar]

[38] Ghosh, D., & Vogt, A. 2012, Retrieved from American Statistical Association’s Section on Survey Research Methods Proceedings, [Google Scholar]

[39] Gourieroux, C., Holly, A., & Monfort, A. 1982, Econometrica, 63 [CrossRef] [Google Scholar]

[40] Guédon, Y. 2013, Comput. Stat., 28, 2641 [CrossRef] [Google Scholar]

[41] Gupta, A. K., & Nadarajah, S. 2004, Handbook of Beta Distribution and its Applications (CRC Press) [CrossRef] [Google Scholar]

[42] Guthery, S. B. 1974, J. Am. Stat. Assoc., 69, 945 [CrossRef] [Google Scholar]

[43] Haario, H., Saksman, E., & Tamminen, J. 2005, Comput. Stat., 20, 265 [CrossRef] [Google Scholar]

[44] Hackl, P., & Westlund, A. H. 1989, in Econometrics of Structural Change (Springer), 103 [CrossRef] [Google Scholar]

[45] Hatzes, A. P. 1996, PASP, 108, 839 [NASA ADS] [CrossRef] [Google Scholar]

[46] Hatzes, A. P. 2016, in Methods of Detecting Exoplanets (Springer), 3 [NASA ADS] [CrossRef] [Google Scholar]

[47] Hatzes, A. P., & Cochran, W. D. 2000, AJ, 120, 979 [CrossRef] [Google Scholar]

[48] Haynes, K., Eckley, I. A., & Fearnhead, P. 2017, J. Comput. Graph. Stat., 26, 134 [CrossRef] [Google Scholar]

[49] Haywood, R. D., Collier Cameron, A., Queloz, D., et al. 2014, MNRAS, 443, 2517 [Google Scholar]

[50] Hocking, T. D., Schleiermacher, G., Janoueix-Lerosey, I., et al. 2013, BMC Bioinformatics, 14, 164 [CrossRef] [Google Scholar]

[51] Howard, A. W., Marcy, G. W., Johnson, J. A., et al. 2010, Science, 330, 653 [Google Scholar]

[52] Jandhyala, V., Fotopoulos, S., MacNeill, I., & Liu, P. 2013, J. Time Ser. Anal., 34, 423 [CrossRef] [Google Scholar]

[53] Jurgenson, C., Fischer, D., McCracken, T., et al. 2016, SPIE Conf. Ser., 9908, 99086T [Google Scholar]

[54] Kjeldsen, H., Bedding, T. R., Arentoft, T., et al. 2008, ApJ, 682, 1370 [Google Scholar]

[55] Lavielle, M., & Teyssiere, G. 2007, in Long Memory in Economics (Springer), 129 [CrossRef] [Google Scholar]

[56] Lavielle, M., Teyssiere, G., & Stochastique, M. 2006, Lietuvos Matematikos Rinikinys, 46, 25 [Google Scholar]

[57] Lefebvre, S., García, R., Jiménez-Reyes, S., Turck-Chièze, S., & Mathur, S. 2008, A&A, 490, 1143 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[58] Lévy-Leduc, C., Roueff, F., 2009, Ann. Appl. Stat., 3, 637 [Google Scholar]

[59] Lindegren, L., & Dravins, D. 2003, A&A, 401, 1185 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[60] Liu, S., Wright, A., & Hauskrecht, M. 2018, Artif. Intell. Med., 91, 49 [CrossRef] [Google Scholar]

[61] Lomb, N. R. 1976, Astrophys. Space Sci., 39, 447 [Google Scholar]

[62] Lovis, C., & Fischer, D. 2010, Exoplanets, 27 [Google Scholar]

[63] Lung-Yut-Fong, A., Lévy-Leduc, C., & Cappé, O. 2011, ArXiv e-prints [arXiv: 1107.1971] [Google Scholar]

[64] Maidstone, R. 2016, PhD thesis, Lancaster University [Google Scholar]

[65] Mayor, M., & Queloz, D. 1995, Nature, 378, 355 [Google Scholar]

[66] Mayor, M., Marmier, M., Lovis, C., et al. 2011, ArXiv e-prints [arXiv: 1109.2497] [Google Scholar]

[67] McKnight, P. E., & Najab, J. 2010, The Corsini encyclopedia of psychology, 1 [Google Scholar]

[68] Meunier, N., Desort, M., & Lagrange, A.-M. 2010, A&A, 512, A39 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[69] Meunier, N., Lagrange, A.-M., Kabuiku, L. M., et al. 2017, A&A, 597, A52 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[70] Nava, C., López-Morales, M., Haywood, R. D., & Giles, H. A. 2019, AJ, 159, 23 [CrossRef] [Google Scholar]

[71] Noyes, R., Hartmann, L., Baliunas, S., Duncan, D., & Vaughan, A. 1984, ApJ, 279, 763 [NASA ADS] [CrossRef] [Google Scholar]

[72] Page, E. S. 1954, Biometrika, 41, 100 [CrossRef] [Google Scholar]

[73] Pepe, F., Lovis, C., Ségransan, D., et al. 2011, A&A, 534, A58 [CrossRef] [EDP Sciences] [Google Scholar]

[74] Pepe, F., Molaro, P., Cristiani, S., et al. 2014, Astron. Nachr., 335, 8 [Google Scholar]

[75] Pollet, T. V., & van der Meij, L. 2017, Adapt. Hum. Behav. Physiol., 3, 43 [CrossRef] [Google Scholar]

[76] Queloz, D., Henry, G., Sivan, J., et al. 2001, A&A, 379, 279 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[77] R Core Team. 2019, R: A Language and Environment for Statistical Computing (Vienna, Austria: R Foundation for Statistical Computing) [Google Scholar]

[78] Rajpaul, V., Aigrain, S., & Roberts, S. 2015, MNRAS, 456, L6 [Google Scholar]

[79] Rasmussen, C. E., & Williams, C. K. I. 2005, Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) (The MIT Press) [Google Scholar]

[80] Reeves, J., Chen, J., Wang, X. L., Lund, R., & Lu, Q. Q. 2007, J. Appl. Meteorol. Climatol., 46, 900 [CrossRef] [Google Scholar]

[81] Reiners, A., Zechmeister, M., Caballero, J., et al. 2018, A&A, 612, A49 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[82] Robertson, P., Mahadevan, S., Endl, M., & Roy, A. 2014, Science, 345, 440 [Google Scholar]

[83] Rosenthal, L. J., Fulton, B. J., Hirsch, L. A., et al. 2021, ApJS, 255, 8 [NASA ADS] [CrossRef] [Google Scholar]

[84] Saar, S. H., & Donahue, R. A. 1997, Astrophys. J., 485, 319 [NASA ADS] [CrossRef] [Google Scholar]

[85] Sahki, N., Gégout-Petit, A., & Mézières-Wantz, S. 2018, in ENBIS 2018-18th Annual Conference of the European Network for Business and Industrial Statistics [Google Scholar]

[86] Scargle, J. D. 1982, ApJ, 263, 835 [Google Scholar]

[87] Schwarz, G., 1978, Ann. Stat., 6, 461 [Google Scholar]

[88] Schwarzenberg-Czerny, A. 1998, MNRAS, 301, 831 [NASA ADS] [CrossRef] [Google Scholar]

[89] Seber, G. A., & Lee, A. J. 2003, Linear Regression Analysis, 165 [CrossRef] [Google Scholar]

[90] Simola, U., Dumusque, X., & Cisewski-Kehe, J. 2019, A&A, 622, A131 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[91] Thieler, A. M. 2014, PhD thesis, Dissertation, Dortmund, Technische Universität, 2014 [Google Scholar]

[92] Thieler, A. M., Fried, R., & Rathjens, J. 2016, J. Stat. Softw., 69, 1 [CrossRef] [Google Scholar]

[93] Thompson, A., Watson, C., de Mooij, E., & Jess, D. 2017, MNRAS, 468, L16 [NASA ADS] [CrossRef] [Google Scholar]

[94] Thompson, A., Watson, C., Haywood, R., et al. 2020, MNRAS, 494, 4279 [NASA ADS] [CrossRef] [Google Scholar]

[95] Truong, C., Oudre, L., & Vayatis, N. 2018, ArXiv e-prints [arXiv: 1801.00826] [Google Scholar]

[96] Truong, C., Oudre, L., & Vayatis, N. 2020, Signal Process., 167, 107299 [CrossRef] [Google Scholar]

[97] Tuomi, M., Anglada-Escudé, G., Gerlach, E., et al. 2013, A&A, 549, A48 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[98] van den Burg, G.J., & Williams, C.K. 2020, ArXiv e-prints [arXiv:2003.06222] [Google Scholar]

[99] Verbesselt, J., Hyndman, R., Newnham, G., & Culvenor, D. 2010, Rem. Sens. Environ., 114, 106 [NASA ADS] [CrossRef] [Google Scholar]

[100] Wilson, O. 1968, ApJ, 153, 221 [CrossRef] [Google Scholar]

[101] Zechmeister, M., & Kürster, M. 2009, A&A, 496, 577 [CrossRef] [EDP Sciences] [Google Scholar]

[102] Zeileis, A., Leisch, F., Hornik, K., & Kleiber, C. 2002, J. Stat. Softw., 7, 1 [CrossRef] [Google Scholar]

[103] Zeileis, A., Kleiber, C., Krämer, W., & Hornik, K. 2003, Comput. Stat. Data Anal., 44, 109 [CrossRef] [Google Scholar]

[104] Zimmerman, D. W., & Zumbo, B. D. 1993, J. Exp. Educ., 62, 75 [CrossRef] [Google Scholar]

Accounting for stellar activity signals in radial-velocity data by using change point detection techniques★

1 Introduction

2 Change Point Detection Methods

2.1 The CPD Statistical Framework

2.2 The Breakpoint Method

3 Data Analysis

3.1 Preliminary Considerations

3.2 Analyses of a Real Star: α Cen B

3.3 Comparison between the Optimal and the Suboptimal Solutions of the Bp Method

3.4 Exoplanets Detection Limit

3.4.1 Simulation Study based on α Cen B Data

3.4.2 Behavior of the GLS Periodogram

3.5 Comparison of the Oc and the Bp Methods Applied to HD 215152, HD 10700, and HD 192310

4 Discussion

5 Conclusion

Acknowledgements

Appendix A HD 215152, HD 10700, and HD 192310

References

All Tables

All Figures

Accounting for stellar activity signals in radial-velocity data by using change point detection techniques^★