Open Access
Issue
A&A
Volume 675, July 2023
Article Number A187
Number of page(s) 18
Section Planets and planetary systems
DOI https://doi.org/10.1051/0004-6361/202346472
Published online 21 July 2023

© The Authors 2023

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

1 Introduction

Over the past few years, the development of precise radial velocity (RV) instruments in the near-infrared (near-IR) domain, such as HPF (Mahadevan et al. 2012), IRD (Kotani et al. 2014), CARMENES-NIR (Becerril et al. 2017), NIRPS (Bouchy & Doyon 2018) and SPIRou (Donati et al. 2020), has facilitated the detection and characterization of new exoplanetary systems (Cadieux et al. 2022; Martioli et al. 2022). These instruments have enabled the expansion of research on M-type stars to the YJHK spectral bandpasses (Quirrenbach et al. 2018; Reiners & Zechmeister 2020). Those stars are of particular interest because they constitute the majority of main sequence stars in the solar neighborhood (Henry et al. 2006). Moreover, thanks to their low masses, M dwarfs are particularly well suited for the detection of planet RV signatures of a few ms−1 (Anglada-Escudé et al. 2016; Bonfils et al. 2013; Faria et al. 2022; Mascareño et al. 2023) and they are known to harbor more rocky planets than stars of other types (Bonfils et al. 2013; Dressing & Charbonneau 2015; Gaidos et al. 2016; Hsu et al. 2020; Sabotta et al. 2021).

However, M dwarfs pose challenges for exoplanet detection due to their high levels of magnetic activity, which produce spurious RV signals (Bonfils et al. 2007; da Silva et al. 2012; Newton et al. 2016; Hébrard et al. 2016). Additonally, similar to other instruments operating in the optical range, some systemat-ics may originate from the instrument itself, such as an average intra-night drift and a nightly zero-point variation that can be corrected (Trifonov et al. 2020; Tal-Or et al. 2019), or instrumental defects (Dumusque et al. 2015). Furthermore, the near-IR range required to analyze the red spectral energy distribution of M dwarfs is prone to strong telluric absorption lines which can create an additional source of uncertainty in the measurement of RV (Artigau et al. 2014). The near-IR poses other challenges, including persistence in infrared arrays. Most notably, this phenomenon is characterized by the fact that not only is the preceding frame of significance, but the entire illumination history spanning the last few hours, is as well. Specifically, bright targets observed at the outset of a night can impact the observation of fainter targets later on in the same night (Artigau et al. 2018).

Despite efforts to correct these systematics, it is possible that spurious signals may still remain after applying existing techniques, potentially due to incorrect assumptions, technical limitations, or unidentified sources. To address this issue, we present a new data-driven method for correcting systematics in RV data. We focus our work on data collected with the SPIRou instrument, a high-precision RV spectrograph in the near-IR domain with two science topics that are at the forefront: the quest for Earth-like planets in the habitable zones of very-low-mass stars, and the study of low-mass stars and planet formation in the presence of magnetic fields (Donati et al. 2020).

To achieve these science goals, SPIRou has a large program (LP) called the SPIRou Legacy Survey (SLS)1 which was allocated 310 nights over seven semesters (2019a to 2022a) on the Canada-France-Hawaii Telescope. The SLS-Planet Search program, which is a part of this LP, aims to detect and characterize planetary systems around ~50 nearby M dwarfs (Moutou et al., in prep.).

The SPIRou data are prone to the previously mentioned sys-tematics and it is necessary to develop methods to correct them. Our method, Weighted-principAl comPonent analysIs reconsTructIon, referred to as Wapiti2, is agnostic to the origin of the systematics and reconstructs the per-line RV time series. This method takes advantage of the new line-by-line (LBL) algorithm (Artigau et al. 2022) used to compute RV data with SPIRou, which provides a RV time series for each line of the spectrum. The application of principal component analysis (PCA) to mitigate systematics in RV data is a well-established technique in the exoplanet community. However, existing methods are primarily aimed at specifically reducing the impact of stellar activity, such as the methods presented in Collier Cameron et al. (2021) and Cretignier et al. (2022). PCA based methods were also explored to correct for telluric or instrumental effects in prior studies (Artigau et al. 2014; Cretignier et al. 2021). Our method, on the other hand, is unique in that it does not focus solely on reducing the impact of stellar activity or telluric effects but instead is a versatile and flexible approach that can be applied to any target, as long as LBL data are available. We demonstrate the effectiveness of Wapiti by applying it to the star GJ 251, a target with a known planet in the SLS-Planet Search program.

This paper is structured as follows. Section 2 provides an overview of the observations, data reduction, and LBL algorithm used for the study. Section 3 presents the Wapiti method in detail, including how it determines the optimal number of principal vectors. Section 4 demonstrates the efficiency of the Wapiti method by eliminating systematic peaks and confirming the detection of a planet with a 14.2-day period. Finally, Sect. 5 summarizes the main results and discusses potential future work.

2 Observations

2.1 GJ251

The star GJ 251 (HD 265866) is a red dwarf with a spectral type of M3V (Kirkpatrick et al. 1991) located 5.5846 ± 0.0009 pc away (Gaia Collaboration 2020) from Earth. From the analysis of SPIRou spectra, Cristofari et al. (2022) derived the stellar parameters listed in Table 1 that we used in our study.

In Butler et al. (2017), the detection of two planets with orbital periods of 1.74 and 601.32 days was claimed to be made using the HIRES spectrograph at the Keck observatory. However, in Stock et al. (2020), no such evidence was found in the CARMENES visible data. Instead, a planet with an orbital period of 14.238 ± 0.002 days was identified in the data, with a semi-amplitude K = 2.11 ± 0.20 ms−1, which falls within the category of temperate super-Earths, with a calculated mass of Mp sin i = 4.0 ± 0.4 M and an equilibrium temperature of Teq = 351 ± 1.4 K (Stock et al. 2020). The small amplitude of this planet makes GJ 251 an ideal candidate to demonstrate that SPIRou is also a suitable instrument for detecting Earth-like planets in the habitable zone of M dwarfs.

The observations were collected as series of circular polarization sequences of 4 individual sub exposures per night. The data collection for GJ,251 spans from December 12, 2018 to April 14, 2022, comprising a total of 769 subexpositions, with each visit comprising four subexposures. Out of the available data, three visits are incomplete, leaving us with a total of 191 complete visits. The airmass during these observations varied from a minimum of 1.028 to a maximum of 2.003, with a median value of 1.083. The signal-to-noise ratio per 2.28-km s−1 pixel bin in the middle of the H band, as measured with 61-s exposures, has a median value of 137.0.

2.2 The SPIRou spectropolarimeter

SPIRou is a high-resolution near-infrared spectropolarime-ter installed in 2018 on the Canada-France-Hawaii Telescope (CFHT). It has a spectral range from 0.98 to 2.35 µm at a spectral resolving power of about 70 000 (Donati et al. 2020). The instrument operates in vacuum at 73 K, regulated to sub-mK level, with a temperature-controlled environment (Challita et al. 2018). It has precise light injection devices (Parès et al. 2012) and splits light into two orthogonal polarization states using ZnSe quarter-wave rhombs and a Wollaston prism located in its Cassegrain unit. It uses two science fibers (A and B) and one calibration fiber (C; Micheau et al. 2018). The detector is a 15-micron science-grade HAWAII 4RGTM (H4RG) from Teledyne Systems used in its up-the-ramp readout mode (Artigau et al. 2018).

Table 1

Retrieved stellar parameters of GJ 251 (Cristofari et al. 2022).

2.3 SPIRou data reduction with APERO

The spectra of GJ 251 were reduced with the 0.7.275 version of the SPIRou data reduction system (DRS) called APERO (A PipelinE to Reduce Observations) which is detailed in Cook et al. (2022). The APERO pipeline processes science frames to produce 2D and 1D spectra from the two science channels and the calibration channel. The wavelength calibration uses a combination of exposures from a UNe hollow cathode lamp and a Fabry-Pérot etalon (Hobson et al. 2021). Barycentric Earth radial velocity (BERV) is calculated to shift the wavelengths to the barycentric frame of the solar system using barycorrpy (Kanodia & Wright 2018). Finally, the pipeline incorporates a correction for telluric absorptions and night-sky emission, which will be detailed in an upcoming publication (Artigau, in prep.).

2.4 LBL algorithm

Radial velocities of GJ 251 were obtained from the telluric-corrected spectra using the line-by-line (LBL) method described in the paper by Artigau et al. (2022). This method, which is based on the framework developed in Bouchy et al. (2001), has been proposed by Dumusque (2018) and further explored by Cretignier et al. (2020) in the context of optical RV observations with HARPS. The LBL calculates Doppler shifts for individual spectral lines. To achieve this, a template with low noise is required, as the RVs are inferred from the comparison between the residuals of the observed spectrum and the template, as well as the derivative of the template. In practice, a high signal-to-noise ratio (S/N) combined spectrum is used as a template for a given target, to ensure that any remaining noise is small compared to an individual spectrum. The LBL method has been compared to other methods such as the cross-correlation function and template matching in Martioli et al. (2022) and Artigau et al. (2022).

For each spectrum, the LBL algorithm combines the velocities of individual lines (27 702 lines in the case of GJ251) into a single RV measurement. This is achieved using a mixture model, which assumes that the velocities of individual lines are distributed according to a Gaussian distribution centered on the mean velocity, or another distribution of high-sigma outliers with multiple origins (e.g., bad pixels, cosmic rays, telluric residuals). The RV measurements are grouped by night by calculating the inverse-variance weighted mean of all observations for each night, resulting in a reduction from 769 to 177 data points. To address the variation in velocities of the day-to-day drift, a correction is applied using simultaneous Fabry–Pérot measurements. Furthermore, to take into account any long-term variation, a Gaussian Process (GP) regression was computed using the most observed stars in the SLS in order to estimate the zero point in the time domain. This procedure is similar to the one conducted for SPIRou data in Cadieux et al. (2022) or for SOPHIE (Courcol et al. 2015), HARPS (Trifonov et al. 2020) and HIRES (Tal-Or et al. 2019) spectrographs. In our case, we employed the celerite package (Foreman-Mackey et al. 2017) to fit a GP model using a Matérn 3/2 kernel. The estimation of the hyper-parameters was achieved by minimizing the likelihood function. Studies on common trends in SPIRou data are being conducted in parallel, and will address the differences between GPs and more commonly used tools such as a running median or nightly offsets in forthcoming work by the SPIRou team.

The final outcome of the data reduction is presented in Fig. 1, wherein the red data points represent the results obtained after applying zero point correction and the black data points represent the results without such correction. To identify periodicities in the time series, a generalized Lomb–Scargle (GLS) peri-odogram was used (Zechmeister & Kürster 2009), with a grid of 100000 points ranging from 1.1 days to 1000 days, using the astropy package (Astropy Collaboration 2018). The false alarm probability (FAP) of the maximum peak was calculated using the analytical formula in Baluev (2008).

As is illustrated, even with the careful processing steps in APERO, the LBL algorithm produced time series with spurious peaks at a period of 6 months and 1 yr. However, the planet as reported in Stock et al. (2020) was not detected. Notably, the zero-point correction did not alleviate these systematic effects but rather amplified them. As this 180-day peak is not present in the window function, it can be inferred that it did not arise from the observation strategy but from an unclear systematic effect. These signals may be attributed to a defective telluric correction; however, the underlying cause is poorly understood and a practical, physics-based correction is currently unavailable. Therefore, GJ 251 serves as an appropriate example of how our method can eliminate spurious signals from RV data.

thumbnail Fig. 1

Comparison of LBL RVs data for GJ 251 with and without zero point correction. The top panel shows the data with and without zero point correction, indicated by black and red points respectively. The zero point offset is shown in blue. The bottom panel shows the corresponding periodograms for the time series in the same color scheme, and the blue line represents the window function. The significance levels of FAP = 10−3 and FAP = 10−5 are indicated by the solid and dashed horizontal lines, respectively. The 1-yr period, the 180 days period, and the synodic (29.5 days) period of the Moon are represented by vertical black dashed lines, and the period of the claimed planet in Stock et al. (2020) is in red.

3 The Wapiti method

Our approach applies a PCA to the per-line RV time series produced by the LBL algorithm. Since these datasets may be noisy and contain missing data, we use the weighted principal component analysis (wPCA) method described in Delchambre (2015), implemented in the wpca module3 and each observation is weighted by the inverse of its uncertainty squared. The objective of the Wapiti method is to eliminate the sources of systematic error. This is achieved by performing a wPCA reconstruction of the original per-line RV time series and subtracting its reconstructions from it. The rationale behind this approach is that, in a time series dominated by spurious signals of unknown origin, selecting appropriate components can serve as a proxy for correcting the effects responsible for the resulting systematic errors.

To determine the appropriate number of principal vectors, we suggest the following approach. First, we discarded all individual line RV time series that have more than 50% of missing data, reducing the number of lines from 27 201 to 17 923. Such missing data may occur for a line when the RV value deviates by more than five sigma from the standard deviation in relation to the line’s RMS (Artigau et al. 2022). Subsequently, we binned each individual per-line RV time series by night, by computing the inverse-variance weighted mean of all observations for each night. Afterward, we night-binned in a similar fashion the Fabry-Pérot drift also computed with the LBL and removed it from each individual per-lines RV time series. Finally, we standardized the time series using their inverse-variance weighted mean and variance.

3.1 Reducing the number of components through a permutation test

The selection of the appropriate number of components for a wPCA is a critical aspect. Too few components may fail to effectively remove undesired contamination, while an excessive number of components may lead to over-fitting and an increase in the levels of random noise. To address this issue, a method is needed to determine which wPCA components are relevant and necessary for the analysis. In our work, we first used a permutation test (Vieira 2012) which involves independently shuffling the columns of the dataset, and then conducting a wPCA on the resulting dataset. We repeated this process 100 times and compared the explained variance of the original dataset with the permuted versions. If a principal component is relevant, we expect that the permuted explained variances of that component will not be greater than the original explained variance. We can identify the relevant components by examining the p-values and selecting those with low p-values (≤10−5). The result was that only the first 26 components cannot be explained by noise as it can be seen in Fig. 2.

thumbnail Fig. 2

p-values of each component as determined by the permutation test. Only the components with a p-value below a certain threshold (≤10−5) are considered to have a significant contribution, rather than being just random noise.

3.2 Leave-p-out cross-validation

To evaluate the significance of the number of components retained after the permutation test, we conducted a leave-p-out cross-validation. This approach has been used by Cretignier et al. (2022) to select relevant components, albeit with variations in the methodology. In this study, we generated 100 subsets of the per-lines RV dataset, each containing 80% of the lines randomly selected. The wPCA was then recalculated for each of these N datasets to assess the robustness of the 26 remaining components.

The underlying idea involves evaluating the robustness of the eigenvector basis, also known as principal vectors, by examining their stability under the removal of a few lines. Should a principal vector exhibit excessive sensitivity to the lines in use, it is deemed superfluous and may be discarded. We employed the Pearson coefficient of correlation as the evaluation metric, for the reasons stated in Cretignier et al. (2022) being that it is computationally efficient, bounded, and insensitive to the amplitudes of vectors. Specifically, we computed the Pearson coefficient between the nth principal vector obtained from the wPCA using the full dataset and the nth principal vector obtained from each of the N subsets. The absolute value of the coefficient was taken, as the wPCA vectors may change sign. The median Pearson coefficient value was calculated for each component number, and only those with a value above an arbitrary threshold of 95% were retained, as in Cretignier et al. (2022).

As depicted in Fig. 3, it was observed that the value first goes below the threshold after the 13th component. This indicates that the principal vectors after the 3th one are not correlated with their corresponding vectors in the original dataset, implying that these eigenvectors are reliant on the lines used. As a result, they were disregarded, leaving us with 13 components for the remainder of the Wapiti method. In future research, efforts will be conducted to study how the number of components obtained through the application of the two methods, namely the permutation test and the leave-p-out cross-validation, varies with the target being studied.

3.3 Removing outliers

Per-line RVs have the potential to be affected by short-term fluctuations that are not correlated with the most significant principal vectors. These outliers can adversely affect the ability to identify systematics and reduce the accuracy of the reconstruction. Collier Cameron et al. (2021) applied a singular value decomposition to the shape of the CCF and used a one-dimensional rejection mask in the time domain to remove outliers. For each principal component, they classified observations as “bad” if their absolute deviations are farther from the median value than a specified number kmad of the median absolute deviations (MAD), and “good” otherwise. If even one epoch in a principal component is an extreme outlier, it is likely that the entire observation is contaminated. In this study the authors determined that a kmad value of 6 yielded satisfactory results at the cost of rejecting 33 data points out of 886.

In our work, we considered the per-line RV time series and applied a similar method to the principal vectors. However, selecting an appropriate value for kmad is not a straightforward task. Although one could potentially choose a value such as kmad = 15 to correspond to a 10σ rejection, this would seem arbitrary. Furthermore, such an approach would not align with the aim of our method, which is to be able to operate independently on stars without requiring an intervention to select the appropriate value of kmad for each target. For this reason, we have chosen to employ a different strategy. We computed the ratio of the absolute deviation (AD) of the nnth principal vector Vn (t) to its MAD, and assigned the maximum value of this ratio to each epoch across all components (we recall that for a time-series X and naming its median X¯${\bar X}$ we have AD (X)=| XX¯ |$\left( X \right) = \left| {X - \bar X} \right|$ and MAD = median (AD(X))). We use the letter D to denote this value, which measures the degree of anomaly in an observation:

D(t)=maxnAD(Vn(t))MAD(Vn(t)).$D\left( t \right) = \mathop {{\rm{max}}}\limits_n {{{\rm{AD}}\left( {{V_n}\left( t \right)} \right)} \over {{\rm{MAD}}\left( {{V_n}\left( t \right)} \right)}}.$(1)

Consequently, we reorganized the epochs in a descending order based on the value of D, resulting in a list that ranks the degree of anomaly of each observation. We chose to discard the first N elements from this list. To identify the optimal value of N, we gradually removed them one by one and determined the number that maximizes the significance of the signal in the corrected RV once the Wapiti method had been applied. We found that removing the first seven outliers produced the best outcomes, reducing the time series from 178 to 171 data points. It is acknowledged that readers may perceive this method to be specific to individual stars. However, we contend that this is a justifiable approach for our selected target GJ 251, as the signal remains consistent regardless of the number of rejected outliers. Moreover, the main benefit of this approach is that it provides a framework to assess the level of anomaly for any target at a given epoch. For further information about this process, refer to Appendix A.

The top-left panel of Fig. 4 displays the RV time series with flagged outliers highlighted in red. The periodogram of the RV time series after the removal of outliers is displayed on the right. Furthermore, the first two principal vectors are presented on the left, with the previous principal vectors using all available data points depicted in red, and the new principal vectors obtained from applying the wPCA on the remaining data points shown in black. The periodograms of the new principal vectors are displayed on the right. From the periodograms, it can be observed that the presence of a 1-yr period and a 180-day period is evident in the two vectors, indicating that the wPCA method identified signals within the per-line RV dataset that may be contributing to the resulting systematic errors. We furthermore observed that the principal vectors did not exhibit any signs of the known 14.2 d planet, indicating that our approach did not inadvertently capture the Keplerian signal in our data. It can also be noted that a prominent signal with a period of 120 days appears in the first vector, while not present in the RV time series. This period could correspond to a harmonic of the 1 yr signal, however we also note that it is close to the reported star’ s rotation period of Prot = 124 ± 5 days in Stock et al. (2020). We attributed this signal to stellar activity for reasons that will be more detailed in subsequent sections.

thumbnail Fig. 3

Mean Pearson coefficients as a function of the components. Those with a value below 95% are discarded, leaving the number of components to 13.

4 Results

4.1 The Wapiti corrected time series and its analysis

The method outlined in the previous section has allowed us to determine the most appropriate number of components for our dataset on GJ 251. We subsequently subtracted the reconstruction of the original time series to itself to generate a new time series, which we refer to as the Wapiti corrected time series. We evaluated the effectiveness of the correction by analyzing the resulting periodogram as illustrated in Fig. 5. In this figure, the time series resulting from the correction is depicted in blue and the original data are represented in black. The same color scheme is used when plotting the periodograms below the time series.

As desired, the Wapiti correction effectively eliminated the 180 d and 365 days peaks that we attributed to systematics present in our data since those periods are no longer statistically significant and the planet previously reported by Stock et al. (2020) is now discernible in our data, making it the first detection of GJ 251b with near-IR velocities. More precisely, the most prominent peak went from a 184 d period peak with a FAP level of 9.15 × 10−5 to a 14.2days peak with aFAP level of 4.01 × 10−9 after correction. In Appendix B, we explore the impact of the number of components used for the wPCA on this detection.

To analyze the corrected RV time series obtained from the Wapiti algorithm, a model µ consisting of a Keplerian orbit with a period of 14.2 days and an offset was employed, using the radvel package (Fulton et al. 2018) on the corrected data. However, caution must be exercised when fitting this model, as the reconstruction process may degrade it. The reason being that although there were no signs of the Keplerian signal found in the principal vectors, it is important to note that they may not be orthogonal to it. Therefore, it is possible that when subtracting the wPCA projection from the time series, the projection of the Keplerian signal could still have a nonzero value, posing a potential issue when trying to correctly characterize the signal. Therefore, it could be necessary to take into account the filtering effect of the wPCA reconstruction when fitting the model. This can be achieved with minimal computational cost by using the wPCA to reconstruct the desired model and then fitting the difference between the original model µ and its wPCA reconstruction μ^${\hat \mu }$ to the corrected RV time series obtained from Wapiti. Concretely, denoting R the wPCA reconstruction we have μ^=R(μ)$\hat \mu = R\left( \mu \right)$ and we suggest to fit μμ^$\mu - \hat \mu $ to the corrected data instead of simply µ. This process bears resemblance to that described in Gibson et al. (2020), wherein they used this method to take into account the filtering impact of the SysRem algorithm (Tamuz et al. 2005) on their model. The periodogram of the residuals in Fig. 6 does not show any signs of one of the planets reported in Butler et al. (2017).

To obtain estimates of the uncertainties on the free parameters, we employed a Bayesian Markov chain Monte Carlo (MCMC) framework to sample the posterior distribution, with 100 000 samples, 10 000 burn-in samples, and 100 walkers using the emcee module (Foreman-Mackey et al. 2013). The orbital and derived planetary parameters are shown in Table 2 and compared to those obtained in Stock et al. (2020). In order to calculate the mass of the planet, Mp sin i, we used the mass of the star and its uncertainty obtained from Cristofari et al. (2022) and listed in Table 1. One may note that the eccentricity is poorly constrained from our RV data, prompting the question of whether a circular fit would yield a better result. To address this, we conducted a comparison of the maximum likelihood estimation (MLE) for both Keplerian and circular models, allowing us to compute their respective Bayesian Information Criterion (BIC; see Schwarz 1978). By taking the logarithm of the Bayes factor (log BF=BICcirc.BICkep.2$\left( {\log \,{\rm{BF}} = {{{\rm{BI}}{{\rm{C}}_{{\rm{circ}}{\rm{.}}}} - {\rm{BI}}{{\rm{C}}_{{\rm{kep}}{\rm{.}}}}} \over 2}} \right)$), we were able to compare the two models. Our analysis strongly favored the Keplerian model (log BF = 15), and as such, we decided not to perform a MCMC estimation of the orbital parameters using a circular model. The phase-folding of the best-fitted model to the data is displayed in Fig. 6 which also present the periodogram of the residuals with no significant signals within it.

thumbnail Fig. 4

Radial velocity time series and principal vectors. Left panels show the RV time series and and the first two principal vectors. Their respective periodograms are on the right. The significance levels of FAP = 10−3 and FAP = 10−5 are indicated by the solid and dashed horizontal lines, respectively.

4.2 Injection recovery test

In order to assess the effectiveness of the Wapiti method in retrieving correct planetary parameters and to verify our claim that it is necessary to take into account the wPCA reconstruction of our model, we performed an injection recovery test. We first removed the Wapiti corrected RV time series from each per-line RV time series. We then injected a signal consisting of a Keplerian with the parameters corresponding to the best-fit parameters of GJ 251b from our analysis, except for the semi-amplitude Kinjected that we made vary from 2 to 10 m s−1.

We chose a range of semi-amplitude values that explore the low and intermediate regime where the planetary signal is either of lower or of comparable magnitude with the systematics present in our data. In our case, the systematics consist of a 180 days signal of ~6 ms−1 amplitude. This range of semiamplitude allowed us to assess the ability of the Wapiti method to recover the injected planetary signal.

For each injected signal, we employed a simple model comprising of only the injected Keplerian with varying semiamplitude and an offset. Our aim was to quantitatively assess the effect of the Wapiti method on the estimated semi-amplitude Kcomputed. To accomplish this, we used a MCMC approach, which consisted of 100000 steps, 10000 burn-in samples, and 100 walkers. We set a uniform prior distribution between -50 and 50ms−1 over the semi-amplitude.

In order to evaluate the importance of subtracting the wPCA reconstruction μ^${\hat \mu }$ from the model µ, we conducted a comparative analysis by fitting both µ and μμ^$\mu - \hat \mu $ to the data and examining their impact on the estimated semi-amplitude. Our findings displayed in Fig. 7 demonstrate that the injection recovery test is generally successful and the Wapiti method accurately recovers the correct semi-amplitude. However, as the semi-amplitude increases, the exclusion of the reconstruction µ becomes increasingly significant. Fitting only µ results in an underestimation of the correct semi-amplitude.

Table 2

Orbital and derived planetary parameters from the analysis of GJ 251.

thumbnail Fig. 5

Effectiveness of the Wapiti correction method on the RV time series. The top panel of the figure shows the original data in black and the time series from the Wapiti correction in blue. The bottom panel shows the corresponding periodograms using the same color code, it can be seen that the correction has successfully eliminated the 180 and 365 days systematic peaks. The reported 14.2 days period planet is now detected at high significance in the blue periodogram with a FAP level of 4.01 × 10−9.

4.3 Stellar activity in GJ251

It is noteworthy that contrary to the method employed by Stock et al. (2020), we did not use a quasi-periodic GP to account for stellar activity in our analysis. In their paper, they reported a rotation period of 124 ± 5 days while Fouqué et al. (2023) detected a periodicity of 98.74.8+11.5$98.7_{ - 4.8}^{ + 11.5}$ days by analyzing the time series of the longitudinal magnetic field (Bl), yet we did not observe such signals in our RV periodogram. Nonetheless, for completeness, we fit two models consisting of one planet and one planet plus a quasi-periodic GP of the form

k(τ)=Aexp(τ22l2Γ sin2(πPτ)).$k\left( \tau \right) = A\,\exp \,\left( { - {{{\tau ^2}} \over {2{l^2}}} - {\rm{\Gamma }}\,{\rm{si}}{{\rm{n}}^2}\,\left( {{\pi \over P}\tau } \right)} \right).$(2)

The GP was computed using george (Ambikasaran et al. 2015). WecalculatedtheBayes factorto comparethetwo models and it was found that adding the quasi-periodic GP did not improve the fit (log BF = −5.5), reinforcing our conclusion that stellar activity does not impact our RV data. Ultimately, we determined that it was not necessary to account for the influence of stellar activity when characterizing the planet orbiting GJ 251.

However, when performing a wPCA on the differential line width (dLW; introduced in Zechmeister et al. 2018), which is another data product obtained from the LBL analysis that measures variations in the average width of line profiles relative to a template (Artigau et al. 2022), Cadieux et al. (in prep.) found that the first principal vectors can be sensitive to the presence of stellar activity. In our case, we observed that even though no periodicity can be detected in dLW, the first principal vector (W1) presents a signal that is similar to the one reported in Stock et al. (2020) as displayed in Fig. 8. We fit a quasi-periodic GP in order to estimate the value of the periodicity in W1 fitting a quasi-periodic kernel of the same form as in Eq. (2) as well as a jitter term and a constant. We computed an MCMC using 10 000 steps, 1000 burn-in samples, and 100 walkers which revealed a periodicity of 13716+37$137_{ - 16}^{ + 37}$ days comparable within uncertainties to the one obtained in Stock et al. (2020). This led us to conclude that W1 could serve as a good proxy for stellar activity and the specifics of this analysis can be found in Appendix D. In addition, in Donati et al. (2023) it was found when studying AUMic that the first time derivative of W1 correlated well with the RV which is not the case in our study (Pearson coefficient of −0.11). That said, a noteworthy result is the correlation between the first principal component of the wPCA applied to the RVs and W1 with a Pearson coefficient of 0.96, suggesting that the secondary peak of the first principal vector at 120 days may be related to stellar activity and not be a 1 yr harmonic. This highlights that despite the lack of any clear signs of activity in both RV and dLW time series, the presence of such activity can still be detected in their principal vectors. The results from both our data and Stock et al. (2020) demonstrate that the impact of stellar activity on RV data can manifest with a ~2σ difference to the periodicity reported in Fouqué et al. (2023) when studying Bl. This indicates that the effect of stellar activity on RV data may not always occur at the same period, for instance due to differential rotation, and must be evaluated with available indicators on a case-by-case basis.

thumbnail Fig. 6

Results of the RV analysis using the Wapiti corrected time series. The top panel illustrates the phase-folding of the planet with the best-fit 14.2-day period, represented by a solid black curve, and the black circles depict the binned RV measurements. The middle panel displays the residuals after the model has been fit, and the bottom panel presents the resulting periodogram, which indicates the absence of any remaining significant signals within the data.

thumbnail Fig. 7

Comparison of retrieved and injected semi-amplitudes. Top panel displays the retrieved semi-amplitude Kcomputed versus the injected semi-amplitude Kinjected. The bottom panel illustrates the residuals, demonstrating the successful recovery of the injected signal using the Wapiti method, with the inclusion of the wPCA reconstruction in the fit model.

5 Discussion, summary, and prospects

The Wapiti method successfully eliminated the spurious signals in the GJ 251 data and detected the planetary signal. Although our study demonstrated the effectiveness of the method in accurately estimating planetary parameters, it is important to carefully consider the characteristics of the data. For instance, in situations where a high semi-amplitude planetary signal or trend is present, it would be recommended to first fit these signals and then apply the method to the residuals.

Furthermore, due to its data-driven nature, this method does not require a model for the origins of the corrected systematic errors. Nevertheless, the Wapiti method does not aim to replace more physically motivated approaches, such as the zero point correction (which in the instance of GJ 251 is not sufficient), but rather to be a tool that works in conjunction with them. In fact, this method is quite helpful and convenient when investigating the sources of systematics, as outlined in the following discussion.

thumbnail Fig. 8

Analysis of dLW and W1 time series. The left panels display, at the top, the dLW time series and its corresponding periodogram at the bottom, which shows no indication of periodicity in the data. On the right, the first principal component of the per-line dLW time series obtained from the wPCA is displayed, which exhibits a pronounced 124-day period peak.

5.1 A BERV effect in SPIRou LBL RVs of GJ251

From the outset of our study, it was apparent that the spurious signals at 180 and 365 days in our dataset could be attributed to a residual telluric effect, as verified through visual examination of the data in the BERV space in Fig. 9. This analysis revealed the presence of a BERV effect, with significant fluctuations occurring in the original data at BERV values of approximately 10 and 25 km s−1. It is worth noting that the largest fluctuation observed in our data corresponds to a velocity of the star that is close to zero in the observer’s frame of reference due to the star’s velocity of 22.980.22+0.22km s1$22.98_{ - 0.22}^{ + 0.22}{\rm{km}}\,{s^{ - 1}}$.

The principal vectors successfully captured the BERV dependency, as evidenced by an evident structure in the BERV space. For the sake of readability we only displayed the first three vectors in Fig. 9. Therefore, it can be concluded that the Wapiti method was able to correct for systematics in the original data by removing the BERV dependency through sub-stracting the wPCA reconstruction. Although the origin of this BERV effect in the reduced data remains somewhat unclear, the next subsection describes how the Wapiti products can be used to investigate the source of systematics.

5.2 Investigating the sources of systematics

An approach to search for the underlying physical causes of systematic variations in our data would be to examine the scores obtained from the wPCA. These scores represent the transformed data in the new coordinate system established by the principal vectors. To quantify the influence of a specific principal component on a given line, we examined its score values Un. Our method involved computing the value U˜n=UnU¯nσ(Un),${{\tilde U}_n} = {{{U_n} - {{\bar U}_n}} \over {\sigma \left( {{U_n}} \right)}},$ where Ūn and σ (Un) denote the weighted mean and uncertainties of Un, respectively. For each line λ, we determined the maximum absolute Z-score value of Ũn (λ) over all the components. Recall that the Z-score of a value x in a sample with mean µ and standard deviation σ is given by Z(x)=xμσ$Z\left( x \right) = {{x - \mu } \over \sigma }$. Hence, the metric we employed to quantify the extent to which a line is affected by the systematics is determined for each line λ as follows:

z(λ)=maxn| Z(U˜n(λ)) |.$z\left( \lambda \right) = \mathop {\max }\limits_n \,\left| {Z\,\left( {{{\tilde U}_n}\,\left( \lambda \right)} \right)} \right|.$(3)

The higher the value of z, the more a line can be considered to be affected by at least one of the 13 components. In Fig. 10, we have constructed a plot that displays the lines with a z value above an arbitrary threshold of z0 = 10. This selection criterion resulted in a total of 63 lines being displayed on the plot. Additionally, we have overplotted in blue a synthetic telluric spectrum obtained using TAPAS (Bertaux et al. 2014) available on the APERO GitHub repository4 as well as the template of GJ 251 that was used to perform the LBL algorithm.

Figure 10 reveals that the lines most affected by the sys-tematics are predominantly situated within the H band (1500–1800 nm), as well as in the immediate vicinity of the telluric regions surrounding this band. However, we note that 7 of them are located outside this shared region. To try to establish a connection between affected lines and stellar lines, we computed atomic and molecular lines with VALD (Pakhomov et al. 2017) for a star with a temperature of 3500 K. The atomic mask consists of 2932 lines from 24 elements (H, Na, Mg, Al, Si, S, K, Ca, Sc, Ti, V, Cr, Mn, Fe, Co, Ni, Cu, Ga, Rb, Sr, Y, Zr, Ba, Lu). As for the molecular mask, it includes 4666 lines from 3 elements (OH, CO, CN). In Fig. 10, we employed a color-coding scheme to highlight any line from the two masks that met two criteria: first, it fell within the start and end points of an affected line, and second, its depth was above 0.7% to avoid being considered as noise due to the S/N of our observations. The depth of the line being determined as the contrast of the line’s background in comparison to the mean of the boundaries before and after it. Based on our analysis, we observed that 14 atomic lines (in blue), 8 OH lines (in red) and 9 CO lines (in green) could correspond to the affected lines, which accounts for 50% of the total affected lines. Furthermore, we observed that 27 of those lines that have a potential association with the star are located in the H band. For the remaining lines located in the telluric region, our findings suggest a possible complex interplay between telluric and stellar lines. Specifically, the overlapping of molecular or atomic lines with telluric lines may explain why both a 1-yr signal and the star’s rotational period can be distinguished in V1.

However, the origin of the 32 remaining lines with a z value above 10 is uncertain, some of them could be molecular lines as of yet unidentified but also of a completely different origin. While telluric or stellar activity could potentially be contributing factors, they may not fully explain the observed impact, suggesting the presence of other sources of spurious signals in the data. The complex interplay between different sources of systematics makes it challenging for the wPCA to distinguish them clearly, particularly since the wPCA is a linear method that cannot differentiate between nonlinear relationships between features in the data. Therefore, should the interplay between overlapping stellar lines and telluric lines be nonlinear, we may encounter such an issue. Ultimately, the purpose of this discussion is not to conclusively explain the origin of the systematics, but rather to demonstrate how the results obtained from the Wapiti method can lead to a better understanding of them. In order to identify the exact cause of the systematics, it would be necessary to conduct an investigation of the scores associated with the principal vectors across multiple targets, which is beyond the scope of this study.

We also wanted to examine if those systematics only impact a few lines and what would happen if we were to simply remove those high impacted lines from the calculation when computing the final RV time series. We tested this idea by removing lines for various threshold values of z. We computed the peri-odogram of the resulting time series and we see in Fig. 11 that even a very strict criterion of removing lines with z ≥ 3 only very weakly enhanced the 14.2 d period signal (FAP = 4.9 × 10−5) while not effectively removing the systematics from the RV data, at the cost of removing 1939 lines from the computation. These results suggest that simply removing the most impacted lines does not eliminate systematics from the RV time series. Instead, the wPCA reconstruction performed by the Wapiti method is necessary to successfully eliminate the systematics.

Further research could explore the potential of variational autoencoders (VAE; Kingma & Welling 2013) to reconstruct the per-line RV time series, which may be able to capture nonlinear relationships between the systematic errors. Previous research has explored the use of neural networks in the examination of RV time series by incorporating both RV and stellar activity indicators as input (Perger et al. 2023; Camacho et al. 2023). However, the present study concentrates on the performance of the Wapiti method and demonstrates that a wPCA, more straightforward, easier to follow, and more efficient computationally, already produces promising outcomes. Additional investigation would be necessary to assess the advantages of applying a VAE to per-line RV time series.

thumbnail Fig. 9

Variability of RV data and principal vectors with respect to BERV. The top panel displays the original data’s variability with respect to the BERV and how our method removed this dependency in the Wapiti corrected RV. The bottom panel demonstrates the behavior of the first three principal vectors with the BERV.

5.3 Conclusion and future work

In this paper, we have presented Wapiti, anew data-driven algorithm for correcting systematic errors in RV data. The algorithm used a dataset of per-line RV time series generated by the LBL algorithm and employed a weighted principal component analysis to subtract the wPCA reconstruction from the original data. This process created a new RV time series free of systematic errors. To determine the appropriate principal vectors, the algorithm incorporated a permutation test followed by leave-p-out cross-validation algorithm. Once the relevant number of components have been identified an outlier rejection step followed and the wPCA was applied to the remaining data points. The wPCA reconstruction was consequently removed from original RV time series, allowing for the derivation of a new Wapiti corrected RV time series.

The application of Wapiti to GJ 251 allowedto successfully free the RV data from spurious signals at 6 months and 1 yr while allowing the detection of a known temperate super-Earth with a period of 14.2 days. An examination of the sources of corrected systematics, as revealed by the z metric, suggested that the method was able to address various sources ofspurious signals in the data, including but not limited to stellar activity, telluric lines, and other unidentified sources. This highlights the potential for SPIRou to detect exoplanets, even in the presence of known or unknown sources of systematics. Furthermore, the algorithm has the potential to enhance the precision and accuracy of RV data from future exoplanet detection surveys, thereby advancing our understanding of the diversity and frequency of exoplanetary systems. Finally, this work provided further evidence to support the idea that the first principal vector W1 derived from the wPCA of the dLW time series can be employed as a reliable activity indicator to detect the stellar rotation period in our data.

While the Wapiti algorithm has shown encouraging results, there are still some remaining questions and challenges that need to be addressed. Future research should aim to evaluate the algorithm on a broader range of RV data and exoplanetary systems and better understand the sensitivity of planetary parameters to the method. A specific direction for future work will be to apply the Wapiti algorithm to targets of the SLS-Planet Search program that exhibit systematic signals of unknown origin at similar periods.

thumbnail Fig. 10

Scores of the most affected lines using all components. In the top panel, the lines that are significantly affected by the component are displayed in black. If these lines could be attributed to a OH line and a CO line, they are color-coded in red and green respectively, while if they correspond to an atomic line, they are colored in blue. The telluric absorption computed with TAPAS is also displayed, as well as the template of GJ 251 used by the LBL algorithm. The two lower panels provide zoomed-in views from 1400 to 1800 nm.

thumbnail Fig. 11

Impact of removing affected lines on the RVs. Top panel shows the z metric of all lines. Bottom panel displays the periodogram of the resulting time series in green, red and blue when respectively rejecting lines with a z value above 10, 5 and 3.

Acknowledgements

This project received funding from the European Research Council under the H2020 research & innovation program (grant #740651 New-Worlds). E.M. acknowledges funding from FAPEMIG under project number APQ-02493-22 and research productivity grant number 309829/2022-4 awarded by the CNPq, Brazil. J.H.C.M. is supported in the form of a work contract funded by Fundação para a Ciência e Tecnologia (FCT) with the reference DL 57/2016/CP1364/CT0007; and also supported from FCT through national funds and by FEDER-Fundo Europeu de Desenvolvimento Regional through COMPETE2020-Programa Operacional Competitividade e Internacionalização for these grants UIDB/04434/2020 & UIDP/04434/2020, PTDC/FIS-AST/32113/2017 & POCI-01-0145-FEDER-032113, PTDC/FIS-AST/28953/2017 & POCI-01-0145-FEDER-028953, PTDC/FIS-AST/29942/2017. This work has been carried out within the framework of the NCCR PlanetS supported by the Swiss National Science Foundation under grants 51NF40_182901 and 51NF40_205606. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement SCORE No 851555). We acknowledge funding from Agence Nationale pour la Recherche (ANR, project ANR-18-CE31-0019 SPlaSH). X.D. and A.C. acknowledge funding by the French National Research Agency in the framework of the Investissements d’Avenir program (ANR-15-IDEX-02), through the funding of the “Origin of Life” project of the Université Grenoble Alpes. B.K. ackowledges funding from the European Research Council (ERC) under the H2020 research & innovation programme (grant agreements #865624 GPRV). This research made use of the following software tools: NumPy; a fundamental package for scientific computing with Python. pandas; a library providing easy-to-use data structures and data analysis tools. Astropy; a community-developed core Python package for Astronomy. SciPy; a library for scientific computing and technical computing. RadVel; a python package for modeling radial velocity data. matplotlib; a plotting library for the Python programming language. tqdm; a library for creating progress bars in the command line. seaborn; a data visualization library based on matplotlib. wpca; a python package for weighted principal component analysis. emcee ; a python package for MCMC sampling. PyAstronomy; a collection of astronomical tools for Python. celerite; a fast Gaussian process library in Python.

Appendix A Supplementary information regarding the outliers rejection

The process of identifying outliers for rejection involved using the D metric which quantifies the degree of anomaly of an epoch. They were subsequently ranked in decreasing order of D and were removed one by one, and we stop when D ≤ 10. The Wapiti method was applied to the resulting time series and the one with the maximum statistical significance of the highest peak in the resulting periodogram was chosen, which in our case resulted in the rejection of 7 outliers. This significance was determined using the log false alarm probability (FAP), but we also verified that the same conclusion was reached using the log Bayes factor.

Such a criterion to remove outliers is justified for our data set from the observation that the Wapiti method yields a prominent peak at the same period independently of the number of rejected outliers as shown in Fig. A.1. From the definition of D it comes that those rejected epochs are outliers in at least one principal component, yet they do not visually appear in the RV time series and one may be interested in what could be their origin.

This is the reason why we tried to visualize their physical origin in Fig. A.2 using the most-straightforward parameters available from the APERO reduction, namely the BERV, the airmass (AIRMASS), the SPIRou Exposure Meter S/N Estimate (SPEMSNR), and the telluric water exponent (TLPEH2O) and other telluric molecules (TLPEOTR) calculated with TAPAS during the precleaning of the telluric correction. Regrettably, our investigation did not provide conclusive evidence to determine the source of the rejected outliers. These outliers exhibit a wide range of BERV values and are present at low airmasses (≤ 1 2). Furthermore, some of the rejected outliers have a SPEMSNR that is not necessarily lower than the average, even though we do notice that the first two outliers are the data points with the lowest SPEMSNR. Ultimately, it is likely that multiple factors contribute to the anomalous behavior of these observations, and a single parameter is insufficient to explain this anomaly. We advise that when applying this outlier removal method to other targets, it should be done with caution and the results should be carefully verified, as shown in Fig. A.1.

thumbnail Fig. A.1

Effect of removing outliers on the signal detection. The top panel displays the evolution of the log FAP of the signal as the number of outliers removed increases. The red dots overlaid on the plot represent the evolution of the log BF, which follows a similar pattern and reaches its maximum value at the same number of rejected outliers. The bottom panel of the figure shows the detected period in the resulting time series, which remains consistent within error-bars with the reported planetary period.

thumbnail Fig. A.2

Analysis of anomaly degree and rejected observations. All panels display the degree of anomaly D of the observations with respect to various parameters. The data points that appear in blue represent the rejected observations that were excluded to maximize the significance of the planetary signal in our data.

Appendix B Impact of the components on the signal

After determining that 13 components were required to accurately reconstruct our data using wPCA, one may wonder how the number of components affects the signals reported in our data. To address this question, we conducted a straightforward test by applying the method to time series with an increasing number of components ranging from 2 to 20. For each time series, we computed its periodogram and the FAP of its maximum peak period. Additionally, we performed a computation of the log BF to compare a Keplerian model with a 14.2-day orbital period and an offset with a model solely consisting of an offset. The outcomes are exhibited in Figure B.1. As our rejection of outliers is expected to favor the use of 13 components, we conducted this examination on both RV time series, both before and after outliers rejection.

Contrary to expectations, the use of 13 components did not yield the highest level of statistical significance for the 14.2-day signal, but instead required 12 components without outlier rejections and 10 components with such rejections. It is noteworthy that the planetary signal can be directly extracted using only 2 components in the time series data that underwent Wapiti correction and outlier rejection, compared to 3 components required without such rejection. Notably, in the case of no outliers rejection, the number of components had minimal impact on the results after reaching 5 components, with the signal remaining above a log BF of 5 and a FAP value below 10−5.

To sum up, the results indicate that the number of components employed does not significantly impact the outcomes beyond a certain threshold. However, given that the objective is to use the Wapiti method on unidentified planetary systems where the orbital period is not a priori known, an unbiased and robust approach is necessary to determine the appropriate number of components. The identification of 13 as the optimal number of components through our independent preprocessing methods provides robust evidence of the effectiveness of our methodology.

thumbnail Fig. B.1

Impact of the choice of number of components on the signal detection. The top panel displays the peak associated with the maximum period in the periodogram when the wPCA reconstruction is eliminated from the data using varying numbers of components. In the middle panel, the logarithm of the FAP linked to this signal is presented, while the bottom panel depicts the log BF evaluating the significance of a 14.2 d Keplerian signal compared to an offset. The black and red data points correspond, respectively, to the outcomes obtained from the pre-and post-outlier rejection RV time series.

Appendix C MCMC results from the RV analysis

Orbital parameters computed using an MCMC over the Wapiti corrected RV data from SPIRou with 100 000 samples, 10 000 burn-in samples, and 100 walkers.

thumbnail Fig. C.1

Corner plot displaying the results of MCMC analysis on the Wapiti corrected RV time series is presented. This analysis involves fitting a model that consists of an offset γ0, an RV jitter s (in log), and a Keplerian with parameters including the time of periastron Tb, the orbital period Pb, the semi-amplitude Kb, and the eccentricity and argument of periastron represented by ecosω(hb)$\sqrt e \cos \omega \left( {{h_b}} \right)$ and esinω(kb)$\sqrt e \sin \omega \left( {{k_b}} \right)$, respectively.

Appendix D MCMC results from the W1 analysis

We fit a quasi-periodic GP in order to estimate the value of the periodicity in W1 fitting a quasi-periodic kernel of the form

k(τ)=Aexp(τ22l2Γ sin2(πPτ))$k\left( \tau \right) = A\,\exp \,\left( { - {{{\tau ^2}} \over {2{l^2}}} - {\rm{\Gamma }}\,{\rm{si}}{{\rm{n}}^2}\,\left( {{\pi \over P}\tau } \right)} \right)$(D.1)

as well as a jitter term s and a constant γ. We computed an MCMC using 10,000 steps, 1,000 burn-in samples, and 100 walkers. The result is displayed in Table D.1 and the corner plot is shown in Figure D.1.

Table D.1

Prior and posterior distributions of the quasi-periodic GP model of the SPIRou W1 time series.

thumbnail Fig. D.1

Corner plot displaying the results of MCMC analysis on the W1 time series. This analysis involves fitting a quasi-periodic GP over the data and an offset.

References

  1. Ambikasaran, S., Foreman-Mackey, D., Greengard, L., Hogg, D. W., & O’Neil, M. 2015, IEEE Trans. Pattern Anal. Mach. Intell., 38, 252 [Google Scholar]
  2. Anglada-Escudé, G., Amado, P. J., Barnes, J., et al. 2016, Nature, 536, 437 [Google Scholar]
  3. Artigau, É., Astudillo-Defru, N., Delfosse, X., et al. 2014, SPIE Conf. Ser., 9149, 914905 [NASA ADS] [Google Scholar]
  4. Artigau, É., Saint-Antoine, J., Lévesque, P.-L., et al. 2018, SPIE Conf. Ser., 10709, 107091P [NASA ADS] [Google Scholar]
  5. Artigau, É., Cadieux, C., Cook, N. J., et al. 2022, AJ, 164, 84 [NASA ADS] [CrossRef] [Google Scholar]
  6. Astropy Collaboration 2018, https://zenodo.org/record/4080996 [Google Scholar]
  7. Baluev, R. V. 2008, MNRAS, 385, 1279 [Google Scholar]
  8. Becerril, S., Mirabet, E., Lizon, J. L., et al. 2017, in Materials Science and Engineering Conference Series, 278, 012191 [NASA ADS] [CrossRef] [Google Scholar]
  9. Bertaux, J.-L., Lallement, R., Ferron, S., & Boonne, C. 2014, in 13th International HITRAN Conference, 8 [Google Scholar]
  10. Bonfils, X., Mayor, M., Delfosse, X., et al. 2007, A&A, 474, 293 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  11. Bonfils, X., Delfosse, X., Udry, S., et al. 2013, A&A, 549, A109 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  12. Bouchy, F., & Doyon, R. 2018, in European Planetary Science Congress, EPSC2018-1147 [Google Scholar]
  13. Bouchy, F., Pepe, F., & Queloz, D. 2001, A&A, 374, 733 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  14. Butler, R. P., Vogt, S. S., Laughlin, G., et al. 2017, AJ, 153, 208 [Google Scholar]
  15. Cadieux, C., Doyon, R., Plotnykov, M., et al. 2022, AJ, 164, 96 [NASA ADS] [CrossRef] [Google Scholar]
  16. Camacho, J. D., Faria, J. P., & Viana, P. T. P. 2023, MNRAS, 519, 5439 [NASA ADS] [CrossRef] [Google Scholar]
  17. Challita, Z., Reshetov, V., Baratchart, S., et al. 2018, SPIE Conf. Ser., 10702, 1070262 [Google Scholar]
  18. Collier Cameron, A., Ford, E. B., Shahaf, S., et al. 2021, MNRAS, 505, 1699 [NASA ADS] [CrossRef] [Google Scholar]
  19. Cook, N. J., Artigau, E., Doyon, R., et al. 2022, Astrophysics Source Code Library [record ascl:2211.019] [Google Scholar]
  20. Courcol, B., Bouchy, F., Pepe, F., et al. 2015, A&A, 581, A38 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  21. Cretignier, M., Dumusque, X., Allart, R., Pepe, F., & Lovis, C. 2020, A&A, 633, A76 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  22. Cretignier, M., Dumusque, X., Hara, N. C., & Pepe, F. 2021, A&A, 653, A43 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  23. Cretignier, M., Dumusque, X., & Pepe, F. 2022, A&A, 659, A68 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  24. Cristofari, P. I., Donati, J. F., Masseron, T., et al. 2022, MNRAS, 516, 3802 [NASA ADS] [CrossRef] [Google Scholar]
  25. da Silva, J. G., Santos, N. C., Bonfils, X., et al. 2012, A&A, 541, A9 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  26. Delchambre, L. 2015, MNRAS, 446, 3545 [NASA ADS] [CrossRef] [Google Scholar]
  27. Donati, J. F., Kouach, D., Moutou, C., et al. 2020, MNRAS, 498, 5684 [Google Scholar]
  28. Donati, J. F., Cristofari, P. I., Finociety, B., et al. 2023, MNRAS, accepted, https://doi.org/10.1093/mnras/stad1193 [Google Scholar]
  29. Dressing, C. D., & Charbonneau, D. 2015, ApJ, 807, 45 [Google Scholar]
  30. Dumusque, X. 2018, A&A, 620, A47 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  31. Dumusque, X., Pepe, F., Lovis, C., & Latham, D. W. 2015, ApJ, 808, 171 [NASA ADS] [CrossRef] [Google Scholar]
  32. Faria, J. P., Mascareño, A. S., Figueira, P., et al. 2022, A&A, 658, A115 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  33. Foreman-Mackey, D., Conley, A., Meierjurgen Farr, W., et al. 2013, Astrophysics Source Code Library [record ascl:1303.002] [Google Scholar]
  34. Foreman-Mackey, D., Agol, E., Angus, R., & Ambikasaran, S. 2017, Astrophysics Source Code Library [record ascl:1709.008] [Google Scholar]
  35. Fouqué, P., Martioli, E., Donati, J. F., et al. 2023, A&A, 672, A52 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  36. Fulton, B. J., Petigura, E. A., Blunt, S., & Sinukoff, E. 2018, PASP, 130, 044504 [Google Scholar]
  37. Gaia Collaboration (Brown, A. G. A., et al.) 2020, VizieR Online Data Catalog: I/350 [Google Scholar]
  38. Gaidos, E., Mann, A. W., Kraus, A. L., & Ireland, M. 2016, MNRAS, 457, 2877 [Google Scholar]
  39. Gibson, N. P., Merritt, S., Nugroho, S. K., et al. 2020, MNRAS, 493, 2215 [Google Scholar]
  40. Hébrard, É. M., Donati, J. F., Delfosse, X., et al. 2016, MNRAS, 461, 1465 [CrossRef] [Google Scholar]
  41. Henry, T. J., Jao, W.-C., Subasavage, J. P., et al. 2006, AJ, 132, 2360 [Google Scholar]
  42. Hobson, M. J., Bouchy, F., Cook, N. J., et al. 2021, A&A, 648, A48 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  43. Hsu, D. C., Ford, E. B., & Terrien, R. 2020, MNRAS, 498, 2249 [Google Scholar]
  44. Kanodia, S., & Wright, J. 2018, RNAAS, 2, 4 [NASA ADS] [Google Scholar]
  45. Kingma, D. P., & Welling, M. 2013, arXiv e-prints [arXiv:1312.6114] [Google Scholar]
  46. Kirkpatrick, J. D., Henry, T. J., & McCarthy, D. W., Jr. 1991, ApJS, 77, 417 [Google Scholar]
  47. Kotani, T., Tamura, M., Suto, H., et al. 2014, SPIE Conf. Ser., 9147, 914714 [NASA ADS] [Google Scholar]
  48. Mahadevan, S., Ramsey, L., Bender, C., et al. 2012, SPIE Conf. Ser., 8446, 84461S [NASA ADS] [Google Scholar]
  49. Martioli, E., Hébrard, G., Fouqué, P., et al. 2022, A&A, 660, A86 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  50. Mascareño, A. S., González-Álvarez, E., Osorio, M. R. Z., et al. 2023, A&A, 670, A5 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  51. Micheau, Y., Kouach, D., Donati, J.-F., et al. 2018, SPIE Conf. Ser., 10702, 107025R [NASA ADS] [Google Scholar]
  52. Newton, E. R., Irwin, J., Charbonneau, D., Berta-Thompson, Z. K., & Dittmann, J. A. 2016, ApJ, 821, L19 [NASA ADS] [CrossRef] [Google Scholar]
  53. Pakhomov, Y., Piskunov, N., & Ryabchikova, T. 2017, ASP Conf. Ser., 510, 518 [NASA ADS] [Google Scholar]
  54. Parès, L., Donati, J. F., Dupieux, M., et al. 2012, SPIE Conf. Ser., 8446, 84462E [Google Scholar]
  55. Perger, M., Anglada-Escudé, G., Baroch, D., et al. 2023, A&A, 672, A118 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  56. Quirrenbach, A., Amado, P. J., Ribas, I., et al. 2018, Proc. SPIE, 10702, 32 [Google Scholar]
  57. Reiners, A., & Zechmeister, M. 2020, ApJS, 247, 11 [CrossRef] [Google Scholar]
  58. Sabotta, S., Schlecker, M., Chaturvedi, P., et al. 2021, A&A, 653, A114 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  59. Schwarz, G. 1978, Ann. Stat., 6, 461 [Google Scholar]
  60. Stock, S., Nagel, E., Kemmer, J., et al. 2020, A&A, 643, A112 [EDP Sciences] [Google Scholar]
  61. Tal-Or, L., Trifonov, T., Zucker, S., Mazeh, T., & Zechmeister, M. 2019, MNRAS, 484, L8 [Google Scholar]
  62. Tamuz, O., Mazeh, T., & Zucker, S. 2005, MNRAS, 356, 1466 [Google Scholar]
  63. Trifonov, T., Tal-Or, L., Zechmeister, M., et al. 2020, A&A, 636, A74 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  64. Vieira, V. 2012, Comput. Ecol. Softw., 2, 103 [Google Scholar]
  65. Zechmeister, M., & Kürster, M. 2009, A&A, 496, 577 [CrossRef] [EDP Sciences] [Google Scholar]
  66. Zechmeister, M., Reiners, A., Amado, P. J., et al. 2018, A&A, 609, A12 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

All Tables

Table 1

Retrieved stellar parameters of GJ 251 (Cristofari et al. 2022).

Table 2

Orbital and derived planetary parameters from the analysis of GJ 251.

Table D.1

Prior and posterior distributions of the quasi-periodic GP model of the SPIRou W1 time series.

All Figures

thumbnail Fig. 1

Comparison of LBL RVs data for GJ 251 with and without zero point correction. The top panel shows the data with and without zero point correction, indicated by black and red points respectively. The zero point offset is shown in blue. The bottom panel shows the corresponding periodograms for the time series in the same color scheme, and the blue line represents the window function. The significance levels of FAP = 10−3 and FAP = 10−5 are indicated by the solid and dashed horizontal lines, respectively. The 1-yr period, the 180 days period, and the synodic (29.5 days) period of the Moon are represented by vertical black dashed lines, and the period of the claimed planet in Stock et al. (2020) is in red.

In the text
thumbnail Fig. 2

p-values of each component as determined by the permutation test. Only the components with a p-value below a certain threshold (≤10−5) are considered to have a significant contribution, rather than being just random noise.

In the text
thumbnail Fig. 3

Mean Pearson coefficients as a function of the components. Those with a value below 95% are discarded, leaving the number of components to 13.

In the text
thumbnail Fig. 4

Radial velocity time series and principal vectors. Left panels show the RV time series and and the first two principal vectors. Their respective periodograms are on the right. The significance levels of FAP = 10−3 and FAP = 10−5 are indicated by the solid and dashed horizontal lines, respectively.

In the text
thumbnail Fig. 5

Effectiveness of the Wapiti correction method on the RV time series. The top panel of the figure shows the original data in black and the time series from the Wapiti correction in blue. The bottom panel shows the corresponding periodograms using the same color code, it can be seen that the correction has successfully eliminated the 180 and 365 days systematic peaks. The reported 14.2 days period planet is now detected at high significance in the blue periodogram with a FAP level of 4.01 × 10−9.

In the text
thumbnail Fig. 6

Results of the RV analysis using the Wapiti corrected time series. The top panel illustrates the phase-folding of the planet with the best-fit 14.2-day period, represented by a solid black curve, and the black circles depict the binned RV measurements. The middle panel displays the residuals after the model has been fit, and the bottom panel presents the resulting periodogram, which indicates the absence of any remaining significant signals within the data.

In the text
thumbnail Fig. 7

Comparison of retrieved and injected semi-amplitudes. Top panel displays the retrieved semi-amplitude Kcomputed versus the injected semi-amplitude Kinjected. The bottom panel illustrates the residuals, demonstrating the successful recovery of the injected signal using the Wapiti method, with the inclusion of the wPCA reconstruction in the fit model.

In the text
thumbnail Fig. 8

Analysis of dLW and W1 time series. The left panels display, at the top, the dLW time series and its corresponding periodogram at the bottom, which shows no indication of periodicity in the data. On the right, the first principal component of the per-line dLW time series obtained from the wPCA is displayed, which exhibits a pronounced 124-day period peak.

In the text
thumbnail Fig. 9

Variability of RV data and principal vectors with respect to BERV. The top panel displays the original data’s variability with respect to the BERV and how our method removed this dependency in the Wapiti corrected RV. The bottom panel demonstrates the behavior of the first three principal vectors with the BERV.

In the text
thumbnail Fig. 10

Scores of the most affected lines using all components. In the top panel, the lines that are significantly affected by the component are displayed in black. If these lines could be attributed to a OH line and a CO line, they are color-coded in red and green respectively, while if they correspond to an atomic line, they are colored in blue. The telluric absorption computed with TAPAS is also displayed, as well as the template of GJ 251 used by the LBL algorithm. The two lower panels provide zoomed-in views from 1400 to 1800 nm.

In the text
thumbnail Fig. 11

Impact of removing affected lines on the RVs. Top panel shows the z metric of all lines. Bottom panel displays the periodogram of the resulting time series in green, red and blue when respectively rejecting lines with a z value above 10, 5 and 3.

In the text
thumbnail Fig. A.1

Effect of removing outliers on the signal detection. The top panel displays the evolution of the log FAP of the signal as the number of outliers removed increases. The red dots overlaid on the plot represent the evolution of the log BF, which follows a similar pattern and reaches its maximum value at the same number of rejected outliers. The bottom panel of the figure shows the detected period in the resulting time series, which remains consistent within error-bars with the reported planetary period.

In the text
thumbnail Fig. A.2

Analysis of anomaly degree and rejected observations. All panels display the degree of anomaly D of the observations with respect to various parameters. The data points that appear in blue represent the rejected observations that were excluded to maximize the significance of the planetary signal in our data.

In the text
thumbnail Fig. B.1

Impact of the choice of number of components on the signal detection. The top panel displays the peak associated with the maximum period in the periodogram when the wPCA reconstruction is eliminated from the data using varying numbers of components. In the middle panel, the logarithm of the FAP linked to this signal is presented, while the bottom panel depicts the log BF evaluating the significance of a 14.2 d Keplerian signal compared to an offset. The black and red data points correspond, respectively, to the outcomes obtained from the pre-and post-outlier rejection RV time series.

In the text
thumbnail Fig. C.1

Corner plot displaying the results of MCMC analysis on the Wapiti corrected RV time series is presented. This analysis involves fitting a model that consists of an offset γ0, an RV jitter s (in log), and a Keplerian with parameters including the time of periastron Tb, the orbital period Pb, the semi-amplitude Kb, and the eccentricity and argument of periastron represented by ecosω(hb)$\sqrt e \cos \omega \left( {{h_b}} \right)$ and esinω(kb)$\sqrt e \sin \omega \left( {{k_b}} \right)$, respectively.

In the text
thumbnail Fig. D.1

Corner plot displaying the results of MCMC analysis on the W1 time series. This analysis involves fitting a quasi-periodic GP over the data and an offset.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.