Issue 
A&A
Volume 605, September 2017



Article Number  A103  
Number of page(s)  11  
Section  Planets and planetary systems  
DOI  https://doi.org/10.1051/00046361/201730406  
Published online  18 September 2017 
Evidence for at least three planet candidates orbiting HD 20794^{⋆}
Centre for Astrophysics Research, School of Physics, Astronomy and Mathematics, University of Hertfordshire, College Lane, Hatfield, AL10 9AB, UK
email: f.feng@herts.ac.uk, fengfabo@gmail.com
Received: 6 January 2017
Accepted: 14 May 2017
Aims. We explore the feasibility of detecting Earth analogs around Sunlike stars using the radial velocity method by investigating one of the largest radial velocities data sets for the one of the most stable radialvelocity stars HD 20794.
Methods. We proceed by disentangling the Keplerian signals from correlated noise and activityinduced variability. We diagnose the noise using the differences between radial velocities measured at different wavelength ranges, socalled “differential radial velocities”.
Results. We apply this method to the radial velocities measured by HARPS, and identify four signals at 18, 89, 147 and 330 d. The two signals at periods of 18 and 89 d are previously reported and are better quantified in this work. The signal at a period of about 147 d is reported for the first time, and corresponds to a superEarth with a minimum mass of 4.59 Earth mass located 0.51 AU from HD 20794. We also find a significant signal at a period of about 330 d corresponding to a superEarth or Neptune in the habitable zone. Since this signal is close to the annual sampling period and significant periodogram power in some noise proxies are found close to this signal, further observations and analyses are required to confirm it. The analyses of the eccentricity and consistency of signals provide weak evidence for the existence of the previously reported 43 d signal and a new signal at a period of about 11.9 d with a semiamplitude of 0.4 m/s.
Conclusions. We find that the detection of a number of signals with radial velocity variations around 0.5 m/s that are likely caused by low mass planet candidates demonstrates the important role of noise modeling in searching for Earth analogs.
Key words: methods: statistical / methods: numerical / techniques: radial velocities / stars: individual: HD 20794
Radial velocity tables are available at the CDS via anonymous ftp to cdsarc.ustrasbg.fr (130.79.128.5) or via http://cdsarc.ustrasbg.fr/vizbin/qcat?J/A+A/605/A103
© ESO, 2017
1. Introduction
High precision spectrometers enable us to find superEarths by measuring the Doppler shift of the stellar spectra caused by the periodic perturbations of exoplanets orbiting the target stars. Unlike the transit method, which relies on a particular orbital orientation, the radial velocity (RV) method can provide useful information about planets around all stars. The recent detection of the nearest habitablezone planet candidate Proxima Centauri b demonstrates the important role of the RV technique in detecting Earth analogs (AngladaEscudé et al. 2016).
However, the stellar spectrum is contaminated by various noise sources, such as stellar activity, instrumental noise, and nonperfect data reduction. An analysis of highcadence spectroscopy of M dwarfs shows that the noise floor (about 1 m/s) of current Doppler surveys has to a large extent an instrumental origin (Berdiñas et al. 2016). Our analysis is also concerned with demonstration of the strong dependence of activityinduced noise on wavelength. The minimization of both instrumental and stellar noise requires a noise model accounting for noise correlated in time and wavelength.
Considering that complex models typically lead to false negatives while simple models lead to false positives (Feng et al. 2016), we follow the Goldilocks principle introduced in Feng et al. (2016) to select the best noise model for a given RV data set. In other words, we devise a noise model framework to diagnose the noise in a given data set. In this work, we apply this noise model framework to the HARPS RV measurements of HD 20794.
HD 20794 is a highvelocity and metaldeficient G8 star with a mass of M_{⊙} (Ramírez et al. 2013). It has been reported to host at least three planets and a dust disk (Pepe et al. 2011; Kennedy et al. 2015). The postulated planets are likely superEarths with orbital periods of 18.1, 40.1, and 90.3 d Pepe et al. (2011; hereafter P11). These results of P11 were obtained via periodogram analysis, which assumes white RV noise and circular planetary orbits. Although HD 20794 is known to have planets, it has long been chosen as a special RV reference target due to its notable stability over long time baselines found by the southern radial velocity programs of the AngloAustralian Planet Search (e.g., Fig. 2 of Butler et al. 2001) and its continuous presence on the HARPS guaranteed time observation list^{1}.
This paper is structured as follows. We introduce the RV data in Sect. 2. In Sect. 3, we introduce differential RVs to model wavelengthdependent noise and choose the socalled Goldilocks noise model in the Bayesian framework. In Sect. 4, we apply our models to the data and report the results. Finally, we discuss and conclude in Sect. 5.
2. HARPS Doppler measurements for HD 20794
In the European Southern Observatory archive, there are 5150 publicly available RVs measured by HARPS from September 2003 to September 2013 for HD 20794 from programs Mayor, 60.A9036, 072.C0488, 083.C1001, 084.C0229, 086.C0230, 087.C0990, 088.C0011, 089.C0050, 090.C0849, and 091.C0936. We processed the Doppler measurements with the TERRA algorithm (AngladaEscudé & Butler 2012), which generates RVs for 72 echelle orders. The stellar activity is recorded by the following indices: the Sindex measured from the Ca II H&K emission, line bisector shapes (BIS), and the width of the spectral lines (FWHM). We excluded the points with RVs or activity indices, which deviate from the mean by more than 5σ. We do this iteratively to omit any visible outliers. Moreover, we also removed the data obtained before JD 2 453 500 due to the large dispersion in the activity indices, FWHM in particular. Such a feature is also observed in the τ Ceti data (Feng et al. 2017b), although the cause for this may be different. Finally, we obtain a data set of 4882 RVs together with activity indices. These data and the discarded epochs are shown in Fig. 1. We see a great variation of FWHM before JD 2 453 500, and this feature is also found in the data measured for τ Ceti (Feng et al. 2017b)^{2}.
Fig. 1 Measurements of RV, Sindex, BIS and FWHM by HARPS for HD 20794. The dashed line denotes JD 2 453 500 before which the data is probably affected by instrumental effects. 

Open with DEXTER 
We used the RV differences between the 72 echelle orders to remove wavelengthdependent noise. We divided the 72 spectral orders into groups and averaged the RVs in each group to generate the socalled “aperture data sets”. For n groups of spectral orders, we generated nAPj aperture data sets, where j is a natural number not larger than n. We defined the RV differences between aperture data sets as differential RVs. We denoted them by nAPj − i, where n is the number of groups of spectral orders, and j and i denote different aperture data sets. For n independent aperture data sets, there are n − 1 independent differential RVs in total. The schematic of this data reduction process is shown in Fig. 2.
Fig. 2 Schematic diagram of data reduction and modeling process. 

Open with DEXTER 
3. Noise model framework and Bayesian inference
3.1. Wavelengthdependent noise models
The RV variation induced by stellar activity and rotation is typically wavelengthdependent (Tuomi et al. 2013a). Thus the average of RVs measured at different wavelength ranges (or spectral orders) is biased because previous studies have ignored the influence of such a dependence. To remove such bias, we introduced “differential RVs” to weight the spectral orders a posteriori. Differential RVs are RV differences between different spectral orders (or wavelength ranges) and thus do not contain Keplerian variation since Keplerian variation is color independent. Following the Goldilocks principle of noise modeling introduced by Feng et al. (2016), we reduced the number of different RVs by deriving them from averaged spectral orders, socalled aperture data sets (see Fig. 2).
The differential RVs are then (1)where v(t_{i},λ_{j}) is the RV measured at time t_{i} within a wavelength range centered at λ_{j}. We define the noise component of the RV model as (2)where N_{λ} is the number of wavelength ranges or aperture divisions and thus N_{D} ≡ N_{λ} − 1 is the number of independent differential RVs, d_{m} characterizes the linear dependence of RV noise on the mth differential RV, a is the linear acceleration caused by stellar companions or activity cycles and b is the reference velocity. The linear dependence of RVs on activity index I_{k} is parameterized by constant c_{k}. We consider that a and b can be wavelength dependent, because besides fitting for trends due to companions, those parameters can also absorb any drift due to longterm activity not correlated with activity indices or longterm instrumental noise variation. We linearly combine with n Keplerian components to define the basic RV model, which is (3)where f_{k}(t_{i}) is the kth Keplerian component, and is the noise component. In the Keplerian function, K_{k}, ω_{k}, ν_{k}, and e_{k} are the amplitude, longitude of periastron, true anomaly, and eccentricity of the kth Keplerian signal.
Since the moving average (MA) model can efficient account for red noise in RVs and avoid false negatives (Feng et al. 2016), we model the red noise using a general MA model with exponential smoothing which is (4)where w_{k} and τ are the amplitude and timescale of the moving average, and ϵ_{i − k} is . Hereafter, we define “nP+MA(q)+mD” as the nplanet model combined with qthorder moving average and m differential RVs deriving from m + 1 aperture data sets (see Fig. 2). The white noise model is , denoted by MA(0).
We also account for a white excess noise with an amplitude of s_{J} in the likelihood which is (5)where σ_{i,j} is the measurement noise at time t_{i} in the jth aperture data set with wavelength range centered at λ_{j}, s_{J}(λ_{j}) is the jitter level, and v_{i,j} is the RV measured at time t_{i} in the jth aperture data set. Following Feng et al. (2016), we adopt a Gaussian prior distribution centered at zero and with a standard deviation of 0.1 for eccentricity, logarithmic uniform distributions for period P and correlation timescale τ, and uniform distributions for other parameters. The specific prior distributions are given in Table 1.
Prior distributions of model parameters.
In almost all previous research, only the averaged data or 1AP1 is analyzed without accounting for wavelengthdependent noise. We justify the usage of differential RVs in the removal of wavelengthdependent noise by applying the model defined in Eqs. (4) and (5) to the 3AP1, 3AP2, 3AP3 and 1AP1 data sets. We model these data sets using the noise models of 0P+MA(1)+0D, 0P+MA(1)+2D, 0P+MA(0)+0D, and 0P+MA(0)+2D. We calculate the generalized periodogram with floating trend (GLST; Feng et al. 2017a) for each data set and each model to visualize the consistency of signals in wavelength. Following Cumming et al. (1999), the GLST is normalized with respect to the residual variance and the FAP is calculated accordingly.
The GLSTs for the 3AP1, 3AP2, 3AP3, and 1AP1 data sets and their residuals after subtracting the bestfitting noise model are shown in Fig. 3. The interpretation of specific peaks in plotted GLSTs is not important. Rather, our concern is the consistency between GLSTs of RVs and their residuals measured at different wavelengths. The FAPs are calculated assuming the null hypothesis of white noise and thus are not used to measure the significance of signals in the data contaminated by correlated noise. In the figure, we observe significant differences between GLSTs for 3AP1, 3AP2, 3AP3, and 1AP1, indicating significant wavelengthdependent noise in the RV data. On the contrary, the residuals of all RV data sets have similar GLSTs after subtracting the bestfitting 0P+MA(0)+2D or 0P+MA(1)+2D from the data. Such consistency is not sensitive to the change of timecorrelated noise component (i.e., MA(0) and MA(1)), as seen in the rightmost two columns. Therefore differential RVs are essential in removing wavelengthdependent noise.
Fig. 3 Illustration of the existence of wavelengthdependent noise and the necessity of removing it using differential RVs. We show a series of GLSTs across the page. Each row of plots is given to a different aperture data set labeled 3AP1, 3AP2, 3AP3, and 1AP1. The following columns show GLSTs for the corresponding residuals after subtracting the bestfitting models of 0P+MA(0)+0D (white noise), 0P+MA(1)+0D (moving average), and 0P+MA(0)+2D (white noise with differential RVs), 0P+MA(1)+2D (moving average with differential RVs). The logarithm BF of a model with respect to the MA(0) model are shown for the residuals of each data set after subtracting the best model prediction. In each panel, the red dotted line denotes the period with the highest power. The false alarm probability (FAP) threshold of 0.1, 0.01 and 0.001 are shown by dashed lines. 

Open with DEXTER 
In the following sections, we focus our analysis on the 1AP1 data set because it has a higher signaltonoise ratio with respect to the other aperture data sets. As seen in Fig. 3, the wavelengthdependent noise in 1AP1 is properly removed by differential RVs a posteriori.
Logarithmic BICestimated BFs for noise models with respect to the model without any MA components or differential RVs.
Logarithmic BICestimated BFs for models with various numbers of planets for Keplerian and circular solutions.
3.2. Goldilocks noise model
We applied the adaptive Metropolis Markov chain Monte Carlo (AM) introduced by Haario et al. (2001) to sample the posterior. Specifically, we used tempered (or hot) AM chains to identify potential signals, and then used untempered (or cold) chains to constrain signals. Similar methods have been introduced by Gregory (2011). The hot chain is defined as a chain moving according to the modified posterior with 0 <β ≤ 1, where and P(θ  M) are the likelihood and prior of parameters θ for model M and data . We used four parallel chains for inference if the posterior samples drawn by these chains converged to a stationary distribution. In other words, the ratio of variances between and within these chains should be less than 1.1, which is the socalled “GelmanRubin criteria” (Gelman & Rubin 1992). We typically use the AM algorithm to draw a few million posterior samples for model and/or parameter inference. The detailed results are available online at http://starwww.herts.ac.uk/~ffeng/HD20794_supplementary/results
From the likelihoods of the posterior samples drawn by cold chains, we estimated the Bayes factor (BF) using the Bayesian information criterion (BIC). Following Feng et al. (2016), we regard model A as favored over model B if the logarithmic BF of the former with respect to the latter is larger than 5. To confirm a signal, we also adopted the criteria used by Tuomi (2012) that the period should be constrained from above and below in the posterior distribution of the period. In other words, converges to a stationary distribution.
Following the Goldilocks principle introduced by Feng et al. (2016), we applied the BF criterion to determine which noise model to use for removing noise. We generated a set of noise models by setting q ∈ { 0,1,2,3,4,5,6 } and N_{D} ∈ { 0,2,5,8 }, and apply them to the data. The results of the model comparison are shown in Table 2. We find that the fourth order MA combined with five differential RVs is the best choice. Apart from q and N_{D}, all other model parameters are free. Specifically there are 2 parameters for the trend, 1 for jitter, 3 for activity indices, 5 for differential RVs and 5 for the MA model. There are 16 free parameters in total in the 0P+MA(4)+5D model. In the following section, we analyze the HARPS data using this model.
4. Keplerian candidates
We applied both Keplerian and sinusoidal functions to model the RV variations caused by planets, which are called the Keplerian solution and circular solution, respectively. We identified six signals in both solutions, and report the BFs for them in Table 3. Although the ~43 d signal identified in the Keplerian solution does not pass the BF threshold of 150, it satisfies the criteria for the circular solution. Moreover, without including the 43 d signal into the model, the eccentricity of the 89 d signal would be as high as 0.4. The 43 d and 89 d signals are approximately in 1:2 resonance, which is not rare for exoplanets (Steffen & Hwang 2015). Although the signal at a period of 11 d does not pass the BF threshold in both solutions, it is strong in the Keplerian solution, and increases the BF by at least one order of magnitude. In addition, it is visible in the periodograms for all observation seasons, as we see later. The periods in the circular solution are different from those in the Keplerian solution because of the assumption of periods in the circular solution.
To test whether the identified signals are caused by stellar and instrumental noise, we show the GLSTs for the data, the activity indices and differential RVs in Fig. 4. In the figure, we find that the RVs are strongly correlated with the FWHM and differential RVs, demonstrating the essential role of these noise proxies in removing correlated noise. We also find that periods around the ~330 d signal are strong in noise proxies such as FWHM and 6AP54, although these periods typically deviate from 330 d by more than 20 d. Such approximate overlaps are expected since the 330 d signal is close to the annual sampling frequency. Moreover, the probability of a random overlap between a signal and significant powers in the GLSTs of noise proxies increases with the number of proxies. Considering the minimization of correlated noise using noise proxies in the model, the 330 d signal is too strong to be caused by pure noise.
Fig. 4 GLSTs for the 1AP1 data set, various noise proxies, and the window function. The names of the times series are shown at the top left corner. The red and blue dotted lines denote the signals quantified for the 6planet model in the Keplerian and circular solutions respectively and reported in Table 3. The blue and red lines are usually too close to be distinguished for signals at periods larger than 12 d. We truncate the period range for optimal visualization of signals. We do not show the FAPs because the highly correlated noise in the data is not accounted for by white noise FAPs. 

Open with DEXTER 
We also calculated the GLSTs for the data subtracted by the optimal prediction of models with different numbers of Keplerian components and show these periodograms in Fig. 5. We see that all identified signals correspond to certain peaks in the periodogram for the original RVs. The subtraction of the noise components makes some signals more significant. We find that the signal identified by posterior sampling is not always the most significant one shown in the residual periodogram. Moreover, the 43 and 11 d signals are not strong in the corresponding residual periodograms, indicating the limitation of residual periodograms in identifying Keplerian signals. The residuals after subtracting the predictions of nplanet model are shown in Fig. A.1 in the appendix. The correlation between the RV residuals and noise proxies are shown in Fig. A.2.
Fig. 5 GLSTs for the original data and data subtracted by the optimal prediction of nplanet model. The value of n is shown in the upper right corner. The new signal identified by the n + 1planet model is shown by red dotted lines. The other elements are similar to those in Fig. 4. 

Open with DEXTER 
To show the consistency of the detected signals, we make a Bayesian periodogram for data within a moving time window. To make this periodogram, we model the RVs using a combination of sinusoidal functions and a linear trend and analytically marginalize the likelihood over the amplitudes of sinusoidal functions and the trend parameters, which are a and b in Eq. (2). Hence this new periodogram is called marginalized likelihood periodogram (MLP; Feng et al. 2017a). In addition to trend parameter a, the MLP also optimizes b, and thus extends the Bayesian generalized LombScargle periodogram (Mortier et al. 2015)^{3}. To make a 2D periodogram, we move the time window 100 steps to cover the whole data set, and compute the MLP for each step to construct a moving periodogram. We apply this method to the RVs from which the noise component in the sixplanet model prediction for the Keplerian solution has been subtracted. The moving periodograms with 1000 d and 2000 d time windows together with the data are shown in Fig. 6. Since more data is needed to cover the phase of long period signals, we recommend the 2000 d time window for testing the consistency of the 330 d and 147 d signals. We calculated the MLP for each time step and scale the logarithmic marginalized likelihood (ML) to be , where and ML_{max} are the mean and maximum ML. The RMLs are encoded by colors to show the consistency of signals in time rather than to estimate the significance of signals.
As we can see, the 11 d signal together with its aliases, although weak, are visible over the whole time span. The 147 and 330 d signals are consistently identified in the 2000 d moving periodogram. Notably the significance of these signals may vary with time because long period signals are sensitive to data sampling and size, which typically vary with time. There are also strong powers around the harmonics of the 330 d signal at a period of about 680 d and the annual alias of the 147 d signal at a period of about 250 d. Considering that the 330 d signal is also strong in some noise proxies, further observations and analyses are required to explore the nature of this long period signal.
Fig. 6 Noisesubtracted RVs (top panels) and the moving MLPbased periodograms (bottom panels) made within the 1000 d (left panels) and 2000 d (right panels) moving window. In each moving periodogram, the bottom left panel shows the periodogram for periods ranging from 10 d to 1000 d while the bottom right panel is focused on visualization of periods below 30 d. The colors encode the RML which is truncated to optimize the visualization of signals. The periods of signals identified in the 6planet Keplerian solution are shown by horizontal dashed lines. The logarithmic ML is truncated to optimize the visualization of signals. 

Open with DEXTER 
The 147 d signal is strong in many observation seasons despite the slight variation of its period. The high periodogram power around 110 d is probably a combination of the aliases of 89 and 147 d since the subtraction of the 89 d signal would reduce its significance. The signals around 89 d, 43 d and 18 d are also consistently identified over the whole time span. These signals are previously found by P11 in the circular solutions. Notably the periods of ~43 and 147 d slightly deviate from the optimized values shown in the periodogram because the periodogram assumes zero eccentricity. In addition to the identified signals, the periodogram maps in Fig. 6 also show annual alias of 330 d around 174 d, and annual aliases of 89 d around 113 and 70 d. The annual aliases of 43, 18 and 11 d are also visible. Therefore we interpret all strong signals based on the moving periodograms which visualize the genuine signals as well as their aliases.
Based on the above analyses, we interpret the signals around 18, 89, 147 and 330 d as planet candidates, although investigation of the 330 d signal is necessary to fully confirm its Keplerian nature. Since there is only weak evidence for the signals at periods of 11 and 43 d, more data and further analyses are required to confirm these potential candidates. We show the phase curves for these six signals in Fig. 7. We see that the sixplanet model well fits the data, supporting the existence of 6 signals despite considerable eccentricity for signals around 18, 43 and 147 d.
Fig. 7 Phasefolded data and planet model prediction. For the signal shown in each panel, the noise components and other signals are subtracted from the data. For each panel, the phasefolded times are original observation times mod the optimal period with respect to the minimum observation time. 

Open with DEXTER 
In Table 4, we report the parameters of the noise model and the four planet candidates, which are estimated in the 6planet Keplerian solution. In the table, we see high MAP values of linear parameters such as c_{3} and d_{3}^{4}, suggesting the existence of correlated noise as high as 1 m/s. This correlated noise would prevent any reliable detections of submeter signals if not minimized through the application of noise proxies such as activity indices and differential RVs. However, the application of noise proxies may introduce extra noise in the fitting (Feng et al. 2017b) due to an oversimplification of the complex/nonlinear dependence of RVs on proxies by linear functions. This is also part of the reason why we do not interpret the weak signals at periods of 11 and 43 d as planet candidates. On the other hand, the MA model plays an important role in removing timecorrelated noise, which is probably intranight RV variability caused by instrumental or reductionprocess effects (Berdiñas et al. 2016).
Maximum a posteriori (MAP) estimation of the parameters for the noise model (lower rows) and the five signals (upper rows) detected in the TERRAreduced HARPS data for HD 20794.
In Table 4, the signal at a period of 147 d corresponds to a superEarth perturbing HD 20794 with a mean semiamplitude^{5} of about 0.5 m/s. The candidate with 89 d period has a minimum mass lower than the candidate around 90 d reported by P11 most likely because we accounted for correlated noise that might have been interpreted as signals by P11. The 11 d signal caused a radial velocity variation of about 0.35 m/s. Regardless of whether this signal is Keplerian or not, the detection of such a weak signal demonstrates the important role of noise modeling in detecting low mass planets using the radial velocity technique.
According to the results of RV challenge, the MA model in combination of Bayesian methods can identify signals with a K/N ratio as low as 5 without announcing a false positive (Dumusque et al. 2017). The K/N ratio for signal with semiamplitude of K is defined as , where RV_{rms} is the standard deviation of RVs after removing the bestfitting trend and correlation with noise proxies, and N_{obs} is the number of observations. If we consider measurements in a 15 min bin as an independent observation (Mayor et al. 2003; O’Toole et al. 2008)^{6}, we get 713 independent observations and RV_{rms} = 1.61 m/s. Then the K/N ratios are 12, 10, 10, 19, 6.7, and 9.6 for signals at periods of 18, 89, 147, 330, 11, and 43 d. Hence the submeter signals we identified for HD 20794 b, d, and e are well above the K/N = 7.5 threshold of reliable detections of RV signals. Even the smallest signal at 11 d is close to this threshold and is higher than the K/N = 5 limit reached by our team.
However, the nominal eccentricities of HD 20794 b, d, and e are different from zero, which probably lead to dynamical instability of the system according to the Lagrange stability analyses (Barnes & Greenberg 2006). But we find that the system is stable if lower eccentricity within the uncertainty interval is adopted for HD 20794 b, d and e. The high eccentricity reported in Table 4 might arise from instrumental noise (Feng et al. 2017b). Specifically, the instrumental noise may cause shortterm RV variations that favor high eccentricity solutions. The high eccentricity may also be caused by the intrinsic bias in estimating nonnegative eccentricity (Zakamska et al. 2011). Therefore the proposed candidates probably have orbits that are more circular than those reported in this work.
Only the candidate with a period of about 330 d is located in the habitable zone (Kopparapu et al. 2013), although the confirmation of this candidate requires further observations and analyses. Its habitability would be influenced by the frequent impacts of objects from the massive circumstellar disk (Kennedy et al. 2015). All the other candidates are located too close to the host star to allow the existence of liquid water, and thus cannot be considered habitable.
5. Discussions and conclusions
We analyze the HARPS data of HD 20794 in the Bayesian framework. We find strong dependence of the RV noise on wavelengths. This wavelengthdependent noise cannot be removed by averaging all spectral orders as previous studies did. To deal with this noise, we use differential RVs to weight the spectral orders a posteriori. We apply this method to data sets measured within different wavelength ranges, and find that differential RVs efficiently remove wavelengthdependent noise. Therefore we propose a combination of the MA model and differential RVs to remove the time and wavelengthdependent noise in RV data sets.
By modeling the RV noise correlated in time and wavelength for HD 20794, we identify three firm Keplerian candidates at periods of about 18, 89 and 147 d. The signal at a period of 330 d is probably Keplerian, although similar periods have strong powers in some proxies. There is also weak evidence for the existence of signals at periods of 43 and 11 d. While the 43 d signal does not pass the Bayes factor criterion, the inclusion of this signal in the model can reduce the eccentricity of the 89 d signal. Although the 11 d signal only increases the Bayes factor by a factor of about 20, it consistently appears in different data chunks.
Thus by applying noise modeling to the P11 and newer HARPS data, we confirm the two candidates at periods of 18 and 90 d reported by P11 and quantify them better. But we are suspicious about the existence of the reported 40 d candidate because it does not pass the signal detection criteria. We also find a new planet candidate, HD 20794 e, at a period of about 147 d, corresponding to a superEarth. The signal at a period of about 330 d probably correspond to a Neptune located in the habitable zone of HD 20794. The estimated orbital eccentricity of HD 20794 b, d and e is larger than 0.2.
This considerable eccentricity is probably caused by instrumental noise and the fitting bias, as concluded by (Feng et al. 2017b) and Zakamska et al. (2011). Notably such eccentric solutions are also found in the HARPS data of τ Ceti (Feng et al. 2017b). This indicates a connection between high eccentricity and significant red noise in large RV data sets of bright stars that are observed with high cadence. Thus it is a viable hypothesis that the four planet candidates have eccentricities that are low enough to form a dynamically stable system.
Our detection of signals with semiamplitudes to below 0.5 m/s demonstrates the ability of the RV method to find Earth analogs orbiting Sunlike stars. This work supports the conclusion of Dumusque et al. (2017) that the combination of Bayesian methods with the modeling of correlated noise is essential to finding small planets. In particular, the noise model framework developed by Tuomi et al. (2013b) and Feng et al. (2017b) and applied in this work plays a key role in disentangling small signals from noise.
The data are available in http://starwww.herts.ac.uk/~ffeng/HD20794_supplementary/data
The relevant code for MLP and GLST is available in Github: https://github.com/phillippro/agatha, based on which a web app is also developed and linked at http://www.agatha.herts.ac.uk
These parameters measure the contribution of noise RV variations and are in units of m/s because the proxies are normalized. On the other hand, the Pearson coefficients shown in Fig. A.2 measure the correlation between RVs and proxies.
The RVs were measured with high cadence and are thus highly correlated in time due to stellar oscillations and systematic errors (Teixeira et al. 2009). It is necessary to bin the data to calculate the number of independent observations.
Acknowledgments
The authors are supported by the Leverhulme Trust (RPG2014281) and the Science and Technology Facilities Council (ST/M001008/1).We used the ESO Science Archive Facility to collect radial velocity data sets. The authors gratefully acknowledge the HARPSESO team’s continuous improvement of the instrument, data and data reduction pipelines that made this work possible. Finally, the authors thank the referee, Xavier Dumusque, for his valuable comments that helped improve the manuscript significantly.
References
 AngladaEscudé, G., & Butler, R. P. 2012, ApJS, 200, 15 [NASA ADS] [CrossRef] [Google Scholar]
 AngladaEscudé, G., Amado, P. J., Barnes, J., et al. 2016, Nature, 536, 437 [NASA ADS] [CrossRef] [PubMed] [Google Scholar]
 Barnes, R., & Greenberg, R. 2006, ApJ, 647, L163 [NASA ADS] [CrossRef] [Google Scholar]
 Berdiñas, Z. M., Amado, P. J., AngladaEscudé, G., RodríguezLópez, C., & Barnes, J. 2016, MNRAS, 459, 3551 [NASA ADS] [CrossRef] [Google Scholar]
 Butler, R. P., Tinney, C. G., Marcy, G. W., et al. 2001, ApJ, 555, 410 [NASA ADS] [CrossRef] [Google Scholar]
 Cumming, A., Marcy, G. W., & Butler, R. P. 1999, ApJ, 526, 890 [NASA ADS] [CrossRef] [Google Scholar]
 Dumusque, X., Borsa, F., Damasso, M., et al. 2017, A&A, 598, A133 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Feng, F., Tuomi, M., Jones, H. R. A., Butler, R. P., & Vogt, S. 2016, MNRAS, 461, 2440 [NASA ADS] [CrossRef] [Google Scholar]
 Feng, F., Tuomi, M., & Jones, H. R. A. 2017a, MNRAS, 470, 4794 [NASA ADS] [CrossRef] [Google Scholar]
 Feng, F., Tuomi, M., Jones, H. R. A., et al. 2017b, AJ, submitted [Google Scholar]
 Gelman, A., & Rubin, D. B. 1992, Stat. Sci., 457 [Google Scholar]
 Gregory, P. C. 2011, MNRAS, 410, 94 [NASA ADS] [CrossRef] [Google Scholar]
 Haario, H., Saksman, E., & Tamminen, J. 2001, Bernoulli, 223 [Google Scholar]
 Kennedy, G. M., Matrà, L., Marmier, M., et al. 2015, MNRAS, 449, 3121 [NASA ADS] [CrossRef] [Google Scholar]
 Kopparapu, R. K., Ramirez, R., Kasting, J. F., et al. 2013, ApJ, 765, 131 [NASA ADS] [CrossRef] [Google Scholar]
 Mayor, M., Pepe, F., Queloz, D., et al. 2003, The Messenger, 114, 20 [NASA ADS] [Google Scholar]
 Mortier, A., Faria, J. P., Correia, C. M., Santerne, A., & Santos, N. C. 2015, A&A, 573, A101 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 O’Toole, S. J., Tinney, C. G., & Jones, H. R. A. 2008, MNRAS, 386, 516 [NASA ADS] [CrossRef] [Google Scholar]
 Pepe, F., Lovis, C., Ségransan, D., et al. 2011, A&A, 534, A58 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Ramírez, I., Allen de Prieto, C., & Lambert, D. L. 2013, ApJ, 764, 78 [NASA ADS] [CrossRef] [Google Scholar]
 Steffen, J. H., & Hwang, J. A. 2015, MNRAS, 448, 1956 [NASA ADS] [CrossRef] [Google Scholar]
 Teixeira, T. C., Kjeldsen, H., Bedding, T. R., et al. 2009, A&A, 494, 237 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Tuomi, M. 2012, A&A, 543, A52 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Tuomi, M., AngladaEscudé, G., Gerlach, E., et al. 2013a, A&A, 549, A48 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Tuomi, M., Jones, H. R. A., Jenkins, J. S., et al. 2013b, A&A, 551, A79 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Zakamska, N. L., Pan, M., & Ford, E. B. 2011, MNRAS, 410, 1895 [NASA ADS] [Google Scholar]
Appendix A: Residual visualization
Fig. A.1 Signals and residuals for nplanet model. In the left panels, the original data, noisesubtracted data, and signals are shown by black points, gray points, and red curve, respectively. The residuals after subtracting signals and the noise model predictions are shown in the middle panels with the same RV range as in the left panels. The distributions of residuals are shown in the right panels. The mean (μ), standard deviation (σ), skewness (μ^{3}), and kurtosis (μ^{4}) are shown for each distribution. 

Open with DEXTER 
Fig. A.2 Correlation between RV residuals and noise proxies. The RV residuals for a given proxy are calculated by subtracting the bestfitting sixplanet model from the 1AP1 data set, where the linear parameter corresponding to the proxy is set to zero. The Pearson correlation coefficient is shown in the top right corner. 

Open with DEXTER 
All Tables
Logarithmic BICestimated BFs for noise models with respect to the model without any MA components or differential RVs.
Logarithmic BICestimated BFs for models with various numbers of planets for Keplerian and circular solutions.
Maximum a posteriori (MAP) estimation of the parameters for the noise model (lower rows) and the five signals (upper rows) detected in the TERRAreduced HARPS data for HD 20794.
All Figures
Fig. 1 Measurements of RV, Sindex, BIS and FWHM by HARPS for HD 20794. The dashed line denotes JD 2 453 500 before which the data is probably affected by instrumental effects. 

Open with DEXTER  
In the text 
Fig. 2 Schematic diagram of data reduction and modeling process. 

Open with DEXTER  
In the text 
Fig. 3 Illustration of the existence of wavelengthdependent noise and the necessity of removing it using differential RVs. We show a series of GLSTs across the page. Each row of plots is given to a different aperture data set labeled 3AP1, 3AP2, 3AP3, and 1AP1. The following columns show GLSTs for the corresponding residuals after subtracting the bestfitting models of 0P+MA(0)+0D (white noise), 0P+MA(1)+0D (moving average), and 0P+MA(0)+2D (white noise with differential RVs), 0P+MA(1)+2D (moving average with differential RVs). The logarithm BF of a model with respect to the MA(0) model are shown for the residuals of each data set after subtracting the best model prediction. In each panel, the red dotted line denotes the period with the highest power. The false alarm probability (FAP) threshold of 0.1, 0.01 and 0.001 are shown by dashed lines. 

Open with DEXTER  
In the text 
Fig. 4 GLSTs for the 1AP1 data set, various noise proxies, and the window function. The names of the times series are shown at the top left corner. The red and blue dotted lines denote the signals quantified for the 6planet model in the Keplerian and circular solutions respectively and reported in Table 3. The blue and red lines are usually too close to be distinguished for signals at periods larger than 12 d. We truncate the period range for optimal visualization of signals. We do not show the FAPs because the highly correlated noise in the data is not accounted for by white noise FAPs. 

Open with DEXTER  
In the text 
Fig. 5 GLSTs for the original data and data subtracted by the optimal prediction of nplanet model. The value of n is shown in the upper right corner. The new signal identified by the n + 1planet model is shown by red dotted lines. The other elements are similar to those in Fig. 4. 

Open with DEXTER  
In the text 
Fig. 6 Noisesubtracted RVs (top panels) and the moving MLPbased periodograms (bottom panels) made within the 1000 d (left panels) and 2000 d (right panels) moving window. In each moving periodogram, the bottom left panel shows the periodogram for periods ranging from 10 d to 1000 d while the bottom right panel is focused on visualization of periods below 30 d. The colors encode the RML which is truncated to optimize the visualization of signals. The periods of signals identified in the 6planet Keplerian solution are shown by horizontal dashed lines. The logarithmic ML is truncated to optimize the visualization of signals. 

Open with DEXTER  
In the text 
Fig. 7 Phasefolded data and planet model prediction. For the signal shown in each panel, the noise components and other signals are subtracted from the data. For each panel, the phasefolded times are original observation times mod the optimal period with respect to the minimum observation time. 

Open with DEXTER  
In the text 
Fig. A.1 Signals and residuals for nplanet model. In the left panels, the original data, noisesubtracted data, and signals are shown by black points, gray points, and red curve, respectively. The residuals after subtracting signals and the noise model predictions are shown in the middle panels with the same RV range as in the left panels. The distributions of residuals are shown in the right panels. The mean (μ), standard deviation (σ), skewness (μ^{3}), and kurtosis (μ^{4}) are shown for each distribution. 

Open with DEXTER  
In the text 
Fig. A.2 Correlation between RV residuals and noise proxies. The RV residuals for a given proxy are calculated by subtracting the bestfitting sixplanet model from the 1AP1 data set, where the linear parameter corresponding to the proxy is set to zero. The Pearson correlation coefficient is shown in the top right corner. 

Open with DEXTER  
In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.