3D magneto-hydrodynamical simulations of stellar convective noise for improved exoplanet detection I. Case of regularly sampled radial velocity observations

Context. Convective motions at the stellar surface generate a stochastic colored noise source in the radial velocity (RV) data. This noise impedes the detection of small exoplanets. Moreover, the unknown statistics (amplitude, distribution) related to this noise make it difficult to estimate the false alarm probability (FAP) for exoplanet detection tests. Aims. In this paper, we investigate the possibility of using 3D magneto-hydrodynamical simulations (MHD) of stellar convection to design detection methods that can provide both a reliable estimate of the FAP and a high detection power. Methods. We tested the realism of 3D simulations in producing solar RV by comparing them with the observed disk integrated velocities taken by the GOLF instrument on board the SOHO spacecraft. We presented a new detection method based on periodograms standardized by these simulated time series, applying several detection tests to these standarized periodograms. Results. The power spectral density of the 3D synthetic convective noise is consistent with solar RV observations for short periods. For regularly sampled observations, the analytic expressions of FAP derived for several statistical tests applied to the periodogram standardized by 3D simulation noise are accurate. The adaptive tests considered in this work (Higher-Criticism, Berk-Jones), which are new in the exoplanet field, may offer better detection performance than classical tests (based on the highest periodogram value) in the case of multi-planetary systems and planets with eccentric orbits. Conclusions. 3D MHD simulations are now mature enough to produce reliable synthetic time series of the convective noise affecting RV data. These series can be used to access to the statistics of this noise and derive accurate FAP of tests that are a critical element in the detection of exoplanets down to the cm.s−1 level.


Introduction
At the time of this writing, 880 extrasolar planets have been discovered so far by the radial velocity (RV) technique 1 . In this sample, 52 % of these consist of planets that are more massive than Jupiter and 14 % that have a mass inferior to 10 Earthmasses (M ⊕ ). Since the detection of HD215152c (Mayor et al. 2011), only 19 planets have been found with a mass ≤ 2M ⊕ and all of the latter are short-period (≤ 50 days) planets orbiting stars that are less massive than the Sun.
Indeed, detecting planets is easier around low-mass stars (as the ratio of the planet to stellar masses is higher) and, thus, a first strategy consists in monitoring cool M dwarfs to increase the detection probability. This has been the purpose of recent surveys with spectrographs such as CARMENES (Quirrenbach et al. 2014) and SPIRou (Donati et al. 2017). On the other hand, new instruments such as ESPRESSO (Pepe et al. 2010) and EX-PRES (Jurgenson et al. 2016) have been developed to ensure the long-term stability that is needed to detect signals of terrestrial planets orbiting main sequence G-dwarf stars (with an amplitude around 10-30 cm.s −1 ). 1 Source: exoplanet.eu, confirmed planets (01/2020). However, detecting planet signatures at the cm.s −1 level remains challenging as spurious Doppler shifts of various origins may dominate the RV series. The activity at the surface of the host star is one of the main sources generating changes in depth, width, and asymmetries of the absorption lines. Disentangling the planetary signal from the stellar activity "noise" is an active research topic (see e.g., Desort et al. 2007;Aigrain et al. 2012;Haywood et al. 2014;Lagrange et al. 2010;Meunier et al. 2017a;Wise et al. 2018;Dumusque 2018;Chaplin et al. 2019;and references therein) and stellar activity has already led to several controversial planet detections in the past (e.g., αCenB b, Dumusque et al. 2012;Hatzes 2013;Rajpaul et al. 2016, GJ581 d andg, Vogt et al. 2010;Robertson et al. 2014, GJ667 c, andf Anglada-Escudé et al. 2013;Robertson & Mahadevan 2014).
This activity results from the contribution of various phenomena, which can be classified as a function of their correlation timescales. For main-sequence Solar-like stars, the three main noise sources originate from: 1) cyclic stellar oscillation eigenmodes (a few minutes), 2) stochastic surface convection motions (min-hrs), and 3) (quasi) periodic stellar activity -spots, plages, flares -modulated with the stellar rotation or cycle (days-years). We note that even if active regions are more frequent at the max- imum phases of the cycle, they can have lifetimes that are shorter than the rotation period (Saar & Donahue 1997).
In this paper, we aim to consider how the influence of the stochastic noise due to stellar convective motion could be counteracted. We ignore other sources of stellar RV variations (e.g., oscillations and active regions) and consider them as already corrected in our time series (e.g., through activity-sensitive lines (Baliunas et al. 1995;Wise et al. 2018) or dedicated filtering technique (Chaplin et al. 2019)).
Convective noise can significantly alter the detection of exoplanets at the sub-m.s −1 level (Meunier et al. 2015(Meunier et al. , 2017bMeunier, N. & Lagrange, A.-M. 2019;. The main technique proposed so far for mitigating its contribution in the RV series down to some tenth of cm.s −1 consists of averaging several (typically two or three) measurements of a target star during a night and separating them by at least two hours (Dumusque et al. 2011;Collier Cameron et al. 2019). However, the convection acts as a correlated noise over timescales longer than twoto-four hours (and even longer for supergranulation) and some correlations remain by using this observational procedure (Meunier et al. 2015). Moreover, this technique is performed at the cost of a small number of data points per night, leading to a critical lack of knowledge of the statistical properties for stellar activity as a whole. Other methods for dealing with convection noise consist of modeling the stellar activity as a correlated noise when fitting for RV planetary Keplerian signatures. Examples of common empirical models that we can find in the literature are red (i.e., power law) noise (Feroz & Hobson 2014), moving averaged noise (Tuomi et al. 2014), or Gaussian processes (Rajpaul et al. 2015). In practice, these empirical modelings should be used with caution as their results may lead to different conclusions, as shown in a recent RV challenge (Dumusque et al. 2017).
In this study, we question the reliability of traditional methods for determining the statistical significance of the detection in the presence of correlated noise. This significance is based on the value of the false alarm probability (FAP) of statistical tests (see review in Khan et al. 2017). Traditionally, the FAP is derived under the assumption that the noise within the data (or the data residuals) is an uncorrelated white Gaussian noise (WGN). In this work, we propose a new method to access the significance of the detection of (quasi-)periodic signals in the presence of a correlated noise, providing that we can generate reliable (nonparametric) time series of this noise. We propose to use stateof-the-art 3D magneto-hydrodynamical (MHD) simulations of stellar surfaces to generate the noise series. We note that such simulations have already been investigated to determine the impact of convection on exoplanet detection (Cegla et al. 2013(Cegla et al. , 2018. Our analysis focuses on evaluating the reliability of MHD simulations in reproducing the time series of solar convective noise and on investigating the statistical benefit of using such simulated RV for deriving accurate FAP. In this paper, the benefit of using these simulations in the detection process is described for the case of regularly sampled time series. The case of an irregular sampling will be developed in a second paper 2 , whereas the present analytical studies can nevertheless provide a useful proxy of the performance that can be expected in the case of an irregular sampling close to regular (e.g., one point per night at roughly the same hour). This paper is divided into five sections. In Sec. 2, we evaluate the realism of the 3D MHD simulations. In Sec. 3, we use the standardized periodogram and present several detection tests to exploit this periodogram. In particular, we discuss some tests that are new to the exoplanet field: namely, the Higher-Criticism (Donoho & Jin 2004) and the Berk-Jones tests (Berk & Jones 1979). In Sec. 4, we perform a numerical study to investigate the benefit of our procedure and present our conclusions in Sec. 5 and 6.

Simulated solar granulation noise
As a preamble, we aim to test the realism of 3D MHD simulated RV time series of convective (granulation) noise and compare it to the RV time series obtained using the spectrophotometer Global Oscillation at Low Frequencies (GOLF).

Space measurements of RV solar convective noise
Measurements from spaceborne missions represent an excellent opportunity to validate the simulated velocities of solar convection. Indeed, they are not affected by the alternation of day and night or any ground-based follow-up problems (e.g., the influence of the Earth's atmosphere) and provide regularly sampled time series at high cadence.
Since 1996, the GOLF spectrophotometer on board the Solar and Heliospheric Observatory (SoHO) spacecraft takes an almost continuous measurement of the solar disk-integrated position of the Sodium doublet. More particularly, it measures the position in the "blue" and "red" wings of the lines at ± 108 Å from the center of the lines, which are located at λ = 5895.924 Å (D 1 ) and 5889.950 Å (D 2 ). The solar light enters into a sodium vapor cell and a magnetic field splits the absorption lines (Zeeman effect). Then the Doppler shift (i.e., velocity) is evaluated as the flux ratio on these two points of the lines' wings (see p. 328, Unno et al. 1989): where F B and F R are the fluxes in the blue and red wings, respectively. For more technical details about this velocity extraction, we refer to Boumier & Dame (1993); Gabriel et al. (1995); Garcia et al. (2005) and Appourchaux et al. (2018). After roughly one year of GOLF measurements, an instrumental failure happened and the velocity extraction was done using only one side of the sodium doublet: the blue wings (where the solar intensity comes from the bottom of the photosphere) from 1996 to 1998 and from 2002 until now and the red wings (where the solar intensity comes from the upper layers of the photosphere) between these dates (Garcia et al. 2005). Therefore, a careful calibration of the GOLF data was needed to obtain consistent velocities and several calibrations have been proposed. We chose to use the recent level-2 GOLF data 3 calibrated as described in Appourchaux et al. (2018). In order to have the same sampling as in our MHD simulations, we sampled the GOLF time series every minute 4 . Moreover, we divided the GOLF time series into two-day sequences to study the RV correlations on the granulation timescales (from a few minutes up to several hours) and to validate them with Monte Carlo (MC) simulations based on a large number of solar subseries (see the statistical results presented in Sec. 3). From the entire sample of two-day sequences, we removed the ones containing observation gaps to have a perfectly regularly sampled time series (as our working hypothesis throughout this paper).
Finally, we applied a low-pass filter of 1620 µHz (i.e. ,10.3 minutes) passband to filter out the oscillations modes and to restrict sensitivity to pick up only the convective noise. We computed the velocity root-mean-square (rms) of each 182 sequences available on the 1996 dataset (i.e., at solar cycle minimum, no calibration problem) and obtained an average value of 49 cm.s −1 , which is in agreement with Pallé et al. (1999). An example of a two-day sequence and the corresponding periodogram will be shown in Sec. We use the state-of-the-art radiative MHD code (STAGGER CODE, Nordlund & Galsgaard 1995) to simulate the surface convection and stratification of the Sun. In a 3D local-box model of the solar atmosphere (size: 8000 × 8000 kms and +500 and −3400 km above and below the surface at optical depth τ = 1), the code solves the full set of conservative MHD equations coupled to an accurate treatment of the radiative transfer. The horizontal sizes of the domain are defined to contain a sufficient number of granules at each time-step. The code is based on a sixth-order explicit finite difference scheme. The equations are solved on a staggered mesh where the thermodynamic variables are cellcentered, while the flux is shifted to the cell edge. The domain of simulation contains the entropy minimum located at the surface (photosphere) and is extended deep enough to have a flat entropy profile at the bottom (adiabatic regime). The code uses periodic boundary conditions horizontally and opened boundaries vertically. At the bottom of the simulation, the inflows have constant entropy and pressure. The outflows are not constrained and are free to pass through the boundary. We used a realistic equation-of-state that accounts for ionization, recombination, dissociation (Mihalas et al. 1988), and continuous line opacity (Gustafsson et al. 2008). Radiative transfer is crucial since it drives convection through entropy losses at the surface (Stein & Nordlund 1998) and is solved using the Feautrier's scheme along with several inclined rays (one vertical, eight inclined) through each grid point. The wavelength dependence of the radiative transfer is taken into account using a binning scheme, in which the monochromatic lines are collected into 12 bins. The numerical resolution used for the present simulation is 120 3 . The choice of this modest resolution is a compromise between sufficient fine grid to catch enough of the inhomogeneities and sufficiently small to minimize the computing and storing costs of very long-run simulation. The solar parameters that define our 3D model are T eff = 5775 ± 30 K, log g = 4.44 and a solar chemical composition ). The uncertainty in T eff represents the fluctuations due to convection and p-modes. The average magnetic field in our simulation is ∼ 100 G, as observed by (Hanle) spectropolarimetry (Trujillo Bueno et al. 2004).
In this work, we use an exceptionally long series of solar snapshots computed to study the properties of solar p-modes (Bigot et al. (in prep. for A&A). It represents 53.14 days with a sampling of 60 seconds (i.e., 76 528 snapshots). To our knowledge, this is the longest series ever generated with such a 3D code. For the present study, we filtered out these modes since they have unrealistic large amplitudes (due to their small iner-tia) in such shallow boxes of granulation simulation. The synthetic sodium doublet lines are obtained for each snapshot by a monochromatic line transfer within [5884.000, 5901.945] Å and at a resolution of 20 000.
The synthetic line intensities I(t, x, y, λ, µ, φ) and the continuum C(t, x, y, λ, µ, φ) are computed for each x and y, the horizontal Cartesian coordinates of the simulation box and for several inclined rays defined by µ, the cosine of the six limb angles, and four azimuthal angles φ. The chosen discrete µ i values are defined by the Gauss-Radau procedure. For six angles, we then have µ i = {0.12, 0.39, 0.60, 0.80, 0.92, 1.00}. We averaged these intensities both horizontally and in azimuth to obtain our time-dependent center-to-limb intensity I(t, λ, µ) and continuum C(t, λ, µ), from which we will extract the radial velocities the following sub-sections.

RV dependence on the center-to-limb position
The radial velocities associated to each value of µ i are obtained using (1). To compute (1), we generated the fluxes F(t, λ, µ i ) as the ratio of I(t, λ, µ i ) over C(t, λ, µ i ) for each µ i . We then extracted the mean line profile F 0 (λ, µ i ) by averaging the fluxes F(t, λ, µ i ) over t and used this reference profile to evaluate the fluxes ratio involved in (1). Finally, we translated these Doppler shifts into velocities using a proportional factor (κ) that results from a Taylor development around the considered wavelength λ 0 (see p. 328, Unno et al. 1989). This factor needs to be evaluated for each line of the Sodium doublet. It writes: with c the speed of light and λ 0 the wavelength corresponding to an intensity level of reference. We set this level of reference to F 0 (λ B 0 , µ i ) = F 0 (λ R 0 , µ i ) = 0.5 with λ B 0 and λ R 0 the wavelengths in the blue and the red wings, respectively. The RV time series associated with three of the discrete {µ i }-values are shown in Fig. 1.
The extracted radial velocities are strongly decreasing from the limb to the disk center, with typical rms velocities of 234.2 m.s −1 at the limb (µ = 0.12) and 23.3 m.s −1 at the disk center (µ = 1), as it is observed for the Sun (Löhner-Böttcher et al. 2018). This is explained by the fact that the observer does not see the same components of convective flows at the limb and the disk center. Indeed, the radial velocity is the projection of the total convective velocity, which includes both the vertical and horizontal velocities. At the disk center, radial and vertical velocities are the same, but at the limb, we only see the horizontal component. Since convection is strongly decelerating and horizontally diverging at the surface so that the gas overturns back to the interior in vertical downflows, the horizontal speeds are much larger than the vertical ones (Stein & Nordlund 1998;Nordlund et al. 2009). This explains the much larger values found at the limb than at the disk center. Moreover, the contribution of the small vertical velocity at the limb is strictly zero due to the projection effect. Despite the large rms velocities at the limb, we see in the following section that its contribution is limited due to surface projection effect when considering the disk-integrated velocities.

Disk-integrated RV
A single 3D simulation box represents a tiny fraction of the solar surface. Typically, we need N B = 2πR 2 / 2 ≈ 4.7 × 10 4 simulation boxes to cover the visible solar disk (i.e., half of the solar surface) with R the solar radius and = 8 Mm the hori- The idea is to use the 76 528 available synthetic line profiles to patch a surface equivalent to the solar disk. Contrary to the previously mentioned studies, which used very short time series of a couple of hours, we have approximately 2 × N B boxes to patch the solar disk at a given time, t, that allows us to cover the entire disk without duplication of the same snapshots. This allows us to avoid using the same snapshot in the patching procedure multiple times, which could lead to unavoidable correlations. In the present study, we randomly distributed the snapshots all over the surface, with the condition that two consecutive patches should correspond to times that are separated by at least 20 min to minimize possible correlations. For each patch k, we have an associated emergent intensity I(t, λ, µ k ). The values of µ k are calculated depending to the position of the patch on the grid and the intensities I(t, λ, µ k ) interpolated from the six Gauss-Radau values using a second-order polynomial function. Then we let each of these patches evolves independently; that is, for each patch k, we performed a new interpolation from the six Gauss-Radau values to derive the new value for I(t+1, λ, µ k ) corresponding to the considered µ k . For each t and λ, we evaluated the disk integrated emergent flux as a function of wavelength, and we normalized the flux (3) by its corresponding value in the continuum F C (t, λ) = N B k=1 C(t, λ, µ k ) µ k . As in Sec. 2.2.3, we then generated the mean line profile F 0 (λ), which is our reference spectral line, to calculate the final Doppler shifts resulting from these disk-integrated synthetic Sodium line spectra. Fi-nally, we extracted the Doppler velocity by measuring the flux ratio in the two points of each of the lines' wings using (1) with the proportional factor given in (2).
The acoustic modes are naturally generated by the convective fluctuations inside a simulation box. However, in one shallow box, the modes have much lower inertia than the real observed p-modes. They have therefore much larger amplitudes. Hence, we eliminated their contribution to the RV time series by using a low-frequency filter of 1620 µHz passband (the same applied to GOLF time series; see Sec. 2.1). The rms of the final synthetic RV time series is 0.507 m.s −1 , which is a value very close to the observed rms from space with GOLF (Pallé et al. 1999), and from the ground with HARPS-N (Collier Cameron et al. 2019). We note that our rms value does not take into account the possible contribution of the granulation noise to the high frequencies (ν > 1620 µHz) as we filtered them to remove the contribution of the stellar p-modes.
Other rms values due to granulation can be found in the literature. For example, Meunier et al. (2015) derived an rms that is twice higher (80 cm.s −1 ), Cegla et al. (2012) derived a similar value (40 cm.s −1 ) and  derived a smaller value (10 cm.s −1 ). The latter authors discuss the influence of the magnetic field that can reduce the velocity of the granulation flows in the 3D simulations.

Comparison between RV observations and simulations
To compare the synthetic velocity time series extracted from our 3D simulations with the GOLF observations, we added to both datasets a synthetic WGN to replace the high-frequency part of their power spectral density (PSD) that had been filtered out due to the presence of the acoustic modes (see Sec. 2.2.3). The variance of this high-frequency noise was evaluated using the PSD at ν > 1620 µHz of the non-filtered GOLF data. This WGN does not affect the lower frequency part of the periodogram. We note that the influence of the four second exposure time of GOLF has been neglected in the computation of the PSD of the synthetic velocities.
The final comparison of the velocities is shown in the left panel of Fig. 2 for two selected two-day sequences. The corresponding (averaged) periodograms (see Eq. (8) in Sec. 3.2), resulting from the average of L = 26 regularly sampled twoday sequences, are shown in the right panel. The third periodogram represents the PSD of GOLF observations before the filtering of the acoustic modes. Toward the lower frequencies, we observe the frequency-dependent behavior of the solar granulation in all periodograms. When using high-resolution observations, the RV contribution of the stellar granulation acts as a frequency-dependent noise source that drastically differs from a WGN (characterized by a flat power over all frequencies). We observe a good match between the PSDs of the observed and simulated velocities until ν = 56 µHz (i.e., ∼ 5 hours) corresponding to the correlation regime dominated by the granulation process (i.e., ν ∈ [50, 1000] µHz). At lower frequencies, the solar PSD becomes dominated by supergranulation and magnetic activity phenomena (spots and plages) and the comparison of these simulations of granulation with observations becomes obsolete (even if the granulation signal affects also the low frequencies of the PSD). We note that supergranules have longer lifetimes and should generate noise correlated over several days. They are not included in our present MHD simulations but they are also reproducible through 3D simulations, although their computing takes a longer time (e.g., Stein et al. 2009).

Detection
This section presents the considered statistical model and detection tests. Several results detailed in Sulis et al. (2017a) are summarized below for the sake of completeness since Sec. 4 is aimed at validating the theoretical results from this study on real astrophysical data. The purpose of the approach is to detect (possibly quasi-) periodic components in a stationary colored noise with partially unknown statistics. By "partially" we mean that a training dataset of this colored noise is available (through MHD simulations). This noise dataset is independent of the observations (see Sec. 2). In the following, we assume the training data set contains all the noise sources that can affect the dataset under test. We note that currently, the MHD simulations per se cannot reflect the presence of active regions (spots, plages) due to the finite model precision. Hence, this study shows what can be done in the absence of such noise sources or in the situation where activity signatures can be identified by other means and added to the simulation.

Hypothesis testing problem
Let us consider a time series X(t j ) with N points, evenly sampled on times t j = j × dt for j = 1, . . . , N with dt the sampling time step. We consider a binary hypothesis problem of the form: where, under the null hypothesis, H 0 , the data contain only the colored noise E(t j ) (of which a training set is available). The noise E is defined as a zero-mean second-order stationary Gaussian noise with unknown power spectral density S E and absolutely integrable autocorrelation function r E (see Sulis et al. (2017a)). The alternative hypothesis, H 1 , represents the case where an unknown RV planetary signal R(t j ) is melded with the colored noise. As illustrated, for example in Sulis et al. (2016), RV Keplerian signatures can be well approximated by a limited number of pure oscillations : where the vector θ R collects all the unknown amplitudes α q ∈ R + * , frequencies f q ∈ R + * , and phases, ϕ q ∈ [0, 2π[, of the N s sinusoids. If a star reflects N p planetary signatures, N s is, in general, larger than N p . The case N s ≈ N p corresponds to N p planets with circular orbits and frequencies close to the Fourier grid. In all situations, N s is much smaller than the number of Fourier frequencies.

Detection approach: a standardized periodogram
For simplicity, we consider for this section a unit time sampling dt = 1 and N even. When the observation sampling is regular, the search of periodic components can be done using the classical periodogram (Schuster 1898) defined as: We note that to express P in units of density (m 2 /s 2 /Hz), expression (6) has to be divided by the passband 1/dt (see Eq. (11.6), Chap. 11 of Percival 1994). This leads us to consider in (6) a discrete Fourier frequencies defined as: Owing to the hermitian symmetry of the Fourier transform and because we are not interested in the null frequency, below we consider P(ν) in (6) only as evaluated on a subset of N 2 − 1 independent Fourier frequencies corresponding to k ∈ Ω := {1, . . . , N 2 − 1}. Asymptotically, P is an unbiased 5 (but inconsistent 6 ) estimate of the PSD (see Brillinger 1981, Theorems 5.2.1 and 5.2.4). The asymptotic distributions of P under both hypotheses are known ∀k ∈ Ω: Brillinger 1981, Theorem 5.2.6), with S E the (unknown) noise PSD and λ k = λ(ν k ; S E , θ R ) a noncentrality parameter. For N s sinusoidal components involved under H 1 , the expression of this parameter can be found in Eq.
(6) of Sulis et al. (2017a). We note that if the noise PSD S E is unknown, the distribution of P given in (7) is also unknown. Assuming now that L time series of the colored noise (denoted by {X }, = 1, . . . , L below), can be generated under H 0 as a training dataset, we propose to use them as an estimate of the noise PSD to calibrate the periodogram of the data under test. Based on these L time series, we compute an averaged periodogram defined as: We note that this periodogram has been initially introduced by Bartlett (1950), and used on subseries) to reduce the variance of the classical periodogram given in (6). This averaged periodogram is an asymptotically consistent and unbiased estimator of the PSD. Following the same reasoning as for (7), the asymptotic distribution of P L can be easily derived ∀k ∈ Ω as: Using (8) to calibrate (6), we define the standardized periodogram as: Thanks to the known distributions of the numerator and denominator of (10) and to their mutual independence, we can also derive the distribution of the standardized periodogram. Using (7) and (9), we obtain a ratio of two independent χ 2 variables. This ratio leads, under H 0 and H 1 , to a central and non-central F-distribution with respectively 2 and 2L degrees of freedom: We note that under H 0 , the distribution of the standardized periodogram P is now asymptotically independent of the noise PSD S E . This important property makes tests applied to P act as Constant False Alarm Rate detectors (Scharf & Friedlander 1994): their false alarm rate is independent of the noise PSD. This is a very desirable feature in practice since it allows to control the false positive rate despite the unknown noise PSD. Under H 1 , the distribution depends on the noise PSD through the non-centrality parameter λ k . The definition and the analysis of the theoretical performance of tests based on (10) are summarized in the following section.

Analysis of tests applied to the standardized periodogram
Before introducing the tests, it is convenient to consider vectors of random variables, noted in bold. For instance, the vector collecting the periodogram ordinates is written as: Notation x|y denotes a standardization of the entries of x by those of y. For instance, the vector of periodogram ordinates is standardized as in (10) and defined on the frequency set Ω. It is written as:

Test designed for a single periodicity
A common test consists of comparing the maximum periodogram value to a detection threshold γ ∈ R + that determines the false alarm rate: This test is most efficient when a single periodicity on the Fourier grid is present under H 1 (Donoho & Jin 2004). As the asymptotic distribution of P is known at each frequency (see Eq. (11)), the false alarm and detection probabilities (noted P FA and P DET respectively), as well as their relationship (P DET (P FA )), can be derived analytically (Sulis et al. 2017a): where N i := N 2 − 1 is the number of frequencies effectively considered in the test, : is the cumulative distribution function (CDF) of a non-central F variable with non centrality parameter λ k . The P DET expressions given in (14) and (15) are approximations due to the approximate independence of the periodogram ordinates under H 1 (see Li 2014, Theorem. 6.5). However, the analytic formulae above are quite accurate for values of N i considered in practice as shown in Sulis et al. (2017a). These results allow saving a substantial amount of computation time for comparing the tests (in comparison with a MC simulation-based approach). They also allow gaining theoretical insight into the relative performances of the tests. Using the relation P DET (P FA ), receiver operating characteristic (ROC) curves can be computed to compare the performances of the statistical tests. Furthermore, these analytical results can also be used to design detectability studies (see Sec. 4.2).

Tests designed for multiple periodicities
Testing for the largest peak in the periodogram may not be the best strategy for the case of multiple (quasi-) periodic signals. Chiu (1989) showed that in such cases tests exploiting order statistics of the periodogram may be more powerful than T M (which looks at the maximum value only). In the case where the number of periodogram ordinates at Fourier frequencies affected by the planetary signature can be guessed or estimated a priori (let N C denote this number), a generalization of test T M replaces the maximum by the N th C largest periodogram components. For such a test, analytic expressions for both the P DET and P FA can also be derived (see test T C in Sulis et al. 2017a).
In practice, however, N th C is often unknown and it is necessary to turn to tests that are adaptive with regard to the number of periodicities contained in the total Keplerian signature. Such tests are based on the P-values (noted v below) of the standardized periodogram. In the framework considered here, the P-values of an observed random variable (periodogram, or test statistic) is defined as the probability, under the null hypothesis, of obtaining a more extreme value than the observed one. Precisely, the P-values of P(ν k ) are defined ∀k ∈ Ω as: and BJ( P | P L ) := max where v P,(k) denotes the order statistics of the P-values of the standardized periodogram (which have a beta distribution, David & Nagaraja 2003), α 0 is a constant ∈ [ 1 N , 1], and I denotes the CDF of a beta variable.
Such tests as HC or BJ consist of setting a multiple testing problem, in which a set of test statistics (in our case, this refers to the periodogram at different frequencies) is taken and each of them are simultaneously considered in order to discriminate between the two hypotheses. In essence, these tests compare the maximal deviation of the empirical CDF of the ordered periodogram's P-values to their true CDF under H 0 . The definition of the deviation depends on the test; both can be seen as variants of a generic divergence (Zhang et al. 2017).
In periodograms under H 1 , the planetary signature affects only a small fraction of the total number of ordinates; furthermore, this is by only a very small amount, leading to a very difficult "needle in a haystack" detection problem. Donoho & Jin (2004) and Moscovich et al. (2016) demonstrate theoretically that HC and BJ present optimal guarantees in this regime. For finite values of N i , the studies of Zhang et al. (2017) and Sulis et al. (2017a) show that BJ can be more powerful than other tests in case of weak and non extremely sparse signatures (e.g., multiplanetary systems of small planets with off-grid orbital frequencies and with high eccentricity orbits). We note that, in the case of irregular sampling -that will be the subject of a second paper, RV planet signatures can be much less sparse in the Fourier domain than for regular sampling owing to the sidelobes of the spectral window. Interestingly, efficient and accurate analytic calculations for the distribution of several adaptive tests, such as the HC and BJ under the null and the alternative hypotheses, have been recently included in Zhang et al. (2017). These features make adaptive tests particularly interesting for exoplanets detection, as illustrated in the numerical study below.

Numerical study
In this section, we first evaluate the validity of the statistical method presented in Sec.3 using the solar observed and synthetic RV time series presented in Sec. 2. In a second step, we perform detectability studies for different planet signatures in the presence of solar convective noise by exploiting our analytical results. Finally, we compare the power of classical and adaptive detection tests for different Keplerian signatures.

Control of the false alarm: comparison of methods
The first part of this numerical study aims to compare the reliability of different false alarm probability estimates. We compare in particular bootstrap approaches to periodogram standardization (assuming a noise training data set is available). For the sake of concision, we focus on one test: the test of the maximum (see Eq. (12)). As for the considered dataset, we selected the regularly sampled two-day GOLF time series that are available for the first ten years of GOLF observations. In this sample, we removed sequences that are affected by strong outliers due to instrumental defects. This corresponds to a set of N series = 1640 GOLF times series, with N = 2880 data points each. As described in Sec. 2, we filtered out the acoustic modes and added to each time series a WGN of standard deviation σ = 49 cm.s −1 . This dataset represents our sample of solar observations under H 0 , as none of them contains any signs of the Solar System planets (the shortest period, of Mercury, is ≈ 88 days or 1.31×10 −7 Hz) nor the stellar oscillations modes (affecting mostly the frequencies in the range 1-5 × 10 −3 Hz) that have been filtered out. In the following, we will run tests on the frequency range that is dominated by the granulation noise: ν ∈ [50 − 8333] µHz.
As discussed in the introduction, a traditional approach in RV planet detection for evaluating FAP thresholds is based on bootstrap procedures. These methods assume that the observations (or their residuals in the case where some periodicities have been removed) contain only noise and that this noise is further uncorrelated with unknown variance. The noise statistics are estimated from the observations (see e.g., Jenkins et al. 2013, Hobson, M. J. et al. 2018, Trifonov et al. 2018, Ment et al. 2018). The FAP is evaluated by estimating the distribution of the test statistic using fake data, typically obtained by shuffling the data.
Let us consider first the (scaled) max test, which, by definition, has FAP defined as where Φ M is the CDF of T M . If the data contains a pure WGN of known variance σ 2 , it can be shown (see e.g., Sec. 2.4.2, Sulis 2017) that  Table. 1. with N i = N/2 − 1 the number of considered (independent) periodogram components.
In the case where the variance σ 2 is unknown, the Max test, taking an estimate of the variance, σ 2 , uses T M (2 P| σ 2 ) as a test statistic. The bootstrap procedure consists in this case of estimating the variance and repeating the following steps: i) shuffle the observed time series, ii) compute the resulting periodogram on the new data set, and iii) evaluate the test statistics (18) with σ 2 replacing σ 2 . After generating a large number of realizations of test's statistics, the FAP is derived as in (19), with the empirical distribution Φ M replacing Φ M . This numerical procedure gives good results when the noise is white and Gaussian. This is illustrated in panel (a) of Fig. 3. This panel shows three FAP as a function of the detection threshold for the Max test. First, the blue line shows the FAP of test T M (2 P|σ 2 ): this is the case for which σ 2 is known and the FAP is obtained using (20) in (19). Second, since the bootstrap procedure describes above estimates σ 2 for the time series and follows steps i) to iii) above, the estimated function P FA (γ) depends on the original data set used to generate the "fake" data set obtained by shuffling. One FAP estimate obtained for one particular data set, a WGN with standard deviation σ = 49 cm.s −1 , is shown by the red curve, while the true FAP of T M (2 P| σ 2 ) (evaluated on a WGN of variance σ 2 instead of σ 2 ), is shown by the dark green curve. Third, if we investigate the dependence of the FAP estimate with the original dataset (used to estimate σ 2 ), we obtain the orange curves of the panel (a): here we show 100 curves corresponding to 100 different original data sets. We see from this panel that the bootstrap procedure is quite stable with respect to the considered dataset.
When the noise is colored, the data shuffling breaks the correlations present within the data and the situation changes. This is shown in panel (b). The red curve shows the FAP of the test T M (2 P| σ 2 ) estimated by bootstrap on one particular time series of estimated variance σ 2 . The true FAP of this test as estimated from the remaining N series − 1 is shown by the dark green curve. The orange (resp. light green) curves show the same as the red (resp. dark green) curve for all other times series. In contrast to the WGN case, the evaluation of the FAP is not robust nor reliable in this case. Hence, if noise correlations are ignored, a classical bootstrap procedure may severely underestimate the FAP and derive irrelevant thresholds. This is further illustrated in panels (d) and (e) of Fig. 3. In all bottom panels, the estimated distributions of test T M are shown in grey. The empirical thresholds corresponding to FA rates of 1% and 10% are represented by the vertical solid and dotted lines, respectively. Their numerical values can be read in Table 1. Panel (d) shows this distribution as obtained from one bootstrap procedure in the case of WGN. In this case, the thresholds estimated by bootstrap are close to the values they should have to ensure the target FAP. However, for the case of solar observations (panel (b)), these estimates are incorrect and lead to FAP that can be an order of magnitude larger than the target value. For instance, for one series (see red curve in panel (b)), the bootstrap procedure derives for a FAP of 1% a threshold value of γ = 23.65, whereas this value is clearly underestimated: at this threshold, the true FAP, as estimated using all other time series, is in the range of [70.3%, 95.1%] (see green light curves).
To conclude this part of the analysis of results on GOLF data, we now turn to test T M applied to the standardized periodogram (see Eq. (12)). In this case, the theoretical FAP is known (although the noise DSP is analytically unknown) and given by (13). To verify this expression, we standardize each of the periodograms of the GOLF sequences by the averaged periodogram computed using the L = 20 noise training datasets generated by the MHD simulations of the granulation (see Sec. 2). We then apply test T M ( P | P L ) and derive the associated P FA as in (13). The results are shown in panels (c) and (f) of Fig. 3. This time, we observe in both cases a very good match between the theoretical FAP and the empirical values (see also last columns of Table. 1).
We conclude the presentation of this first study with a short discussion. Of course, our point is not to show that the bootstrap is doomed to fail in case of colored noise; rather, it might possible to design bootstrap procedures that would take benefit from a training data set (as the approach of panel (c) does) or would use pre-whitening to obtain more robust FAP estimates than shown in panels (b) and (e). Our point here is primarily to show that noise correlation caused by stellar convection severely impacts FAP estimates and that the proposed approach based on standardization achieves the desired robustness in estimating the FAP. These results validate the MHD simulation-based standardization approach for the control of the FAP and in particular the accuracy of the analytic calculations for test T M ( P | P L ) on real data. Since the principle of the approach based on accurate MHD simulations would be unchanged for a different spectral type, these results suggest that it can be used for detecting exoplanets orbiting any type of convective star.

Detectability study
As we have seen in Sec. 3, exploiting MHD simulations of the granulation noise opens up the possibility for analytically controlling the false alarm rate and extending the power of the tests for any values of the observation parameters. Comparing the impact of these parameters on the probability of detection for a fixed FAP is very useful in designing observational strategies, for instance.
Let us consider again the test T M given in (12), for which the detection probability can be computed using expression (15) for a given P FA . Given a specific planetary signature, we want to evaluate the observation duration (T obs ) that is required to allow for the detection of this planet with a large probability (say, P DET = 80% at P FA = 1%). We simulated for this study different planetary signatures under H 1 with circular orbits and orbital frequencies on the Fourier grid (we slightly adjusted the time sampling step dt as T obs increases to guarantee that the period is exactly on the grid). For such signatures, only one periodogram ordinate is affected under H 1 , while T M is optimal (Donoho & Jin 2004). For periodogram standardization, again we used the simulated velocities discussed in Sec. 2. The considered convection noise corresponds to a Sun-like star.
Some results are shown in Fig. 4. Each panel of the figure investigates the influence of a different parameter (see legend). The black curves correspond to a configuration where a 0.5 Earth-mass planet orbits circularly its host star with a period of 17.5 hours, the regular time sampling step dt is 2 hours and L = 20 MHD simulations time series are available for periodogram standardization. This setting corresponds to a RV signature of semi-amplitude K = 0.35 m.s −1 and an orbital frequency of f p = 1.58 × 10 −5 Hz. In each panel, the pale blue dot indicates a detection probability of 80% in this configuration for this planet. The dashed lines in the first three panels represent the detectability in the case where the noise is white (instead of colored) but with the same standard deviation as the colored convection noise (σ = 49 cm.s −1 ). The analytical expression for this probability is (see Eq. (2.53) in Sulis 2017): .P DET (γ; T M (2 P|σ 2 )) : where Φ χ 2 2 ,λ k is the CDF of a non central χ 2 2 distribution with two degrees of freedom and the non-centrality parameter λ k (see Eq. (6) of Sulis et al. 2017a).
The panel (a) of Fig. 4 shows, for instance, that in the considered configuration, an observation run totaling T obs ≈ 12.4 days (corresponding to N ≈ 150 sample points with dt = 2 hours) would allow the detection of a 0.5 Earth-mass planet with a probability of 80%, while ensuring a false alarm rate of 1%. We note that, in contrast, we would only need T obs = 3.0 days to reach the same trade-off P DET vs P FA if the noise was uncorrelated (see the dashed circle). This factor ≈ 4 in duration is the price that has to be paid in order to fight against correlation caused by convection noise. If the planet mass is lower (panel (a)), the required observational time T obs can increase extensively. For example, for a planet with a mass similar to Mars (≈ 0.1 M ⊕ leading to an RV semi-amplitude K = 0.07 m.s −1 ), we would need at least 457 days of observations to achieve the same performances. Similarly, if the planet's period increases, the needed observational Fig. 4. Detection probability as a function of the observation time for a single planet in circular orbit with period 17.5 hours around a solar-type star, for test T M ( P|P L ), at P FA = 1%. The orbital inclination is set to 90 degrees. The different panels show the influence of the planet's mass (a), the orbital period (with the corresponding orbital frequency on (black) or off (gray) -Fourier grid) (b), the time sampling step (c) and the size of the training data set (i.e., the number of available noise times series) (d). In each panel, the black curve indicates the P DET obtained for a configuration in which the planet has a mass of 0.5 M ⊕ , a circular orbit and L = 20 HD time series are available for periodogram standardization. The point where P DET reaches 80% for this configuration is indicated by the blue disks. In each panel, the legends indicate the parameters under study. The dashed lines represent the planet's detectability in the case where the noise is white (instead of colored) but with the same standard deviation as the colored convection noise (σ = 49 cm.s −1 ). time increases (because the amplitude of the Keplerian signature decreases, which is not shown).
The test's performance depends also on the sampling of the orbital frequency (panel (b)). In our example, if the orbital frequency is not on the Fourier frequency grid, the detection performance of this test decreases, with a loss in P DET that can reach a factor of 2.
Increasing the sampling time step (panel (c)) or decreasing the number of used training data set (panel (d)), increases also T obs . We also note that there is a very small improvement of the test performance brought by increasing L as soon as L is sufficiently large (for L = 50 and L = 1000, when the required observation durations are T obs = 10.5 and 9.5 days, respectively). This fact is particularly interesting since the MHD simulations are computationally heavy and L = 1000 may remain outside of the reach of the coming decades.
These plots are examples of false alarm versus power tradeoffs that can be achieved by exploiting reliable time series of the convective colored noise. We note that the values indicated in this study are drastically different from those reported in Sulis et al. (2017a). For instance, we reported T obs = 250 days for an 1.1 M ⊕ planet orbiting its star in 3.2 days with dt = 4h and L = 100, while with these same parameters, we find now T obs ≈ 17 days. The reason is that the PSD considered to represent the solar granulation noise source is different from that given in these first works: the considered PSD is now more realistic and deeply checked against Solar observations (see Sec. 2).
We now give an example of an application of adaptive tests, which are less well known in the exoplanet community than test T M , although they can sometimes present advantages over the latter.

Detectability of general Keplerian signatures
In this section, we compare the performances of the different tests presented in Sec. 3, i.e., T M (12), HC (16) and BJ (17) for different types of Keplerian signatures. The combination of Keplerian parameters influences the shape of the RV signature which, in turn, influences the sparsity of the signature in the Fourier domain, that is, the number of periodogram components affected by the presence of a planetary signature (e.g., see Sulis et al. 2016 for a detailed study of the influence of Keplerian parameters on sparsity). Here we define the sparsity coefficient S β as the proportion of non-zero coefficients and, as in Donoho & Jin (2004), we parameterize S β as: with β ∈ [0, 1] a sparsity parameter. The value β = 1 corresponds to an extremely sparse signal (i.e., a single periodogram frequency is affected by the periodic signal), and β → 0 to a non sparse signature. RV signatures correspond in general to sparse signatures (β is typically in the range [ 1 2 , 1]). The less sparse signatures are obtained for multiple systems, planets having highly eccentric orbits and planets with off-Fourier grid orbital frequencies.

Adaptive tests
We compare the detection probability of tests T M , HC and BJ for two types of planet signatures. For the first case, we consider the signal of a 0.4 M ⊕ planet in a circular orbit with frequency on the Fourier grid. In the second case, we consider the same planet but with a highly eccentric orbit (e = 0.9) and a slightly off-grid orbital frequency. The RV signatures of these two planets and the corresponding periodograms are shown in the top panels of Fig. 5. We see a difference in the number of significant peaks in their corresponding periodograms.
We performed MC simulations to evaluate the ROC curves of tests T M , HC and BJ. As the number of available MHD simulations is limited, we generated the noise under H 0 using a 20 order autoregressive (AR) process, with parameters fitted to the MHD simulated time series. We generated 10 4 realizations of this AR process under H 0 and 10 4 other realizations under H 1 with the two RV planetary signatures. For each realization, we computed the standardized periodogram in Eq. (10) using L = 20 series for the denominator in Eq. (8), and applied the different tests. Results are shown in the bottom panel of Fig. 5. Comparing the performances of test T M with the adaptive tests, we observe for the "sparse signal" the best results for T M over the other two tests, with HC close to T M at low FAP. However, in Fig. 5. Top: Synthetic RV time series (left) and corresponding periodograms (right) for a single planet in circular (black) or eccentric (red) orbit around a Solar-type star. For both signals, the planet mass is set to 0.4 M ⊕ , the period to 17.5 hours, the orbital inclination to 90 degrees, the argument at periastron to π/2 radian, the time sampling step to 2 hours and the number of data points is N = 300. For the eccentric planet, the planet orbital frequency is slightly off the Fourier-frequency grid. Bottom: ROC curves of tests T M (solid), HC (dashed) and BJ (dotted) applied to P|P L with L = 20 for the considered circular (left) and eccentric (right) orbital signals. the case of a high-eccentricity planetary orbit, tests BJ and HC show better performances than T M at all FAP. We see that these adaptive tests present another important side advantage over T M : their test statistics can be used to reliably estimate the frequency content for complex planetary signatures.

Ability of adaptive tests in recovering the signal's frequency support
Here we evaluate the ability of each test in detecting the correct number of periodic components at their true location (i.e., the true signal frequencies in the periodogram). We note that this problem is different from that of discriminating between the "noise only" vs "planet plus noise" hypotheses. To be able to quantify easily the number of periodic components to be detected under H 1 , we consider periodic signals in the form of a sum of N s pure sinusoidal signals with frequencies on the Fourier frequency grid. Moreover, to generate a large amount of MC simulations, the colored noise is generated as a low order AR process. The detection thresholds for a target FAP of 5% were derived for all considered tests by MC simulations on 10 4 noise sequences under H 0 . For the training dataset used for peri-odogram standardization (see Eq. (10)), we generated L = 20 synthetic separate noise time series for each of these 10 4 sequences.
In each case, we computed tests T M (12), HC (16), and BJ (17) on P|P L . By definition, test T M only focuses on the largest component of the (standardized) periodogram. In contrast, tests HC and BJ focus on one particular ordered P-value: the one for which the corresponding test statistic is maximum. This ordered P-value, say v P,i corresponds to the i largest periodogram components. Consequently, when a detection is made, the i largest periodogram ordinates can be used to estimate the signal's frequency support. In the following, we estimate by MC simulations the probability that i = N s and that the identified frequencies correspond to the true signal frequencies, at a fixed FAP of 5%.
Under H 1 , we varied the number N s of sinusoidal signals (by varying parameter β) added to each of the colored noise time series: N s ∈ [1, 100], corresponding to β ∈ [0.33, 1]. The N s sinusoids' amplitudes denoted by A in Fig. 6 were taken as equal, with A varied in the range [0.1, 4.5] m.s −1 , while their N s frequency locations f q were picked randomly in the Fourier grid. Fig. 6. Probability of detecting the correct support (i.e., to find the true number of sinusoids with the correct frequencies) for tests T M (left), HC (middle) and BJ (right) applied to P|P L with L = 20. For each given value of β, the N s amplitudes are equal. The left panel represents this probability as a function of the sinusoid's amplitude (N s = β = 1) and the performances of tests HC and BJ have been added for comparison. The two other panels represent the adaptive tests' performances as a function of the sinusoids' amplitude A and the sparsity parameter β. In these two panels, the probability of correct recovery is indicated in color.
The results are shown in Fig. 6 as a function of the signal parameters (amplitude and sparsity) for tests T M , HC, and BJ (from left to right, respectively). For T M , we only show the case of β = 1 as this test statistic focuses on the largest periodogram component. In the left panel, test T M appears to be the best to correctly locate the signal frequency when only one frequency is present (the adaptive tests are shown for comparison). For N s = 1 (β = 1), the probability of correctly locating the sinusoid frequency (P R ) grows faster to 1 for this test than for the other tests (with HC close to T M ). In contrast to T M , tests HC and BJ allow both for the detection and characterization of the planetary frequency support (middle and right panel of Fig. 6). We also note the particularly good performances of BJ vs HC for recovering the frequency support over a large sparsity regime. These tests, which benefit from theoretical optimal results (see Donoho &Jin 2004 andMoscovich et al. 2016), can be exploitable in the context of exoplanet detection by RV thanks to the considered periodogram standardization. Their good performances in theory and practice make them particularly interesting for the detection and characterization of extrasolar planetary systems.

Scope of the proposed method and a zoom on USP planets
Convective noise affects all components of the periodogram but its effects impact mostly the frequency range corresponding to periods between some minutes and several hours for solar-like stars. In Sec. 2, we obtained a good match between the observed and simulated PSD of solar RV in this frequency range. In practice, the convective noise cannot be "corrected" as magnetic activity may be (e.g., with chromospheric indicators, Baliunas et al. 1995;Wise et al. 2018) and constitutes a noise barrier, for which the statistical properties need to be known to reliably claim any planet detection at the cm.s −1 level. This was the purpose of our study and its presentation of the formalism of the approach. In a subsequent work, we will apply this formalism to other Solarlike stars having different convective properties. The proposed method of basing such studies on a standardized periodogram could be directly applied to improve the de-termination of the FAP in the case of ultra-short period (USP) planets (defined with periods < 1 day) under H 1 . USP planets are known to be tidally locked to their host star (leading to circular orbit) and of small size or mass (< 10M ⊕ ). They exist, in general, in multi-planetary systems. According to Winn et al. (2018), this category of planets is as frequent as hot-Jupiters (defined with periods ranging up to 10 days), with one over 200 Sun-like stars hosting such planets. To illustrate the performance of the proposed technique for detecting USP planets, we performed a similar detectability study as described in Sec. 4.2 for some known USP planets (we assume their hosting star is similar to the Sun). The results for a sampling rate of dt = 12 hrs and for L = 2, available synthetic noise light curves are reported in Table. 2 and displayed in Fig. 7. The figure represents the observational time required to reach a P DET of 80% for a P FA equal to 1%, using test T M given in (12), as a function of the planet's orbital period. The colored curves show the observational time for virtual planets of different periods and masses. The black crosses correspond to real planets. Logically, we observe the increase of the observational time with the decrease of the planet mass and the increase of the planet period. The table indicates that most USP planets are detectable at the levels specified above with our technique within a couple of weeks (< 21 days). For small mass USP planets, with M p < 2M ⊕ , it would take a couple of months to achieve the same performances.
Finally, we expect the method presented here to be easily extended to larger period ranges by computing MHD supergranulation instead of granulation, that is, by extending the simulation domain (making it larger and deeper) with exactly the same simulation setup (Rincon & Rieutord 2018). Since this would be more demanding in terms of CPU and storage, while retaining, in principle, what is shown here, we restrict the scope of this study to granulation scale simulations.

Benefits and limitations of the method
MHD simulations are non parametric, meaning that they do not rely on any adjustable parameter to fit the observed data (see Sec. 2.1). In practice, the frequency dependence of the granulation is often estimated using parametric laws, such as Harveylike profiles (Harvey 1985). However, as demonstrated in Sec- Fig. 7. Observation time as a function of the planet orbital period that is needed to achieve P DET = 80% and P FA = 1% with test T M (colored curves). The fixed parameters are dt = 12 hrs and L = 2. The observation time associated with some known USP exoplanets (see Table. 2, assuming a Solar-like star) are represented by black crosses.
tions VI and VII.D of Sulis et al. (2017a), the estimation of the parameters of these models leads to the injection of an estimation noise in the detection process, the statistics of which are difficult to capture. Besides, the noise parameters derived in this way may be contaminated by the signal to be detected and the choice of the noise parametric model can be subjective. As we show in this study, using an MHD simulation-based approach allows us to accurately control the estimation noise through the number L, whose impact on the tests performances can be exhibited analytically (see Eqs. (14)- (13)).
This study needs to be extended to other convective stars. The impact of granulation changes throughout the HR diagram: the larger is the pressure scale height at the surface (that is for larger effective temperatures or lower gravities), the larger are the fluctuations induced by the convective motions. The current limitation in the present method is the computational cost of the MHD simulations to generate a substantially long time series of velocities. However, in the coming years, the increased speed of CPU resources will allow for such computations to be carried out in a more systematic way. In a subsequent work, we will explore these effects for selected targets in the HR diagram. Based on the realism of the 3D MHD simulations, our results suggest that the proposed method can be a powerful and reliable way of detecting RV exoplanet signatures at the cm.s −1 level in the presence of convective noise.
A second limitation is the regular sampling involved in this study. In practice, the RV data are irregularly sampled and the FAP of any test based on any periodogram (Schuster 1898;Scargle 1982;Zechmeister & Kürster 2009) cannot be controlled by analytical expressions in the case of correlated noise because the periodogram components are interdependent. However, as mentioned earlier in this paper, we underline that if the irregularity of the considered sampling remains weak, the analytical studies presented here may provide a useful proxy of the tests' performance in practical situations. For example, this can be used to design detectability studies. For strongly irregular samplings, the techniques based on the MHD standardized periodogram presented here need to be adapted by dedicated bootstrap procedures (see, i.e., Sulis et al. (2017b)). The application Table 2. Table of some known USP planets given in the exoplanetarchive.ipac.caltech.edu catalog. For all targets, the stellar mass is assumed to be 1 M and the eccentricity 0. Columns indicate the planet's name, orbital period, mass and the observational time we need with the proposed technique to achieve P DET = 80% and P FA = 1% with test T M ( P | P L ) computed for L = 2 and dt = 12 hrs. Symbols ( ) indicate that the target is detectable with probability higher than 80% for the considered parameters (mass and period) and sampling rate. For instance, WASP-47 e would be detectable with probability > 97% as soon as the observation time is superior to 6 days and 55 Cnc e with a probability > 99%. of this procedure to the real data deserves a full study that will be the purpose of a second paper.

Conclusions
In cases where the effective temperature, surface gravity, and metallicity of the star are precisely known (thanks to asteroseismology, interferometry, or spectroscopy), 3D MHD simulations are capable of generating realistic RV time series of the stellar granulation. This has been demonstrated for the Sun as part of studies involving the comparison of velocities extracted from 3D spectra of the sodium doublet and GOLF/SoHO observations. Following the theoretical analysis described in Sulis et al. (2017a), we used these synthetic time series of the granulation colored noise to design standardized periodograms. These new standardized periodograms allow for the application of tests that are both powerful and for which we can derive accurate FAP. We present extensive numerical results based on real and synthetic data, including studies on the robustness, the detectability, and the frequency support recovery. In particular, we introduced adaptive tests, which are new in the field of RV planet detection. Even if the objective of this study is to detect planets down to the cm.s −1 level, the proposed procedure is a general approach that can be applied to many periodicity detection problems in astrophysics (and beyond).