Issue 
A&A
Volume 614, June 2018



Article Number  A30  
Number of page(s)  16  
Section  Planets and planetary systems  
DOI  https://doi.org/10.1051/00046361/201730921  
Published online  08 June 2018 
Astrometry and exoplanets in the Gaia era: a Bayesian approach to detection and parameter recovery
^{1}
Lund Observatory,
Box 43,
22100
Lund, Sweden
email: piero@astro.lu.se
^{2}
INAF – Osservatorio Astronomico di Bologna,
via Gobetti 93/3,
40129
Bologna, Italy
Received:
3
April
2017
Accepted:
25
January
2018
The Gaia mission is expected to make a significant contribution to the knowledge of exoplanet systems, both in terms of their number and of their physical properties. We develop Bayesian methods and detection criteria for orbital fitting, and revise the detectability of exoplanets in light of the inflight properties of Gaia. Limiting ourselves to oneplanet systems as a first step of the development, we simulate Gaia data for exoplanet systems over a grid of S/N, orbital period, and eccentricity. The simulations are then fit using Markov chain Monte Carlo methods. We investigate the detection rate according to three information criteria and the Δχ^{2}. For the Δχ^{2}, the effective number of degrees of freedom depends on the mission length. We find that the choice of the Markov chain starting point can affect the quality of the results; we therefore consider two limit possibilities: an ideal case, and a very simple method that finds the starting point assuming circular orbits. We use 6644 and 4402 simulations to assess the fraction of false positive detections in a 5 yr and in a 10 yr mission, respectively; and 4968 and 4706 simulations to assess the detection rate and how the parameters are recovered. Using Jeffreys’ scale of evidence, the fraction of false positives passing a strong evidence criterion is ≲0.2% (0.6%) when considering a 5 yr (10 yr) mission and using the Akaike information criterion or the Watanabe–Akaike information criterion, and <0.02% (<0.06%) when using the Bayesian information criterion. We find that there is a 50% chance of detecting a planet with a minimum S∕N = 2.3 (1.7). This sets the maximum distance to which a planet is detectable to ~70 pc and ~3.5 pc for a Jupitermass and Neptunemass planets, respectively, assuming a 10 yr mission, a 4 au semimajor axis, and a 1 M_{⊙} star. We show the distribution of the accuracy and precision with which orbital parameters are recovered. The period is the orbital parameter that can be determined with the best accuracy, with a median relative difference between input and output periods of 4.2% (2.9%) assuming a 5 yr (10 yr) mission. The median accuracy of the semimajor axis of the orbit can be recovered with a median relative error of 7% (6%). The eccentricity can also be recovered with a median absolute accuracy of 0.07 (0.06).
Key words: methods: statistical / astrometry / methods: numerical / techniques: miscellaneous / celestial mechanics / planetary systems
© ESO 2018
1 Introduction
The detection of exoplanets with astrometric techniques is simple in principle. Although an exoplanet is usually too faint to be observed with current instruments, its existence can be inferred from the motion of its host star around the centre of mass of the starplanet system. In the past, several claims of detections have been made and later disproved starting from Jacob (1855) (see Quirrenbach 2010 for a brief historical account). The difficulty lies in the small magnitude of the host star astrometric shift, which even in the most favourable cases is ≲1 mas (see e.g. Sect. 3).
The astrometric satellite HIPPARCOS (ESA 1997) could measure the alongscan position of a star with an error of a few mas. Several studies tried to derive orbital constraints for candidate planets suggested by radial velocity searches (Mazeh et al. 1999; Zucker & Mazeh 2000; Gatewood et al. 2001; Han et al. 2001), but fitting orbital models to HIPPARCOS data hardly improved the astrometric fit and led instead to spurious features (Pourbaix 2001).
The Gaia mission (Gaia Collaboration 2016b) offers a ~30fold improvement in astrometric precision with respect to HIPPARCOS, and can therefore finally allow the astrometric detection of planets to become routine. The Gaia capabilities for planet detection were soon recognised (Bernstein & Bastian 1995; Casertano et al. 1995), and explored in depth by Casertano et al. (2008, hereafter C08) and later revised by Perryman et al. (2014, hereafter P14). The number of detectable planets was estimated by P14 to be ~21 000 for a 5 yr mission, after considering a model of the Galaxy population and fitting planetary orbits with the leastsquares method.
In this paper we consider the application of Bayesian methods to the problem of planetary orbit fitting. We focus on two aspects of Bayesian inference: the derivation of joint errors for all parameters of interest, and the comparison of models. The first builds on the ability of Markov chain Monte Carlo (MCMC) methods to track errors beyond the regime in which linear approximations are acceptable. The second is especially important in light of the difficulties that arise for calculating goodness of fit for nonlinear models. Our discussion also includes a traditional metric, Δχ^{2}, that can be used to achieve results similar to those obtained using information criteria. Bayesian methods are already in use for the Gaia data processing, and thus we expect our results to better anticipate the outcome of future Gaia data releases.
Similarly to C08 and at variance with P14, we focus more on the detectability as a function of the orbital characteristics than on the number of detectable planets in the solar neighbourhood. While C08 was published five years before the launch of Gaia and therefore reflects prelaunch expectations, in this paper we use the current knowledge of the satellite to reassess what fraction of planets can be detected and how well their orbital parameters can be recovered.
In this paper we focus on developing methods, and so we only consider singleplanet systems. While this may be a reasonable approximation in some cases, realworld planetary systems can have more than one planet with a nonnegligible S/N, in which case the results discussed here may not apply. Doubleplanet and more complicated systems will be considered in a followup paper.
The structure of this paper is as follows. In Sect. 2 we describe our adopted formalism for the description of Keplerian orbits in an astrometry setting. In Sect. 3 we consider the expected S/N for a oneplanet system. In Sect. 4 we describe our simulations. In Sect. 5 we consider Bayesian methods and detection metrics. In Sect. 6we estimatethe fraction of false positive detections. In Sect. 7 we show the detection rate. In Sect. 8 we discuss how the orbital parameters are recovered. In Sect. 9 we discuss our results, and include an assessment of the average accuracy and precision which a parameter can be recovered. Finally, in Sect. 10, we present our conclusions.
2 Astrometry and Keplerian orbits
2.1 Formalism
An exoplanet orbit can be determined in the same way as that of astrometric binary stars, where only one element of the binary system is visible. Here we summarise the method; secondary references include Binnendijk (1960), Taff (1985), Quirrenbach (2010), and Perryman (2011).
Let us consider a system made of a star and one planet. In an inertial reference frame, both bodies follow elliptical orbits around the system’s centre of mass. The semimajor axis of the star’s orbit is (1)
where a_{p} is the semimajor axis of the planet orbit, M_{p} is the planet mass, and M_{*} is the stellar mass. The astrometric signature^{1} υ is the angular size of the stellar orbit (2)
where d is the distance from the observer to the planetary system.
The orbital elements define the orbit on the true plane. Three of them are called dynamical elements as they define the motion in the orbit: the period P, the time of periastron passage T_{p}, and the eccentricity e. Four geometrical elements describe the size of the orbit and its orientation with respect to the plane of the sky: the semimajor axis a_{*} , the longitude of the ascending node Ω, the argument of periapsis ω, and the inclination i (see Fig. 1).
The instantaneous position on the sky of a star at time t is determined by a fixed displacement (Δα_{0}, Δ δ_{0}) from its nominal position (α_{0}, δ_{0}) at a reference time t_{0}, the proper motion vector^{2} (μ_{α} , μ_{δ}), the parallax ϖ, and the orbital elements. In Cartesian coordinates lying on a plane tangent to the sky and directed along right ascension and declination, respectively (Michalik et al. 2014), the position can be written as
where Π_{α} and Π _{δ} are the components of the parallax direction, X and Y are the displacement per unit amplitude due to planetary motion in the true plane projected onto the sky through the Thiele–Innes (TI) constants, also called natural elements A, B, F, G (Thiele 1883; van den Bos 1926, 1932):
In the true plane the coordinates X and Y depend on the eccentric anomaly E and on the eccentricity e:
The eccentric anomaly is obtained from Kepler’s equation (11)
where M is the mean anomaly: (12)
Within the TI formalism, the determination of the orbital elements from astrometric observations is a nonlinear problem in P, e, and T_{p} , and linear in all other variables.
Fig. 1 Illustration of the four geometrical elements: the semimajor axis a_{*}, the longitude of the ascending node Ω (also known as receding node), the inclination i, and the argument of periapsis ω. The light cyan polygon represents the sky plane in a perspective view; the reference direction is taken towards the north (N). The line of nodes, shown by the black arrow pointing towards the ascending node, is the intersection between the sky and the orbit planes. The orbit is shown as the red ellipse. The small red arrow is perpendicular to the true plane and shows the inclination. 
2.2 Degeneracies
In some cases, some parameters can become degenerate. For instance, in the case of circular orbits, the argument of periapsis ω loses its geometrical meaning as direction of the semimajor axis of the orbital ellipse, and it becomes degenerate with the periapsis transit time T_{p}.
A second case is that of faceon orbits, where cos i = ±1. Here, the true plane of the orbit coincides with the sky plane, so that both the longitude of the ascending node Ω and the argument of periapsis ω are measured on the same plane. The intersection of the two planes, which defines the ascending node, becomes undefined. If Eqs. (5)–(8) are left unchanged, then the sum Ω + ω identifies the reference direction from which the time of periastron passage T_{p} is measured, but Ω and ω themselves become undefined. A possible change to the model could be to use a different definition for ω (e.g. measure it from the north direction).
A third case happens because astrometric data cannot distinguish the ascending node from the descending one. In principle, both the longitude of the ascending node Ω and the argument of periapsis ω vary between − 180° and 180°. However, the same projected orbit can be obtained by adding 180° to both Ω and ω, i.e. by exchanging the ascending node for the descending node and the periapsis for the apoapsis. It is customary when analysing astrometric data to restrict the allowed range of Ω to the [0°, 180°] interval (Binnendijk 1960; Perryman et al. 2014). This degeneracy could be broken if one were to jointly analyse astrometric and radialvelocity data.
The above degeneracies are observed in our analysis and the affected systems will be identified in Sect. 6.
3 Detectability as a function of the S/N
Previous works by C08 and P14 have shown that the main parameters, upon which the detectability of a planet depends, are the period P and the S/N (13)
where υ is the astrometric signature and σ_{A L} is the singlescan uncertainty on our observable, the shift in the alongscan (AL) field angle η (defined in Sect. 4.1).
Detectability may also depend on the period because orbit sampling becomes difficult or incomplete either if P is shorter than the average time between two Gaia scans, or if P is longer than the mission length. This is explored in Sect. 7.
Any value of S∕N can be mapped, using Eq. (2), to a subspace of M_{p}, M_{*} , a_{p} , d, and of the stellar magnitude V , which would produce the same simulated data. This is illustrated for a few example cases in Fig. 2, where planets of different masses (1 Neptune mass, 1 Jupiter mass, and 13 Jupiter masses^{3}) are assumedto orbit a bright star at different semimajor axes and distances. Both C08 and P14 suggested that a S∕N ≳ 3 is needed to detect a planet; therefore, Fig. 2 also shows an approximate estimate of the distance from Earth up to which a planet can be detected.
Therefore, in the following we will parameterise our simulations over a grid of P and S∕N. Only when relevant do we trace the S∕N back to the parameters on which it depends.
Fig. 2 Expected (S/N) for different planet masses as a function of distance. The lines show the cases for 13 Jupiter masses, 1 Jupiter mass, and 1 Neptune mass; with circular orbits at 1, 3, and 4 au (it is only possible to observe a 4 au orbit if the mission length is ~ 10 yr; see Sect. 8). A 1 M_{⊙} star is assumed for the solid lines, while a 0.5 M_{⊙} star is assumed for the dashed lines. 
4 Simulated Gaia observations
4.1 Fieldangles in the Gaia field of view
The Gaia spacecraft scans the sky along great circles whose orientation changes in time, following a complex pattern which we refer to as the Gaia scanning law. For each star transiting any of the two fields of view, the timedependent position in the AL direction is measured on the sky mapper chargecoupled devices (CCD) and on each of the nine astrometric field CCDs. The formal uncertainty σ_{A L} results fromcombining all ten measurements, and it depends on the stellar magnitude, with σ_{A L} = 0.034 mas for V ≤ 14 for a single scan, and larger for fainter stars (de Bruijne 2012; de Bruijne et al. 2014). A coarser determination of the position in the acrossscan (AC) direction is also done, but issues of pixel size and binning make AC measurements less useful for astrometry. Gaia returns on average 70 times to each star during its 5 yr mission (de Bruijne et al. 2010), and each time the transit in the fieldofviewmeasurement occurs with a different orientation, thus the stellar position can be pinpointed with AL information only.
If θ(t) is the position angle of the Gaia field of view, the shift in the AL field angle η is given by (14)
and Eqs. (3) and (4) can be expressed in terms of AL displacement: (15)
Probability distributions of randomised parameters for the singleplanet simulations.
4.2 Singleplanet systems: simulations over a grid of parameters
Ideally, we aim to test how the detection methods perform over the possible range of variation of all parameters. However, the high dimensionality of the problem of determining astrometric and Keplerian parameters together prevents a full exploration of all parameters on a grid.
For our singleplanet simulations, we selected three parameters to be explored on a grid:

the S/N;

the period P;

the eccentricity e.
All the other parameters were randomised using the probability distributions listed in Table 1 and explained in Sect. 4.3.
We produced a first batch of simulations for a 5 yr mission (the nominal Gaia mission length), and a second batch for a 10 yr mission to investigate the improvements in case the mission is extended. In both batches, our grids ran with logarithmic increments over the ranges S∕N = 0.2–10, P = 0.08–10 yr, and e = 0.01–0.97. To smooth the grid and make the plots more readable, we added small random amounts to the S/N, P, and e of each simulation. The upper bound to the eccentricity corresponds to an ellipse with a 1:4 ratio between the semiminor and semimajor axis.
4.3 Randomised astrometric parameters
We set the RA and Dec coordinates of the host star to random values, uniformly distributed over the celestial sphere. The Gaia scanning law was computed for each position using the AGISLab software (Holl et al. 2012), which was developed as part of the Gaia astrometric global iterative solution (AGIS; Lindegren et al. 2012) and is available to users within the Gaia Data Processing and Analysis Consortium (DPAC).
We set the astrometric parameters (Δα, Δ δ, ϖ, μ_{α} , μ_{δ}) to random values according to the probability distributions listed in Table 1. We assumed Δ α and Δ δ to be normallydistributed, with a standard deviation of 1 mas. The remaining parameters depend on the distance d, which was assumed to be lognormally distributed around a value which depends on the S/N. The range of values covered by d was chosen so that the S/N could include the cases of a Jupitermass planet at 3 au around a 1 M_{⊙} star, and of a Neptunemass planet at 4 au around a 0.5 M_{⊙} star. The parallax was obtained from the distance. The proper motions should reflect the larger spread in velocities of nearby stars with respect to distant stars; therefore, we drew proper motions from the Gaia data release 1 (DR1; Gaia Collaboration 2016a; Lindegren et al. 2016).
We set the remaining orbital elements (T_{p}, Ω , ω, i) to random values, different for each simulation, uniformly distributed over their natural ranges of variation: 0 ≤ T_{p} < P, 0 ≤ Ω < 180°, − 180° ≤ ω < 180°, − 1 ≤ cos i ≤ 1.
Kepler’s equation needs to be solved for every simulated Gaia scan; we used the numeric solver vpasolve in the MATLAB package, which provides solutions with doubleprecision accuracy^{4} .
4.4 Simulations of systems with no planet
To help set up the thresholds for planet detection, we also simulated noplanet systems. These systems follow the same specifications of the previous sections for the astrometric parameters, noise, and scanning law, but no Keplerian signal is added.
5 Bayesian analysis
In this section, we assume some familiarity with Bayesian inference and its nomenclature (prior, likelihood, posterior, Markov chains, etc). The interested reader can find an introduction geared towards astrophysics in Trotta (2008) or a more general discussion in the textbook by Gregory (2005).
5.1 Modelling
From a model point of view, Bayesian inference works similarly to maximum likelihood in that a series of tentative sets of parameters are generated, and each set is used to simulate an observable quantity which is then compared against the data. In our setting the data are η_{0,s} , i.e. the observed AL displacements, one per Gaia scan s, computed at simulation time using Eq. (15). Let η_{i,s} be the displacements computed during the fit for a set i of astrometric and orbital parameters. The likelihood can then be obtained by assuming that the difference η_{0,s} − η_{i,s} is normally distributed, with mean zero and standard deviation σ_{A L}: (16)
We used the Stan package for statistical inference (Carpenter et al. 2016, version 2.12) to model our simulated Gaia observations. The Stan package allows models to be specified in a programming language very similar to that of popular packages such as BUGS (Lunn et al. 2000) or JAGS (Plummer et al. 2003). All these packages implement generalpurpose MCMC samplers together with a highlevel programming language to specify the model.
Our implementation follows the formalism of Sect. 2.1. For each simulated system, we define two models:

a purely astrometric model, which only contains Δα_{0}, Δδ, ϖ, μ_{α}, and μ_{δ}. This model can be obtained by setting A = B = F = G = 0 in Eq. (15). This model is the baseline against which the full model is compared when evaluating a detection;

a full model, containing all the parameters of Eq. (15).
Kepler’s equation needs to be solved at each MCMC iteration, but Stan does not provide a solver in its current version; therefore, we use the Markley (1995) approximation, which is fast to compute and provides, in our implementation, a relative error of the order of 10^{−14}, close to the limits of a doubleprecision variable.
Priors for the purely astrometric model.
5.2 Model priors
5.2.1 Purely astrometric model
The prior probability distributions for the model parameters should be set according to the expected range of variation, and to the expected distribution for their values.
For the astrometric model we chose very weakly informative priors which only set the physical scale of the parameters (Table 2). For Δα_{0} and Δδ we used normal distributions with mean 0 and standard deviation 0.04 mas, consistently with the expected perturbation induced by a Jupitermass planet. For the proper motions μ_{α} and μ_{δ} we used normal distributions with mean 0 and standard deviation that depends on the parallax to reproduce the fact that more distant stars have lower velocities. By fitting the Gaia DR1 stars within ~200 pc, we found and σ_{δ} ∝ ϖ^{0.89}. For the parallax ϖ, we used a lognormal prior (with ln(mean) = 2.8, corresponding to 16 mas, and ln(standard deviation) = 7) which we found to reasonably reproduce the HIPPARCOS and Gaia DR1 data. We also added the constraint that ϖ ≥ 0.
5.2.2 Full model
For some parameters (Ω, ω, cos i, and T_{p}) the obvious choice is to set the priors to constant distributions according to the parameter range (see Table 3). A uniform distribution is therefore our choice for cos i. However, Ω and ω are angles, and T_{p} can also be treated as such, thus one needs to ensure that the prior distribution is defined over a circular support so as not to introduce any artificial barrier for when the Markov chains explore the likelihood space. This is achieved by a von Mises distribution, with concentration parameter κ ~ 0, which is the circular equivalent ofa uniform distribution^{5} (Jammalamadaka & SenGupta 2001).
For the other orbital parameters we chose noninformative priors as follows:

For P, we considered a prior where lnP is uniformly distributed, so that no preference for any scale is introduced. Periods shorter than 1 day or longer than 150 yr were excluded.

For e, we also considered a uniform prior. An upper bound e = 1 cannot be used because it corresponds to an open orbit. Therefore, we took e = 0.9999 as the upper bound (which corresponds to an ellipse with a ratio between the major and minor axes of ~70).

For the angular size of the semimajor axis υ we chose a lognormal distribution with ln(mean) = −2 (corresponding to 0.14 mas) and ln(standard deviation) = 1, with an upper bound at 10 mas. This distribution puts 90% of the probability on systems with υ < 0.5 mas, yet it allows for some rarer systems closer to Earth where wider angular orbits might be expected.
Finally, astrometric parameters in the full model have the same priors as in the purely astrometric model.
Priors for the orbital elements of the planet.
5.3 Chain initialisation
Markov chains (MC) need a starting point. From there, the MC will start to explore the parameter space. If the starting point is not in a highprobability region, it is necessary to discard the first iterations (usually called the burnin period or warmup period) to ensure that the rest of the chain represents a sample of the posterior distribution, with no bias due to the presence of many initial steps exploring a lowprobability region.
In principle, the starting point can be taken randomly, though this may result in a very long burnin period. Therefore, statistical literature often suggests starting the Markov chains at a point reasonably close to the likelihood global maximum if it is known (e.g. Geyer 2011).
Our tests have shown that the MC burnin can be prohibitively long for the needs of a survey unless a starting point close to the correct values for the parameters is used. However, in our case the structure of the likelihood space presents many very narrow peaks^{6} , which prevent common optimisation methods (e.g. Amoeba, LevenbergMarquardt, or Broyden–Fletcher–Goldfarb–Shanno) from finding the global likelihood maximum.
We have considered two possibilities for the starting point: first, a perfect starting point corresponding to the true value of the parameters; second, a bestfit point obtained from a simplified model that assumes circular orbits. The first option is motivated by our aim to test our Bayesian model and check how it behaves in an ideal case. The second option is more realistic; our motivation is to test whether a very simple and fast model can be used as a reasonable starting point or if a more refined approximation is needed. In the reality of any future survey using Gaia data, the first option will of course be unavailable, while more refined methods used to find a starting point can be used (e.g. Sahlmann et al. 2013).
In the circular orbits model, the period P is the only parameter which appears nonlinearly. We defined a grid of frequencies f = 1∕P, starting from f = 0.067 and increasing to 12.5 yr^{−1}, with a step of 0.03 yr^{−1}, covering the allowed interval for P. For every value of f, we obtained the bestfit values for all other parameters and the χ^{2} using linear leastsquares. The values which scored the lowest χ^{2} provided the starting point for the MC.
The circularorbit model only includes ten parameters: e does not appear, and ω and T_{p} become degenerate, so that only one of them is needed. To make our implementation easier, we dropped T_{p} and retained ω. Therefore, we were left with the need to specify starting values for e and T_{p}. We chose e = 0.1 and T_{p} = 0.
5.4 Chain convergence
In the long run, the set of points touched by a MC forms a nonbiased sample of the target probability distribution; when this happens, it is said that the MC has converged to the target. The main risk associated with nonconverging chains is that, since the MCMC samples would not be representative of the posterior distribution, any inference on the model parameters might be severely biased. The purpose of the burnin period is to assure that the chains have converged before posterior samples are obtained from them. However, there is little guidance on how long the burnin period should be, so one has to rely on experience and on inspection of the MCMC samples.
We initially conduced some test runs of our Stan implementation to understand the number of iterations needed for the Markov chains burnin period. Visual inspection of the chain traces showed that the burnin period lasted less than a few hundred iterations in many cases where S∕N ≳ 3−5, but in some cases it could be considerably longer.
Therefore, for the simulations described in Sect. 4.2, we initially decided to use a burnin period of 500 iterations. Then we checked whether the chain had converged after this period, or if more burnin iterations were needed, according tothe procedure described as follows.
After the 500 burnin iterations, we collected the next 500 iterations for further analysis. On the latter samples we computed for each parameter i the potential scale reduction , which measures how much sharper an estimate might become if the MCs were run indefinitely (Gelman & Rubin 1992; Eq. (11.4) in Gelman et al. 2013). The compares the variance among the chains versus the variance within a single chain: if all chains have converged and are sampling the same space, both variances should be approximately equal, and . A value of is usually considered as pointing towards convergence problems. The opposite is not necessarily true (it is possible to have and still have convergence problems) because convergence can never be guaranteed when the number of samples is finite, though in our experience the criterion has matched very well any conclusion that could be made based on visual inspection.
We found that, for any given simulated system, the can vary among the parameters, with some of them showing convergence problems, while other seem to be fine. Therefore, we also computed the average . Whenever a simulated system was found to have , we discarded its samples and we ran the chains anew, this time with 15 000 burnin iterations.
After that, if was still unsatisfying, we deemed the chains as not converging and excluded the planet from further consideration.
There may be different reasons for not fulfilling the convergence criterion, some of which do not prevent meaningful information to be retrieved. However, it would be difficult to express how confident one could be in any estimate coming from nonconverging chains. Therefore, in the context of a blind survey of exoplanets, information from nonconverging chains might be of little use; in the context of extracting the most information from Gaia data of a particular planet (say, by joint fitting of astrometric and radial velocity, or transit data) there would probably be more prior information to help find a different (perhaps better) starting point for the chain.
5.5 Exoplanet detection as model selection
Deciding whether an exoplanet has been detected is essentially a comparison of model performances. The null model is that no planet is present, and that the Gaia measurements can be adequately explained with just the astrometric parameters. The alternative model is that a planet is present, and both the astrometric and Keplerian parameters are needed.
The classical (frequentist) approach is to use hypothesis testing: the difference between the models’ χ^{2} is computed and linked, through the number of degrees of freedom (dof), to a pvalue. It can be applied to the case of exoplanet detection, provided that the Thiele–Innes parameterisation is adopted. One would therefore obtain the A, B, F, and G parameters from the fit, and only later would they be reconnected to the orbital elements υ, ω, Ω , and cos i. Uncertainties on the bestfit values should then be propagated from the Thiele–Innes parameters back to the orbital elements.
However, a complication is that the number of dof is unambiguously defined only for linear models. Nonlinear models behave as if they have a larger number of dof than a linear model with the same number of parameters. Also, the “effective dof” of a nonlinear model are datadependent (through the variance of the parameters; see e.g. Eq. (19) below), and they are larger when the data are noisy (Andrae et al. 2010); we will show this happening in our simulations in Sect. 6.2.
Conversely, the Bayesian approach has the following advantages: the orbital elements can be the subject of inference, the uncertainties on the fit parameters need not be estimated using asymptotic theory, and there are model comparison methods which explicitly account for the effective dof.
An easy way to perform Bayesian model comparison is to use information criteria (IC) because only the posterior samples are needed for their calculation. The theory of most IC is based on the notion of predictive accuracy, i.e. how well a model fit can predict new data produced from the true datagenerating process. A notable exception is the Bayesian information criterion (BIC, see below) which is motivated by the marginal probability of the data under the model. A introduction to both criteria from an astrophysicsperspective can be found in Liddle (2007); among the statistical sources, we refer to Gelman et al. (2013, Chap. 7) and Burnham & Anderson (2002).
The IC operate in practice by computing a function of the likelihood, either at a specified point or averaged over the posterior. The function includes a penalty, based on the number of dof, to account for overfitting by the more complex model.
Some IC also find application in the context of frequentist statistics as they can be applied to maximum likelihood estimates. Therefore some of our results might also find application in future work that does not use the Bayesian framework.
The best known IC are the Akaike information criterion (AIC; Akaike 1973), (17)
and the Bayesian information criterion (BIC; Schwarz 1978), (18)
where is the maximum likelihood, k is the number of free parameters, and n is the number of data points. The main difference is that the penalty on BIC also depends on the number of data points.
Both the AIC and the BIC rely on a point estimate of the model, and they still involve counting the free parameters. A recent development on the AIC is the widely applicable (or Watanabe–Akaike) information criterion (WAIC), which has been proposed as a fully Bayesian approach of estimating the predictive accuracy (Watanabe 2010; see Ranalli et al. 2016 for an application to astrophysics). It uses all available samples of the posterior distribution and gets the effective dof from the parameter variance: (19)
Here, the index i runs on the samples from the posterior distribution, and the index s runs on the data points (the Gaia scans), so both indices keep the same meaning they had in Eq. (16). The symbol is the likelihood of a single data point (scan). The angular brackets with the i subscript indicate that the average over the samples should be taken; the squared difference is therefore the variance of the loglikelihood^{7}.
In all cases, the lower the IC, the more favoured the model. However, the absolute scale of the IC does not carry statistical meaning; the important quantity is the difference between the IC of different models, not the values of the IC themselves. Therefore, a large ΔIC can indicate a strong preference for one model over the other, while a small Δ IC indicates that one model does not perform appreciably better than the other.
The ΔIC can also be interpreted using Jeffreys’ scale of evidence (Efron & Gous 2001), which rates Δ IC ≳ 1, 3, 20, and 150 as “barely worth reporting”, “weak”, “strong”, and “very strong”, respectively.
The preference can be quantified by considering the relative likelihood w of model M_{1} over model M_{2}. Assuming IC(M_{1}) > IC(M_{2}) (so that M_{2} is the preferred model), (20)
where ΔIC_{12} = IC(M_{1}) −IC(M_{2}) (Burnham & Anderson 2002, see their Sects. 2.8 and 2.9). Strong evidence, in the sense of Jeffreys’ scale, would correspond to a relative likelihood of w ~ 4.5×10^{−5} for the worse model over the preferred one.
Reflecting the different definitions of the IC, the actual values of w will depend on which IC is considered. The most important factor is the ln n term that appears in BIC (Eq. (18)), but not in AIC or WAIC (Eqs. (17) and (19)). The BIC in fact has the property that in the limit of large n the penalty tends to infinity. The AIC and WAIC do not have this property.
The importance of this property is more evident if one considers the case of a real multiplanet system whose data are fit by a set of models that approximate our physics knowledge better and better (e.g. no planet; one planet; two planets ignoring their mutual attraction; two planets including their mutual attraction; three planets, etc.). Obviously, some models are more easily distinguished (no planet vs. one planet), while some others are more difficult to distinguish (mutual attractions). In the limit of large n, BIC will discard models that overfit data more easily than AIC or WAIC, while AIC and WAIC will instead identify a subset of similarly behaving models. Whether this property of BIC is desirable depends on whether one expects that the “true” model belongs in the considered set. One seldom or probably never has such an expectation in general. However, planetary motions may represent a niche case where one can expect to include a model that, for a given data quality, is indistinguishable from one that includes all known physical effects.
For Gaia data, n = 73 ± 23 and 147 ± 46 for the 5 yr and 10 yr mission, respectively, which means that from Eqs. (17) and (18) one expects an average Δ AIC −ΔBIC ~ 16 and 21, respectively. These values are close to the threshold for strong evidence on Jeffreys’ scale. Therefore, we expect that different IC may yield different fractions of false positives if the strong evidence threshold is used. From the defining equations of the IC and for systems with S∕N ~ 0, we expect that the BIC signals a preference for the simpler model (the astrometric model), while the AIC and WAIC signal that the more complex model (the planetary model) does not offer an advantage over the simpler model.
One might ask what advantage the IC offer over a Δχ^{2} criterion. Ina nonlinear setting, because of the difficulties in determining the number of dof, a Δ χ^{2} is equivalent to an information criterion where one does not know beforehand what penalty to apply to correct for overfitting. It would still be possible to use the Δχ^{2} for model choice after calibrating a threshold based on an allowed fraction of false positives.
6 Fraction of falsepositive detections
In this section we apply all criteria to our noplanet simulations, so we expect that the purely astrometric model should be preferred in all cases. Our samples consist of 6644 and 4402 noplanet systems for the 5 yr and 10 yr missions, respectively.
Fig. 3 Histograms for the IC (left panels) and for Δχ^{2} (right panels) in the case of noplanet simulations. The upper panels consider the 5 yr mission, the lower panels the 10 yr mission. Leftpanels: green solid lines, WAIC; red dashed lines, AIC; blue dotted lines, BIC; the vertical dashed lines show the IC thresholds for weak and strong evidence according to Jeffreys’ scale. Right panels: solid and dashed curves show χ^{2} distributions with the dof reported in the legend. The empirical density does not follow a χ^{2} distribution with the nominal number of dof (7, solid red line). It can however be approximated, by a χ^{2} distribution with a larger number of dof: 11 dof for the 5 yr mission, 15 dof (or 16 dof for Δ χ^{2} ≳ 30) for the 10 yr mission. 
6.1 Information criteria
We define ΔIC = IC(M_{a}) −IC(M_{k}), where M_{a} is the astrometric model, and M_{k} is the model with the Keplerian orbit. With our definition, a positive Δ IC indicates a preference for the Keplerian model, and vice versa. Values of ΔIC ~ 0 indicate no preference for one model over the other; therefore, for reasons of simplicity, the simpler model (i.e. the astrometric model) should be preferred. For ΔIC > 0, we use Jeffreys’ scale of evidence to interpret the results, and then we define empirical Δ IC thresholds based on the number of false positives.
In Fig. 3 (left panels), we show the histograms of Δ IC for the noplanet systems. The peaks of the histograms are safely below the IC > 3 threshold for weak evidence according to Jeffreys’ scale, though both AIC and WAIC have a tail in the weakevidence zone, especially in the 10 yr mission case. Such a tail may be better seen in the plots of the cumulative distributions (Fig. 4). For the 5 yr mission, the strong evidence threshold (IC > 20) is exceeded by AIC in 2 cases (0.03%) and by WAIC in 11 cases (0.17%) out of 6644 cases; for the 10 yr mission it is exceeded by AIC in 12 cases (0.27%) and by WAIC in 25 cases (0.57%) out of 4402. The strong evidence threshold is never exceeded by BIC. We list the 99% and 99.73% quantiles in Table 4. The value for the 99.73% quantile should be considered as approximate because of the small number of systems whose ICs fall above the quantiles (18 and 12 for the 5 yr and 10 yr mission, respectively). When going from the 5 yr to the 10 yr mission, there is an increase in both quantiles for all ICs and for Δ χ^{2} .
Compared to ΔAIC and ΔBIC, Δ BIC appears to be negative in almost all cases and therefore indicates a preference for the astrometric model.
6.2 Δχ^{2}
Considering two nested linear models with a different number of parameters (e.g. the astrometric and the full model), and data distributed according to the null model (in our case, the astrometric model), the difference between the χ^{2} values of the best fits under the two models still follows a χ^{2} distribution, with a number of dof equal to the difference between the dof of the two models. As discussed in Sect. 5.5, the dof become illdefined when the model is nonlinear. We show this in Fig. 3 (right panels), where we plot the empirical Δ χ^{2} density for the noplanet simulations against a χ^{2} distribution for seven dof (corresponding to the difference between 12 and 5 parameters in the full and astrometric models, respectively). The empirical density is better described by a χ^{2} distribution with 11 dof for the 5 yr mission; and with 16 dof for the 10 yr mission (though 15 dof seem to better reproduce the tail at Δχ^{2} ≳ 30).
The cumulative distribution is shown in Fig. 4. The 99% and 99.73% quantiles obtained from the empirical distribution are listed in Table 4. As in the case of Δ AIC and Δ BIC, the quantile increases when going from the 5 yr to the 10 yr mission.
Fig. 4 Cumulative distributions of the IC and Δχ^{2} for noplanet systems in the 5 yr (upper panel) and 10 yr (lower panel) mission simulations. The four curves show (from left to right): BIC (blue), AIC (red), WAIC (green), Δ χ^{2} (black). The horizontal dashed lines show the 100% and 99.73% (3σ) quantiles. The vertical dashed lines show the IC thresholds for weak and strong evidence according to Jeffreys’ scale. 
Information criteria and Δχ^{2} quantiles for falsepositive detections in noplanet simulations, for the four different indicators and for the 5 yr and 10 yr missions.
7 Detection rate
In this and in the following sections, we consider simulations of oneplanet systems. Our samples consist of 4968 and 4706 systems for the 5 yr and 10 yr missions, respectively.
We chose to put our detection threshold at ΔBIC = 20, which is equivalent to ΔAIC = 70, Δ WAIC = 70, and Δ χ^{2} = 80. This has the effect that, with respect to ΔAIC = 20, the number of false positives is minimised, and the quality of detection is improved (our tests showed that in systems with very low S/N and hence a contrasting AIC versus BIC verdict, the period was incorrectly recovered in approximately onequarter of the cases).
In Fig. 5 we show the detection fraction as a function of both S/N and period. The fraction of detected systems depends mainly on the S/N. Detecting systems with periods longer than the mission length requires a S/N that is higher by an amount that increases with the period.
The 50% detection rate for systems with period shorter than the mission length is found at S∕N ~ 2.3 and 1.7 for the 5 yr and 10 yr mission, respectively. Starting the MC with the circular orbit model or at the ideal point makes little difference because the detection rates vary by less than 10%.
The above numbers only consider systems for which the MC achieved convergence within the allowed number of samples, according to the criteriondescribed in Sect. 5.4. We found that the fraction of nonconverging chains depends on the starting point: in the ideal starting point case, 99% of chains converged, while in the case of the circular orbits starting point, the fraction lowers to ~80%. This fraction might increase by using a more refined method to select a starting point (as was done in e.g. Sahlmann et al. 2013).
Fig. 5 Detection fractions as a function of S/N and period. Left panel: 5 yr mission; right panel: 10 yr mission. The colours show the detection fraction and range from light yellow (low detection fraction) to dark green (high detection fraction). The transition between nondetections and detections is sharper and occurs at lower S/N in the 10 yr mission than in the 5 yr mission. Planets with periods longer than the mission length require a stronger S/N to be detected than planets withlower periods. 
8 Parameter recovery
In this section we consider candidate detections, and look at how well the orbital parameters are recovered. The results depend on how close the initial point for the MC is to the true value of the parameters. To show the impact of the MC starting point, we present the results for the ideal case (starting from true values) for period and eccentricity. We then show, for all orbital parameters, the results obtained by starting the MC at the bestfit points obtained using the circular orbits model.
For every parameter, we calculated the median (the circular median is used for Ω , ω, and T_{p} ; see Pewsey et al. 2013) and an estimate of the associated error: for Ω, ω, and T_{p} we used the circular standard deviation^{8}; for every other parameter, we used the 68.3% highest posterior density interval (HPDI) from the posterior samples. These quantities are plotted in Figs. 6–10.
8.1 Period
Figure 6 (upper panels) shows that the periods are almost always correctly recovered in the ideal case, when the MC are started using the true values of the parameters.
Instead, the use of imperfect starting points produces a small number of nonrecovered periods (the systems with 0.1 ≲ P_{output} ≲ 0.3 yr). Also, a small number of systems are fit with periods that are somewhat shorter or longer than the input value by a factor of ~40–60% (they are visible in Fig. 6 as the two sequences right above and below the onetoone line); these systems are also misfit as edgeon, higheccentricity systems (they have incorrect cos i_{output} ~ 0 and e_{output} ~ 1). The fraction of such systems depends on the mission length but not on the S/N and is, considering only inputs with 0.2 ≤ P_{input} ≤ 3.5 yr, 12% for the 5 yr mission and 3.2% for the 10 yr mission. The two sequences can be interpreted as parallel sequences in terms of frequency f =1∕P, where the frequency of each sequence is f_{output} = f_{input} ± 0.3.
8.2 Eccentricity
The eccentricity exhibits quite a large scatter, both in terms of the location of the median e_{output} and of the size of the HPDI. The scatter is somewhat larger when the MC are started using the circular orbit model.
When the MC are started using the circular orbit model, a fraction of systems are fit with e_{output} ~ 1 with no dependence on e_{input}; this happens in 9.6% and 6.7% of the systems with e_{input} < 0.8 for the 5 yr and 10 yr mission, respectively. Such systems also exhibit incorrect fits in other variables: they have ω_{output} ~±90°, an incorrect T_{p} , and cos i_{output} ~ 0 (edgeon); their P_{output} also run on the parallel sequences to the onetoone relationships.
8.3 Size of the orbit
The angular size of the semimajor axis of the orbit is given by the astrometric signature υ; its input value υ_{input} can be obtained from the S/N through Eq. (13). In Fig. 8, we show that the onetoone relationship is well recovered.
Conversely, most of the systems with deviations on the left side (i.e. with υ_{output} > υ_{input}) are also misfit as edgeon, high eccentricity systems (cos i_{output} ~ 0 and e_{output} ~ 1).
Fig. 6 Simulated vs. recovered period. The left and right panels show the 5 yr and 10 yr missions, respectively. The upper and lower panels differ only for the starting point of the MC: the ideal case (MC started at the true value) in the upper panel, and the circularorbit model bestfit point in the lower panel. The colours show the density of points where they overlap, going from blue (no overlap) to red (maximum overlap). The error bars show the 68.3% highest posterior density interval. The red solid lines show the onetoone relationship. The period is correctly recovered in the majority of cases. The two sequences parallel to the main relationship are populated with systems that are incorrectly fit with edgeon, higheccentricity orbits. 
8.4 Inclination
The inclination is recovered correctly in the majority of cases (Fig. 9); the scatter of the onetoone relationship is reduced when going from the nominal to the extended mission.
8.5 Longitude of ascending node
The longitude of the ascending node (Ω) is correctly recovered in most cases; see Fig. 10. There does not seem to be any particular structure in the points that deviatemost from the onetoone relationship.
8.6 Argument of periapsis
Whether theargument of periapsis (ω) is correctly recovered depends on the eccentricity. As discussed in Sect. 5.3, when orbits are nearly circular, ω loses its geometrical meaning and starts to track the zeropoint of the orbit as T_{p} . Therefore, for low eccentricity the relationship between ω_{input} and ω_{output} is lost.
The relationship is recovered for larger eccentricities e_{input} ≳ 0.1; the relationship for ω appears somewhat noisier than the analogous relationships that hold for Ω and T_{p} .
8.7 Periapsis transit time
As already seen in the case of ω, the periapsis transit time T_{p} is only recovered when orbits are not circular. For systems with e_{input} > 0.1, we found that the onetoone relationship seems well recovered, with no particular structure in the points deviating from it.
Fig. 7 Simulated vs. recovered eccentricity. Panels, symbols, and colours as in Fig. 6. The eccentricity is recovered in most cases, albeit with large uncertainties. When the MC are started using the circular orbits model (lower panels), some systems with e_{input} ≲ 0.4 are incorrectly fit with very elliptical orbits (e_{output} ~ 1), especially in the 5 yr mission; these systems are the same as those that lie in the parallel sequences of Fig. 6. 
9 Discussion
9.1 Maximum distance of detectable planets
Previous studies (C08, P14) had suggested a minimum S∕N = 3 for the detection of planets. This corresponds to a 74% chance of detecting a planet in the 5 yr mission (92% for the 10 yr mission). A 50% detection chance occurs at S∕N = 2.3 and 1.7, respectively (Sect. 7).
We can therefore take a second look at Fig. 2, where we present the S/N of a few representative classes of planets, as a function of their distance from the solar system. Assuming the S/N that allow a 50% detection chance, the maximum distance up to which a Jupitermass planet around a 1 M_{⊙} star with a semimajor axis of 1 au could be detected is 13 pc for the 5 yr mission, and 17 pc for the 10 yr mission. Considering a semimajor axis of 3 au, the distances would increase to 39 pc and 52 pc, respectively. A 10 yr mission also allows the detections of planets with semimajor axes up to 4 au, yielding a maximum distance of 70 pc.
For a Neptunemass planet, the distances are shorter. Even in the favourable case of a 3 au semimajor axis, the distances would be 1.9 pc and 2.6 pc, respectively. For the 10 yr mission, a 4 au semimajor axis would yield a distance of 3.5 pc. In all cases, there is a better chance to detect planets if one also considers stars with a lower mass than the 1 M_{⊙} considered here.
In a real survey, it is possible that some false positives will be included among the detections of planets with low S/N. The Δ BIC = 20 threshold that we have considered has allowed zero false positives among our noplanet simulations; therefore, considering the number of simulated systems, upper limits to the false positive fraction can be put as < 1.5×10^{−4} and < 2.3×10^{−4} for the 5 yrand and 10 yr mission, respectively. To put such numbers in the context of a real survey, we consider the number of stars within 100 pc of the Sun. Considering only dwarf stars of the F, G, and K spectral types (as done previously by C08 and P14), the Gaia universe model snapshot (GUMS; Robin et al. 2012) includes 6.3×10^{4} stars. Therefore, from the above upper limits to the false positive fraction, one could estimate that ≲9 and ≲14 false positives could be present in an allsky survey.
Fig. 8 Simulated vs. recovered angular size of the orbit. Symbols and colours as in Fig. 6. The two panels showonly the results when the MC have been started using the circular orbit model. Most of the points with υ_{output} ≪ υ_{input} have input periods longer than the mission length, and form the horizontal plume at large P_{input} in Fig. 6. 
Fig. 9 Simulated vs. recovered inclination. Panels, symbols, and colours as in Fig. 8. The inclination is correctly recovered for most systems. Especially in the 5 yr mission, there is a fraction of systems which are mistaken as edgeon (cos i_{output} ~ 0) which seems to be independent of the input inclination, but which is often associated with high eccentricity (e_{output}~ 1) and with the parallel sequences in the period. 
9.2 Accuracy and precision of fit parameters
We examinewhich parameters are recovered better, and with what accuracy and precision they are on average recovered. We define accuracy as the absolute difference between the input and recovered values, and precision as the random error that affects the recovered values (i.e. the length of an error bar, given by the length of the 68.3% HPDI). For the two scale parameters (P and υ, that vary over several orders of magnitude) we consider relative accuracy and precision; for the other parameters, that set the shape of theorbit, we consider absolute accuracy and precision.
In this way we obtain for each parameter a distribution that spans all simulated planetary system; the medians of such distributions can be taken as the characteristic measure of accuracy or precision for each parameter. In Tables 5 and 6 we show the medians of such distributions, along with the 25% and 75% quartiles. For example, the period (P) has a relative precision smaller than ±0.41% in 25% of the cases in the 5 yr mission.
The accuracy and precision distributions are presented in Fig. 11, where they are shown in the form of a violin plot. In this kind of plot, a kernel density estimate is calculated for each parameter distribution, and plotted twice, to form a symmetric figure with normalised area. At every value of the abscissa, the figure height is proportional to the density of the distribution.
9.3 Independence from sky position
We assume in Sect. 4.3 that the starplanet systems are uniformly distributed on the celestial sphere. In principle, this assumption might not necessarily be true. We have found no dependence of either the detection rate or of any aspect of parameter recovery on the sky position; therefore, no position bias should be present in our results.
Fig. 10 Simulated vs. recovered longitude of ascending node (Ω). Panels, symbols, and colours as in Fig. 8. Since Ω is an angle, we plot the data points twice (at nominal position and at ±180°) to better show the structure of the noise. 
10 Conclusions
We have presented an investigation of the planet detection capabilities of Gaia, using the most uptodate inflight properties of the spacecraft. We have tested a simple Bayesian model of astrometric data of planetary orbits that relies on offtheshelf software and standard procedures for inference and model selection.
We have considered two models: one accounting only for astrometric parameters (position, parallax, proper motion), and another accounting for both astrometric and orbital parameters. We investigated model selection using three information criteria (AIC, BIC, WAIC) and the Δχ^{2}.
Using simulations of Gaia observations of stars with no planets (one set for a nominal 5 yr mission, and another set for an extended 10 yr mission), we have compared the information criteria against the threshold of Δ IC = 20 suggested for strong evidence in favour of one model over another by Jeffreys’ scale for Bayesian evidence. We have found that the use of AIC or WAIC with the above threshold allows false detection fractions that are always ≲0.6%, with AIC being somewhat stricter than WAIC. The BIC criterion, that puts a stronger penalty on the more complex model, allowed no false positive out of 6644 cases (5 yr) and 4402 cases (10 yr). The Δ χ^{2} can also be used, but the nonlinearity of the model prevents the use of a reference χ^{2} distribution to obtain a pvalue; therefore the choice ofa Δχ^{2} threshold needs an empirical calibration.
We have investigated parameter recovery using simulations of stars with one planet, whose S/N and periods were assigned according toa logarithmically spaced grid, and whose eccentricities were assigned on a linear grid. All other parameters are assignedrandomly. Realistic scanning laws are calculated according to a randomly assigned position on the sky.
We have applied our detection method to 4968 and 4706 simulations of stars with one planet over a mission of 5 yr and 10 yr, respectively. We have chosen a detection threshold of Δ BIC = 20 (i.e. with BIC at the strong evidence point according to Jeffreys’ scale) which is equivalent to Δ AIC = 70, Δ BIC = 20, Δ χ^{2} = 80. This not only minimises the number of false positives, but has also guaranteed that the detected systems show a good quality of fit.
We have confirmed the finding of previous studies that the S/N is the main parameter that determines whether a planet can be detected. The 50% detection threshold occurs at S∕N = 2.3 and 1.7 for the 5 yr and 10 yr mission, respectively. Therefore, some approximate maximum distances to which planets around a 1 M_{⊙} star can be detected with a 50% chance are 39 pc (a Jupitermass planet orbiting at 3 au; 5 yr mission), 70 pc (Jupitermass, 4 au, 10 yr), 1.9 pc (Neptunemass, 3 au, 5 yr), and 3.5 pc (Neptunemass, 4 au, 10 yr).
The orbital parameters of planets with periods up to the mission length can be recovered with different degrees of accuracy and precision. We define accuracy as the difference between the input value and the median of the output posterior distribution, and precision as the length of the 68.3% HPDI. Considering the period, in the case of a 5 yr mission we find a median relative accuracy of 0.70% and a median relative precision of 1.4%. For the angular size of the semimajor axis of the orbit, we find 7.9% and 23%, respectively. For the eccentricity, we find 0.077 and 0.19, respectively.
A followup of this work should deal with the detectability of multiple planets in the same systems, and of very longperiod planets, and should consider data from both Gaia and a future astrometric mission.
Distribution of the accuracy with which fit parameters are recovered.
Fig. 11 Accuracy and precision of the fit parameters. Left panels: 5 yr mission; right panels: 10 yr mission. For each orbital parameter, the accuracy is the absolute difference between the input value and the median of the output posterior distribution. The precision is the length of the 68.3% highest posterior density interval (HPDI). The figures depicted by the black curves show the distribution of either accuracy or precision; they have normalised areas, so that at every value of the abscissa, their height is proportional to the density of the distribution. The red dots mark the median of each distribution. Relative quantities are shown for quantities that vary over several orders of magnitude, namely P and υ; absolute quantities for the remaining parameters. We consider T_{p}∕P (instead of just T_{p}) because it is always contained in the [0, 1] interval; thus, T_{p} has a median precision of 5% P in the 5 yr mission. 
Distribution of the precision with which the fit parameters are recovered.
Acknowledgements
We thank an anonymous referee whose comments have greatly contributed to improve this paper. We thank Martin Gustavsson for providing an initial version of our simulation code, and Alessandro Sozzetti for useful discussions. We gratefully acknowledge support from the Swedish National Space Board (SNSB Dnr 183/14 and SNSB Dnr 74/14) and the Royal Physiographic Society in Lund.
References
 Akaike, H. 1973, in 2nd International Symposium on Information Theory, Academiai Kiado, reprinted in Selected Papers of Hirotugu Akaike (Berlin, Heidelberg: Springer, 1998), 199 [Google Scholar]
 Andrae, R., SchulzeHartung, T., & Melchior, P. 2010 ArXiv eprints [arXiv:1012.3754] [Google Scholar]
 Bernstein, H.H., & Bastian, U. 1995, in Proc. of the RGO  Future Possibilities for Astrometry in Space, eds. M. A. C. Perryman, & F. van Leeuwen, ESA SP, 379, 55 [NASA ADS] [Google Scholar]
 Binnendijk, L. 1960, Properties of Double Stars: A Survey of Parallaxes and Orbits (Philadelphia: Pennsylvania University Press) [Google Scholar]
 Boss, A. P., Butler, R. P., Hubbard, W. B., et al. 2007, IAU Trans., 26A, 183 [Google Scholar]
 Burnham, K. P., & Anderson, D. R. B. 2002, Model Selection and Multimodel Inference: A Practical Informationtheoretic Approach (New York: Springer) [Google Scholar]
 Carpenter, B., Gelman, A., Hoffman, M., et al. 2016, J. Stat. Softw., 76, 20 [Google Scholar]
 Casertano, S., Lattanzi, M. G., & Perryman, M. A. C. 1995, in Future Possibilities for astrometry in Space, eds. M. A. C. Perryman, & F. van Leeuwen, ESA SP, 47 [Google Scholar]
 Casertano, S., Lattanzi, M. G., Sozzetti, A., et al. 2008, A&A, 482, 699 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 de Bruijne, J., Siddiqui, H., Lammers, U., et al. 2010, in Relativity in Fundamental Astronomy: Dynamics, Reference Frames, and Data Analysis, eds. S. A. Klioner, P. K. Seidelmann, & M. H. Soffel, IAU Symp., 261, 331 [NASA ADS] [Google Scholar]
 de Bruijne, J. H. J. 2012, Ap&SS, 341, 31 [NASA ADS] [CrossRef] [Google Scholar]
 de Bruijne, J. H. J., Rygl, K. L. J., & Antoja, T. 2014, in EAS Pub. Ser., 67, 23 [CrossRef] [Google Scholar]
 Efron, B., & Gous, A. 2001, in Model Selection, ed. P. Lahiri, IMS Lecture Notes Monograph Series, 38, 208 [Google Scholar]
 ESA 1997, The HIPPARCOS and TYCHO catalogues. Astrometric and Photometric Star Catalogues Derived from the ESA HIPPARCOS Space Astrometry Mission, ESA SP, 1200 [Google Scholar]
 Gaia Collaboration (Brown, A. G. A., et al.) 2016a, A&A, 595, A2 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Gaia Collaboration (Prusti, T., et al.) 2016b, A&A, 595, A1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Gatewood, G., Han, I., & Black, D. C. 2001, ApJ, 548, L61 [NASA ADS] [CrossRef] [Google Scholar]
 Gelman, A., & Rubin, D. B. 1992, Stat. Sci., 7, 457 [Google Scholar]
 Gelman, A., Carlin, J., Stern, H., et al. 2013, Bayesian Data Analysis, 3rd edn., (Boca Raton, FL: CRC Press) [Google Scholar]
 Geyer, C. 2011, in Handbook of Markov Chain Monte Carlo Handbooks of Modern Statistical Methods, eds. S. Brooks, A. Gelman, G. L. Jones, & X.L. Meng (London: Chapman and Hall/CRC), 3 [Google Scholar]
 Gregory, P. 2005, Bayesian Logical Data Analysis for the Physical Sciences: A Comparative Approach with Mathematica® Support (Cambridge: Cambridge University Press) [CrossRef] [Google Scholar]
 Han, I., Black, D. C., & Gatewood, G. 2001, ApJ, 548, L57 [NASA ADS] [CrossRef] [Google Scholar]
 Holl, B., Lindegren, L., & Hobbs, D. 2012, A&A, 543, A15 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Jacob, W. S. 1855, MNRAS, 15, 228 [Google Scholar]
 Jammalamadaka, S., & SenGupta, A. 2001, in Series on Multivariate Analysis, ed. M. M. Rao (Singapore: World Scientific), 5 [Google Scholar]
 Liddle, A. R. 2007, MNRAS, 377, L74 [NASA ADS] [Google Scholar]
 Lindegren, L., Lammers, U., Hobbs, D., et al. 2012, A&A, 538, A78 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Lindegren, L., Lammers, U., Bastian, U., et al. 2016, A&A, 595, A4 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Lunn, D. J., Thomas, A., Best, N., & Spiegelhalter, D. 2000, Stat. Comput., 10, 325 [Google Scholar]
 Markley, F. L. 1995, Celest. Mech. Dyn. Astron., 63, 101 [NASA ADS] [CrossRef] [Google Scholar]
 Mazeh, T., Zucker, S., Dalla Torre, A., & van Leeuwen, F. 1999, ApJ, 522, L149 [NASA ADS] [CrossRef] [Google Scholar]
 Michalik, D., Lindegren, L., Hobbs, D., & Lammers, U. 2014, A&A, 571, A85 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Mikkola, S. 1987, Celestial Mechanics, 40, 329 [NASA ADS] [CrossRef] [Google Scholar]
 Perryman, M. A. C. 2011, The Exoplanet Handbook (Cambridge, NY: Cambridge University Press) [CrossRef] [Google Scholar]
 Perryman, M., Hartman, J., Bakos, G. Á., & Lindegren, L. 2014, ApJ, 797, 14 [NASA ADS] [CrossRef] [Google Scholar]
 Pewsey, A., Neuhäuser, M., & Ruxton, G. 2013, Circular Statistics in R, EBLSchweitzer (Oxford: Oxford University Press) [Google Scholar]
 Plummer, M. 2003, in Proceedings of the 3rd International Workshop on Distributed Statistical Computing in Vienna. vol. 124, eds. K., Hornik, F., Leisch & A., Zeileis, 125 [Google Scholar]
 Pourbaix, D. 2001, A&A, 369, L22 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Quirrenbach, A. 2010, in Exoplanets, ed. S. Seager (Tucson: The University of Arizona Press), 157 [Google Scholar]
 Ranalli, P., Koulouridis, E., Georgantopoulos, I., et al. 2016, A&A, 590, A80 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Robin, A. C., Luri, X., Reylé, C., et al. 2012, A&A, 543, A100 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Sahlmann, J., Lazorenko, P. F., Ségransan, D., et al. 2013, A&A, 556, A133 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Schwarz, G. 1978, Ann. Statist., 6, 461 [Google Scholar]
 Taff, L. G. 1985, Celestial Mechanics: A Computational Guide for the Practitioner (New York: WileyInterscience) [Google Scholar]
 Thiele, T. N. 1883, Astron. Nachr., 104, 245 [Google Scholar]
 Trotta, R. 2008, Contemporary Phys., 49, 71 [Google Scholar]
 van den Bos, W. H. 1926, CIUO, 68, 352 [NASA ADS] [Google Scholar]
 van den Bos, W. H. 1932, CIUO, 86, 261 [NASA ADS] [Google Scholar]
 Watanabe, S. 2010, J. Mach. Learn. Res., 11, 3571 [Google Scholar]
 Watanabe, S. 2013, J. Mach. Learn. Res., 14, 867 [Google Scholar]
 Zucker, S., & Mazeh, T. 2000, ApJ, 531, L67 [NASA ADS] [CrossRef] [MathSciNet] [PubMed] [Google Scholar]
The proper motion component in the right ascension includes the cos δ factor, and corresponds to the in Appendix A of Michalik et al. (2014).
A mass of 13 M_{Jup} is the approximate limit mass for deuterium burning, and it is usually taken as the upper limit for planetary masses (Boss et al. 2007).
We also experimented with the Mikkola (1987) cubic approximation, which has a relative error within 0.2%. We found that the use of the approximation was hindering the successful fit of the simulations with the method described in Sect. 5.
The von Mises distribution is a circular analogue of a normal distribution. It is a continuous probability distribution whose support is a circle, and it is defined as , where I_{0}(κ) is the modified Bessel function of order 0. The parameters are the location of its peak μ_{vM} and the concentration κ (analogous to the inverse of the variance of a normal; distributions with higher κ have sharper peaks). When κ = 0, no peak is present and the location becomes undefined. Ideally, one should set κ = 0; however, the current Stan implementation of the von Mises distribution is restricted to κ > 0, so we chose a κ low enough to be practically indistinguishable from 0.
The highest secondary peaks are randomly located and do not seem to be aliases of the true period; there are usually only a few for very high S/N (e.g. S∕N ~ 10). In addition, a large number of lower peaks are present even at S∕N ~ 0, which can just be attributed to noise. The peak narrowness does not depend on the S/N.
Watanabe (2013) also proposed an extension of the BIC, the widely applicable Bayesian information criterion (WBIC). However, to compute the WBIC samples from both the posterior and the prior are needed. The latter are not provided by the current Stan version, and therefore we do not consider the WBIC here.
Defined as , where is the sample mean resultant length, for a sample of n of angles γ_{i} whose mean is , and calculated using the sd.circular function in the circular package for the R software (see Pewsey et al. 2013).
All Tables
Probability distributions of randomised parameters for the singleplanet simulations.
Information criteria and Δχ^{2} quantiles for falsepositive detections in noplanet simulations, for the four different indicators and for the 5 yr and 10 yr missions.
All Figures
Fig. 1 Illustration of the four geometrical elements: the semimajor axis a_{*}, the longitude of the ascending node Ω (also known as receding node), the inclination i, and the argument of periapsis ω. The light cyan polygon represents the sky plane in a perspective view; the reference direction is taken towards the north (N). The line of nodes, shown by the black arrow pointing towards the ascending node, is the intersection between the sky and the orbit planes. The orbit is shown as the red ellipse. The small red arrow is perpendicular to the true plane and shows the inclination. 

In the text 
Fig. 2 Expected (S/N) for different planet masses as a function of distance. The lines show the cases for 13 Jupiter masses, 1 Jupiter mass, and 1 Neptune mass; with circular orbits at 1, 3, and 4 au (it is only possible to observe a 4 au orbit if the mission length is ~ 10 yr; see Sect. 8). A 1 M_{⊙} star is assumed for the solid lines, while a 0.5 M_{⊙} star is assumed for the dashed lines. 

In the text 
Fig. 3 Histograms for the IC (left panels) and for Δχ^{2} (right panels) in the case of noplanet simulations. The upper panels consider the 5 yr mission, the lower panels the 10 yr mission. Leftpanels: green solid lines, WAIC; red dashed lines, AIC; blue dotted lines, BIC; the vertical dashed lines show the IC thresholds for weak and strong evidence according to Jeffreys’ scale. Right panels: solid and dashed curves show χ^{2} distributions with the dof reported in the legend. The empirical density does not follow a χ^{2} distribution with the nominal number of dof (7, solid red line). It can however be approximated, by a χ^{2} distribution with a larger number of dof: 11 dof for the 5 yr mission, 15 dof (or 16 dof for Δ χ^{2} ≳ 30) for the 10 yr mission. 

In the text 
Fig. 4 Cumulative distributions of the IC and Δχ^{2} for noplanet systems in the 5 yr (upper panel) and 10 yr (lower panel) mission simulations. The four curves show (from left to right): BIC (blue), AIC (red), WAIC (green), Δ χ^{2} (black). The horizontal dashed lines show the 100% and 99.73% (3σ) quantiles. The vertical dashed lines show the IC thresholds for weak and strong evidence according to Jeffreys’ scale. 

In the text 
Fig. 5 Detection fractions as a function of S/N and period. Left panel: 5 yr mission; right panel: 10 yr mission. The colours show the detection fraction and range from light yellow (low detection fraction) to dark green (high detection fraction). The transition between nondetections and detections is sharper and occurs at lower S/N in the 10 yr mission than in the 5 yr mission. Planets with periods longer than the mission length require a stronger S/N to be detected than planets withlower periods. 

In the text 
Fig. 6 Simulated vs. recovered period. The left and right panels show the 5 yr and 10 yr missions, respectively. The upper and lower panels differ only for the starting point of the MC: the ideal case (MC started at the true value) in the upper panel, and the circularorbit model bestfit point in the lower panel. The colours show the density of points where they overlap, going from blue (no overlap) to red (maximum overlap). The error bars show the 68.3% highest posterior density interval. The red solid lines show the onetoone relationship. The period is correctly recovered in the majority of cases. The two sequences parallel to the main relationship are populated with systems that are incorrectly fit with edgeon, higheccentricity orbits. 

In the text 
Fig. 7 Simulated vs. recovered eccentricity. Panels, symbols, and colours as in Fig. 6. The eccentricity is recovered in most cases, albeit with large uncertainties. When the MC are started using the circular orbits model (lower panels), some systems with e_{input} ≲ 0.4 are incorrectly fit with very elliptical orbits (e_{output} ~ 1), especially in the 5 yr mission; these systems are the same as those that lie in the parallel sequences of Fig. 6. 

In the text 
Fig. 8 Simulated vs. recovered angular size of the orbit. Symbols and colours as in Fig. 6. The two panels showonly the results when the MC have been started using the circular orbit model. Most of the points with υ_{output} ≪ υ_{input} have input periods longer than the mission length, and form the horizontal plume at large P_{input} in Fig. 6. 

In the text 
Fig. 9 Simulated vs. recovered inclination. Panels, symbols, and colours as in Fig. 8. The inclination is correctly recovered for most systems. Especially in the 5 yr mission, there is a fraction of systems which are mistaken as edgeon (cos i_{output} ~ 0) which seems to be independent of the input inclination, but which is often associated with high eccentricity (e_{output}~ 1) and with the parallel sequences in the period. 

In the text 
Fig. 10 Simulated vs. recovered longitude of ascending node (Ω). Panels, symbols, and colours as in Fig. 8. Since Ω is an angle, we plot the data points twice (at nominal position and at ±180°) to better show the structure of the noise. 

In the text 
Fig. 11 Accuracy and precision of the fit parameters. Left panels: 5 yr mission; right panels: 10 yr mission. For each orbital parameter, the accuracy is the absolute difference between the input value and the median of the output posterior distribution. The precision is the length of the 68.3% highest posterior density interval (HPDI). The figures depicted by the black curves show the distribution of either accuracy or precision; they have normalised areas, so that at every value of the abscissa, their height is proportional to the density of the distribution. The red dots mark the median of each distribution. Relative quantities are shown for quantities that vary over several orders of magnitude, namely P and υ; absolute quantities for the remaining parameters. We consider T_{p}∕P (instead of just T_{p}) because it is always contained in the [0, 1] interval; thus, T_{p} has a median precision of 5% P in the 5 yr mission. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.