To evaluate the performance of the algorithm, we used the same method as Doyle et al. (2000). For each set of trial parameters the algorithm was run first on a set of one hundred simulated light curves containing only Gaussian noise and no transits. Subsequently it was run on another set of one hundred simulated light curves containing Jovian-type planetary transits with the characteristics described in Sect. 2.1, with the same level but different realizations of the photon noise, and with uniformly distributed random phases

For each simulation, the modified posterior probabilities were plotted versus period and the value of the maximum was noted. This maximum is our "detection statistic'', on the basis of which we determine whether there is a transit or not. We then plot a histogram of the detection statistics measured from running the algorithm over all the light curves with transits and one histogram for all the light curves with noise only. In other words, one histogram corresponds to the cases where the transit hypothesis is correct and one to the cases where the null hypothesis is correct. Ideally, the two distributions would be completely separated, with no overlap, and choosing a detection threshold located between the two histograms would guarantee a 100% detection rate and a 0% false alarm rate. In practice, for the cases of real interest, close to the noise level, the two histograms will show an overlap. A compromise has to be found by choosing a threshold which minimises a penalty factor designed to take into account both false alarm and missed detection rate. This is illustrated in Fig. 3.

$\begin{figure} \par\includegraphics[width=8.8cm,clip]{h3360f3.eps} \end{figure}$

Figure 3: Schematic diagram of the method used to set the optimal threshold and compute the false alarm and missed detection rate. Solid line: distribution of the detection statistics obtained for lightcurves with noise + transit. Dashed line: distribution of the detection statistics obtained for lightcurves with noise only. Vertical solid line: threshold. The hashed area, to the left of the threshold but under the "transits'' distribution, corresponds to the missed detection rate. The filled area, to the right of the threshold but under the "noise only'' distribution, corresponds to the false alarm rate.

Depending on the circumstances, it may be more important to minimise the false alarm rate than the missed detection rate. This is the approach followed by Jenkins et al. (2002), on the basis that detections from space experiments are hard to follow-up from the ground. An alternative view is any real transit that is rejected is a loss of valuable scientific information. As long as the false alarm rate is kept to a manageable level, further analysis of the light curves will prune out the false events. We have opted here for an intermediate position, and our penalty factor is simply the sum of the missed detection rate ${N}_{{\rm MD}}$ and the false alarm rate ${N}_{{\rm FA}}$ :

5.2 Results

5.2.1 An ideal case

In Defaÿ et al. (2001a), analysis performed on the basis of 200 bootstrap samples for the COROT observations of a star with magnitude 13 and an Earth-sized planet showed, with 6 transits lasting 5 hr each, a probability of true detection of around 0.3. We performed the simulations described in Sect. 5.1 for a similar case: Earth-sized planet orbiting a K5V type star with V=13 with a period of $932 \times 15$ min and a transit duration of 5 hr. The light curve is sampled with 15 min bins. The noise is different from the COROT case, as we concentrate uniquely on the photon noise expected for Eddington.

The results are shown in Fig. 4 for period and phase separately. As the distributions for the noise only and transit light curves are completely separated, each parameter alone is sufficient to determine a threshold ensuring null false alarm and missed detection rates.

$\begin{figure} \par\includegraphics[width=7.5cm,clip]{h3360f4.eps} \end{figure}$

Figure 4: Distributions of the detection statistics for an Earth-sized planet orbiting a V=13 star with period $P=932\times 15$ min. Solid line: lightcurves with noise + transit. Dashed line: lightcurves with noise only. Vertical solid line: threshold value. a) Period. b) Phase. Over 100 realizations there were no false alarms and no missed detections.

5.2.2 Performance of the algorithm at the noise limit

Given that the key scientific goal of Eddington in the field of planet-finding is the detection of habitable planets, the performance of the algorithm was extensively tested for habitable planets at (or close to) the noise limit of Eddington. The case of an Earth-like planet orbiting a K dwarf in a habitable orbit was used as benchmark. The light curve was simulated for a system with the following parameters:

With the results of the simulations, an example of which is shown in Fig. 6, the analysis described in Sect. 5.1 was performed for all three magnitudes, confirming that the combined use of the two statistics improves the results. This is illustrated for the V=14.5 case in Figs. 7, 8 (for this particular case 1000 rather than 100 runs were computed to improve precision).

$\begin{figure} \par\includegraphics[width=7.5cm,clip]{h3360f5.eps} \end{figure}$

Figure 5: An example light curve containing 4 transits of an Earth-like planet orbiting a K5V star with V=14.5. a) Full light curve. b) Portion around a transit. c) The four transits phase-folded.

$\begin{figure} \par\includegraphics[width=8.8cm,clip]{h3360f6.eps} \end{figure}$

Figure 6: Example of posterior probability distributions arising from the lightcurve shown in Fig. 5 (arbitrary units). a) Period: real value = 2912 hours, error = -2 hours. b) Phase: real value = 0.885, error = 0.005.

$\begin{figure} \par\includegraphics[width=7.4cm,clip]{h3360f7.eps} \end{figure}$

Figure 7: Distributions of the detection statistics for an Earth-sized planet orbiting a V=14.5 star with period P=4 months. Solid line: lightcurves with noise + transit. Dashed line: lightcurves with noise only. Vertical solid line: threshold value. a) Period: 190 false alarms and 185 missed detections over 1000 realizations. b) Phase: 27 false alarms and 14 missed detections over 1000 realizations.

$\begin{figure} \par\includegraphics[width=7.5cm,clip]{h3360f8a.eps}\par\includegraphics[width=6.2cm,clip]{h3360f8b.eps} \end{figure}$

Figure 8: a) Contour plot and b) 3-D representation of the two-dimensional distributions of the period and phase detection statistics for an Earth-sized planet orbiting a V=14.5 star with period P=4 months. Black: lightcurves with noise + transit. Grey: lightcurves with noise only. Solid line: Optimal threshold line. ( $S_{{\rm ph}} = 42.47 - 1.191 \times S_{{\rm per}}$ ), yielding 29 false alarms and 9 missed detections over 1000 realisations.

$\begin{figure} \par\includegraphics[width=8.5cm,clip]{h3360f9.eps} \end{figure}$

Figure 9: Evolution of the algorithm's performance (in terms of fractional error rates) with magnitude (P=4 months, V=14.5, Light curve duration 16 months). a) Using the period statistic only. b) Using the phase statistic only c) combining the two statistics dotted line: false alarm rate. Dashed line: Missed detection rate. Solid line: mean error rate.

As illustrated in Fig. 9, a mean error rate of <3% can be achieved up to magnitude 14.5. This magnitude is therefore taken as the performance limit for the algorithm for an Earth-sized planet around a K5V-type star. However this analysis is not complete enough to allow a precise determination of the limit, as the noise treatment is incomplete (photon noise only being considered) and one would need more runs per simulations to compute meaningful errors on the false alarm and missed detection rates (sets of 1000 runs, as was done for the limiting V=14.5 case, should be computed for all cases).

The asymmetric shape of the distributions shown in Figs. 4, 7 and 8 implies that, even though the thresholds are chosen to minimise false alarms and missed detections equally, the optimal threshold results in more false alarms than missed detections. This could easily be avoided, if needed, by replacing Eq. (9) by:

As in any unbiased search for periodicity in a time-series, the inclusion of a larger range of periods in the search will lead to a higher chance of finding a spurious (noise-induced) period signal in the data. The simulations used here to assess the algorithm's performance are based on a search through a relatively small range of periods. In practice, lacking any a priori knowledge of the possible periodicity of planetary orbits around the star being observed, one will have to test a large range of periods, ranging from few days (the physical limit of the period of planetary orbits) all the way to the duration of the data set (searching for individual transit events).

5.2.3 Data gaps

Any realistic data set will suffer from gaps in the data. While the orbits of both Eddington and Kepler have been chosen to minimize gaps, 100% availability is not realistic, and gaps will be present due to e.g. telemetry dropouts, spacecraft momentum dumping maneuvers, showers of solar protons during large solar flares, etc. For this reason any realistic algorithm must be robust against the presence gaps in the data, showing graceful degradation as a function of the fraction of data missing from the time series.

We have therefore tested the algorithm discussed here using simulated light curves with 5%, 10% and 20% data gaps, randomly distributed in the data, i.e. 5% of the points in the time series are selected randomly with a uniform distribution and removed from the light curve. The gaps will probably not be randomly distributed in reality, but as the typical gap duration is expected to be of order 1 or 2 hours, simulated random gaps can already be used to test the algorithm's robustness. For reasons of computing time, to avoid having to recalculate the "window function'' at each run, the distribution of the data gaps is the same for all runs of a simulation. As the gaps are chosen one by one there are rarely gaps of more than two consecutive time steps, i.e. 2 hours. Note that e.g. the Eddington mission is designed to produce light curves with a duty cycle $\ge$ $90\%$ , so that the case with 20% data gaps represents a worst case analysis.

The results are shown in Fig. 10. There is visibly very little degradation up to 20% data gaps. When using $S_{{\rm per}}$ alone or the two statistics combined there is no perceptible difference. We can therefore say this algorithm is robust at least for data gaps of the type likely to occur due to e.g. telemetry dropouts, which last only a few hours. One would also expect the algorithm to perform well in the presence of longer gaps: the effect of gaps is to render the number of samples per bin uneven, and this is already the case for this particular method with no gaps at all.

$\begin{figure} \par\includegraphics[width=8.5cm,clip]{h3360f10.eps} \end{figure}$

Figure 10: Evolution of the algorithm's performance with data gaps (P=4 months, V=14.0, Light curve duration 16 months). a) Using the period statistic only. b) Using the phase statistic only. c) Combining the two statistics. Dotted line: false alarm rate. Dashed line: missed detection rate. Solid line: mean error rate.

5.2.4 Number of transits in the data

The planetary transits detection phase of the Eddington mission is planned to last 3 years with a single pointing for the entire duration of that phase. There will therefore be three or four transits in the light curve for a typical habitable planet. However, other missions such as COROT are planned with shorter (5 months) pointings and it is of interest for this type of mission to study the degradation of the algorithm's performance as the number of transits in the light curve reduces. If the algorithm performs well with 2 or less transits, in the context of Eddington it may also allow the detection of "cool Jupiters'', i.e. Jupiter-sized planets with orbits more similar to those of the gaseous giants in our solar system. This would be of relevance to the question of how typical our solar system is.

Sets of 100 runs with the characteristics specified in Sect. 5.2.2 for a star of magnitude 14.5 were computed for light curve durations of 4, 8, 12, 16 and 20 months, containing between 1 and 5 transits. The results are shown in Fig. 11. The degradation only becomes significant when less than three transits are present. However, even mono-transits could be detectable for larger planets at that magnitude.

Defaÿ et al. (2001a) compared a matched filter approach with a Bayesian method based on the decomposition of the light curve into its Fourier coefficients. Their results suggest that the performance degradation in the low number of transits case is faster for the Bayesian method than for the matched filter. This is because the matched filter makes use of assumptions about the transit shape. It is also shown that when the Bayesian method fails to detect a transit, it can still reconstruct it if the detection is performed using a matched filter. Our algorithm has not been directly compared to a matched filter. Its very design is based on the search for a short periodic signal in an otherwise flat lightcurve, which is itself an assumption about the shape of the signal. The matched filter makes use of more detailed knowledge of the transit shape and is therefore likely to perform better in the low transit number limit. However our algorithm with n=1 may provide already a very good approximation to the relatively simple shape that is a transit, and therefore perform nearly as well.

$\begin{figure} \par\includegraphics[width=8.5cm,clip]{h3360f11.eps} \end{figure}$

Figure 11: Evolution of the algorithm's performance with the number of transits in the light curve, i.e. the light curve duration (P=4 months, V=14.5). a) Using the period statistic only. b) Using the phase statistic only. c) Combining the two statistics. Dotted line: false alarm rate. Dashed line: missed detection rate. Solid line: mean error rate.

5.2.5 Differences in the two statistics

The two a posteriori probabilities show a different behavior. In general the phase statistic is far more discriminatory than the period statistic. The period statistic's lesser effectiveness may be explained in the following way. If the phase is wrong, even if the period is right, it is likely none of the transits will be matched. If the phase is right, whatever the period, at least the first transit will be matched by the model. First we consider the likelihood distribution a function of phase, normalised over all periods. For an incorrect phase the contribution from the correct period is nil as all transits are missed, but for the correct phase all trial periods produce a non-negligible contribution (the correct period of course contributing most). The likelihood distribution as a function of phase is therefore sharply peaked. Then we consider the likelihood distribution as a function of period, normalised over all phases. The contribution from the correct phase is non-negligible whatever the period. When the period is correct, the contribution from the correct phase is washed out by the contributions from all the incorrect phases. The likelihood distribution as a function of period is therefore less sharply peaked.

However the combined use of the two parameters is more successful than the phase statistic alone. The reason for this is illustrated in Fig. 8: in 2-D space the two distributions are aligned on a diagonal, such that no single value cutoff is optimal in either direction, compared to the line shown. In an upcoming paper, the direct use of a combined statistic shall be investigated. The global odds ratio described in Sect. 3.3 could be used for such a purpose. We have noted in Sect. 3.5 that the global odds ratio for a given lightcurve cannot be used as an absolute statitstic in the context of the present method. It can however be used as relative detection statistic, like $S_{{\rm per}}$ & $S_{{\rm ph}}$ , combined with bootstrap simulations.