Automated eccentricity measurement from raw eclipsing binary light curves with intrinsic variability

,


Introduction
Eclipsing binary (EB) systems are an important source for determining the physical properties of stars (Torres et al. 2010).By being locked in a gravitational dance they provide the opportunity to directly measure individual stellar masses, and by eclipsing each other they provide the opportunity to measure the stellar radii (see Southworth 2012, for a brief but clear introduction to the topic).To achieve these measurements, the right observational data are needed: high-precision photometric time series of the brightness of the EB and enough data points of the radial velocities of both stars obtained through spectroscopy.Due to the nature of the observations, obtaining spectra is a lot more time-intensive than photometry, although both types of observations are seeing big boosts in the available amount of data: for example, from the Convection, Rotation and planetary Transits (CoRoT; Auvergne et al. 2009), Kepler (Koch et al. 2010), and Transiting Exoplanet Survey Satellite (TESS; Ricker et al. 2015) missions for photometry and the Gaia-ESO (Gilmore et al. 2012), GALactic Archaeology with HERMES (GALAH; De Silva et al. 2015), SDSS-IV Apache Point Observatory Galactic Evolution Experiment 2 (APOGEE-2; Blanton et al. 2017), and forthcoming SDSS-V Milky Way Mapper (MWM; Kollmeier et al. 2017) surveys for spectroscopy.determined from the eclipses can inform spectroscopic followup, and it is also an important ingredient in a further analysis of the system.Orbital eccentricities give the shape of the orbit and, by being tied to dissipation within the stars, provide a window into the stellar properties.The argument of periastron and inclination define two of the three angles of the orientation of the orbit1 .The sum of the radii scaled by the semi-major axis and the ratio of radii provide the sizes of the stars relative to the size of the orbit, which also ties back to dissipative processes.The surface brightness ratio is related to the effective temperature ratio and, when combined with the scaled radii, tells us whether the two stars are more like twins or distant cousins.
There are several more properties that influence the shape of the light curve, one example being limb darkening, but they are usually subtler effects.Furthermore, many stars, including those in binaries, are found to exhibit intrinsic variability, such as pulsations (Lampens 2021, and citations therein).Systems that show both eclipses and pulsations offer the biggest opportunity to learn by virtue of the information encoded in their light variations (Huber 2015).Various studies have taken advantage of this, as shown by the in-depth analyses by for example, Maceroni et al. (2009), Schmid et al. (2015), Johnston et al. (2019bJohnston et al. ( , 2021Johnston et al. ( , 2023)), Maxted et al. (2020), Sekaran et al. (2020Sekaran et al. ( , 2021)), Guo et al. (2022), Southworth & Bowman (2022), and Southworth & Van Reeth (2022).However, these studies only treated between one and several dozen systems.The analysis of these systems is often complicated: the amplitudes of pulsations can be comparable to or even exceed the depth of eclipses in the light curve, rendering it impossible to treat a single source of light variation separately.Any generalised approach should therefore account for both the eclipses present and any intrinsic variability that one or both stars exhibit.
Many interesting eclipsing systems have been manually investigated in depth to great effect, providing physical insights on a case-by-case basis (e.g.Schmid & Aerts 2016;Johnston et al. 2019a;Jennings et al. 2023;Pavlovski et al. 2023;Rosu et al. 2023, to name just a few).It is through these kinds of studies that we can calibrate in detail our models of stars, for example with regards to the discrepancy between the dynamical and evolutionary masses of intermediate-to high-mass stars (e.g.Burkholder et al. 1997;Guinan et al. 2000;Weidner & Vink 2010;Massey et al. 2012;Tkachenko et al. 2020;Johnston 2021).This underlines the value of these systems for the calibration of stellar models.With the number of known EBs in the thousands and increasing, investigating every one of them by hand in a reasonable amount of time is infeasible.If a portion of the analysis could be done in an automated way, this would give the researcher more time to take a more detailed look at a select number of promising systems.Additionally, there is a lot of power in numbers, as certain aspects of stellar physics can only be constrained by being compared with the population as a whole (see e.g.Van Eylen et al. 2016;Lurie et al. 2017;Wells & Prša 2021).Altogether, high levels of automation in data analysis are needed to achieve many of the goals in stellar astrophysical research today.
Notable work towards automated pipelines for EB analysis includes Detached Eclipsing Binary Light curve fitter (DE-BiL;Devor 2005), Eclipsing Binary Automated Solver (EBAS; Tamuz et al. 2006), and Eclipsing Binaries via Artificial Intelligence (EBAI; Prša et al. 2008).The DEBiL analysis code is part of a larger EB detection pipeline and uses simulated annealing and the downhill simplex method (Nelder & Mead 1965) to solve the inverse problem.EBAS has a strategy for making an initial guess of the parameters using grid searches, gradient descent, and light curve geometry and then employs simulated annealing as well.DEBiL uses its own light curve model implementation of limb-darkened spherical stars, whereas EBAS uses the more complex EBOP light curve generator.With EBAI, Prša et al. (2008) took a different approach and used machine learning with neural networks trained on the rigorous Wilson & Devinney (Wilson & Devinney 1971) model to arrive at a fast estimator for EB parameters.All three pipelines were applied to the published database of Optical Gravitational Lensing Experiment (OGLE; Udalski et al. 1992) EBs (Wyrzykowski et al. 2003), which at the time typically had of the order of a few hundred data points per target.However, while two of the pipelines include a measure of ellipsoidal variability, none account for the intrinsic variability in the form of pulsations that we are specifically pursuing in this work, given its high astrophysical value.
We present a new and fully automated method and its implementation, STAR SHADOW, for the analysis of EB light curves with and without intrinsic variability.Starting from an input light curve, the algorithm can provide the orbital period, several orbitand eclipse-specific parameters such as eccentricity, and a set of sinusoids that are added to the model of eclipses to fully describe the light variations in the time series.If further investigation into a system's eclipses or the remaining variability is desired, the removal of the other signal, be it the eclipses or the pulsations, is greatly simplified.We validated our method on artificial data and applied it to a population of EBs from the Kepler database (Prša et al. 2011;Slawson et al. 2011;Kirk et al. 2016) to further validate the computed orbital periods.For the Kepler population, we determined the orbital configurations, in particular eccentricities, and the properties of the variability intrinsic to the binary components.We provide a detailed description of the method and its application to a number of realistic synthetic test cases in this paper.

Light curve analysis
We developed an automated recipe for the analysis of EB light curves to handle large numbers of targets in a consistent, timeefficient way and without manual involvement.The implementation of the recipe, called STAR SHADOW (Satellite Time series Analysis Routine using Sinusoids and Harmonics Automatedly for Double stars with Occultations and Waves), is publicly available on GitHub 2 (version 1.1.7awas used here) and comes with a short guide to get started.Since one of the main aims is the application to data from the TESS mission, the algorithm works best for high-duty-cycle satellite time series photometry and there are a few TESS-specific features like loading in TESS data products and recognition of TESS observing sectors.The algorithm is otherwise fully agnostic to the source of the input light curve, so in the following text if we mention 'sectors', this can be taken to mean any subdivision of the data in the time domain.The method is meant to be applied to known EBs with space-based time series data of tens of ppm precision, which covers the TESS, Kepler, and PLAnetary Transits and Oscillations of stars (PLATO; Rauer et al. 2022) missions, and it is assumed that care has been taken in detrending the raw light curves.Our own two main goals are the determination of the orbital eccentricity and a measure for the significance of variability in the light curve after subtraction of the eclipses.

Eclipses
Fig. 1.Flowchart of the STAR SHADOW analysis steps.Blue rectangles contain analysis steps that are themselves grouped into four more general functionalities.Read and write operations are in yellow parallelograms, and the orange diamond denotes a decision point to redo a number of steps concerning the orbital period.

Top level overview
We provide a visual representation of the flow of the analysis steps in Figure 1, with additional example analysis output given in Appendix B. The recipe starts with an iterative pre-whitening procedure meant to capture all the statistically relevant (with regards to the Bayesian information criterion) variability in the light curve in a mathematical model of sine waves.The parameters of the sinusoids are optimised and, if not given beforehand, the orbital period is determined.Frequencies in the previously extracted list that match a theoretical orbital harmonic within a tolerance are forced to the exact multiple of the orbital frequency, thus replacing all frequencies within the tolerance with a single harmonic.If frequencies are closely spaced around the harmonics, this approach will remove too many frequencies from the model: an additional round of iterative pre-whitening ensures that the model of the light curve is complete.The model is again optimised, now including the orbital period as a free parameter, although its value tends to only change by a fraction of the period error estimate at this point, if at all.
The eclipses are located in the time series using their signatures in the first and second time derivatives, using only the model of harmonic sinusoids for the time derivatives to reduce the chance of mistaking intrinsic variability for eclipses.Examples of what is meant by such eclipse signatures can be seen in Figure 2. The eclipse timings and depths are measured by finding specific extrema and zero points in the two time derivatives.The eclipse detection is based on the methodology described in IJspeert et al. (2021, Section 3).
The eclipse timings and depths are translated into some of the orbital and physical parameters of the system: eccentricity, argument of periastron, inclination, sum of the scaled radii, ratio of radii, and ratio of surface brightness.This is done using Kepler's second law and by iteratively solving a set of equations that relate specific points in the orbit to the system's properties.This is discussed further in Sect.2.2.3.
The orbital and physical parameters are used as a starting point to fit a simple light curve model of eclipses to the data, to potentially increase the accuracy of some of the parameters.The model assumes spherical stars of uniform surface brightness and we refer to it later as just the 'eclipse model' for brevity.The fit is done in three steps: fit only the eclipse model to the original light curve, subtract the eclipse model and iteratively pre-whiten the residuals, and then optimise the full model of eclipses and sinusoids together.
Finally, the ratios of eclipse depth to variability amplitude are computed to have a quick statistic of the balance between the two signals.At every step that adds, removes, or otherwise alters sinusoids, the extracted sinusoids are checked for significance against a signal-to-noise threshold, but not removed based on this check.

Detailed description
The model of the light curve constructed during the iterative prewhitening procedure also includes a linear trend for each sector.For TESS data, the linear trend is split up according to the first and second half of each sector; each piece is described by an independent slope and a y-intercept.This captures the average level of the light curve as well as remaining linear instrumental trends in the TESS data products, which may include sudden jumps in the light curve of a sector due to momentum dumps of the gyroscopes.Where the linear trend is split up can be defined by the user.The slope and y-intercept are calculated through their maximum likelihood estimators (MLEs), and updated at each step of the subsequent iterations of the pre-whitening as well as during optimisation procedures.The formulae of the MLEs are provided in Appendix A.  The time axis of the data is mean-subtracted throughout the analysis, which ensures a minimal correlation between parameters such as the slope and the y-intercept, and between the frequency and the phase.This means that sinusoid phases and eclipse times are measured with respect to this reference time.For the linear trends, the mean is computed for each separate sector, so their y-intercepts are with respect to those time points.

Frequency analysis
This subsection discusses the 'sinusoids' block in the flow diagram (Figure 1) and the visualisation in Figure B.1.In the iterative pre-whitening procedure we build a model of the light curve in terms of sine waves: where t is time, f is frequency, A is amplitude, and ϕ is phase.In this procedure, the next sinusoid to be subtracted is picked by finding the highest amplitude peak in the Lomb-Scargle periodogram (Lomb 1976;Scargle 1982).The periodogram is oversampled by a factor of 10 with respect to the Rayleigh criterion given by 1/T, where T stands for the total time base of observations. 3The highest peak in the periodogram is over-sampled by an additional factor of 100 to obtain a precise position (i.e.frequency) and amplitude.The phase of the sinusoid is determined directly from the periodogram as well, using the formula described by Hocke (1998, Equation 12).
A sinusoid is only accepted if the Bayesian information criterion (BIC) of the residuals is reduced by at least two, indicating at least positive evidence of its presence by rule of thumb (Raftery 1995;Kass & Raftery 1995).The BIC decreases with more white-noise-like residuals (i.e. containing less information) but increases with additional free parameters, of which each sinusoid initially has three (see Appendix A for a brief derivation).If the next sinusoid does not meet the requirement, the iteration is stopped.This was found to be a suitable procedure for CoRoT space photometry (Degroote et al. 2009).In the case red noise is present, and depending on its power spectral density, it is possible that frequency peaks of high enough signal-to-noise are missed if they have a frequency above the regime affected by the red noise but an amplitude below the amplitude of the red noise.
If at any point during the iterative pre-whitening a frequency is found within 1.5 times the Rayleigh criterion (Loumos & Deeming 1978) of a previously extracted frequency, their peaks in the periodogram will have been blended prior to subtraction.This blending means that the top of the true peak in the periodogram got shifted and the extracted frequency is in a different location.A mechanism is introduced to counteract this shift, which checks each frequency that is being extracted for closely spaced frequencies in the previously extracted list.If a chain of closely spaced frequencies is found around the currently extracted frequency, all frequencies in the chain are iteratively updated by removing one of them at a time and re-extracting it from the updated periodogram around that frequency.Here 'chain' is used to mean any consecutive set of frequencies that are closer than the frequency resolution (1.5/T) to their direct neighbour.This extends to frequencies that are further than the frequency resolution of the currently extracted frequency.The iteration of this sub-loop is terminated when the BIC of the residuals no longer decreases.
The sinusoid parameters may now have shifted with respect to their initial extraction, and the algorithm may have to remove some sinusoids again based on the BIC.This is done in two parts, starting with removing individual sinusoids one by one and checking whether it results in a decrease in the BIC of more than two.Secondly, it is checked whether chains of closely spaced frequencies (or any consecutive sub-chains) can be replaced by a single frequency and yield an improvement.
At the end of each step that involves extracting or updating sinusoid parameters, their significance is checked based on several criteria and stored alongside the parameters.This has no further effect on the analysis process.Sinusoids with an amplitude that is consistent with zero within three sigma are marked insignificant, and for frequencies that fall within three sigma of each other, the lowest amplitude sine wave is marked insignificant.The frequency-dependent periodogram noise level is computed by taking a running average over the Lomb-Scargle periodogram of the residuals of the light curve using a frequency window of one per day, as is common in the literature.The signal-to-noise for each sinusoid is the amplitude divided by this periodogram noise level and those not meeting the threshold are marked insignificant.We used the signal-to-noise threshold as determined by Baran & Koen (2021, Equation 6), which is a function of the number of data points and computed for a false alarm probability of 0.1%.We selected candidate harmonics as well, marking only the sinusoids that are within three sigma of a theoretical harmonic frequency (this selection is only relevant later in the analysis).
At the end of the frequency extraction, it is necessary to do a multi-sinusoid non-linear optimisation to get closer still to the real frequencies present in the light curve (Bowman & Michielsen 2021, Sect. 3).We implemented the minimize function from the Scipy (Virtanen et al. 2020) optimisation module and used the Limited-memory Broyden-Fletcher-Goldfarb-Shanno Bounds (L-BFGS-B) method with analytical gradients (see Appendix C for more insight into the choice of method and the mentioned gradient formulae).Doing such an optimisation whereby all sinusoids are simultaneously fitted is extremely time-consuming.Therefore, to limit the time taken the algorithm fits in groups of 20 to 25 sinusoids.This reduces the fit performance in terms of the final BIC value by about one-quarter of what it could have reached with a fully simultaneous fit in small-scale simulated tests, but the time saved is highly disproportional to this performance decrease (orders of magnitude).Groups are based on the amplitude of the sinusoids, bundling those with similar amplitudes together with the idea that those will have a similar level of influence on each other and on the likelihood function.The division between groups is made by sorting the amplitudes in descending order and making the cut where the largest jump in amplitude occurs within the given constraints of minimum and maximum group size.Group size is not fixed to one number in order to increase amplitude disparity between groups.Frequencies, amplitudes, and phases of the sinusoids in each group are fitted as free parameters in turn until all sinusoids have been optimised.Slopes and y-intercepts of all linear trends always remain free parameters throughout the fitting process.

Frequency analysis with harmonics
This subsection discusses the 'harmonics' block in the flow diagram (Figure 1) and the visualisation in Figure B.2.We now introduce the orbital period as a free parameter. 4If known, an initial period can be provided, which is then further optimised.Alternatively, a period-finding algorithm is used to obtain the unknown orbital period.
The period-finding algorithm makes use of the already obtained information of sinusoids by only searching at the extracted frequencies and fractions of those.A combined phase dispersion minimisation measure (Stellingwerf 1978) and Lomb-Scargle amplitude statistic is computed as described in Saha & Vivas (2017).In addition, this is multiplied with two more statistics that use the known list of frequencies and are geared towards improving the odds of finding the correct EB orbital period.These two statistics are calculated as follows: (1) The number of harmonic sinusoids present in the extracted frequencies.At each tested orbital frequency, the theoretical orbital harmonic frequencies are computed and compared to the list of extracted frequencies.Those extracted frequencies that fall within the frequency resolution (1.5/T) of a theoretical harmonic are selected and count towards the length of the harmonic series for that orbital frequency.(2) The filling factor of the harmonic series.The second statistic is also based on the harmonic series and captures the filling factor calculated as the number of harmonics present in the extracted frequency list divided by the total number of theoretically possible harmonics below the Nyquist frequency.The Nyquist frequency is defined as the sam-pling rate divided by two; we used one over the minimum time step for the sampling rate.
After multiplying these four statistics the highest scoring frequency is taken as the initial orbital frequency.The best orbital frequency is further refined locally by minimising the sum of squared distances between the series of extracted harmonic frequencies and the theoretical harmonic frequencies in a grid with a sampling of 0.001% over a 1% interval around the orbital frequency.
As a final check, the two statistics on the series of harmonics as described above are calculated for several multiples of the orbital period: factors of 1/2, 2, 3, 4, and 5.There are two conditions for changing the period multiple, concerning both the filling factor itself and the multiplication of the two statistics of the filling factor and harmonic series length.If the multiplication of the two statistics results in more than 1.1 times the value in this statistic than for the original period and the filling factor is at least 0.85 times that of the original period, the final orbital period is set to the multiple achieving the best score.Alternatively, if both the mentioned values exceed 0.95 specifically for doubling the period, the period is doubled.The threshold values for this decision are empirically determined based on the tests described in Sections 3 and 4.
The error on the period is estimated by taking the simple linear regression (SLR) uncertainty where each observed eclipse is represented by a single timestamp.This error is only a function of the number of observed eclipses and the uncertainty on the individual timestamps, which is taken to be half the integration time for the observations at this stage.The number of observed eclipses is approximated by rounding down the time base divided by the orbital period, and thus assuming no gaps.
Once the algorithm has a binary period, it selects the orbital harmonic frequencies.All frequencies falling within 1.5 times the Rayleigh criterion of the theoretical harmonics are removed and replaced by exact multiples of the orbital frequency.The new amplitudes and phases are obtained from the periodogram.From this point, the orbital harmonics will have a reduced number of free parameters (from three for unconstrained sinusoids to two, for example), and the orbital period is one additional free parameter.Despite the reduction in parameters and updated harmonic sinusoids, it is possible for this step to increase the overall BIC of the residuals, as too many sinusoids may have been removed.
Since harmonics have one free parameter fewer than their base frequency, it is possible that additional harmonics can be extracted from the periodogram while decreasing the BIC of the residuals by more than two.This is attempted by calculating the amplitude and phase at harmonic frequencies not yet occupied in the frequency list, up to the Nyquist frequency, and checking the significance criterion before they are added to the sinusoidal model of the light curve.Additionally, due to the removal of candidate harmonics, it is likely that more unconstrained sinusoids can be extracted analogously to the initial pre-whitening.After adding sinusoids to the list, the algorithm checks again whether individual frequencies have to be removed, or whether closely spaced chains of frequencies can be replaced by a single frequency based on the BIC of the residuals.If a chain of close frequencies contains a coupled harmonic sinusoid, the algorithm attempts to replace the chain with the harmonic (and therefore does not remove the harmonic).The harmonic does not change in frequency, but its amplitude and phase can change.Finally, a second multi-sinusoid non-linear optimisation that is more or less analogous to the first is performed.The only differences are that a set of harmonic sinusoids is now coupled to the orbital period and the period is optimised along with the amplitudes and phases of the harmonics.

Eclipse timing measurements
This subsection discusses the 'timings' block in the flow diagram (Figure 1) and the visualisation in Figure B.3.For the purpose of locating the eclipses, the eclipses are separated out from other variability in the light curve as much as possible by only using the model of orbital harmonic sinusoids (henceforth: harmonic model).Variability at the harmonic frequencies other than the eclipses cannot be separated yet, which can cause problems when that variability has high frequencies relative to the orbital frequency.For this reason, the algorithm initially determines the approximate eclipse positions using orbital harmonics up to and including the twentieth harmonic.Including fewer harmonics does not help as the eclipses lose too much detail.If this fails, it is attempted again with the first forty harmonics and finally with all the orbital harmonics.We provided a visualisation in Figure 3 to aid in the explanation of the analysis steps below, matching the markers in the figure with numbers and letters between brackets in the text.
The first and second analytically computed time derivatives of the harmonic model inform the automated recognition of the eclipse signatures.The highest peaks in the absolute first derivative belong to the steepest slopes, which in most EB light curves will mean an eclipse ingress or egress.The sign of the first derivative at these locations allows the algorithm to distinguish the start of the eclipse from the end of the eclipse.The algorithm proceeds by determining the first and last contact points5 of the eclipses by following several steps, as described below.Starting from the extrema in the first derivative (1) and moving away from the centre of the eclipse, we find its first zero point (2).Between the extrema and these zero points, there is a minimum in the second derivative (3) where the curvature of the eclipse is greatest (most negative).The minima in the second derivative are taken to be the inner limits for the position of the first and last contact points.The algorithm checks for the presence of local minima in the absolute first derivative (4) between the zero points in the first derivative and the minima in the second derivative.If present, these local minima replace the zero points in the first derivative as the outer limit for the position of the first and last contact points.The times of first and last contact (a) are taken to be midway between the described inner and outer limits, and the interval (3-4) is taken to be the three sigma error region.
Towards the centre of the eclipse, the algorithm determines the first and last internal tangency points6 , analogously to the contact points albeit in the opposite direction.Starting again from the extrema in the first derivative (1) and moving towards the centre of the eclipse, we find its first inner zero point (5).Between the extrema and these zero points, there is a maximum in the second derivative ( 6) where the curvature of the eclipse is greatest (most positive).The maxima in the second derivative are taken to be the outer limits for the position of the first and last tangency points.The algorithm checks for the presence of local minima in the absolute first derivative (7) between the zero points in the first derivative and the maxima in the second derivative.If present, these local minima replace the inner zero points in the first derivative as the inner limit for the position of the first and last tangency points.If the algorithm finds two distinct maxima in the second derivative and the second derivative reaches zero in between them, the first and last internal tangency points (b) are defined as the midway between their respective limits and the interval (6-7) is taken to be the three sigma error region.The internal tangency points now delimit the flattened-off bottom of the eclipse.If instead, the algorithm finds that these two maxima coincide, or the curvature does not reach zero in between them, no such flat bottom is defined.In the case of a highly slanted (asymmetric) eclipse, the second derivative might not reach a strong enough maximum on one side of the eclipse for the algorithm to identify that edge.
The eclipse minima (c) are defined to be in the middle between the internal tangency points, and not as the actual minimum in the harmonic model.This is more robust in any cases with variability at harmonic frequencies that does not belong to the eclipses, and more closely corresponds to the time of conjunction.Possible asymmetry can still be accounted for because the internal tangency points may be anywhere between the contact points.
The best consecutive primary and secondary eclipse combination is chosen by taking those with the largest combined depth.If no secondary is found, or no eclipses are found at all, the analysis is stopped.The method for determination of the orbital eccentricity does not hold for systems with no detected secondary eclipse.This method of locating the eclipses was already described in IJspeert et al. (2021, Section 3), but was adapted and simplified to analytic sums of sine waves instead of raw data.We note that in the left panel of first derivatives in Figure 3, points 7 are not truly local minima in the portrayed derivative of the theoretical eclipse model.Using finite sums of sine waves generally results in many more local minima and maxima than is the case for a theoretical model of eclipses, because of the inflection points around (sharp) bends, thus allowing for the simple identification of points 7 as local minima in the figure if a flat eclipse bottom is present.
Once two eclipses have been detected their timing measurements are refined using all significant harmonics.Limiting to lower harmonics works to make detection more robust, but the downside is that eclipses look wider and shallower.An altered version of the eclipse detection algorithm is used that has the benefit of starting from the previously detected positions and is more robust against high-frequency interference.The characteristic points measured are the same, but the method of arriving at them is different.
Starting off from previously determined inner zero points in the first derivative (5), the algorithm adjusts them to the different number of sinusoids included in the first derivative by moving to the nearest zero point.Subsequently, the algorithm determines the updated extrema in the first derivative (1) by searching between the inner zero points and the previous extrema in the first derivative.Moving away from the eclipse centre from the newly found extrema in the first derivative, we find the outer zero points (2).If secondary (local) extrema occur within the outer zero points, and these are sufficiently high (> 80% of the height), the extrema in the first derivative (1) are shifted to those locations.The minima in the second derivative (3) are now found between the extrema and outer zero points in the first derivative.Local minima in the first derivative (4) are found by moving outwards from the minima in the second derivative.The inner zero points (5) are now adjusted again, by moving inwards to zero from the extrema in the first derivative.The maxima in the second derivative (6) are found between the extrema and inner zero points in the first derivative.Inner local minima in the first derivative (7) are obtained by moving inwards along the first derivative curve from the maxima in the second derivative.If the ingresses or egresses are found to be less than a quarter of the width of initial localisations (obtained with fewer harmonics), it may indicate that the algorithm did not capture the full extent of the eclipse due to variability not related to the eclipses.The algorithm adjusts the outer eclipse points (1, 2, 3, and 4) farther outwards by moving to the next extrema in the first derivative (1) and repeating the steps to find the other points.This adjustment is only accepted if it does not cross the original outer edges (4) as determined in the localisation step, and if it increases the depth by more than 20 percent.If the depth is not substantially increased, we are likely already outside of the eclipse.
Improved error estimates for the individual timings and error estimates for the individual depths are determined using the noise level and the slope of the eclipse in-and egress.The noise level is calculated as the standard deviation of the residuals of the light curve.For errors in the individual eclipse times of ingress and egress, the algorithm determines an average slope of each ingress and egress by fitting a straight line to each side of the eclipse.Dividing the noise level by this slope gives the time it takes the flux level to cross the noise level, thus providing an estimate of the timing precision for contact points.Errors in the individual eclipse times are derived from this by taking half and adding in quadrature to the error estimates obtained during eclipse detection: The factor 1/4 comes from halving the 'noise-crossing time' to make an error estimate out of it.Analogous to the orbital period error the error on the final eclipse timings, over all observed eclipses, is determined from the SLR uncertainty, which is only a function of the number of observed eclipses and the uncertainty on the individual timings given by Equation 2. The error for the orbital period is also recalculated, now using the averaged error on the primary and secondary eclipse times as input instead of the time stamp error.
The error in the flux level at first and last contact is estimated as being the noise level, and the error at the bottom of the eclipse is either the noise level (if there is no flat bottom) or the standard deviation of the flux measurements from first to last internal tangency (if there is a flat bottom).These flux level errors then translate into depth errors by adding them in quadrature: The factor 1/4 comes from averaging the flux levels for the first and last contact points in the calculation for the eclipse depth.
To obtain the error on the final eclipse depths, over all observed eclipses, we divided Equation 3 by the square root of the number of eclipses.
Eclipses are rejected under certain conditions, at the end of the detection phase but also at points during the determination of the best candidates for primary and secondary eclipse.Most notably the algorithm applies a significance criterion to the depth and duration of the eclipses based on the estimated errors and a signal-to-noise threshold.The depth must exceed one sigma of the estimated error and one-half the standard deviation of the residuals, whereas the duration must exceed three sigma of the estimated error.Examples of other criteria, applied during the selection of candidate eclipses, look at differences between overlapping candidates and whether a candidate greatly changes in depth between harmonic models with all or only low-frequency harmonics (indicating the occurrence of unrelated signal).These selection criteria are meant to filter out as many false positive candidates as possible before choosing the final set of eclipses to use in the rest of the analysis.However, these criteria or the conditions applied at the end may result in only a primary eclipse (or none), in which case this is logged and the analysis ends.

Eclipse analysis
This subsection discusses the 'eclipses' block in the flow diagram (Figure 1) and the visualisation in Figure B.4. Determination of orbital and physical properties -including the eccentricity, argument of periastron, orbital inclination, sum of the scaled radii, ratio of the radii, and surface brightness ratio -is initially done through the use of formulae coupling these properties to the timings and depths of the primary and secondary eclipse.These formulae can be found in, for example, Kopal (1959), but some will be repeated here in altered form for completeness.As the Article number, page 7 of 43 main link between time and orbital elements, we need Kepler's second law in integral form: where t a and t b represent time points in the orbit, P is the orbital period, e the eccentricity and ν a and ν b the true anomaly angles corresponding to the time points.This equation has an analytic solution, which we show in Appendix A.32. Since we do not know the true anomaly (ν) angles beforehand, we have broken it down into argument of periastron (ω) and phase angle (θ): The argument of periastron is a free parameter defining part of the orbital orientation in the sky, the phase angle gives the position along the orbit and is zero near primary minimum and π near secondary minimum.The algorithm now only needs to determine the phase angles at several strategic points in the orbit to calculate useful values with the integral (4).These points constitute the times of minima, times of contact and for certain cases times of internal tangency as well.Times of minima are reached when the projected orbital separation reaches a minimum, hence when its derivative becomes zero: with i the orbital inclination.From this formula, we can see that in circular orbits the minima are at phase angles of exactly zero and π.Times of contact are reached when the projected distance between the two stars is equal to the sum of both radii (scaled by the semi-major axis), leading to the equation where we introduce the angles ϕ, which are the phase angles of the contact points as measured from zero and from π for the primary and secondary eclipse, respectively.In their exact form, Equations 6 and 7 do not have analytical solutions.Approximate analytical solutions do exist, but these are only valid for low eccentricities.
To be able to obtain accurate results for all eccentricities, we used an iterative form of Kepler's second law and let the algorithm solve it numerically for ψ: where ψ relates to the component ecos(ω) of the eccentricity.We write which depends on an estimate of esin(ω).For esin(ω) we write a formula with explicit dependence on inclination, as well as auxiliary angle ϕ 0 , which further simplifies with the assumption of small ϕ 0 and i = 90 • : introducing the notation τ as the time duration from conjunction to either contact point and the subscript τ n,m is used to indicate primary (n = 1) or secondary (n = 2) eclipse and first (m = 1) or last (m = 2) contact.Eclipse durations are denoted with d n here.We used formula 10 with i = 90 • and the approximate formula for the auxiliary angle, ϕ 0 , as in Kopal (1959): Angle ϕ 0 relates to the sum of the scaled radii in the following manner: with a the semi-major axis.After obtaining the components of the eccentricity, the algorithm uses an iterative procedure to improve the value of ϕ 0 by computing the sum of eclipse durations using Kepler's second law (Equation A.32) and Equation 7. A fast global minimisation method is used to find inclination, ratio of radii and ratio of surface brightness by comparing measured eclipse times and depths against the values computed using Equations A.32, 6, 7, and 14.All of the above assumes spherical stars of uniform brightness.Error estimates are made using a combination of analytical error formulae and the Monte Carlo approach of importance sampling.We provide more details for this step and the error formulae in Appendix C.0.5.
Fitting eclipse models to a light curve with no prior information usually does not work since the parameter space is degenerate and the optimisation is prone to getting stuck in local minima.Having obtained the parameters from the eclipse timings and depths in the previous step, our pipeline avoids this problem by being able to start close to the optimal solution.
We constructed a simple physical eclipse model by assuming spherical stars of uniform surface brightness and calculating the fraction of light lost due to one star covering the other, which we refer to as the eclipse model.We have not included light from a third body in this model.Subtracting the fractional light loss from one gives us the leftover normalised flux, l: where sb 1 and sb 2 stand for the surface brightness of stars 1 and 2, respectively.This formula gives the normalised flux for the primary eclipse: we have multiplied the right-hand fraction by sb 2 sb 1 to obtain the flux for the secondary eclipse.The covered area A covered is the result of two overlapping circles as a function of the distance between their centres s and is computed as follows: The separation between the two centres is expressed in terms of the orbital parameters and as a function of the phase angle (Kopal 1959): Article number, page 8 of 43 In the three equations above the stellar radii r i and the separation s are in units of the semi-major axis a, but this is left implicit for the simplicity of the formulae.The model is calculated at a regular interval of 0.001 rad in θ, which is converted into times using Kepler's law (Equation A.32).Linear interpolation is used to get the values at the times of the observations.This eclipse model is first fit to the data, starting from the previously obtained parameters, after which it is subtracted from the light curve and the residuals are pre-whitened analogously to the earlier steps.The obtained model of sinusoids, the linear model and the eclipse model are then optimised simultaneously.We implemented two approaches here: Markov chain Monte Carlo (MCMC) sampling with PyMC3 (Salvatier et al. 2016) and Scipy minimize with method L-BFGS-B as before (default option) 7 .Since MCMC can deal with high dimensionality, it gives the full correlation structure as an output and thus additionally provides statistical error estimates.However, these estimates are found to be small compared to our other error estimates (and are thus underestimated; see Section 3).This procedure is slower than the fitting method.We note that the orbital period and time of primary minimum are fixed and no longer free parameters at this point.As time-dependent parameters are not included in our model of eclipses at this moment, we recommend to perform the analysis of the light curve in parts.This can reveal possible time dependence of the parameters due to a changing orbit.

Remaining variability
The level of intrinsic variability compared to the eclipse depths is computed to get an insight into the strength of the presence of potential pulsations or the harmonic sinusoids that are not captured by the eclipse model.Standard deviations are computed for the residuals of four different regression models: an eclipse 7 For the fitting method, the sinusoids are again subdivided into groups.model with no extra sinusoids, a model with only the first and second harmonic sinusoids, a model with only non-harmonic sinusoids and the full model including all sinusoids.All four models include the linear trend and the eclipse model.Then the algorithm takes the ratio of primary and secondary eclipse depth to each of these four standard deviations.While we have not further explored the variability in this work, additional steps of analysis on the remaining variability could be easily integrated into the framework.Notes.Frequency and amplitude ranges differ on a case-by-case basis, so we state the ranges of minima and maxima over all cases.

Application to synthetic data
We produced a set of 100 synthetic test light curves with randomly sampled parameters, including orbital period, eccentricity, argument of periastron, inclination, scaled radii, surface brightness ratio, third light and a number of pulsations represented by sinusoids.Light curves of a single TESS sector (27.4 days) and a full year (13 sectors) at a 30-minute cadence were constructed to mimic some of the TESS time series.The EB light curve models were produced with ellc (Maxted 2016) and sine waves were added to represent the intrinsic variability of various frequencies and amplitudes, finally adding Gaussian noise to imitate real data.Table 1 summarises the ranges of input parameters considered.Appendix D.0.2 includes examples of some of the generated light curves; all light curves and their parameters are made available through the GitHub page of STAR SHADOW.
Due to the random nature of the generated parameters, some of the synthetic light curves contain no eclipses or have such shallow eclipses that they are impossible to spot by eye.These systems are marked in Figure 4 and later removed for a cleaner overview of the results.From the viewpoint of the analysis, there are several indicators for such difficult cases, most importantly the completion of all stages of the analysis.If the analysis was completed, the following parameters provide a handle on possible erroneous results: the number of eclipse cycles (period over time base) and depth of the secondary eclipse.We go into some more detail on both of these in the following.
The analysis algorithm was run with no prior knowledge of the orbital period, which is the most important parameter for the rest of the analysis to produce accurate results.We show the performance of the global period search in Figure 4.In all relevant cases (n cycles ≥ 2, visible eclipses), the algorithm was able to pick the correct multiple of the input period to better than 0.34% precision.More specifically, using the same selection criteria, the mean absolute deviation from the input period is 0.031%, with a median of 0.0009%.Further restricting our selection to cases that have more than ten eclipse cycles reduces those numbers to a mean of 0.0011% and median of 0.00048%.
As Figure 4 is sorted by the number of cycles, we can see the importance of this parameter as an indicator of the accuracy in the orbital period.This dependence on the number of cycles is taken into account in determining error values, and in the majority of cases, the estimated SLR errors include the input period value within one or two sigma.Figure 5 shows the difference between the measured and input period divided by the SLR errors (χ p ) in a histogram and kernel density estimation (KDE); the KDE is scaled such that its height is comparable to the histogram.The distribution has a broad peak at zero (width > 1), indicating accurate period measurements with estimates of their precision that are overall underestimating the true errors made.

Eccentricities
Further analysis depends on the secondary eclipse and thus its correct identification and the measurements of time of minimum, points of in-and egress and depth.Figure 6 shows the eccentricity measurements and inputs versus the depth of the secondary eclipse in the top panel and the deviation from the input in the bottom panel.The identification of the secondary becomes more difficult with decreasing secondary eclipse depth, as evidenced by the increasing scatter to the left in Figure 6.What actually causes the measurements to be more difficult for lower secondary depths is both the decrease in signal-to-noise and the ratio between the secondary eclipse depth and the amplitude of any intrinsic variability with high-frequency compared to the orbital frequency.Both of these affect the precision with which we can localise the eclipse.Intrinsic variability can additionally impact the accuracy by either shifting the eclipse position or outright causing the method to miss the second eclipse entirely in the most extreme of cases.We refer to  eclipse depths above 0.05 we see that the tangential components only constitute a fraction of the difference, up to about 0.03 in ecos(ω).
Deviations in eccentricity generally stay within 0.1.Although error estimates are representative in most cases, we also see cases with underestimated errors that are not explained by their secondary eclipse depth.We see a second maximum in the distribution of absolute differences between measured and input eccentricities (see Figure 7), between values of -0.2 and -0.1, underestimating the input eccentricities.Inspecting these cases, we find no overarching indicator for their deviation, as their eclipses are correctly identified.We do find their deviations to stem nearly completely from the radial component, esin(ω).This is caused by an inaccurate determination of eclipse start and end times.They are in some cases easily seen in their analysis output, but not in others (see Figures D.28 and D.31 for examples).Figure 8 shows the distribution of the true deviations relative to the estimated errors, χ e = (e measured − e input )/σ e , which has a large width (> 1), meaning errors are underestimated.The bi-modality of the former distribution has largely disappeared, which indicates that the errors are sufficiently large for the second maximum to be incorporated into the main distribution of points.The eccentricities obtained from translating the timings (not shown here; see Appendix D.0.1) and those obtained from the eclipse model (shown here) have a similar spread.The mean for both distributions lies at zero, suggesting no significant bias towards lower or higher values.
The MCMC method also provides error values (not shown here).These are generally smaller than the previous estimates and more often than not missing the input values.This shows that while the MCMC sampling provides an overview of the full correlation structure, it does not capture the full extent of the error; it only captures the statistical error made when fitting a model with no uncertainties to noisy data.Results obtained using the L-BFGS-B fitting method are qualitatively the same as those obtained through the MCMC sampling method.
Article number, page 11 of 43  Distributions for the deviations from the input of the parameters argument of periastron, inclination, sum of the scaled radii, ratio of radii, and ratio of surface brightness are given in Appendix D.0.1.We note a few key takeaway points from those figures.There is a good agreement between the overall distri-butions of the parameter values from translating the timings and those obtained after the eclipse model optimisation step.Two exceptions seem to be the radial component of eccentricity, where the distribution for the eclipse model is shifted slightly towards overestimating this parameter by about two percent, and the auxiliary angle ϕ 0 , which is slightly underestimated by the eclipse model by less than a percent.The tangential component of the eccentricity is accurate and precise with a distribution width of about 0.01, but errors in a minority of cases are underestimated, resulting in long tails for its distribution of true deviations relative to the estimated errors.The sum of the scaled radii shows good agreement with the inputs: its distribution is less than 0.05 in width, be it with a heavy tail on the high end.The orbital inclination shows a slightly wider spread, and no correlation is seen between inclination deviations and third light.The ratio of radii measurement performs worst overall: we attribute this to the fact that for grazing eclipses, and ignoring limb darkening, there is not enough information in the light curve to strongly constrain the individual stellar radii.We do note that the errors the for ratio of radii appear overestimated for the majority of cases, narrowing down the central peak in its distribution of deviations relative to the error estimates.The addition of limb darkening to our eclipse model may improve the predictive power in this area, but for this specific set of synthetic light curves, other variability would dominate over this effect in most cases.The surface brightness ratio performs better than the radii ratio on an absolute scale, but its error estimates are more underestimated and this gives it a poor result on the relative scale.
We also note that we see no correlation between ϕ 0 (or similarly sum of radii) and deviation in eccentricity across our parameter space, with maximum ϕ 0 = 0.63.The limitation of the spherical eclipse model is not a simple function of one parameter, but rather depends on the shape of the eclipses as well as on the shape of the light curve at out-of-eclipse phases.
The number of sinusoids used in generating the synthetic light curves is randomly generated as well as their parameters.It is not evident a priori that we would recover this number: iterative pre-whitening procedures tend to vary in the number of sinusoids found based on stopping criteria and other aspects (e.g.Van Beeck et al. 2021).Our approach is to rely on a statistical measure for the information held in the data (the BIC) during pre-whitening, and only select based on significance criteria at the very end.Our final light curve model constitutes an eclipse model, a linear trend and a sum of sinusoids.We now look only at those sinusoids in the final model that pass the significance criteria, which includes the signal-to-noise threshold.Most of the harmonics found in this filtered list contain the leftovers of the eclipse signal not captured by the eclipse model, like ellipsoidal variability and effects of limb darkening.Therefore, it makes sense that we do not see a correlation with the number of harmonics in this list and the number of input sinusoids.We subtract the number of harmonics (n h ) from the total number of filtered sinusoids (n) to arrive at the number of independent frequencies (n − n h ), which is shown in Figure 9.We find that this number strongly correlates with the input number of frequencies and that the results for longer time series tend to lie closer to the diagonal over all.
We take the square root of n − n h as the error estimate and compute the histogram and KDE for the deviations from the input value divided by the error to arrive at the distribution shown in Figure 10.When separated into cases with month-long and year-long light curves, the former have consistently slightly un-

Kepler EB Catalog
For the purpose of further testing the period-finding capability of STAR SHADOW against a vetted sample of real stars, we apply our methods to the 30-minute cadence light curves for the Kepler EB Catalog targets collected by Prša et al. (2011), Slawson et al. (2011), and Kirk et al. (2016). 8In addition we present the results for the eccentricities found for this dataset.Of the 2920 detrended light curves in the catalogue, all completed analysis at least up to and including the period measurement and for 2001 of those we obtained an eccentricity measurement with our analysis (for all morphologies in the catalogue).
Generally, we see a high level of agreement between the periods found by our method and the periods reported in the catalogue.The proportion of targets where the period is within one percent of the reported period is 80.5%.The amount of targets where the period is within a percent of half the reported period is 7.9%, while other close multiplicative factors together constitute 1.3%, adding to a total agreement level of 89.7%.Cases where half the period is designated as the orbital period consist of systems with a primary and secondary eclipse that look very similar, and thus create only weak evidence in Fourier space and in phase dispersion for the correct multiple of the period.Figure 12 shows the fractional differences between measured and catalogue periods, around the zero point: we see a very sharp central peak and wide but low tails extending to around 7e-5 that are imperceptible on the scale of the plot.In the subset with the correct period multiple, we see a mean absolute fractional difference of 0.0027% and a median of 0.00010%.We divide the differences from the catalogue period by our estimated errors to get the distribution shown in Figure 13.The strong central peak seen in the fractional differences remains but the wings of the distribution become more pronounced, indicating an underestimation of a subset of period errors.
The remaining 10.3% of cases that did not have a multiple of the catalogued period consist of three groups of light curves: (1) very low signal-to-noise cases (62%), ( 2) those where the binary signal is close to a pure sinusoid, which does not result in a strong series of harmonics to pick up on (36%), and (3) those from multiple systems that have more than one period reported in the catalogue; for these systems, the strongest periodic signal is found (correctly), but we then miss the other periods as we do not continue to search for signals caused by higher-order multiples (4%).
The residuals of the 2001 fully analysed light curves show a range in noise level (standard deviation) from 24 ppm to 8.8 ppt, with a median of 526 ppm and mean of 731 ppm.The signalto-noise, measured as the smallest eclipse depth divided by the noise level, has a minimum of 0.4, a median of 68, and a mean of 151.
Article number, page 13 of 43

Eccentricities
The Kepler EB Catalog does not provide eccentricity measurements to directly compare ours against.However, we can present our results in an eccentricity-versus-period plot that gives an idea of whether our measured eccentricity values are globally as we expect them to be.Generally speaking the orbital circularisation timescale is strongly dependent on the orbital period (Tassoul & Tassoul 1990), which means we would expect all short-period binaries to have zero or otherwise small eccentricity.Conversely we expect a wide distribution of eccentricities at the high period end as these systems are still at their birth configurations.
Figure 14 shows the eccentricity results for a selection of Kepler targets.The selection criteria are an eccentricity measurement error below 0.1 and a catalogue morphology parameter below 0.5, leaving 715 targets of the 2001 systems with an eccentricity measurement.The catalogue morphology parameter is an indicator of the level of detachment between the stars in the system, running from zero to one, with lower values being more detached.We show results that include all morphologies in Figure D.38.As an integral part of our methodology, we analyse the sinusoidal content in the light curve that is present in addition to the eclipses.To separate targets that are likely pulsating at independent frequencies, we compute the standard deviation of the residuals after subtracting the eclipse model and all harmonic sinusoids from the light curve and dividing this by the standard deviation of the residuals of the full light curve model.This is interpreted as a measure of signal-to-noise of the intrinsic variability at non-harmonic frequencies, and we select targets above a value of 6.These are shown in Figure 15.Grey markers indicate systems that were found to either be wrongly assigned to this selection of variables due to an ill-fitting eclipse model or have faulty eccentricity measurement due to the absence of one or both eclipses by manual inspection.
The overall picture is consistent with expectation: we see a buildup of systems at zero eccentricity below a period of some ten days, and a large range above that period.There are a few clear outliers among the lowest periods that have significant eccentricity measurements, the majority of which are due to an inaccurate identification or measurement of the secondary eclipse.As mentioned in Van Eylen et al. (2016), the measurement of the radial component of eccentricity, esin(ω), is less straightforward than that of the tangential component, ecos(ω).This is also apparent in our methodology, which becomes visible when we take a look at the measurements of the tangential component in Figure 16.Apart from the much smaller error estimates, the scatter around zero tangential eccentricity of the stars below ten days orbital period is tight.
The theory of close binary tides predicts the circularisation timescales of binary orbits (see e.g.Zahn 1975Zahn , 1977Zahn , 1989;;Zahn & Bouchet 1989;Goldreich & Nicholson 1977;Savonije & Papaloizou 1983, 1984).Circularisation is the result of dissipation mechanisms (tidal friction) in the stellar interior, which in turn decrease the orbital energy.Mechanisms include, but are not limited to, damping of the tidal deformation through turbulent flow in a convective region, and the damping of tidally forced oscillations through radiative diffusion.The state of circularisation for a single binary depends on its age, which is generally not straightforward to constrain.Instead, the timescales on which binaries of certain types are expected to circularise can be used to predict the distribution of eccentricities in a population of binary systems.As explained by Van Eylen et al. (2016), age is involved as a factor for high-temperature stars.Due to their much shorter lifespans, comparable to or shorter than the circularisa-   3) with the cut-off period set to 5 and 6.5 days to plot the dashed black and grey lines.Grey points indicate systems that do not belong in this selection or have otherwise faulty measurements.
Article number, page 14 of 43 Fig. 16.Measurements of the tangential component of eccentricity for a subset of targets from the Kepler EB Catalog.The subset results from cut-offs in eccentricity error (<0.1) and morphology parameter (<0.5), the latter being the more stringent criterion.
tion timescale, they have not had time to circularise even at short periods where tidal friction is strongest.Additionally, hot stars with radiative envelopes have a different mechanism for circularisation than cool stars with convective envelopes.Therefore, we expect hot stars to exhibit larger eccentricities at shorter orbital periods than cool stars.
The Kepler sample studied by Van Eylen et al. (2016), and revisited here, has a majority of cool stars.If we use the same differentiating temperature of 6250K only one-sixth of the selected eccentricity measurements remain in the high-temperature group.In that publication, the authors used the tangential component of eccentricity, ecos(ω) as a proxy for eccentricity, whereas we determine both components in our analysis.The two temperature groups look broadly consistent with each other in our measurements, due in part to outliers that may or may not be at correct measurement values.We looked in more detail at some of the clear outliers at low periods with eccentricity measurements at or above 0.1.Most of these measurements arise from misidentifications of noise as a secondary eclipse, or broad ellipsoidal variability as an eclipse (the Kepler EB Catalog includes singly eclipsing and even non-eclipsing systems with ellipsoidal variability).Nevertheless, there is a small number of correctly identified systems in this regime, around an eccentricity of 0.15 with well-fitting light curve solutions 9 : KIC 4544587, 7943535, 8196180, and 11867071 (orange points in Figure 14).Using the temperatures provided by the Kepler EB Catalog, all four of these systems fall in the high-temperature category, two of them having temperatures above 7000K.This seems to support the conclusion by Van Eylen et al. (2016), who reported an observed difference between the tangential eccentricity distributions of hot-hot binaries as compared to cool-hot and cool-cool 9 And one with a poorer light curve fit that has a high enough tangential component of eccentricity to be sure of its classification in this high eccentricity group.
binaries, where hot-hot systems are more eccentric at shorter periods.This is in broad lines consistent with theory.
We use a simple formula from Halbwachs et al. (2005, Equation 3), based on the strength of tides varying with distance as 1/r 6 (Lecar et al. 1976), to visualise the envelope of maximum eccentricity in Figure 14 more clearly.The cut-off period, or circularisation period, was set to the value reported by Van Eylen et al. (2016), calculated for cool stars.Apart from the outliers, both those misidentified and the hot stars discussed above, this envelope qualitatively fits the sample.We see the same extension of the low eccentricity cluster towards periods longer than 5 days as in the Van Eylen et al. (2016) sample, which may arise from dependences on age and/or temperature of the circularisation timescale.Our selection of systems with intrinsic variability at non-harmonic frequencies, Figure 15, contains 108 systems.Even though only a few systems constrain the maximum eccentricity envelope in this selection, we may visually fit a line at a slightly longer cut-off period of 6.5 days.This could be an effect of the low numbers, but it may be a physical effect if pulsations aid the redistribution of orbital angular momentum, increasing the effectiveness of tides.We will investigate this effect further in a future study based on a TESS sample with a factor of about 4 more systems.
The summarising catalogue of our Kepler analysis results is available at the CDS10 .The results are also made available through the Kepler EB Catalog main web page11 .

Conclusions
We have presented a novel photometric time series analysis methodology and its implementation, STAR SHADOW.Its aim is, in the first place, to analyse large numbers of EBs homogeneously, but it is also broadly applicable for computationally efficient iterative pre-whitening.The method has been tested against a set of synthetic light curves with known input parameters and high levels of intrinsic variability from sources other than eclipses.We find good levels of agreement between the input and extracted sinusoidal frequencies and amplitudes, as well as between the input and computed binary parameters.The accuracy and precision of the binary parameters vary, with the tangential component of eccentricity being the most accurate and precise and the radial component approximately an order of magnitude less well constrained.This is unsurprising as most of the information about the radial component is in radial velocity curves.The number of extracted frequencies after the subtraction of orbital harmonics and after filtering with significance criteria strongly correlates with the number of input sinusoids for the one-year-long simulated light curves.The number is underestimated for shorter light curves of one month durations.
We further tested the period-finding capabilities of our implementation against the Kepler EB Catalog and examined the eccentricities obtained for this sample.The method finds the correct orbital period multiple, or half the orbital period, in 90% of cases, showing its capability to be used with minimal predetermined knowledge of the data.Cases where it finds half the period are those where the primary and secondary eclipses are nearly identical.Period error estimates are found to be appropriate for the vast majority of targets, with heavy tails on the distribution of the deviation over the error revealing an underestimation for a fraction of targets.
A few of our Kepler eccentricity determinations were compared to results obtained during an in-depth analysis by Kjurkchieva & Vasileva (2015, 2018) as a benchmark.Out of 13 targets, all but two eccentricity values agree within two sigma of our error estimates.The mean absolute deviation in eccentricity is 0.015, giving us confidence in the precision of the method for this parameter.We plot the eccentricities and the two components of eccentricity of our result against those found in the literature in separate panels of Figure 17.Inspecting the output of our methodology, we find one clear outlier (KIC 8111622) for which the secondary eclipse is misidentified; this measurement is indicated in red.The reason for this misidentification is a very narrow eclipse that is also shallow; because of this, determining its positioning based on the time derivatives of the time series (limited to low frequencies) is difficult.The linear regression coefficient of the eccentricities, excluding this outlier, is 1.00±0.41.In the bottom panel of Figure 17, the diagonal with a negative slope indicates the position where points are expected if the definition of the primary and secondary eclipse is swapped.The one system where this is the case (KIC 6949550) happens to have equally deep eclipses, so assigning the tag primary or secondary to its eclipses is an ambiguous choice.
The eccentricities obtained for the Kepler targets generally follow the expected pattern as a function of the orbital period.Below around ten days, eccentricities are broadly consistent with zero, and above ten days there is a wide distribution of higher eccentricities.We do see a few outliers that are caused by a misidentification of the secondary eclipse, due to a combination of its low eclipse depth and the presence of other variability in the out-of-eclipse part of the light curve.Our methodology was developed with the TESS mission in mind, since our aim is to apply it to the sample of EBs from IJspeert et al. (2021).
We qualitatively compared our Kepler results to the work by Van Eylen et al. (2016), who analysed their own selection of Kepler EBs to obtain the tangential part of the eccentricities.Our results support their conclusion that, at low orbital periods, hot stars (>6250K) have higher eccentricities than cool stars, which is broadly consistent with theory.Our eccentricity distribution (for cool stars) looks qualitatively consistent with the cut-off period of 5 days computed by Van Eylen et al. (2016) for cool stars with convective envelopes.
The average run time of the analysis was ten minutes for one-year time base simulated light curves, with an average of two minutes of that spent on the frequency analysis.For the Kepler light curves, the recorded average time to completion is just over two hours, including one hour for frequency analysis.The scaling with the time series length (four years for Kepler) is not linear since the number of extracted sinusoids also increases with the time base.This rapid performance is achieved through various optimisations in the algorithm of iterative pre-whitening and multi-sinusoid non-linear optimisation; the main optimisations are further described in Appendix C. We encourage others to implement a gradient-based approach for sinusoid optimisation for the free performance boost it gives, in order to save on both time and energy.
Notes.The methods of the function scipy.optimize.minimizehave different features and requirements in terms of using the Jacobian or Hessian matrix and in being able to handle bounds or constraints.
Although it is not strictly necessary for every minimisation step in the algorithm, we do want to impose boundaries on our parameters.We tested methods that have the option to numerically compute gradients with and without the analytical Jacobian to see what the performance gain would be.Both Powell and Sequential Least SQuares Programming (SLSQP) (without Jacobian) are more than an order of magnitude faster than NM, but both diverge from the true solution instead of converging on it.SLSQP becomes slightly faster when the Jacobian is provided and diverges slightly less dramatically.Method trustconstr (without Jacobian) takes a factor of a few longer than NM, while reaching worse cost-function values.However, when provided with the Jacobian it redeems itself by becoming faster than NM by a factor of 2 and obtaining the same cost-function value.Unfortunately providing the Hessian as well makes it much slower again; this could be due to the high computational cost of the Hessian function.Methods Conjugate Gradient (CG), Broyden-Fletcher-Goldfarb-Shanno (BFGS), L-BFGS-B, and Truncated Newton Constrained (TNC) perform similar to NM in terms of cost function, but only BFGS and L-BFGS-B are faster by a noticeable factor of a few without the Jacobian.CG does not benefit time-wise from the Jacobian, but the three others do: they reach similar speeds of more than an order of magnitude faster than NM.Also, Newton-CG, which required the Jacobian to function, falls into the same performance category as the three others in terms of both cost function and speed.It does not benefit meaningfully from the Hessian.All four of the methods that require the Hessian take a longer time than NM, by a factor of more than 2. Since CG, BFGS, and Newton-CG cannot handle bounds or constraints, L-BFGS-B and TNC go on to a second round of testing.Constrained Optimization by Linear Approximation (COBYLA) also makes it to the second round by being fast and only marginally worse in terms of the cost function, and its lack of support for bounds can be mitigated by its support for constraints.
In a more demanding test with a longer light curve, the COBYLA method (which does not use a Jacobian) shows com-parable speed to L-BFGS-B and TNC (with Jacobian), but diverges and does not return a sensible result.L-BFGS-B and TNC (both compared with and without Jacobian) perform very similarly: the final choice for L-BFGS-B is made based on its faster speed, and even though the cost-function value it reaches is not quite as good, the difference in the actual light curve model is imperceptibly small.Appendix C.0.3: Analytical gradients As mentioned above, the optimisation makes good use of analytical gradient information.We include their mathematical form here.Our objective function is the negative log-likelihood: with symbol definitions as in Appendix A. The Jacobian is defined as where each θ j stands for one of the model parameters.Using the generalised product rule the derivatives can be written as Making the derivatives explicit, we get ∂ŷ i ∂ f j = 2π(t i − t mean )a j cos 2π f j (t i − t mean ) + ϕ j , (C.6) And for the orbital period, in which we are now summing over all harmonic sinusoids.We note that for the eclipse model parameters we cannot compute gradients analytically.However, this does not prevent us from using the gradients that we do know during the optimisation of the eclipse model plus sinusoids.We use the scipy.optimize.approx_fprimeto obtain numerical gradients for the 6 parameters of the eclipse model and fill those in at their respective locations in the Jacobian.This provides a similar, although smaller, speed increase as for the pure sinusoid optimisations.
Appendix C.0.4: Parameter space The parameter space we work in for the six free parameters in our eclipse model is made up of ecos(ω), esin(ω), cos(i), ϕ 0 , log 10 r 2 r 1 , and log 10 sb 2 sb 1 .This choice of parameters ensures two things: minimal correlation between the free parameters and a flatter space across the relevant domain.In the case of ecos(ω), esin(ω), and ϕ 0 both of these reasons apply, while the reduction of correlation is the driving factor compared to their counterparts e, ω and r sum .Reducing correlation has a strong effect on the ability of optimisation algorithms to accurately and quickly converge.Another benefit is that our correlation structures, and thus error estimates, in the least correlated parameter space are most reliable and simple to interpret.
Parameters r 2 r 1 and sb 2 sb 1 have an extremely curved parameter space in the sense that a step of 0.1 has a completely different meaning depending on where in the parameter domain we are situated.Using logarithms, log 10 r 2 r 1 and log 10 sb 2 sb 1 , has two main advantages that are similar to the ones mentioned for the reduction in correlation.It makes it simpler for many optimisation algorithms to traverse the parameter space for a more accurate and sometimes speedier result, as well as improving the reliability and interpretability of the error estimates (in that parameter space).
Appendix C.0.5: Translating eclipse timings to eccentricity Here, we describe the steps for obtaining physical parameters from time measurements in more detail.
Although a multitude of approaches was tested, we settled on one that provides the best combination of speed and accuracy.The initial step is to set cos(i) = 0 and compute the approximate ϕ 0 with Equation 12. Sufficient accuracy is retained when solving Equation 8 and plugging in the approximate Equations 9 and 10.Iteratively updating ϕ 0 using exact formulae for the eclipse durations does, however, provide a noticeable improvement in the measurement of the sum of the scaled radii through Equation 13.
To find inclination, ratio of radii and ratio of surface brightness we construct a cost function based on iteratively solving Equations 6 and A.33 and plugging the obtained angles into 5 and then A.32 to obtain accurate theoretical eclipse times, and computing theoretical depths with Equation 14. Minimisation is done with a global optimiser from Scipy, called simplicial15 homology global optimisation (SHGO), with reasonable boundaries on the parameters for feasible systems.
Minimisation is performed in the parameter space: cos(i), log 10 r 2 r 1 , log 10 sb 2 sb 1 .This has the benefit of homogenising the step size across the parameter space, where especially the two parameter ratios suffer from a large difference in step size on either end of their physical ranges.Since we also compute components of eccentricity ecos(ω) and esin(ω), we also need to convert them to e and ω: where code-wise an arctan2 function is used instead of arctan to obtain correct angles across the full range.ϕ 0 is converted to the sum of radii with Error estimation was done with a combination of analytical error formulae and the Monte Carlo approach of importance sampling.The analytical formulae are used as minimum error estimates, but we do not have approximate analytical formulae for all parameters.Furthermore, for the analytical error formulae, we do not have an a priori value for the inclination error, so we estimate one that is informed by the tests on synthetic light curves.Starting with an error estimate of 0.035 radians in inclination (2 degrees), we use the following formulae as errors on the given subset of parameters: where we abbreviated the partial derivatives: where the sum goes over primary (m = 1) and secondary (m = 2) ingress (n = 1) and egress (n = 2).
σ e = cos 2 (w)σ 2 ecos(w) + sin 2 (w)σ 2 esin(w) , (C.14) The error formulae for esin(w) and ϕ 0 are found lacking in an earlier version of our synthetic test (Section 3), which likely is because they both rely on the measurement of the widths of the eclipses, which can be inaccurate and is more easily disturbed by additional variability.Based on the same test, we constructed simple scaling relations that apply to the minimum error in the concerned parameters.Their functional form is where the constant C is 0.2 for esin(ω) and 0.05 for ϕ 0 .The specific scaling variable used in the exponent was found to correlate strongest with absolute deviation from the input of these parameters (correlation of -0.45 for esin(ω)), much more so than secondary eclipse depth alone (correlation of -0.24 for esin(ω)).
The inclusion of the depth of the secondary eclipse is intuitive as shallower secondary eclipses are harder to measure.The other parameter is less intuitive: it is the sum over amplitudes of low and high-frequency harmonics.The sum actually goes over harmonic orders up to and including 5, and harmonic orders above 15.Dependence on the higher frequency harmonics may be understood in terms of the presence of additional variability in the light curve, which reduces the accuracy of the measurements of eclipse edge points.The dependence on the low-frequency harmonics is likely to do with a stronger presence of these harmonics, indicating either very wide eclipses or out-of-eclipse variability, which cross over into the regime where the simplified physics of the applied formulae starts to lose accuracy.
To obtain minimum error values for the two ratios and their logarithms, we compute a scale factor by dividing the computed value for σ e by the error estimate obtained for e in the importance sampling (explained below).We take the maximum of the upper and lower error value for this, and multiply the importance sampling errors of the logarithm of radius ratio and the logarithm of surface brightness ratio, again taking the maximum of upper and lower error value.Finally, computing the relative error scaling in log space, and multiplying by the respective parameters in linear space, we obtain the minimal errors for both representations of these two parameters.
In the importance sampling, we generated input normal distributions of a thousand samples for the measured values of eclipse times (minima, contact, and tangency) within their respective error regions and computed the output of the iterative 'translation' scheme outlined above for each of the generated input vectors.This results in a set of output distributions for each computed parameter.Estimates of the errors are then obtained using the highest density interval at 0.683 probability.We note that the resulting error region can exclude the point estimate made before.This is fixed by imposing the minimum errors.The choice not to use the mean of the distribution as a point estimate is made because the distributions are often multi-modal and may not accurately represent the optimal solution.
Article number, page 27 of 43                               The model of sinusoids is also plotted separately in red here.The model starts from an overestimated primary eclipse width, and the fit is unable to fully correct for this as the residuals show.This leads to and underestimated tangential eccentricity by about 0.2 for this case.

Fig. 2 .
Fig. 2. Typical eclipse signatures in the derivatives of the light curve.The left column shows a flat-bottomed eclipse resulting in two separated features in the derivatives, while the V-shaped eclipse on the right leads to peaks merged into one feature.Adapted with permission fromIJspeert et al. (2021)

Fig. 3 .
Fig.3.Typical eclipse signatures in the derivatives of the light curve, with markers pointing to important features for the automated eclipse detection and timing measurement.Note that points 7 in the left panel of first derivatives are not truly local minima but marked as such for illustrative purposes.Adapted with permission fromIJspeert et al. (2021)

Fig. 4 .
Fig.4.Performance of the period search algorithm as a function of the number of orbital cycles.The blue points with error bars represent the fractional difference between the measured and the input orbital period with SLR error estimates, sorted by descending number of cycles.The grey line with points shows the number of cycles, calculated as the period divided by the time base, plotted on the right axis.The period error scales strongly with the number of cycles.Note that the number of cycles for cases ∼60 to 100 is close to one, and cases that are shaded grey and hatched do not show eclipses.

Fig. 5 .
Fig. 5. Histogram and KDE of the orbital period χ values ((P measured − P input )/σ P ), using the SLR uncertainties.The KDE is scaled to the height of the histogram.

Fig. 6 .
Fig. 6.Eccentricity results for the synthetic test cases.Top panel: Input (grey) and measured (coloured) eccentricities.Bottom panel: Deviation from the input eccentricity against the secondary eclipse depth.The scatter in the bottom panel is seen to increase to the left, with decreasing eclipse depth, as the eccentricity depends critically on the identification and measurement of the secondary eclipse.
Fig. 7. Histogram and KDE of the differences between measured and input eccentricity.Next to the sharp peak at zero, and within the 0.1 deviation, there is a relatively small second maximum where eccentricities are underestimated by between -0.2 and -0.1.The KDE is scaled to the height of the histogram.

Fig. 8 .
Fig. 8. Histogram and KDE of the eccentricity χ values ((e measured − e input )/σ e ), using the error formula uncertainties.The bi-modality seen in the absolute differences has largely disappeared.The KDE is scaled to the height of the histogram.

Fig. 9 .
Fig. 9. Number of extracted independent sinusoids (n − n h ) versus the input number of sinusoids put into the light curve (n input ).Harmonic sinusoids are filtered out as they capture residuals related to the eclipse model that are unrelated to the number of input sinusoids.Points are divided into month-long time series (blue) and year-long time series (orange), and the error bars are calculated as the square root of the number n − n h .

Fig. 10 .
Fig. 10.Histogram and KDE of the number of sinusoids χ values.Errors are taken to be the square root of the amount (n − n h ).The histogram is split into light curves of a month in length and those of a year in length: the shorter time base tends to lead to an underestimation of the number of sinusoids present.The KDE is scaled to the height of the histogram.

Fig. 11 .
Fig. 11.Histogram and KDE of the frequency χ values (( f measured − f input )/σ f ) for one case of the synthetic light curves.Errors are calculated with the standard formulae.The KDE is scaled to the height of the histogram.

Fig. 12 .
Fig. 12. Histogram and KDE of the fractional difference between the Kepler catalogue orbital periods and those measured in this work.Here we select the periods within 1% of the catalogue values.The KDE is scaled to the height of the histogram.

Fig. 13 .
Fig. 13.Histogram and KDE of the orbital period χ values ((P measured − P catalogue )/σ P ) comparing the Kepler catalogue values to those measured in this work.We select the periods within 1% of the catalogue values.The relatively pronounced wings of the distribution indicate underestimated period errors for a portion of systems.The KDE is scaled to the height of the histogram.

Fig. 14 .
Fig. 14.Eccentricity measurements for a subset of targets from the Kepler EB Catalog.Selections were made in eccentricity error (<0.1) and morphology parameter (<0.5), the latter being the more stringent criterion.We use the maximum eccentricity as a function of orbital period in Halbwachs et al. (2005, Equation3) with the cut-off period set to 5 days to plot the dashed line.Orange points indicate four systems of interest, with well-fitting models.

Fig. 15 .
Fig. 15.Eccentricity measurements for a subset of targets from the Kepler EB Catalog.Selections were made in eccentricity error (<0.1) and morphology parameter (<0.5), and measure of signal-to-noise of the intrinsic variability at non-harmonic frequencies (>6).We use the maximum eccentricity as a function of orbital period in Halbwachs et al. (2005, Equation3) with the cut-off period set to 5 and 6.5 days to plot the dashed black and grey lines.Grey points indicate systems that do not belong in this selection or have otherwise faulty measurements.

Fig
Fig. B.2. 'Harmonic' part of the analysis.The orbital period is determined (if not given), and the orbital harmonics found in the extracted list of frequencies are coupled to the orbital frequency, after which another non-linear optimisation is done.This step is broadly applicable to EBs and other light curves of periodic, non-sinusoidal signal that results in a series of harmonics in the periodogram.The second to last panel shows the orbital harmonic sinusoid model in orange and the bottom panel shows all non-harmonic sinusoids.Article number, page 22 of 43

Fig
Fig. D.1.Performance of the period search algorithm as a function of the primary eclipse depth.Blue points with error bars show the fractional difference between the measured and the input orbital period with SLR error estimates, sorted by descending primary eclipse depth.The grey line with points shows primary eclipse depths, plotted on the right axis.Note that cases shaded grey and hatched do not show eclipses, and cases shaded orange have fewer than three eclipse cycles.

Fig. D. 2 .
Fig. D.2.Histogram and KDE of the eccentricity deviations from the input.

Fig. D. 3 .
Fig. D.3.Histogram and KDE of the eccentricity χ values (deviation from the input divided by the error estimate).

Fig
Fig. D.4.Histogram and KDE of the argument of periastron deviations from the input.

Fig. D. 5 .
Fig. D.5.Histogram and KDE of the argument of periastron χ values (deviation from the input divided by the error estimate).

Fig
Fig. D.6.Histogram and KDE of the inclination deviations from the input.

Fig. D. 7 .
Fig. D.7.Histogram and KDE of the inclination χ values (deviation from the input divided by the error estimate).

Fig
Fig. D.8.Histogram and KDE of the sum of scaled radius deviations from the input.

Fig. D. 9 .
Fig. D.9.Histogram and KDE of the sum of scaled radius χ values (deviation from the input divided by the error estimate).

Fig
Fig. D.10.Histogram and KDE of the ratio of radius deviations from the input.

Fig
Fig. D.11.Histogram and KDE of the ratio of radius χ values (deviation from the input divided by the error estimate).

Fig
Fig. D.12.Histogram and KDE of the ratio of surface brightness deviations from the input.

Fig
Fig. D.13.Histogram and KDE of the ratio of surface brightness χ values (deviation from the input divided by the error estimate).

Fig
Fig. D.14.Histogram and KDE of the tangential component of eccentricity deviations from the input.

Fig
Fig. D.15.Histogram and KDE of the tangential component of eccentricity χ values (deviation from the input divided by the error estimate).

Fig
Fig. D.16.Histogram and KDE of the radial component of eccentricity deviations from the input.

Fig
Fig. D.17.Histogram and KDE of the radial component of eccentricity χ values (deviation from the input divided by the error estimate).

Fig
Fig. D.18.Histogram and KDE of the cosine of inclination deviations from the input.

Fig
Fig. D.19.Histogram and KDE of the cosine of inclination χ values (deviation from the input divided by the error estimate).

Fig
Fig. D.20.Histogram and KDE of the auxiliary angle, ϕ 0 , deviations from the input.

Fig
Fig. D.21.Histogram and KDE of the auxiliary angle, ϕ 0 , χ values (deviation from the input divided by the error estimate).

Fig
Fig. D.22.Histogram and KDE of the logarithm of the ratio of radius deviations from the input.

Fig
Fig. D.23.Histogram and KDE of the logarithm of the ratio of radius χ values (deviation from the input divided by the error estimate).

Fig
Fig. D.24.Histogram and KDE of the logarithm of the ratio of surface brightness deviations from the input.

Fig
Fig. D.25.Histogram and KDE of the logarithm of the ratio of surface brightness χ values (deviation from the input divided by the error estimate).

Fig
Fig. D.26.Tangential part of eccentricity (top panel) and deviation from input (bottom panel) versus secondary eclipse depth.

Fig
Fig. D.27.Inclinations (top panel) and deviation from input (bottom panel) versus third light (fraction of total light).Note the absence of an apparent correlation between the accuracy of inclination and third light.

Fig
Fig. D.28.Example (full) synthetic light curve of case 7. The top panel shows the final eclipse model of the eclipses in red and the full model of the light curve including sinusoids in grey.The bottom panel shows the residuals of subtracting the eclipse model (blue) and those of subtracting the full model (orange).The model of sinusoids is also plotted separately in red here.The slight overestimation of the primary eclipse width can be identified by sharp residual peaks here and leads to an underestimation of the radial part of eccentricity of about 0.2 for this case.

Fig
Fig. D.29.Fourier amplitude spectrum of case 7.In red and pink are the extracted sinusoids, of which the red ones qualify as harmonics.At an orbital period of 16.56, the spacing of harmonics is quite dense (grey grid lines).

Fig
Fig. D.30.Fourier amplitude spectrum of case 7 after subtracting the eclipse model.In red are the extracted sinusoids that pass all criteria, including the signal-to-noise threshold shown in green (calculated in a window of 1d −1 ).

Fig
Fig. D.31.Example synthetic light curve of case 99 folded over the orbital period.The top panel shows the final eclipse model of the eclipses in red and the full model of the light curve including sinusoids in grey.The orange model is the starting point for the light curve fit.The bottom panel shows the residuals of subtracting the eclipse model (blue) and those of subtracting the full model (orange).The model of sinusoids is also plotted separately in red here.The model starts from an overestimated primary eclipse width, and the fit is unable to fully correct for this as the residuals show.This leads to and underestimated tangential eccentricity by about 0.2 for this case.

Table 1 .
Ranges of synthetic light curve parameters.