Photometric detection of internal gravity waves in upper main-sequence stars. III. Comparison of amplitude spectrum fitting and Gaussian process regression using celerite2

Studies of massive stars using space photometry have revealed that they commonly exhibit stochastic low-frequency (SLF) variability. This has been interpreted as being caused by internal gravity waves (IGWs) excited at the interface of convective and radiative regions within stellar interiors, such as the convective core or sub-surface convection zones. We aim to compare the properties of SLF variability in massive main-sequence stars observed by the TESS mission determined by different statistical methods, and confirm the correlation between the morphology of SLF variability and a star's location in the HR diagram. From a sample of 30 massive stars observed by TESS, we compare the resultant parameters of SLF variability, in particular the characteristic frequency, obtained from fitting the amplitude spectrum of the light curve with those inferred from fitting the covariance structure of the light curve using the celerite2 Gaussian process (GP) regression software and a damped SHO kernel. We find a difference in the characteristic frequency obtained from the amplitude spectrum fitting and from fitting the co-variance structure of the light curve using a GP regression with celerite2 for only a minority of the considered sample. However, the trends among mass, age and the properties of SLF variability previously reported remain unaffected. GP regression is a useful and novel methodology to efficiently characterise SLF variability in massive stars compared to previous techniques used in the literature. We conclude that the correlation between a star's SLF variability, in particular the characteristic frequency, and its location in the HR diagram is robust for main-sequence massive stars. There also exists a distribution in the stochasticity of SLF variability in massive stars, which indicates that the coherency of SLF variability is also a function of mass and age in massive stars.


Introduction
The treasure trove of time series photometric data assembled by space telescopes including CoRoT (Auvergne et al. 2009), Kepler/K2 Koch et al. 2010;Howell et al. 2014) and TESS (Ricker et al. 2015) in the last decade has revealed a remarkable range of diverse variability mechanisms in massive stars (Bowman 2020). A recent discovery is that essentially all early-type main-sequence stars show stochastic lowfrequency variability (SLF) in time series photometry (Bowman et al. 2019a(Bowman et al. ,b, 2020, in addition to other variability mechanisms including coherent heat-driven pulsation modes, rotational modulation, and the photometric signatures of binarity (Degroote et al. 2009a;Blomme et al. 2011;Pedersen et al. 2019;Burssens et al. 2020). Similarly, SLF variability has also been reported in later evolutionary stages of massive stars (see e.g., Bowman et al. 2019b;Dorn-Wallenstein et al. 2019, 2020Nazé et al. 2021;Elliott et al. 2022;Lenoir-Craig et al. 2022), which shows similar morphologies to main sequence stars. It currently remains unclear which of the several plausible mechanisms that can generate SLF variability dominate in particular parameter regimes. Hence the investigation of SLF variability has opened up a novel method to understanding the interiors of massive stars (Bowman 2020).
Coupled with spectroscopy, the analysis of time series photometry is able to extract detailed and precise constraints on the physics of stellar interiors thanks to asteroseismology Aerts 2021). In early-type stars, a heat-engine excitation mechanism operating in the iron opacity bump at 200 000 K gives rise to pressure (p) and gravity (g) modes (Dziembowski Article Miglio et al. 2007;Szewczuk & Daszyńska-Daszkiewicz 2017), which probe predominantly the envelope and near-core physics, respectively. The analysis of coherent pulsations has revealed the interior rotation profiles and a diversity in the amounts of convective boundary and envelope mixing profiles in late-B stars (Degroote et al. 2009b(Degroote et al. , 2010Daszyńska-Daszkiewicz et al. 2013;Moravveji et al. 2015Moravveji et al. , 2016Szewczuk & Daszyńska-Daszkiewicz 2018;Szewczuk et al. 2022;Pedersen et al. 2021;Michielsen et al. 2021;, and tight constraints on differential radial rotation profiles and (core) masses and ages for late-O and early-B stars (Aerts et al. 2003;Pamyatnykh et al. 2004;Dziembowski & Pamyatnykh 2008;Burssens et al. 2022).
Similarly to asteroseismology of coherent pulsation modes, the exploitation of SLF variability in time series photometry, which is seen across a wide range in mass, age and metallicity in massive stars, offers us the opportunity to improve our understanding of stellar structure and evolution (Bowman 2020). In their original methodology and characterisation study, Bowman et al. (2019a) used a sample of 35 early-type stars observed by the CoRoT mission to demonstrate how scaling laws based on solar granulation are inconsistent with the observed SLF variability 1 (Paper I). Based on a much larger sample of more than 160 massive stars observed by Kepler/K2 and TESS, which included metal-rich galactic stars and metal-poor stars in the Large Magellanic Cloud (LMC), Bowman et al. (2019b) demonstrated that a common morphology in the SLF variability is observed for essentially all stars with M 3 M . The SLF variability was determined to be dependent on the brightness of the star and by extension also its luminosity, and hence its location in the Hertzsprung-Russell (HR) diagram. More recently, Bowman et al. (2020) used a sample of galactic O and B stars observed by the TESS mission and combined it with high resolution spectroscopy (Simón-Díaz et al. 2011;Burssens et al. 2020) to show that the morphology of SLF variability is correlated with a star's mass, age, and amount of macroturbulence (Paper II).
The detection of SLF variability in massive stars has inspired various theoretical works to explain this new type of observational signal. Motivated by the omnipresence of a convective core during the main-sequence evolution of a massive star led Aerts & Rogers (2015) to postulate that SLF variability observed at the surface of three O-type stars could be explained by convectively driven internal gravity waves (IGWs) excited by core convection. This conclusion was supported based on the qualitatively similar frequency spectra observed for these stars with those predicted from 2D hydrodynamical simulations (Rogers et al. 2013;Rogers 2015;Rogers & McElwaine 2017). More recent 2D and 3D simulations of core convection with different numerical setups also show that IGWs excited by core convection produce detectable velocity and temperature perturbations (Edelmann et al. 2019;Horst et al. 2020;Vanon et al. 2022;Herwig et al. 2022; Thompson et al. 2022). On the other hand, other studies have concluded that the cause of SLF variability in massive stars is IGWs excited by sub-surface convection zones (Cantiello et al. 2009 since IGWs excited from core convection are likely damped before they reach the surface (Lecoanet et al. 2019. Using simulations the dynamics of turbulent convection in massive star envelopes has also been shown to be an important cause of SLF variability (Jiang et al. 2015(Jiang et al. , 2018Schultz et al. 2022). However, the efficiency of convection in sub-surface zones, and hence the driving 1 Sometimes referred to as a low-frequency power excess or 'red noise' in the literature. of IGWs from these regions, is strongly metallicity dependent (Jermyn et al. 2022). Finally, another plausible mechanism for SLF variability, which is especially important for evolved (i.e post-main sequence) massive stars, is clumping in their optically thick winds (see e.g. Krtička & Feldmeier 2018;Krtička & Feldmeier 2021).
It remains an open question as to which excitation mechanism(s) dominates in specific parameter regimes of the HR diagram, as the properties of (sub-surface) convection, winds, and the driving and propagation of IGWs depend on stellar properties such as mass, radius, metallicity and rotation. What is clear is that a mechanism that aims to unanimously explain SLF variability in main-sequence early-type stars must do so for stars that span a broad range of masses (i.e. 3 M 100 M ), ages from the zero-age main sequence (ZAMS) to the terminal-age main sequence (TAMS) and beyond, rotation rates between essentially zero and close to critical, and for both metal-poor (i.e. Z ≤ 0.002) and metal-rich stars (i.e. Z ≥ 0.014). A mechanism must also explain the broad frequency range of SLF variability and how its properties correlate with mass and age (Bowman et al. 2019b. In this paper, we compare the difference in the measured properties of SLF variability when using different fitting methods: directly fitting the amplitude spectrum of the light curve as previously performed by Bowman et al. (2020), and covariance structure fitting of the light curve using the celerite2 Gaussian process (GP) regression software package (Foreman-Mackey et al. 2017;Foreman-Mackey 2018;Foreman-Mackey et al. 2021). Our goal is to demonstrate a more efficient method for extracting the characteristic frequency of the SLF variability for a large number of massive stars observed by the ongoing TESS mission. In section 2, we outline the approach to using GP regression and briefly describe the stellar sample. In section 3, we present the results of our comparison and we conclude in section 4.

Method
Given that SLF variability observed in massive stars is inherently a representation of a stochastic process, it is difficult to parametrise it with a physically motivated function in the time domain. This has led previous studies to characterise SLF variability instead in the Fourier domain (Blomme et al. 2011;Bowman et al. 2019aBowman et al. ,b, 2020Dorn-Wallenstein et al. 2019, 2020Nazé et al. 2021;Elliott et al. 2022). However, there are various statistical tools and methodologies, such as GP regression (Rasmussen & Williams 2006;Aigrain & Foreman-Mackey 2022), low-order linear stochastic differential equations (Koen 2005), and continuous-time autoregressive moving average models (Kelly et al. 2014), that yield a model for stochastic signals in the time domain based on the statistical properties of time-series data.

Previous studies parameterising SLF variability
Previous efforts in parameterising SLF variability in massive stars have used a semi-Lorentzian function to fit the amplitude spectrum of the light curve with a from similar to: where α 0 represents the amplitude at a frequency of zero, γ is the logarithmic amplitude gradient, ν char is the characteristic fre-quency, and C w is a frequency-independent (i.e. white) noise term (Bowman et al. 2019a). We note that C w is a parameter used to evaluate the white noise floor when fitting an amplitude spectrum, which is set by Poisson and instrumental errors and is not part of the spectrum of SLF variability. Hence, stars can have significantly different C w values dependent on the quality of their light curves. Typically, to avoid potential bias in the resultant fit when comparing various stars using Eqn. (1), the light curves of massive stars have undergone iterative pre-whitening to remove dominant periodicities a priori. This means that periodic variability caused by coherent pulsation modes or rotational modulation is removed from the light curve by means of subtracting sinusoids with optimised amplitudes, frequencies and phases to create a residual light curve (Bowman et al. 2019a;. This is particularly important for B-type stars, such as β Cep and SPB stars, which are commonly multiperiodic pulsators in addition to having SLF variability (Bowman et al. 2019b;Burssens et al. 2020). On the other hand, a twostep process of pre-whitening pulsation modes a priori and determining their frequencies, amplitudes, and phases may yield differences in the inferred morphology of the SLF variability compared to fitting the coherent modes and SLF variability simultaneously. Investigating the impact of this is beyond the scope of the current paper but will be explored in future work.
The fitting of the amplitude spectrum in previous studies has been achieved using a Markov chain Monte Carlo (MCMC) numerical framework (e.g. using the python emcee package; Foreman-Mackey et al. 2013, 2019c. It was used to sample the posterior probability distribution to determine the optimum parameters for Eqn. (1) using the amplitude spectrum of a star's light curve as observables. Yet the calculation of an amplitude spectrum and subsequent MCMC numerical framework is relatively computationally expensive and is also potentially quite sensitive to the explicit algorithm used to transform the light curve data from the time domain into the frequency domain. For example, the choices for the frequency over-sampling factor, frequency range, normalisation constant (e.g. the number of data points and/or the data set length when using the numerical implementation of a (discrete) Fourier transform (e.g . Deeming 1975;Kurtz 1985) or (variant of) a Lomb-Scargle periodogram (e.g. Scargle 1982;Press & Rybicki 1989) should ideally not impact the results. Not all studies in the literature make comparable choices -see discussion by . Moreover, differences in approaches to gap-filling a light curve to maximise its duty cycle, remove aliases and aim to minimise the noise floor in the amplitude spectrum, but different forms of post-processing data such as smoothing the amplitude spectrum prior to fitting may lead to different results. Although useful, these amplitude spectrum fitting methods implicitly include assumptions for the signal and noise components of a light curve and can affect subsequent (asteroseismic) inference.

Advantages of GP regression
The application of GP regression has been highly successful in modelling radial velocity time series data, the light curves of transiting exoplanet host stars, and stellar variability including spot modulation in low-mass stars (Brewer & Stello 2009;Barclay et al. 2015;Grunblatt et al. 2017;Pereira et al. 2019;Nicholson & Aigrain 2022). In particular, a GP regression framework has been shown to be very effective at characterising stellar variability with periods shorter than the total time span of a time series, even in the case of degraded and noisy data sets (e.g. Nicholson & Aigrain 2022). As pointed out by Pereira et al. (2019), previous applications of GP regression in astrophysical contexts have typically focussed on determining accurate nonparametric models for time series data without particularly paying close attention to the physical processes that cause the variability. However, GP regression can be used under certain conditions to estimate physical parameters (e.g. such as characteristic timescales) that set the stochastic processes in time series data, such as modelling the signatures of granulation in the light curves of evolved low-mass stars (Pereira et al. 2019). In this work, we extend and test the application of covariance structure fitting of a light curve within a GP regression framework to extract physical parameters, in particular the characteristic frequency, of SLF variability in massive stars.
Because of the potential impact of different choices when calculating amplitude spectra, it can be argued that fitting the covariance structure of a light curve with a GP regression framework, such as the celerite2 software package, is the more robust strategy since it is directly fitting the original measurement data set (i.e. the light curve). Relying on the light curve as the data set and avoiding transformations into the Fourier domain, a GP regression method avoids these potential discrepancies. See also the application and discussion of the advantages of GP regression regarding the window function of a time series by Nicholson & Aigrain (2022). On the other hand, covariance structure fitting methods also include assumptions, such as that the mean of the light curve is constant and stationary, hence have their own limitations.
In this work, we investigated the efficacy of using GP regression for determining the properties of SLF variability in massive star light curves. In so doing, we restricted our sample to the 30 galactic O-type stars from the TESS sample previously studied by Bowman et al. (2020) that do not have significant coherent pulsation modes and thus do not require iterative pre-whitening prior to fitting Eqn. (1). Therefore, our comparison of fitting the amplitude spectrum and GP regression of the light curve was not influenced by any post-processing of the light curves, such as iterative pre-whitening.

Defining a GP kernel
A GP regression model is a non-parametric model that describes correlated stochastic variability in a time series by fitting hyperparameters to define a covariance matrix. Each data point in a time series is described as a correlated random variable with a mean and a variance, for which any finite collection of variables has a multi-variate Gaussian distribution (Rasmussen & Williams 2006;Aigrain & Foreman-Mackey 2022). In GP regression, a function is not directly fitted to a data set, but rather hyperparameters are fit to define a covariance matrix. This yields a posterior distribution that is a representation of a distribution of functions with a given structure in their covariance matrices, which are then conditioned on the data to evaluate the best representation of the input time series. In this work, it is assumed that the variance of the input time series is constant and generally that the observed SLF variability is stationary.
In the application of GP regression, a kernel to describe the correlation among data points is chosen, which is then applied to determine the best set of parameters to reproduce the observed time series (see e.g. Pereira et al. 2019). Posterior probability distributions for the relevant hyperparameters of the GP can be sampled using suitable priors, and the GP likelihood function (see equation 1 of Foreman-Mackey 2018). Computing the likelihood function is typically the computational bottleneck, there-fore the selection of an appropriate kernel, GP implementation, and sampling algorithm is crucial (Foreman-Mackey et al. 2017).
In this work, we used the python celerite2 and exoplanet software packages (Foreman-Mackey et al. 2017;Foreman-Mackey 2018;Foreman-Mackey et al. 2021) 2 to fit a covariance structure for TESS light curves within a GP regression framework. Inspired by Pereira et al. (2019) in their application of GP regression to study the granulation signal of pulsating red giants and guided by the excellent documentation of the celerite2 software package, we utilised a GP kernel that corresponds to a damped simple harmonic oscillator (SHO), whose power spectral density is given by where ω 0 is the natural frequency of the oscillator, S 0 is related to the amplitude of the variability, and Q is the quality factor. For very large values of Q, Eqn.
(2) corresponds to an undamped SHO. As stated in equation 23 of Foreman-Mackey et al. (2017), the kernel in the time domain is given by where η = |1−(4Q 2 ) −1 | 1/2 , and τ is the (absolute) time separation between measurements needed to construct the covariance matrix (i.e., the i j element of the covariance matrix uses τ = |t i −t j |).
In the limit of Q = 1/ √ 2 (see Foreman-Mackey et al. 2017 for the derivation; omitted here for brevity), the power spectral density in Eqn. 2 reduces to A GP regression of a time series can be parameterised using a characteristic variability timescale, ρ char = 2π/ω 0 , a characteristic amplitude 3 , σ A = √ S 0 ω 0 Q, and a characteristic damping timescale, τ damp = 2Q/ω 0 . The form of Eqn. (4) is similar to the semi-Lorentzian function used to fit SLF variability in massive stars (cf. Eqn. (1)). Furthermore, a damped SHO kernel is a convenient starting point when a physically motivated kernel may not be known (see Foreman-Mackey et al. 2017). Given that SLF variability represents a stochastic oscillatory process, we deem this kernel appropriate for modelling TESS light curves of massive stars. Testing its efficacy to extract effective timescales (e.g. ν char ) from light curves is one of the goals of this work.

Input parameter guesses and priors
In our framework we used the kernel of a damped SHO and input guesses for its parameters as follows: ω 0 as the characteristic angular frequency (i.e. 2 π ν char ); Q = 1/ √ 2; and σ A = α 0 √ ω 0 Q × (π/2) 0.25 based on roughly matching terms between Eqns. (1) and (4). We note that our results are in fact not sensitive to the input guesses for these parameters as long as the domains of the priors are large enough to contain the true value.
Given the pristine quality of space photometry and the dominance of the astrophysical variability signal over measurement errors, in this work we assume that all possible sources of instrumental noise are white (see e.g. Kallinger et al. 2014). This may not be generally true for all stars observed by the TESS mission, but is a valid assumption for the bright massive stars dominated by SLF variability studied in this work. We emulated a white noise floor in the Fourier domain by including a noise jitter term in the GP regression following Pereira et al. (2019). The jitter term is a constant added in quadrature to the diagonal of the covariance matrix and has a kernel function of: where i and j are matrix indices and C 2 jitter is the GP regression jitter term. Physically, the jitter term is used to represent uncorrelated noise in the data that cannot be captured by the GP regression, and hence represents a proxy of the measurement uncertainties in a light curve ( In running a GP regression for a given star, there are five main parameters to be determined: (i) the characteristic timescale, ρ char (from which the characteristic frequency can be determined: ω 0 = 2πν char , where ν char = 1/ρ char ); (ii) the characteristic damping timescale, τ damp (from which the characteristic damping frequency can be calculated: ν damp = 1/τ damp ); (iii) the characteristic amplitude, σ A ; (iv) a jitter term to emulate uncorrelated noise in the observations, C jitter ; and (v) the mean of the light curve. We take the ν char parameters from Bowman et al. (2020) multiplied by 2π to calculate ω 0,guess as input guesses, and assign non-informative (i.e. uniform) priors in logarithmic space to all input parameter guesses except the mean of the light curve, which is assigned a normal prior based on the measured mean of the TESS light curve, which a priori has been set to be zero. We specified liberal lower and upper bounds for the four uniform priors as given in Table 1, but are results are not sensitive to these choices as long as the domains of the priors are sufficiently large. An input guess of the quality factor of the GP regression model is set as Q guess = 1/ √ 2, and is evaluated as Q = πτ damp /ρ char .

Parameter and uncertainty estimation
To estimate the best-fitting model parameters and their uncertainties we used the python pymc3 software package (Salvatier et al. 2016) 4 and performed a Hamiltonian Monte-Carlo with a no U-turn sampler to determine parameter confidence intervals. In a standard MCMC approach, parameter chains make stochastic walks within the multi-dimensional parameter space, whereas a no U-turn sampler employs a Hamiltonian description of the probability distribution to sample the posterior probability distribution for the model parameters following Bayes' theorem. This makes a Hamiltonian Monte-Carlo no U-turn samplers typically much more efficient in finding global minima (Salvatier et al. 2016).
The best-fitting GP regression model is evaluated on the hyperparameters with maximum a posteriori probability. We use pymc3 and the maximum a posteriori parameters as an initial trial point to begin sampling the posterior. In our implementation of the Hamiltonian Monte-Carlo with a no U-turn sampler, we used four parameter chains, and at each iteration each chain is used to construct a model subject to a GP marginalised log likelihood function within the pymc3 model that also includes the log priors. A burn-in period of fewer than 1000 iterations was typically needed, which were discarded before parameter and uncertainty estimation were performed using another 1000 iterations. Convergence of the parameter chains is confirmed using the parameter variance criteria from Gelman & Rubin (1992). We used the highest density interval (HDI) and 94% of the posterior probability density to estimate the lower and upper confidence intervals for each marginalised parameter distribution.
Goodness-of-fit is assessed both by ensuring that good convergence statistics are obtained following parameter variance criteria (Gelman & Rubin 1992) and visual inspection of the marginalised posterior distributions (i.e. corner plots; Foreman-Mackey 2016). We also visually inspect the posterior probability distribution, GP regression model and its corresponding power spectral density in the time and Fourier domain, respectively.
We emphasise that our GP regression framework is a fit to the light curve and not a fit to an amplitude spectrum. But to aid in making our results visually comparable to the amplitude spectrum fitting results of Bowman et al. (2020), we calculated the power spectral density (i.e. amplitude squared per unit frequency as a function of frequency) of the TESS light curve and the resultant GP regression model, which was normalised using Parseval's theorem following Pereira et al. (2019). Similarly, we also converted the fits from Bowman et al. (2020) into units of power spectral density. We normalised all power spectral density figures to have the same amplitude at zero frequency (i.e. α 0 = 1), such that they converge at unity at zero frequency for ease of visual comparison. This is necessary to allow a consistent comparison of the power spectral density of the GP regression model including the jitter term, the power spectral density of the TESS data, and the converted fits of Bowman et al. (2020) in the Fourier domain. Furthermore, this approach allowed us to perform a sanity check judgement of the quality of the best-fitting GP regression model to the data, since poor fits are more obvious in the Fourier domain 5 . The conversion of the results into a power spectral density does not impact the inferred parameters, such as ν char , as these are determined by the GP regression in the time domain.

Results
We applied our GP regression framework to the exact same TESS light curves of 30 massive stars previously analysed by Bowman et al. (2020). For each star, we provide a summary figure in Appendix A, which contains the TESS observations and the best-fitting GP regression model in the top panel, and the power spectral densities of the TESS observations, the converted best-fitting model from Bowman et al. (2020), and the GP regression model in the bottom panel. A full summary of our parameter results from the GP regression and the upper and lower confidence intervals for each parameter from the Hamiltonian Monte-Carlo with no U-turn sampler is given in Table B.1. We note that for all summary figures in Appendix A that the minimal informative frequency of the data set (i.e. resolution set by the inverse of the TESS light curve time span of ≥ 28 d) is significantly smaller than the chosen 0.1 d −1 lower limit of the x-axes. An immediate major advantage of using a damped SHO model within a covariance structure fitting framework with the GP regression celerite2 software is that it provides additional useful inferred parameters of the SLF variability in massive stars, such as ν damp and Q, which are not provided by fitting Eqn.
(1) to an amplitude spectrum. Therefore, in addition to the characteristic timescale of the SLF variability, we are able to make inferences on its coherency and quasi-periodicity. We show the histogram of the inferred Q values for the sample of 30 massive stars in Fig. 1, which has been colour-coded by the spectroscopic effective luminosity of each star defined as L := T 4 eff /g (Langer & Kudritzki 2014). We find a range of Q values spanning from 0.18 to 2.15 that cluster around a value approximately equal to 1/ √ 2 and have a median value of 0.48. The inferred Q values represent a wide range in the coherency of SLF variability of massive stars. The two stars with the largest Q values are HD 154368 (TIC 41792209) and HD 152424 (TIC 247267245). Both of these stars have large amplitude and quasi-coherent SLF variability in their light curves (see Fig. A.3) and are among the most massive and luminous stars in our sample.

Comparison of Bowman et al. (2020) and GP regression
In the top-and bottom-left panels of Fig. 2, we show the pairwise comparison of the ν char values determined in this work using GP regression and those determined by Bowman et al. (2020), which are colour coded by the σ A and ν damp GP regression parameters, respectively. On average, a fairly good agreement is seen between the two sets of ν char values. Yet a small fraction of the sample of stars have ν char values that differ by more than a factor of two. In some cases this leads to significant differences when considering the relatively small uncertainties on this pa- rameter, but this varies from star to star. The discrepancies in some stars are not necessarily systematic with ν char . However, we note that stars with a significant difference in ν char values between the two methods have lower σ A and higher ν damp values (i.e. the SLF variability is relatively low amplitude and more stochastic). Also, in cases when a significant difference is found, the GP regression value for ν char is typically larger than that of Bowman et al. (2020). This is possibly related to the difference between a GP regression incorporating a different model and being a fit to the light curve rather than a fit to the amplitude spectrum. In other words, the sensitivity of the GP regression to the window function of the time series data is in the time domain rather than the Fourier domain, such that stars with lower amplitude SLF variability have larger discrepancies between the two methods. However, a much larger sample than the 30 stars considered in this work is needed to further investigate this if this effect is astrophysical or methodological in origin.
In the top-right panel of Fig. 2, we show the relationship between the steepness of the SLF variability in the amplitude spectrum, γ, as determined by Bowman et al. (2020) and the τ damp timescale colour-coded by σ A as determined using GP regression in this work. Whilst there is considerable scatter between these two parameters, a positive correlation between γ and τ damp can be seen. This correlation is caused by stars with SLF vari-ability characterised with longer damping timescales, τ damp , being less stochastic and more quasi-periodic (i.e. higher values of Q). In turn this leads to larger values of γ, and the majority of power in the SLF variability being concentrated near ν char as opposed to being spread over a wide range in frequency.
In the bottom-right panel of Fig. 2, we show the relationship between the amplitude of the SLF variability at zero frequency, α 0 , as determined by Bowman et al. (2020) and the characteristic amplitude, σ A , in this work. A clear correlation between these parameters is seen, which demonstrates that they can serve as proxies of each other. We note that the relationship is almost linear in log-log space, with a linear regression of α 0 and σ A yielding a Pearson correlation coefficient of r = 0.97 (p < 10 −10 ) and a gradient of 0.73 ± 0.03. This is expected given the equivalence of these parameters in both methodologies, but the gradient of order unity and the extremely high degree of correlation is a useful sanity check.
Our comparison of the GP regression parameters and those of Bowman et al. (2020) leads us to define two sub-groups of stars: (i) those with high α 0 (i.e. high σ A ), low ν char (i.e. high τ char ), and low ν damp values; and conversely (ii) those with low σ A , high ν char , and high ν damp values. We term these the 'yellow' and 'blue' sub-groups, respectively, as indicated by the colour coding in Fig. 2. In other words, stars in the 'yellow' sub-group typically have high-amplitude, quasi-periodic lowfrequency variability, and stars in the 'blue' group have lowamplitude, stochastic variability spanning a broad range in frequency, on average. These are not distinct populations and our sample in fact covers the transition between these two extremes. However, the definition of these two sub-groups serves to highlight the differences in the properties of SLF variability among a sample of massive stars.

Correlations with mass and age in HR diagram
In Fig. 3 we show the spectroscopic HR diagram for the 30 galactic O stars studied by Bowman et al. (2020), which we have remodelled using GP regression in this work. As discussed in section 3.1, there are some star-to-star differences in the determined ν char parameters between the results of Bowman et al. (2020) and from our GP regression. However, an important result from this work is that the dependence of ν char on the mass and age of a star remains unaffected since both independent fitting methods yield the same correlation. Therefore, as demonstrated by comparing the left and right panels of Fig. 3, our new modelling approach further strengthens the conclusion of Bowman et al. (2020) that the light curve of a massive star encodes valuable constraints on its mass and age. More specifically, younger stars near the ZAMS have larger ν char values of approximately 7 d −1 , on average, whereas older stars near the TAMS have smaller ν char values of approximately 1 d −1 .
Furthermore, we note that the 'yellow' sub-group of stars (cf. section 3.1) are typically higher mass and more evolved and the 'blue' sub-group of stars are closer to the ZAMS on average. This is a new result of our work. Not only does the value of ν char of a star's SLF variability probe its mass and age, but the degree of coherency and stochasticity of a massive star's light curve is also a function of age. More specifically, more massive and more evolved stars typically have larger Q values, as shown in Fig. 4.
The larger the Q value in an oscillator, the narrower and sharper the peak in the power spectral density becomes because the system is more selective of what range of frequencies are driven to resonance. For example, systems with low quality fac-tors (i.e. Q < 1/ √ 2) are referred to as 'overdamped' because damping dominates over driving. After a perturbation, such a system is returned asymptotically to its equilibrium steady state by means of an exponential decay. Whereas systems with high quality factors (i.e. Q > 1/ √ 2) are referred to as 'underdamped', which means they continue to oscillate at their resonant frequency whilst experiencing a decaying signal amplitude. This is because an underdamped system has driving that is able to overcome damping. For a perfect oscillation with zero damping the quality factor is infinite. Hence, coherent pulsators driven by the heat-engine mechanism, would have extremely high Q values.
Given the measured Q values of the stars shown in Fig. 4, our results demonstrate that SLF variability in massive stars transitions from being well described as an overdamped SHO to an underdamped SHO between the ZAMS and the TAMS in the HR diagram. Therefore, the coherency of SLF variability as inferred using GP regression is a new avenue for constraining the responsible physical mechanism in different parameter regimes of the HR diagram.

Is SLF variability quasi-periodic or stochastic?
Two prime examples of stars in the 'yellow' sub-group are HD 152424 (TIC 247267245) and HD 154368 (TIC 41792209), which have Q = 2.0 +1.3 −0.8 and Q = 2.2 +1.0 −0.7 , respectively (see Table B.1). We note that there are relatively large uncertainties for the Q values, the largest of all the stars, but place the stars in the 'underdamped' regime. These stars have almost 'perfect' GP regression fits to their light curves (see Appendix A), and have light curves that are clearly dominated by astrophysical signal and negligible instrumental noise. Stars in the 'yellow' subgroup are those that deserve special attention for the purpose of identifying and extracting quasi-coherent pulsation modes from their apparent SLF variability for the purpose of asteroseismology (see e.g. Bowman 2020). However, such an attempt would require long-term light curves spanning dozens of TESS sectors to resolve individual modes owing to their apparent intrinsically damped nature. This is beyond the scope of our current paper but is the subject of future work.
We also perform a second modelling setup and perform GP regression for all stars in which we fix the quality factor to be Q = 1/ √ 2. As a consequence, the damping timescale, τ damp , is no longer included as a parameter. This allows for a better understanding of the measured Q values when it is an inferred parameter and of the power spectral density predictions derived from the GP regression. We provide the parameters from the fixed-Q GP regression in Table B.2. Fixing Q as a parameter forces the shape of the power spectral density prediction to be similar to that of Eqn. 1. This has the potential to explain the differences in ν char values between those from Bowman et al. (2020) and from the non-fixed Q-value GP regression. However, the ν char values from the GP regressions with fixed and non-fixed Q values are within 1σ of each other, which demonstrates that we are somewhat insensitive to Q when using GP regression or there is significant degeneracy between the Q and ν char parameters.
We also calculated and compared the Bayesian Information Criteria (BIC) values, which penalises models based on their complexity, using the GP log likelihood value returned by celerite2 software for the GP regression models when Q is a free and a fixed parameter. In most cases, 20 of the 30 TESS light curves, the BIC values indicate that the solutions with Q as a free parameter are strongly preferred. This is understandable since the added complexity is needed to more accurately  describe the morphology of SLF variability in the time domain. A damped SHO model with a fixed Q value is less flexible, with the light curves of massive stars containing added complexity that is not well captured in this overly simplistic model. Moreover, the 10 stars in which the model with a fixed Q value is preferred are generally the stars with noisier light curves (i.e. 'blue' subgroup members) and have generally poorer fits, as can be seen when comparing the power spectral density of the GP solution and the TESS data in the Fourier domain. Therefore, in order to accurately capture the shape of SLF variability we conclude that the use of GP regression with Q (and thus also τ damp ) as a free parameter is the more robust approach, on average.

Testing the impact of poorer quality observations
We have demonstrated that the best-fitting parameters characterising SLF variability in massive stars are sometimes dependent on the choice of methodology, but this affects each star differently. This is expected because we are comparing the results of two different models each with different fitting methods. On the other hand, there is a general consistency in ν char values obtained from the two methods, but it generally depends on the amplitude of the SLF variability and the noise properties of the TESS light curve. For the 'yellow' sub-group of stars, the astrophysical signal (i.e. SLF variability) strongly dominates over the noise. Whereas, some stars in the 'blue' sub-group have SLF variability that is comparable to the white noise, and only detected because of the high precision and long duration of TESS light curves. It is the 'blue' sub-group of stars that have, on average, larger inconsistencies in their inferred ν char values between the two methods (cf. Fig. 2). In order to study SLF variability across a wide range of masses and ages, thus including the full range of SLF variability amplitudes (i.e. tens of µmag to tens of mmag; Bowman et al. 2019a,b), high-photometric precision and short-cadence space light curves are necessary to yield the necessarily low instrumental noise.
For the sample at large, there is fairly good agreement between the results in this work and the alternative approach taken previously by Bowman et al. (2020). This leads us to wonder which of the two approaches is more sensitive to the quality of the observations and perhaps less robust considering the notable difference in ν char values for some stars. For example, not all Fig. 5. Results of GP regression and amplitude spectrum fitting for the noise-enhanced light curves of HD 112244 (TIC 406050497) and HD 96715 (TIC 306491594), which are typical 'yellow' and 'blue' subgroup members, are shown in the left and right columns, respectively. From top to bottom, increasingly larger amounts of white noise has been added. The blue and green shaded regions denote the 94% confidence interval for ν char for each method. The ν char values from the GP regression are more consistent, on average, than those from fitting the amplitude spectrum.
Article number, page 9 of 18 A&A proofs: manuscript no. IGWs_GP_accepted_arxiv stars in our sample are the same brightness, such that the underlying random and instrumental noise contributions in their light curves are different. Perhaps the covariance structure fitting of a light curve is more sensitive to these noise properties compared to the amplitude spectrum fitting in the Fourier domain.
To test the sensitivity of the GP regression to the quality of the input time series, we degraded the quality of the TESS light curves of the 30 massive stars by adding additional frequency independent (i.e. white) noise. This increased the value of C w (cf. Eqn. 1) in the amplitude spectrum and consequently also the contribution of the jitter term in the GP regression. For each star in our sample, we created four additional light curves which had 0.5, 1, 2 and 3 times the standard deviation of the original light curve added to them as white noise to emulate increasingly noisier data sets. The latter of which can be considered an extremely and unrealistic noise-dominated light curve, but we include it for completeness. We then re-modelled these noisy light curves using our GP regression framework and the method of Bowman et al. (2020) and compare the resultant ν char values.
In Fig. 5, we show examples of the results of the 'yellow' (left) and 'blue' (right) sub-group members, HD 112244 (TIC 406050497) and HD 96715 (TIC 306491594), respectively. The GP regression typically returns the same ν char value for all noisy light curves of a star within the respective uncertainties, which are indicated by shaded regions that denote the 94% confidence interval for ν char from each fitting method. Naturally, the confidence intervals become larger for noisier light curves, but we note that they are always much smaller for the MCMC amplitude spectrum method compared to those from the GP regression framework. Importantly, there is considerably larger scatter in ν char values determined from the amplitude spectrum fitting methodology of Bowman et al. (2020) to the same artificially noise-enhanced data sets. The values of ν char of the latter are significantly different from each other because of the typically much smaller uncertainties returned by the method of amplitude spectrum fitting. In almost all cases, and as demonstrated by the example stars in Fig. 5, the ν char value returned by the amplitude spectrum fitting methodology systematically decreases to lower values when the white noise contribution is larger. Whereas, the ν char value from the GP regression stays the same within its (much larger) uncertainties, hence not significantly different among the noise-enhanced data sets given its relatively large uncertainties.
Given the consistency of the ν char values of the GP regression, regardless of the underlying signal-to-noise ratio of SLF variability over the white noise, we conclude that it is the more robust methodology. The reason for this is because of the fundamental difference between the two fitting methods. The GP regression is a fit of the covariance structure in the light curve and incorporates a different model, in which the implemented jitter term more effectively handles the included white noise compared to directly fitting the amplitude spectrum. Therefore, large differences in the C w term in Eqn. (1) can lead to significantly different ν char values, especially given the small uncertainties typically returned from this method.

Application to synthetic data
Considering that our tests in section 3.4 were applied to observed data sets of real stars with unknown underlying frequency spectra, it is possible that the GP regression fitting and amplitude spectrum include different biases. In this section, we test this hypothesis using synthetic light curve data sets generated from different combinations of hyperparameters for the damped SHO kernel generated using celerite2. We applied both fitting methods and compared the derived ν char values to investigate which approach is more robust in returning the true input value for ν char .
We generated 100 artificial light curves that span 26 d in length, consist of 1000 data points, and contain a 1-d gap in the middle to emulate one sector of TESS data. Each light curve was generated containing different noise properties and hyperparameters for a damped SHO kernel. We applied our covariance structure fitting method within a GP regression framework and the amplitude spectrum fitting method of Bowman et al. (2020) to these 100 synthetic light curves. For all cases, the derived ν char values are comparable between the two fitting methods and are approximately within a factor of two of each other, as was the case in section 3.4 when applied to real and noise-enhanced TESS light curves. However, the ν char values determined from the methodology of Bowman et al. (2020) are sometimes up to approximately a factor of 1.4-1.7 larger than the true input values. As before, this difference is caused by the two approaches employing two different models and fitting methods. Moreover, the GP regression returns ν char values that are completely consistent with the true input values. Therefore, we conclude that the GP regression methodology is generally more robust, especially in cases of noisier input light curves. We show several examples of the fitted synthetic light curves and corresponding power density spectra in Fig. 6, which include the power spectral density of the noiseless damped SHO input model for comparison to the two fitting methods.
We note that even though the GP regression is a fit to the light curve, and the method of Bowman et al. (2020) is a fit to the amplitude spectrum, the normalised power density spectra from the best-fitting GP regression models are typically better (visual) fits in the cases when Q > 1 (i.e. left panels of Fig. 6). This is because the shape of Eqn. (1) is quite rigid and unable to capture the complete morphology of the SLF variability in the Fourier domain. For low-Q data sets, the form of Eqn. (1) and the resultant power density spectrum of the best-fitting GP regression model are similar in shape, as shown in the right column panels of Fig. 6. However, the amplitude spectrum fitting methodology typically overestimates the true input value of ν char by a factor of ∼ 1.5. We refer the interested reader to Foreman-Mackey et al. (2017) for similar hare-and-hound exercises with celerie2 using simulated time series data.

Discussion and Conclusions
The vast majority of massive stars have SLF variability in time series photometric observations, if not all when sufficiently high enough quality data are available, which probes the physical properties of the star, such as mass and age. In this work, we have tested the use of GP regression, specifically by fitting a covariance structure with the celerite2 software package and a damped SHO kernel (Foreman-Mackey et al. 2017;Foreman-Mackey 2018;Foreman-Mackey et al. 2021), to the time series photometric observations for a sample of 30 massive stars observed by the TESS mission and previously studied by Bowman et al. (2020). We use the same input light curves from the TESS mission and compare our GP regression results, paying particular attention to the characteristic frequency of the SLF variability, ν char , to the previous amplitude spectrum fitting method by Bowman et al. (2020). We find that the ν char determinations by the two different methods are the same, within their uncertainties, for the majority of massive stars observed by TESS. This is encouraging and demonstrates that the previously found correlation of ν char with mass and age is robust given that the GP Fig. 6. Examples of results of GP regression and amplitude spectrum fitting for synthetic light curves. The left and right columns show three examples that resemble 'yellow' and 'blue' subgroup members in terms of their quality factor. The blue and green shaded regions denote the 94% confidence interval for ν char for each method. In all cases, the true input value of ν char is recovered well by the GP model with celerite2, whereas the MCMC method of Bowman et al. (2020) sometimes overestimates the value of ν char because of the more rigid profile imposed in the Fourier domain, as demonstrated by a comparison with the noiseless input SHO model. regression method is a fit to the light curve using a damped SHO model, and the method of Bowman et al. (2020) is a fit to the amplitude spectrum using a semi-Lorentzian (i.e. Harvey-like;Harvey 1985) model that is not necessarily physically motivated. A minority of stars have different and systematically higher values of ν char by up to a factor two when using the GP regression compared to amplitude spectrum fitting, as demonstrated by the examples shown in Fig. 6. Specifically, the damped SHO model of the GP regression is more flexible in fitting the SLF variability in the time domain compared to the rigid shape imposed by the semi-Lorentzian profile in the Fourier domain given in Eqn. (1).
We have demonstrated that GP regression is a viable method for characterising SLF variability in time series photometry of massive stars. Our work lends further support to the conclusion of Bowman et al. (2020) that SLF variability probes the mass and age of a massive star, demonstrated by the correlation between ν char and a star's location in the HR diagram, as shown in Fig. 3. Therefore, we conclude that the absence of coherent, heat-driven pulsation modes is not necessarily a prerequisite for asteroseismology of massive stars, given the probing power of ν char on bulk properties like mass and age. Furthermore, we have shown that a fraction of massive stars with SLF variability exhibit the characteristics of underdamped quasi-periodic variabil-ity, suggesting the presence of damped pulsations (i.e. stochastic IGWs). Although a range of quality factors exist in the SLF variability of massive stars, a transition from overdamped to underdamped variability is found to coincide with a transition from the ZAMS to the TAMS, as shown in Fig. 4. We have also demonstrated that a GP regression is more robust against noisier input time series data, especially in the cases where the SLF variability is comparable in amplitude to the white noise.
Several recent theoretical studies have focussed on understanding the possible physical causes that give rise to SLF variability in time series photometry. One possible mechanism is that stochastically excited IGWs from the convective core reach the surface with detectable amplitudes and produce a broad frequency spectrum of damped modes (Rogers et al. 2013;Rogers 2015;Rogers & McElwaine 2017;Edelmann et al. 2019;Horst et al. 2020;Vanon et al. 2022;Herwig et al. 2022;Thompson et al. 2022). However, such an interpretation has been challenged by other studies employing numerical simulations showing that core-excited IGWs are damped before they reach the surface Lecoanet et al. (2019Lecoanet et al. ( , 2021. This latter result hence supports instead that SLF variability originates from sub-surface convection zones Schultz et al. 2022). The presence and location of sub-surface convection zones and inferring their existence in stellar evolution models, especially for stars of low metallicity, is clearly important for understanding IGW excitation mechanisms (Jermyn et al. 2022). From stellar evolution theory, it is expected that the size of sub-surface convection zones grow from being negligible to relatively large fractions of a star's radius as it evolves from the ZAMS to the TAMS. Hence, the contribution to SLF variability from sub-surface convection zones is plausibly also expected to increase as stars evolve. On the other hand, the increasing radius as stars evolve produces different contributing factors to the driving, damping and propagation of IGWs. Detailed (rotating) hydrodynamical simulations of massive stars at different masses and ages are needed to explore this transition and the relative importance of IGWs from both the core and sub-surface convection zones.
In this work we have tested and implemented the use of GP regression to infer the characteristic frequency, ν char of massive stars from TESS light curves using the celerite2 software package. We have demonstrated that a transition from a broad range of low-amplitude SLF variability to high-amplitude quasiperiodic variability exists between the ZAMS and TAMS. We postulate that the observed transition discovered in our work corresponds to an astrophysical scenario in which the dominant cause of SLF variability in massive stars transitions from core convection to sub-surface convection as stars evolve from the ZAMS to the TAMS.
Other complementary methods to study SLF variability in astronomical time series data include low-order linear stochastic differential equations (e.g. Koen 2005), and continuous-time autoregressive moving average models (see Kelly et al. 2014). There are thus a variety of statistical tools discussed in the literature that can aid in unlocking the large quantity of TESS light curves of massive stars. In the future, we will expand our analysis to include such methods for inferring properties of the SLF variability in massive stars. We will also apply our methodology to additional stars observed by TESS to extract the morphology of their SLF variability. This will allow us to further populate Fig. 3 and study the origin of SLF variability and its correlation to stellar properties such as mass and age, but also macroturbulence as shown by Bowman et al. (2020). The TESS mission is ongoing and currently providing high precision light curves of thousands of massive stars, which also span much longer time spans than those considered in this initial study. In our next study, we will expand our method to apply it to a much larger sample and use it test if masses and radii can be reliably inferred from the light curves of massive stars.