FliPer$_{Class}$: In search of solar-like pulsators among TESS targets

The NASA's Transiting Exoplanet Survey Satellite (TESS) is about to provide full-frame images of almost the entire sky. The amount of stellar data to be analysed represents hundreds of millions stars, which is several orders of magnitude above the amount of stars observed by CoRoT, Kepler, or K2 missions. We aim at automatically classifying the newly observed stars, with near real-time algorithms, to better guide their subsequent detailed studies. In this paper, we present a classification algorithm built to recognise solar-like pulsators among classical pulsators, which relies on the global amount of power contained in the PSD, also known as the FliPer (Flicker in spectral Power density). As each type of pulsating star has a characteristic background or pulsation pattern, the shape of the PSD at different frequencies can be used to characterise the type of pulsating star. The FliPer Classifier (FliPer$_{Class}$) uses different FliPer parameters along with the effective temperature as input parameters to feed a machine learning algorithm in order to automatically classify the pulsating stars observed by TESS. Using noisy TESS simulated data from the TESS Asteroseismic Science Consortium (TASC), we manage to classify pulsators with a 98% accuracy. Among them, solar-like pulsating stars are recognised with a 99% accuracy, which is of great interest for further seismic analysis of these stars like our Sun. Similar results are obtained when training our classifier and applying it to 27 days subsets of real Kepler data. FliPer$_{Class}$ is part of the large TASC classification pipeline developed by the TESS Data for Asteroseismology (T'DA) classification working group.


Introduction
Starting with the Convection, Rotation and planetary Transits satellite (CoRoT), and showing its full potential with Kepler, asteroseismology is now the most precise way to obtain estimates of masses and radius of field stars (e.g. Lebreton & Goupil 2014), except for eclipsing binaries, for which spectroscopy prevails. Asteroseismic parameters such as the frequency of maximum power ν max and the large frequency separation ∆ν of the oscillation modes of solar-like pulsators (i.e. with modes excited by turbulent convection, Goldreich & Keeley 1977) are obtained from the power density spectrum using global seismic pipelines (e.g. Mosser & Appourchaux 2009;Huber et al. 2009;Mathur et al. 2010, etc.). These global seismic parameters are key constraints for stellar evolution models: using them leads to age estimates that are much more precise than estimates obtained with other classical methods (e.g. Lebreton & Goupil 2014).
The Transiting Exoplanet Survey Satellite (TESS), launched on 18 April 2018, conducts a photometric survey of 90% of the sky during its two-year nominal operations (Ricker et al. 2014). It will search for extrasolar planets that mostly orbit M-type stars. The TESS fields cover 26 sky sectors that each cover four 24 • x 24 • areas from the galactic pole to nearly the ecliptic plane. Each field of view remains unchanged for 27 continuous days. The satellite will specifically observe no fewer than 200 000 main-sequence dwarf stars, 30 − 100 times brighter (with an apparent magnitude lower than ∼ 10, Stassun et al. 2018) than those observed by the Kepler satellite. All these conditions are suitable for seismic detections in solar-like stars, mostly in high-luminosity main-sequence (MS) and subgiant stars (a detailed study of the potential asteroseismic yields of the TESS mission is given by Campante et al. 2016). In addition, more than 400 million stars will be observed in the full-frame images with a 30-minute observational cadence.
The first step for the large asteroseismic survey analysis is to distinguish solar-like pulsators from all other pulsating stars. An accurate stellar classification can be computationally expensive, but efforts have been made to classify CoRoT and Kepler targets (Debosscher et al. 2009;Molnár et al. 2018). For example, Mathur et al. (2016b) showed three years after the end of the Kepler main mission that more than 800 red giants (RGs) (corresponding to about 3% of the total number of observed RGs) were still misclassified as cool dwarfs (see also Hon et al. 2019). However, no public real-time automatic algorithm was developed to classify stars that wereobserved during these missions. In view of the huge amount of data to be delivered by TESS, it would be advantageous to have an automatic method to classify solar-like stars, and even other pulsator types. FliPer is a method for estimating surface gravities (from 0.3 to 4.5 dex) or ν max of solar-type stars (Bugnet et al. 2018a(Bugnet et al. , 2017. It relies on the use of the global amount of power contained in the power spectrum density (PSD) of a solar-type pulsator, which depends on its evolutionary state (Mathur et al. 2011;Kallinger et al. 2016). The method is automatic, and takes advantage of a random forest ML regressor (Breiman 2001) to estimate precise surface gravities. The algorithm is trained to learn how to predict log g from thousands of precise seismic estimates made with the A2Z seismic pipeline (Mathur et al. 2010). In this way, FliPer gives estimates with a precision that is better than can be obtained from spectroscopy alone.
Machine-learning methods such as neural networks (e.g. Bai et al. 2005), algorithms based on decision trees (e.g. Pérez-Ortiz et al. 2017;Veljanoski et al. 2018), or AdaBoost (e.g. Viquar et al. 2018) already give good results for characterising the stars. For instance, Hon et al. (2018) showed that they were able to distinguish core helium-burning clump stars from hydrogen shellburning RG stars using a convolutional neural network. In our study, we use FliPer parameters to classify solar-like pulsators from among all pulsating stars: instead of using a regressor (see Bugnet et al. 2018a) to estimate physical parameters, we use a classifier algorithm trained with the FliPer parameters and the effective temperature of each star. After describing the data in Section 2, we explain in Section 3 how the FliPer Class algorithm uses FliPer parameters (F p,i ) along with the effective temperature to distinguish between the different pulsator types. Then we present results from the classification of TESS-simulated data and of a known sample of Kepler main mission data.

Data preparation
In order to test the algorithms, the T'DA working group simulated datasets of TESS observations 1 (Lund et al. 2017). We used 10, 812 simulated stars that can be studied with a stellar signal alone (designated as "clean" data), with additional white noise ("noisy" data), or with both additional white noise and instrumental systematics ("sysnoisy" data). Because systematics can be corrected (using methods similar to those applied to the K2 data, Aigrain et al. 2016), we chose to focus our study on the "noisy" dataset. The sample is described in Tab. 2. Part of the γ-Doradus sample is constituted of γ-Doradus and δ-Scuti hybrid stars.
To determine the reliability of our method on real data, we also used power spectrum densities of a sample of 1, 442 Kepler targets observed in the long-cadence observation mode (corresponding to an acquisition every 30 min) for which we know the classification. Tab. 2 displays the number of stars in the Kepler sample belonging to each classification (Reed et al. 2018;Mc-Namara et al. 2012;Li et al. 2019;Balona et al. 2011;Balona 2013;Sachkov 2014;Smalley et al. 2015;Serenelli et al. 2017). Long-period variability stars are not represented in the Kepler sample because they can be easily classified by using the effective temperature alone (the FliPer Class is not required for these stars). The Kepler light curves (calibrated following García et al. 2011) considered in this work were observed for approximately four years. This results in a much higher frequency resolution in the PSD than what is expected for most TESS targets, which are

Fliper Class : a tool for classifying pulsating stars
FliPer (Bugnet et al. 2018a) is a method for estimating the surface gravity of solar-like pulsating stars based on the measure of the amount of power in their PSD. For solar-type pulsators, the PSD is dominated by the power of the convective background, stellar oscillation modes, and the rotation period signals. All these effects vary when the star evolves from the MS to the red giant branch (RGB). FliPer thus gives constraints on the evolutionary stage of the solar-like pulsator. We define the FliPer metric as where PSD represents the averaged value of the PSD from a given frequency to the Nyquist frequency, and P n is the photon noise (see Bugnet et al. 2017, for more information).

FliPer parameters: F p,i
For each star we calculated different FliPer parameters, F p,i , as the FliPer metric starting from different lower frequency boundaries (i ∈ [0.7, 7, 20, 50] µHz) in the calculation of PSD. The four different frequency domains used for the F p,i calculation are represented by the coloured area in Fig. 1. By combining these different F p,i , we extracted information from different regions of the PSD of the star. A previous study (see Bugnet et al. 2017) indicated that the two F p,0.7 and F p,7 parameters are easily dominated by rotation peaks for MS stars, but are perfectly suitable to take the power of the modes for high-luminosity giants into account. The other parameters, F p,20 and F p,50 , allow precise estimates for MS stars but they do not take the mode power in high-luminosity RGs into account. FliPer gives great results when MS stars are distinguished from RGs by estimating their surface gravity, as discussed in Bugnet et al. (2018a). By combining the different F p,i for all stellar types, we attempted to classify not only solar-like stars, but all pulsator types. Fig. 1. PSD of six different simulated stars belonging to different classes (solar-like, β-Cephei, γ-Dor, RRLyrae, LPV, and Cepheid) as described by the y -axis labels of each panel. Coloured areas (red, grey, orange, and green) represent the different ranges of frequency used for the F p,i calculation (from 0.7, 7, 20, and 50 µHz, respectively, to the Nyquist frequency). Coloured circles represent the class identifiers used in Fig.2. Each pulsator type has a typical amount of power associated for a given frequency range in the PSD. Figure 1 shows the TESS-simulated PSD for six different pulsator types. First, we observe that each type of star presents a characteristic signature in the PSD.
By calculating F p,0.7 (red areas on Fig. 1), it is easy to distinguish a solar-like star from a long-period variable (LPV) because their granulation power differs by several orders of magnitude. However, it is harder to distinguish a Cepheid from a RRLyrae using only F p,0.7 because they both present a PSD background with the same order of magnitude. With a higher frequency boundary such as 50 µHz for the F p,i calculation, we can distinguish a Cepheid well from a RRLyrae. However, by simultaneously using the different F p,i , it is possible to distinguish the different types of stellar pulsators.
As previously discussed, the type of variability shown by the star affects the range of values that it can have for each F p,i . Figure 2 represents the total sample of TESS-simulated data in the log(T eff ) versus F p,i diagram for i = 0.7 µHz (left panel) and i = 20 µHz values (right panel). In addition, the stars shown in Fig. 1 are represented in the diagrams with stars with the same colour code as in Fig. 2. We also represent the first planet star host observed by TESS in the π-Mensae system with a star (Huang et al. 2018;Gandolfi et al. 2018). This star is properly classified as a solar-like pulsator based on its FliPer values, as shown in Fig. 2.
Using the TESS simulated dataset, we note that each type of star covers a given region of the T eff versus F p,i diagrams. This means for instance that using only one F p,i , solar-like pulsators are already well separated from Cepheids and RRLyrae. However, we extended the analysis of Fig. 1 and show with Fig. 2 that using only one F p,i does not allow us to clearly distinguish between Cepheids and RRLyrae. In addition, we observe by comparing the two panels of Fig. 2 that the area corresponding to a given type of star changes when a different F p,i is used: each pulsator type evolves differently in the diagram when we modify the starting frequency of the F p,i∈ [0.7,7,20,50] calculation (when we switch from the left to the right panel in Fig. 2). We therefore expect to be able to separate RRLyrae from Cepheids by comparing their different F p,i (i ∈ [0.7, 7, 20, 50] µHz).

FliPer Class classification algorithm
In the previous section we explained that stars can be manually classified according to their F p,i∈[0.7,7,20,50] . In view of the amount of TESS data to be released, the classification of each individual pulsator has to be automatic. A random forest classifier (Breiman 2001) is a supervised machine-learning (ML) algorithm that classifies data from a given set of input parameters (see Appendix A for more details about the classifier). Random forest algorithms have been proven to be efficient in distinguishing between MS stars and RGs (Bugnet et al. 2018a) when F p,i (i ∈ [0.7, 7, 20, 50] µHz) is used as input parameters.
We classified the pulsators using the "RandomForestClassifier" function from the "sklearn.ensemble" Python library (Pedregosa et al. 2011). We split the simulated dataset into two random samples. The training set contains 80% of the total num-Article number, page 3 of 8 A&A proofs: manuscript no. main  ber of stars, while the test sample contains the remaining 20% of the stars. The same method was applied to the Kepler set. The supervised classifier FliPer Class was trained on the training dataset to learn how to predict the output classification using F p,i (i ∈ [0.7, 7, 20, 50] µHz) and T eff as input parameters for each star. The maximum number of features considered at each split point is p = √ m because we consider m = 5 input parameters (see Appendix A for details about the classifier). The previously trained algorithm, along with the code to use it, can be downloaded from GitHub 2 . Each parameter has a different effect on the training process, which is represented in Fig. 3 for the TESSsimulated dataset. Feature importance is the number of times a feature is used to split a node normalised by the total number of nodes. Uncertainties are calculated by taking the standard devi-ation of each feature importance from the individual trees. The effective temperature has the highest weight in classifying the type of stars. However, all input parameters are useful regarding the importance of the other F p,i parameters. This shows that F p,i parameters, coupled with T eff , are suitable parameters for classifying stars. Similar results are obtained when the Kepler training sample was used.

Classification of TESS-simulated data
We obtained an out-of-bag (OOB) error of the training on TESSsimulated data of about 0.011. This number gives estimates of the error rate of the classifier when n t = 200 trees are used by classifying a sub-sample of stars that were not used in the building of the last learner. The OOB error can be biased depending on the hyperparameters of the algorithm (number of trees (n t ), number of features considered at each split point (m), etc, Mitchell 2011). This study thus provides another estimate of the classification accuracy using the TESS test sample to examine the performance of the trained algorithm.
The ∼ 2, 000 stars that belong to the TESS test sample were automatically classified amongst the classes reported in Tab. 2 by FliPer Class trained on the training sample. The results are represented in Tab. 2: the numbers in each row represent for a given pulsator class the number of stars that were classified in each output class by FliPer Class . The higher the value on the diagonal (a high value corresponds to a dark-coloured cell), the more accurate the algorithm for the corresponding class. From this table, we first conclude that ∼ 98% of stars in the test set are well characterised by the algorithm.
Then, considering specifically the misclassified stars, we show that most classification errors concern classical pulsators. In particular, δ-Scutis are classified as γ-Doradus, which can be explained by the fact that there are not enough δ-Scuti stars in the sample for the algorithm to learn how to recognise this type of star, and also because the γ-Doradus training sample contains some γ-Doradus and δ-Scuti hybrid stars. This misclassification problem should be solved with real TESS data as the training of the algorithm will be made on a larger set of stars that belong to each of the categories, including new hybrid categories, which will allow us to separate similar classes such as δ-Scuti and γ-Dor well. From studying the spectra, we note that most misclassified solar-like and roAp stars show nearly flat power spectra, except at very low frequency. It is already known from Bugnet et al. (2018a) that FliPer, and thus FliPer Class , is not efficient for this type of noise-dominated spectrum because it compares the global amount of power in the power spectrum with the power at high frequency (representing the photon noise).

Classification based on 27-day segments of real
Kepler data To estimate the accuracy of the method on real data, we trained (tested) the algorithm on 80% (20%) of the global set of Kepler data. As the number of some types of classical pulsators (such as sdBV, β-Cep, RoAp, and Cepheid stars) observed by the Kepler main mission is very small (see Tab. 1), it is too ambitious to train and test the algorithm to recognise all different types of stellar pulsators. To avoid misclassification due to the lack of stars in the Kepler catalogue, we chose to group several pulsators into categories dependent upon their position in the Hertzsprung-Russell diagram: δ-Scuti, RoAp, and sdBV stars have a low luminosity (10L < L < 100L ). -β-Cep and SPB stars have a high luminosity (100L < L < 100, 000L ) and high effective temperatures (4<log 10 (T eff )<4.5). -Cepheids and RRLyrae have a high luminosity (30L < L < 100, 000L ) and low effective temperature (3.6<log 10 (T eff )<3.9).
We then considered the five different classes reported in Tab. 3, which represents the confusion matrix for stars in the Kepler test set. As in Tab. 2, values in each row represent for a given class the number of stars classified in each output class by the FliPer Class . The accuracy of the classification of the Kepler test sample is approximately 99%. We point out that all solar-like stars are correctly classified by the algorithm (which we recall was our main goal). Most misclassifications concern classical pulsators, with a low corresponding number of stars in the training set, which means that the training was probably more difficult for these types of stars. This problem should be solved by training the algorithm on a much larger set of TESS observations.

Effect of the effective temperature for the classification
A distribution of input parameter importance very similar to that shown in Fig. 3 was obtained when we trained on the 0 Kepler sample. The effective temperature thus seems to play a much larger role in the classification process than the different FliPer parameters. We decided to show classification results when the effective temperature was removed from the input parameters in order to explain to which extent the effective temperature is needed for the classification.
When the classifier is tested and trained on Kepler data and the effective temperature is not used as an input parameter, the classification only depends on the FliPer parameters. With this configuration, we obtain a 96% accuracy on the classification Table 3. Confusion matrix of the Kepler data test sample when F p,i (i ∈ [0.7, 7, 20, 50] µHz) and T eff are used as input parameters for each star. Values represent the number of stars, and italic numbers in parentheses represent the percentage accuracy for the class. The colour code is normalised for each row by the total number of stars in each true class. Numbers that do not belong to the diagonal represent classification errors by FliPer Class .  of the test set. Solar-like stars are still very well classified, and most errors concern the SPB/β-Cephei class (see Tab. 4). Indeed Fig 1 shows that F p,i (i ∈ [0.7, 7, 20, 50] µHz) values for SPB/β-Cephei are quite similar to those of solar-like stars and of δ-Scuti, RoAp, and sdBV.
With this study, we point out that the FliPer parameters alone as input to the algorithm are enough to recognise all solar-like stars. However, adding physical parameters (such as T eff ) to the classifier allows FliPer Class to perform well for all pulsators, and also to avoid false detection of solar-like stars, as shown in Tab. 3.

Taking uncertainties on input parameters into account
In order to test the robustness of the classifier regarding uncertainties on the input parameters, we tested the algorithm on the Kepler dataset with modified input parameter values. Uncertainties on FliPer arise from the photon noise in the spectra (following a chi-squared distribution with two degrees of freedom). Hence, the uncertainty on the FliPer (see Bugnet et al. 2018a, for more details) parameters can be explicitly written as We used the central limit theorem and re-binned the spectrum by a factor of n = 50. The total amount of power in the spectrum is where P n,j follows a quasi-normal distribution with 2n degrees of freedom. It assumes that the signal does not change dramatically over this range of 50 bins, which is a strong assumption for classical pulsators. This leads to a global uncertainty on FliPer values of The effective temperature values for the Kepler set are taken directly from the  catalogue. As long as no spectroscopic follow-up surveys are available, only the effective temperature coming from the TIC will be available for most TESS data. Large uncertainties are expected because on average, δT eff ∼ 170 K, according to the first sector data. To be representative of future TESS data, we decided to use δT eff = 170 K instead of the uncertainties from the  catalogue for the whole Kepler test sample.
We then included the effect of these errors on the different parameters during the testing of the algorithm. We performed a Monte Carlo simulation by generating for each star in our test sample 100 artificial parameter sets from their corresponding normal distributions. We computed for each X parameter (F p 0.7 , F p 7 , F p 20 , F p 50 , and T eff ) 100 new values X 0≤i≤100 following with G 0≤i≤100 being 100 random values following the standard distribution. Each new group of X i parameters describes a new star to test the algorithm.
We continued to train the algorithm using the original Kepler training set. The new test set now contained a hundred times more stars than the original test set in order to include the effect of uncertainties on the input parameters. We are able to classify these new stars with a 99% accuracy. We thus conclude that there is no effect of uncertainties of the chosen input parameters for the classification of stars because the classes are well separated in the log(T eff ) versus log(F p,0.7 ) (see Fig. 2). In particular, large uncertainties on T eff that are representative of future TIC effective temperatures do not perturb the pulsator classification.

Conclusion
The study on Kepler data confirms the results obtained by using TESS-simulated data. As expected, FliPer Class is a great method to recognise solar-like stars based on the shape of their PSD. Using F p,i (i ∈ [0.7,7,20,50] µHz) along with T eff as input parameters in a random forest algorithm, we classified more than 98% of TESS-simulated and almost all the Kepler solar-like pulsators within the test set amongst other pulsators. We plan to improve the F p,i calculation (especially for stars observed by TESS with a low signal-to-noise ratio) by empirically calibrating the photon noise as a function of the TESS magnitude of the star (similar to the study by Jenkins et al. 2010, for the Kepler data) instead of measuring the power at high frequencies, which can be biased by astrophysical signal. By comparing the results on noisy data with previous results obtained using clean simulated data (Bugnet et al. 2018b), we note that the performance of FliPer Class is only slightly diminished by photometric noise. This is also auspicious for the applicability of the method to real TESS data. This study will help the massive seismic analysis of TESS solar-like stars with global seismic pipelines by providing a list of stars that are predicted to be solar-like stars.
FliPer Class gives a high weight to seismology through the use of the F p,i parameters. We chose not to incorporate any Gaia parameters in the FliPer Class to remain as general as possible. For example, for faint stars such as the Kepler RGs at the deep end of the Milky Way (Mathur et al. 2016a) or for polluted systems, Gaia luminosities could have large uncertainties or might even be biased. Hence, seismic parameters coupled to effective temperature could be a better choice, as shown by Huber et al. (2017). Therefore, the FliPer Class as defined here could be complemented by any additional precise astrometric, photometric, or spectroscopic parameters, which could then be applied to any observations from Kepler, K2, or TESS missions.
FliPer parameters are integrated as features in the TASOC/T'DA random forest classifier that will be used to automatically classify all TESS targets. This enlarged random forest is itself part of a larger classifier that includes convolutional neural networks (Hon et al. 2018), clustering, etc. The pipeline (Tkatchenko et al., in prep) is currently being built to be efficient in classifying all types of pulsators, and should demonstrate a high level of performance even for stars with complicated pulsation patterns.