Deciphering stellar chorus: apollinaire, a Python 3 module for Bayesian peakbagging in helio- and asteroseismology

Since the asteroseismic revolution, availability of efficient and reliable methods to extract stellar-oscillation mode parameters has been one of the keystone of modern stellar physics. In the helio- and asteroseismology fields, these methods are usually referred as peakbagging. We introduce in this paper the apollinaire module, a new Python 3 open-source Markov Chains Monte Carlo (MCMC) framework dedicated to peakbagging. The theoretical framework necessary to understand MCMC peakbagging methods for disk-integrated helio- and asteroseismic observations are extensively described. In particular, we present the models that are used to estimate the posterior probability function in a peakbagging framework. A description of the apollinaire module is then provided. We explain how the module enables stellar background, p-mode global pattern and individual-mode parameters extraction. By taking into account instrumental specificities, stellar inclination angle, rotational splittings, and asymmetries, the module allows fitting a large variety of p-mode models suited for solar as well as stellar data analysis with different instruments. After having been validated through a Monte Carlo fitting trial on synthetic data, the module is benchmarked by comparing its outputs with results obtained with other peakbagging codes. An analysis of the PSD of 89 one-year subseries of GOLF observations is performed. Six stars are also selected from the Kepler LEGACY sample in order to demonstrate the code abilities on asteroseismic data. The parameters we extract with apollinaire are in good agreement with those presented in the literature and demonstrate the precision and reliability of the module.


Introduction
Models implemented in stellar evolution codes require strong constraints in order to accurately infer stellar ages (Appourchaux et al. 2012;Davies et al. 2016;Silva Aguirre et al. 2017;Tayar et al. 2020). When available, stellar seismic parameters such as pressure-mode (p-mode) frequencies provide excellent constraints on the inputs required for stellar modelling. Reliable tools to infer those parameters from observations are therefore necessary.
The theory behind p modes has been extensively described (e.g. Unno et al. 1989;Christensen-Dalsgaard 2008, and references therein) while solar-like oscillations have been abundantly observed in the Sun with helioseismic facilities such as the Global Oscillations at Low Frequency instrument (GOLF, Gabriel et al. 1995), the Variability of solar IRradiance and Gravity Oscillations instrument (VIRGO, Fröhlich et al. 1995), the Solar Oscillations Investigation's Michelson Doppler Imager instrument (SOI/MDI, Scherrer et al. 1995), the Helioseismic Magnetic Imager instrument (HMI, Scherrer et al. 2012), the Global Oscillations Network Group (GONG, Harvey et al. 1996), the Birmingham Solar Oscillations Network (BiSON, Chaplin et al. 1996), and the solar counterpart of the Stellar Observations Network Group (Solar-SONG, Pallé et al. 2013;Fredslund Andersen et al. 2019;Breton et al. 2022), as well in main sequence solar-like stars (e.g. Appourchaux et al. 2014;, subgiants (e.g. , and red giants, from the red giant branch to the clump (e.g. Beck et al. 2011;Mosser et al. 2011;Bedding et al. 2011). Following space missions such as the Microvariability and Oscillations of STars mission (MOST, Matthews et al. 2000), the Convection, Rotation, and planetary Transit satellite (CoRoT, Auvergne et al. 2009) and especially Kepler/K2 (Borucki et al. 2011;Howell et al. 2014) 1 , the golden years of solar-like-star asteroseismology are not over yet, with the Transiting Exoplanet Survey Satellite (TESS, Ricker et al. 2015) currently taking place and the launch of the PLAnetary Transits and Oscillations of stars mission (PLATO, Rauer et al. 2014) on the horizon in 2026. Ground-based stellar observations provided by networks such as SONG (Grundahl et al. 2007(Grundahl et al. , 2017 also provide asteroseismic data that will benefit from our analysis tools. Helio-and asteroseismic parameter fitting (usually referred to as peakbagging) is a topic that has been extensively discussed in the literature over recent decades. The parameter inference may follow a frequentist approach through the use of maximumlikelihood estimator (MLE; see e.g. Toutain & Appourchaux 1994;Appourchaux et al. 1998) methods or a Bayesian approach, with for example implementation of maximum a posteriori algorithms (MAP; see e.g. Gaulme et al. 2009), Monte Carlo Markov Chains methods (MCMC, see e.g. Benomar et al. 2009;Gruberbauer et al. 2009;Handberg & Campante 2011;Deheuvels et al. 2015;Davies et al. 2016;Nielsen et al. 2021), or nested Monte Carlo (see e.g. Corsaro & De Ridder 2014;Corsaro et al. 2020).
The apollinaire 2 package is designed to provide a readyto-use, consistent, and flexible open-source MCMC peakbagging framework for solar and stellar time series. It has already been used in several publications (Hill et al. 2021;Breton et al. 2022;Huber et al. 2022;Mathur et al. 2022;Smith et al. 2022). The code is fully written in Python 3. Available solar time series being unequaled in quality and length, their specificities were fully taken into account during the development phase of the module. The difficulties of stellar peakbagging were also exhaustively considered in order to provide an automated process for p-mode parameter extraction. Therefore, apollinaire is able to perform fits globally, by order ( = {0, 1, 2, 3}), by pair ( = {0, 2} or {1, 3}), or for a single mode, and to extract parameters such as mode splittings, stellar inclination angles, and the choice of symmetric or asymmetric Lorentzian profiles (see e.g. Nigam & Kosovichev 1998;Korzennik 2005). Similar automated frameworks with the Interactive Data Language (IDL) or Python interface have already been developed in the past few years (see e.g. Fast and AutoMated pEak bagging with DIAMONDS (FAMED) from Corsaro et al. 2020, or PBJam from Nielsen et al. 2021. With the upcoming PLATO mission at the end of the decade, the existence of a wide diversity of peakbagging open source modules will be a considerable asset for the asteroseismic community, as it will allow non-expert peakbaggers to easily access ready-to-use frameworks for p-mode parameters extraction.
The layout of this paper is as follows. Section 2 presents the principles of parameter fitting in a Bayesian framework and extensively describes the set of models that are implemented in apollinaire. Section 3 provides a detailed presentation of how these models are used in the different steps of the apollinaire framework. Section 4 presents an extended benchmark of the module, with comparison of published results from helioseimic and asteroseismic data. Usual fitting strategies are discussed in Sect. 5 while conclusions and perspectives for improvement of the current apollinaire releases are provided in Sect. 6.

Model spectrum
In this paper, we exclusively focus on peakbagging methods for full-disk-integrated time series. The single-sided power spectral density (PSD) is taken as the squared modulus of the Fourier transform of a given time series with the following calibration (e.g. Press et al. 1992;García 2015) verifying the Parseval theorem: where ν N is the Nyquist frequency of the spectrum and σ is the rms value of the temporal signal. The goal of the peakbagging process is to extract stellar background parameters and global and individual oscillation mode parameters from the PSD. After briefly describing the principle of mode fitting in a Bayesian framework, we present and extensively describe the background and p-mode models implemented in apollinaire in the following subsections.

Statistics
The PSD follows a χ 2 distribution with two degrees of freedom (Woodard 1984). The likelihood of an ideal spectrum S parametrised by a set of parameters θ and considered against an observed spectrum S obs at a given set of k frequency bins ν i is: This expression of the likelihood assumes that all frequency bins are independent, an assumption that is theoretically fulfilled only for uninterrupted, evenly sampled observations. However, in the case of high-duty-cycle time series, the independence assumption can be made without introducing any bias in the parameter estimation (Stahn & Gizon 2008;Davies et al. 2016). The cases of significant gaps is discussed in Sect. 2.7.
The goal of a Bayesian approach is to sample the posterior probability p(θ|S obs ) defined as: where p(S obs |θ) is the likelihood L, p(θ) is the prior probability, and p(S obs ) is a normalisation factor. The prior probability forms the core of the Bayesian approach and contains the information we have before confronting the model to the data. In practice, the function that will be sampled is a normalised measure of the posterior distribution, L(S obs , θ)p(θ). The strength of this approach is that it allows the fitter to evaluate the shape of the probability distribution of the model parameters. Moreover, parameter uncertainties can be extracted directly from it, while the MLE approach only provides a lower bound on the uncertainty, obtained through a Hessian inversion (for more details about the Hessian matrix see Toutain & Appourchaux 1994).

Background model
The power distribution in the PSD can be separated between the p-mode contribution and a stellar background (see e.g. Mathur et al. 2010;Kallinger et al. 2014). The global spectrum can then be modelled in the following way: with B the background contribution and P the p-mode contribution. The background is dominated at high frequency by a photon noise term P n . At low frequency, the effects of stellar activity and surface rotation are visible together with long-period instrumental variations (see e.g. García 2015). In the presence of stellar activity alone, these low-frequency regions can be modelled through a power law (Mathur et al. 2010). However, it is difficult to define a general functional profile for both rotational modulations and instrumental effects, and these contributions are not taken into account in apollinaire. Therefore, when power excesses due to such effects are identified in the PSD, the minimal frequency chosen for the background analysis should be set sufficiently high to prevent their contribution from biasing the fitted profile. In the case of Kepler, the typical period of the instrumental effects is ∼40-45 days (see e.g. Santos et al. 2019;Breton et al. 2021) which corresponds to frequency regions below ∼0.3 µHz. The location of the rotational modulations in the PSD depends of course on the surface rotation period of the considered stars, but also on the power distribution between the different harmonics of the signal. The fastest main sequence solar-like stars with detected surface rotation still exhibit a power contribution at a few tens of µHz. Nevertheless, most of the stars are rotating slowly enough for their rotational power contribution to only influence the shape of the PSD below a few µHz. Concerning the Kepler nominal survey, stars exhibiting photometric rotational modulations can be identified with the Santos et al. (2019Santos et al. ( , 2021 reference catalogues. Surface convection is the main process shaping the profile of the remaining components of the background. Harvey (1985) suggested that these trends could be described through empirical laws (referred to as Harvey models or sometimes super-Lorentzians, a nomenclature discussion on the term is provided by Kallinger et al. 2014) of the following form: where A is the reduced amplitude component, ν c the characteristic frequency, and γ is an exponent that can be linked to the amount of memory in the physical process described by the Harvey model (see e.g. García & Ballot 2019).
There is no clear consensus in the community on the best number of Harvey models to consider in order to optimally fit the background. Mathur et al. (2010) combined one Harvey model and a power law in their model, while Kallinger et al. (2014) decided to fit the data with two Harvey models after using a Bayesian framework to compare six possible models.
For the sake of generality, the considered limit background is then taken as the sum of k Harvey models, a power law, and a white noise term: where a and b are the power-law parameters. In the cases of long cadence observations from Kepler or TESS, it can be necessary to include a damping factor η to take into account the power loss in the components of the signals close to the Nyquist frequency (Chaplin et al. 2011;Kallinger et al. 2014): The photon noise is not affected by this effect and Eq. 6 therefore becomes

The Lorentzian model
In asteroseismology, p-mode profiles are usually described with symmetric Lorentzian profiles. However, previous helioseismic observations provide evidence that the p-mode spectral profile was actually asymmetric (Duvall et al. 1993;Toutain et al. 1998). Two solutions were suggested to model p-mode asymmetric Lorentzian profiles. The first one was proposed by Nigam & Kosovichev (1998): while the second possibility was put forward by Korzennik (2005): In the two previous equations, H is the height of the Lorentzian, α is the asymmetry parameter, and x is the reduced frequency, defined as where ν is the frequency, ν 0 the Lorentzian central frequency, and Γ the Lorentzian full width at half maximum (FWHM). If α = 0, the modelled profile is a standard symmetric Lorentzian. It is finally important to note that H and Γ can be related to the mode amplitude A through (Fletcher et al. 2006;Chaplin et al. 2008;:

Low resolution and sinc model
In the PSD, the ideal mode profile is in reality convolved by the Fourier transform of the observational window. In case of continuous observations of length T obs , this Fourier transform has the shape of a sinc function. There is no significant bias in the observed height H (or amplitude A) and width Γ of the mode in the case T obs 1/Γ. However, as illustrated in Fig. 1, when we represent the result of the convolution of the ideal signal by the Fourier transform of the observational window for different ΓT obs values, the effect of the convolution appears as soon as ΓT obs is of the order of several times unity, and the sinc function profile dominates the Lorentzian profile when ΓT obs < 1. Therefore, if the considered modes have a long lifetime with regards to the duration of the observation, the Lorentzian nature of their profile may not appear clearly because of insufficient resolution (frequency bins) to properly characterise the profile. In this case, it is more adequate to model the p-mode peaks with squared sinc functions instead of Lorentzians:

Mode description
Under the effect of slow rotation (Ledoux 1951), an acoustic mode M of order n and given degree is described as a multiplet of 2 + 1 components (modelled with symmetric Lorentzian profiles, asymmetric Lorentzian profiles, or sinc profiles as explained above) according to the following equation: M n, (ν) = m=− L(ν, ν n, + ms n, , Γ n, , r ,m H n, , α) , with s n, being the mode splittings, and r ,m the m-height ratio (with m= m=− r ,m = 1). The r ,m term is, in principle, purely geometric and depends on the stellar inclination angle i (Dziembowski 1977; Toutain & Gouttebroze 1993;Gizon & Solanki 2003;Ballot et al. 2006). Typical r ,m values considered for different instruments are given in Table 3.

Mode visibilities
It is often useful to be able to define the mode-height ratio between close-frequency modes. This can be achieved by computing the mode visibility, V , which depends on the stellar limb darkening and links the mode surface amplitude to the diskintegrated mode amplitude.
where P is the th-order Legendre polynomial and w is a weighting function. Assuming energy equipartition at close frequencies, the height ratio can be computed as the visibility ratio V /V 0 . For some instruments, the dependence is more complex as the instrument response does not depend only on µ. Hence, the relations given by Eq. 15 cannot be directly used (for an example with GOLF, see García et al. 1999;Salabert et al. 2011b). The values for V /V 0 used in apollinaire are given in Table 2.

Modelling the modes for time-series with observational gaps
The apollinaire package is designed to deal with time series with large temporal gaps and was used for this purpose in Breton et al. (2022). The effect of the presence of gaps in the observations is the convolution of the Fourier transform of the ideal time series by the Fourier transform of the observational window. One of the consequences of this convolution is that the hypothesis on the frequency-bin independence is no longer valid (e.g. Gabriel 1994). However, the form of the likelihood that takes this effect into account is computationally challenging and not well suited for an optimised implementation.
The historically considered solution is to ignore the independence loss  and to consider the likelihood given in Eq. 2. To go further, the model can be corrected to take into account the side lobes generated by the window convolution in the PSD (e.g. Salabert et al. 2002Salabert et al. , 2004. We use such an approach in apollinaire to modify the model when fitting PSD from time series with low duty cycle. In order to approximate the power redistribution of one peak in the model, we define the observation window f of the time series. The value of f is 1 at the time-stamps where data were acquired, and 0 otherwise. The Fourier transformf of f is then computed. The frequency shift ν i (relative to the zero-frequency peak of |f | 2 ) and the amplitudes a i of the k peaks above a given threshold in |f | 2 are then stored. The amplitudes are normalised to verify k i=0 a i = 1 .
The mode description given in Eq. 14 is then replaced by: L(ν, ν n, + ms n, + ν i , Γ n, , a i r ,m H n, , α) . (17) The side lobe power redistribution of the mode is illustrated in Fig. 2, where we consider an observational window with a regular observation cycle of 720 minutes with observations followed by 720 minutes without observations. We consider a = 1, 3 pair with i = 90 o and ν s = 0.4 µHz and we represent the model with 100% duty cycle for comparison. The power redistribution of the modes appears clearly when the formula from Eq. 17 is applied to compute the mode profiles, and we see that the modified model -approximating |f | 2 as the sum of k + 1 Dirac functions-is extremely close to the profile we obtain when we actually perform the convolution operation between the PSD and the | f | 2 .

Description of the framework
The apollinaire package is designed to perform MCMC samplings for each step of the peakbagging procedure: background fit, asymptotic parameters fit to determine the global modes pattern, and extraction of individual mode parameters. Those three operations are performed one after another when using the Frequency (µHz) Eq. 14 PSD * |f| 2 Eq. 17 stellar_framework function but can also be performed independently. The global flowchart of an analysis performed with the stellar_framework function is represented in Fig. 3.

Sampling the posterior probability with MCMC
The MCMCs are implemented with the Python package emcee (Foreman-Mackey et al. 2013). The sampling strategy follows Goodman & Weare (2010). The sampler, which can be seen as an improvement of the single site Metropolis scheme (Sokal 1997;Liu 2009), is designed as an affine invariant ensemble of walkers. Walker positions are updated one after another by the algorithm. The proposal for the new position of a given walker is created by taking into account the positions of other walkers. Acceptance or rejection of each move is assessed through the Metropolis-Hastings rule (Metropolis et al. 1953;Hastings 1970). In every part of the framework (background, global mode pattern, individual mode parameters), the user is free to choose the number of walkers and the number of steps to perform in order to sample the distribution, as well as the number of steps discarded as part of the burn-in phase.
The results yielded by a Bayesian approach strongly rely on the choice of prior functions made for the different parameters. At the same time, the prior function reflects the level of knowledge that the Bayesian fitters already possesses concerning the model for which they want to sample the posterior distribution. From this point of view, the choice of the prior functions to use in order to sample the posterior is necessarily somewhat arbitrary. Discussing considerations over the expected mode profile, several possibilities have been suggested in the Bayesian peakbagging literature. Following Benomar et al. (2012), Davies et al. (2016) considered for example a smoothness condition for the mode frequencies while using uniform priors for every other parameters. On the contrary, Lund et al. (2017) considered uniform priors for frequencies and inclination angle while using modified Jeffrey's priors (Handberg & Campante 2011) for amplitudes and widths. Corsaro et al. (2020) adopted uniform priors for all free parameters, justifying this choice by the fast computation that it allowed in their framework. In apollinaire, prior functions are taken to be uniform distributions within two bounds with the exception of the inclination angle, i, the heights H (or amplitudes A), and the widths Γ. It should be underlined that the uniform priors confine the sampled distribution to the compact support defined by the bounds and represent in this sense a strong constraint on the posterior. However, this can be justified by the fact that it is for example reasonable to suppose that, for a given mode, the actual mode frequency cannot be located outside a small frequency window around the mode power excess.
Most of the usual priors defined on a compact support can actually be linked to a uniform prior through a change of variable. For H (or A) and Γ, we therefore consider uniform prior distributions in log H (or log A) and log Γ. This is equivalent to constraining H (or A) and Γ with Jeffrey's priors. This way, in the limit of the fixed boundaries, the prior does not contain information on the parameter scaling. For i, the prior distribution is p(i) = sin i in order to fulfill the a priori isotropic distribution of stellar rotation axes (García & Ballot 2019).
In the stellar_framework function, the automatically generated priors have been set in order to cover a sufficiently wide range to ensure that the sampled distribution is not biased by boundary effects. Additional difficulties that may arise in the specific case of frequencies are discussed in Sect. 3.5.
When a MCMC is sampled, the code automatically extracts summary statistics from it. For a given parameter, it returns the median of the marginalised sampled distribution, y. In order to obtain the uncertainties σ − and σ + over y, 16th and 84th percentiles y 16 and y 84 are also extracted. In the case of a Gaussian distribution, σ − = σ + = σ with σ being the standard deviation of the distribution. Some apollinaire output files (see Appendix D and online documentation for more details) only provide a σ sym over y: In this case, if it is the natural logarithm of the parameters that has been sampled, median, 16th, and 84th percentiles are transformed again before computing and returning σ sym .

Inputs and outputs
The guess and priors for background and global pattern fits are automatically generated by apollinaire. The user can also manually provide guess and priors if needed. The inputs for individual mode-parameter extractions are more complex, and for them, apollinaire uses text files with a specific syntax: the a2z files, which were originally developed as part of the A2Z pipeline (Mathur et al. 2010). The syntax of the file is dedicated to providing a simple and straightforward way to specify the nature, extent, initial value, and bounds for each parameter to fit. These files will be auto-generated by the stellar_framework function but can also be manually created in order to directly use the peakbagging function. They can be read as a pandas 3 DataFrame through the auxiliary function read_a2z.
The chains are stored as Hierarchical Data Format version 5 (hdf5) files. The code provides functions to read these files for the user that would need to perform a more thorough analysis on the chains than the extraction of summary statistics described in Sect. 3.1. Using these files, it is for example straightforward to obtain the marginalised distribution for each parameter or the covariance matrix between two parameters. Corner plots visually summarising the sampled distributions can be saved as pdf or png files if filenames are specified. They represent both the marginalised distribution for each parameter and the covariance distribution for each pair of parameters.
Parameters fitted by the perform_mle_background, perform_mle_pattern, explore_distribution_background, and explore_distribution_pattern functions are returned as numpy arrays. When these functions are called by the stellar_framework functions, the results they provide are stored in text files with dedicated headers. For convenience, the peakbagging function returns both an a2z DataFrame, which can be saved to an a2z file with the auxiliary function save_a2z, and a so-called pkb 4 array, which can be saved with the save_a2z (when the function is called by stellar_framework, this is done automatically). The returned a2z DataFrame is useful for performing new MCMC samplings with modified input values while the pkb array provides a mode-by-mode summary statistics and allows the user to simply reconstruct the best-fit p-mode model computed by apollinaire. More details about a2z files, a2z DataFrame, pkb files, and pkb arrays can be found in Appendix D, and example a2z and pkb files are also provided.

Background fit
Besides the PSD, the only additional inputs needed by apollinaire to automatically compute initial guesses and priors for the background fit are the stellar effective temperature, T eff , the asymptotic large spacing ∆ν, and the frequency at maximum power ν max . If no previous estimations of ∆ν and ν max are available, it is possible to provide the code with the stellar mass, M, and radius, R, in order to compute an estimate of ∆ν and ν max through the scaling laws where ∆ν , M , R , and T eff, are the reference solar values for the asymptotic large spacing, mass, radius, and effective temperature, respectively. These values are set to 135 µHz, 1 M , 1 R , and 5770 K in apollinaire . The limit spectrum fitted on the data is the sum of the background term given by Eq. 6 and a Gaussian p-mode envelope term: with H max being the maximal height of the p-mode Gaussian envelope and W gauss its width or standard deviation.
When only one Harvey model is considered, the initial γ value is fixed to 2; it is fixed to 4 when two Harvey models are considered. This parameter can be fixed (Kallinger et al. 2014) or set to vary. In order to fit more than two Harvey models, the user should manually provide the guesses. Otherwise, input profile guesses are automatically generated. If the spectrum has been acquired with photometric observations, the values provided by Table 2 of Kallinger et al. (2014) are used. If the code has to deal with a PSD obtained from a solar radial-velocity time series, initial guess values have also been implemented using GOLF as a reference.
In order to obtain significant gains in computing time, it is possible to resample the PSD to a version with many fewer data points. The resampling techniques are described in Appendix A. As shown in Sect. 4.3, this method can be reliably used in order to extract mode frequencies but the main caveat is that the background parameters obtained by fitting the resampled PSD should be considered with caution. In particular, the resampling will filter out the power in the p-mode region, and H max , ν max , and W env obtained with this method will be significantly biased.

Limit p-mode spectrum
To fit the p modes, apollinaire divides the observed spectrum by the fitted background profile B in order to get the signal-tonoise-ratio (S/N) spectrum: This way, the background contribution fitted in the previous step is removed. The limit spectrum, S SN , that is then adjusted to S obs,SN is with M n, given by Eq. 14, and b corresponding to an additive factor to locally adjust the background.

Pattern fit
In order to constrain the priors of the main individual mode parameters, ν n, , H n, , and Γ n, , apollinaire performs a global pattern fit on the orders located around ν max . Tassoul (1980) presented the following asymptotic relation for mode frequencies, within the approximation n : where is a phase constant. Slightly modifying the formalism adopted in Lund et al. (2017), this relation can be approximated, in the ν max neighbourhood, to where the small separations δν 0 are given by while α and β 0 are respectively the curvature terms on ∆ν and δν 0 . The parameter n max , which is not an integer, follows the relation Mode heights in the limit spectrum can be approximated through the p-mode envelope parameters H max and W gauss , considering while Γ is taken as a FWHM value common to all modes. Using Eq. 24 and 27, the pattern fit step is designed to approximate the mode pattern around ν max with a given set θ of global parameters: , α, ∆ν, ν max , H max , W gauss , Γ, δν 02 , β 02 , δν 01 , β 01 , and δν 13 (with δν 13 = δν 03 − δν 01 ), β 03 . The last four parameters can be ignored, for example if the star to fit presents = 1 mixed modes: only pairs M n−1,2 , M n,0 will then be fitted (see Appendix B). It is also possible to ignore just the = 3 mode when the S/N of the considered PSD is insufficient. Indeed, in this situation, the δν 13 and β 03 parameters will be difficult to constrain and the sampled distribution will be prior dominated.
Guesses for , α, H max , Γ, δν 02 , δν 01 , and δν 13 can be manually provided, otherwise they will be automatically computed. If a guess is given for a δν 0k , the initial value for the corresponding β 0k will be set to zero. To determine initial automated guesses for parameters, we adopted prescriptions from Corsaro et al. (2012) for stars with ∆ν < 14 µHz and ν max < 450 µHz. For main sequence stars, we derived well performing initial values from the results obtained by . Proper guesses for stars with 450 µHz < ν max < 1000 µHz are not implemented in the current version of apollinaire but a recipe for how to proceed with these stars is given in Appendix B.
The pattern fit step will fit θ by considering the k orders closest to ν max . By default, the stellar_framework function uses k = 3. The bounds of the fitting window are set 0.2 ∆ν below and above the smallest and largest mode frequency included in the pattern. In the standard procedure, the MCMC exploration is directly performed from the initial rough guess computed by apollinaire. It is possible to use a fast MLE run to quickly refine this initial guess and to use the values yielded by the MLE as a starting point for the MCMC sampling. This can save some computing time by reducing the number of discarded steps at the beginning of the MCMC exploration, but may also reduce the opportunities of the walkers to explore different regions of the distribution to sample in the case of multimodal distribution.
As underlined by Lund et al. (2017), the discrepancies between the actual position of the modes and the frequencies yielded by the relation given by Eq. 24 can be physically explained by acoustic glitches (e.g. Mazumdar et al. 2014;Houdayer et al. 2021). Moreover, when optimised only on the central orders, the formula given by Eq. 24 should be extrapolated with caution for modes located far from ν max . Indeed, in this case, there can be significant discrepancies between the mode frequencies predicted by the formula and the actual position of the mode in the PSD. This can be straightforwardly checked using an échelle diagram and is illustrated in particular in Sect. 4.3. It is of course possible to include more orders when sampling the distribution of pattern parameters, but this will require more computing time; on one hand because the likelihood will be computed on more data points, and on the other because the chain will take more steps to converge. In this situation, the results obtained with the summary statistics alone should also be considered with caution as the inclusion of low-S/N modes can strongly influence the multi-modality of the sampled distribution.
An échelle diagram may also be particularly useful in order to check that the code correctly identified the = 0 and the = 1 mode. When this is not the case, the failure in the sampling is generally related to an improper prior for . In this situation, the solution is to manually impose new initial values and prior bounds for .

Selecting orders to fit
The result of the pattern fit is used to obtain guess values for ν n, , H n, , and Γ n, for each mode. These values are provided inside an a2z DataFrame. There are two ways to determine the modes for which the function will provide guess values. With the first method, the function will simply build the a2z DataFrame for a given number of orders symmetrically distributed around ν max . The second method performs a H0 screening (e.g. Appourchaux et al. 2009;Broomhall et al. 2010;Appourchaux et al. 2012;Davies et al. 2016;) on the p-mode region to assess the strong ( = 0 or 1) mode detectability. Following the procedure presented in Davies et al. (2016), the PSD is rebinned over t bins and a χ 2 2t statistics is considered. For each mode, the rebinning is performed considering an odd number of bins, the central bin being the one with the frequency closest to the mode frequency estimated with the fitted global parameters. The adopted threshold for the rejection of the null hypothesis H0 is p = 0.001. The maximal value t max considered for the rebinning is 99, or the t value corresponding to a bin width of 5 µHz, respectively. If the H0 hypothesis is rejected for more than one-third of the considered rebinning, we consider the mode as detectable. A guess for this mode will be added in the a2z DataFrame along with the guess for the closest = {2, 3} mode. It should be stated that it is also possible to manually provide a a2z DataFrame that will override the guess automatically generated by the function. In what follows, we refer to the M n−1,2 , M n,0 , M n−1,3 , and M n,1 group of modes as a peakbagging order to avoid any confusion. Mode parameters can be fitted globally, by peakbagging order, or by pair. In the latter case, modes M n−1,2 , M n,0 or M n−1,3 , and M n,1 are then fitted together, respectively. The reader should note that the pair-fitting strategy is also suited to fitting any individual mode if the parameters of only one mode of a given pair is specified in the a2z input. The model used to compute the likelihood and posterior probability follows Eq. 22. The initial value for the local S/N background term b is taken as 1 and set to vary between 10 −6 and 5.
The minimal set of parameters that will be fitted by the code are mode frequencies, ν n, , heights, H n, (or amplitudes, A n, ), and FWHM, Γ n, . Additional parameters that can be fitted are splittings, s n, , inclination angle, i, and asymmetries, α n, . It is possible to fit the projected splittings, s = s sin i (Ballot et al. 2006(Ballot et al. , 2008, instead of s. Except for mode frequencies, every input parameter can be set to parametrise all modes, a whole order, a pair, or only one mode. In other words, it is possible to impose each mode of a given peakbagging order to have, for example, the same FWHM.
If the user does not provide a fitting window size, the data to fit are restricted to an adaptive fitting window: only PSD elements within [ν − , ν + ] are considered, with: where ν low and ν up are respectively the minimal and maximal guess frequencies of the modes to fit. The value of d is then determined by the considered method: if the fit is made by order, d = 3, if the fit is made by pair, d = 1.
For modes above 800 µHz and if the fit is made by pair, another possibility offered by the code is to use the ∆ν value to constrain the fitting window. The 800 µHz value has been chosen to ensure that the considered mode does not exhibit avoided crossings, which might add additional difficulties due to the po-tential presence of mixed modes in the fitting window. In this case, the window bounds are: where ν center is the centre of the window given by the mean of the guess frequencies of the two modes to fit (or simply the guess frequency of the mode to fit if only parameters for one mode are specified), and the reference value ∆ν = 135 µHz. The possible values of w w are summarised in Table 1. As underlined in section 2, the mode visibilities and mheight ratios have instrumental dependencies. There are two ways to deal with mode visibilities: on one hand, the user can choose to fit amplitude parameters individually for each mode, and on the other, it is possible to specify the amplitude ratios V /V 0 in the a2z input file. The usual ratios (Salabert et al. 2011a) are provided in Table 2.
The m-height ratios to use are selected through the instrument argument of the peakbagging function. The implemented ratios for each instrument are given in Table 3.
It should also be noted that for H and Γ, it is the natural logarithms of the parameters that are expected to have a Gaussian distribution (Toutain & Appourchaux 1994;García & Ballot 2019). For parameters of these types, the peakbagging function thus samples the natural logarithm. However, it should be remembered that it is straightforward to obtain the sampled distribution of the parameter itself by a simple transformation of the MCMC elements.

Dealing with = 4 and = 5 leaks in solar spectra
Due to the very high S/N of radial-velocity helioseismic data, some = 4 and = 5 modes (referred as intermediate-degree modes) may arise above noise. It has been shown that omitting their contribution to the power distribution of the spectrum can introduce bias in the fitted frequencies of up to three times the uncertainty (Jiménez-Reyes et al. 2008). If guesses for intermediate-degree modes are specified in the input a2z file (e.g. using the theoretical frequency value for those modes), the code will check for their presence inside the defined window.
If, when reading the a2z DataFrame, it appears that an intermediate-degree mode is present in the fitting window, its frequency will be added to the parameters to fit. Heights and FWHMs of intermediate-degree mode are computed considering the ratio presented in Table 2 and using the closest = 0 mode as a reference. If no = 0 is fitted at this time, the closest = 1 is used instead. Power is distributed between the m-components of the mode following Table 3. The splittings are set to 400 nHz and do not vary. No asymmetry is considered for these modes. The diagram shown in Fig. 4 summarises the possible options described in Sect. 3.6, 3.7 and 3.8.

Quality assurance
Several frequentist and Bayesian metrics for peakbagging quality assurance have been suggested over the years (see e.g. Appourchaux et al. 2012; Davies et al. 2016;). The quality assurance metric implemented in apollinaire takes inspiration from the Bayesian machinery described in Davies et al. (2016) but avoids resampling a MCMC to obtain the probability of the different models to compare, which saves a large amount of computing time.
The apollinaire package implements a Bayesian quality assurance computing tool where three possibilities are assumed for each fitted pair of modes (odd or even): (1) The strong ( = 0 or 1) and the weak ( = 2 or 3) modes are both detected (model M sw with associated probability p sw ); (2) only the strong mode is detected (model M s with associated probability p s ); or (3) neither of the two modes is detected (model M 0 with associated probability p 0 ). We have assumed that a weak mode could not be detected if the strong mode was not also detected. Assuming that a sufficient number of initial steps has been discarded, the MCMC contains n sets of parameters θ. A subset of k (k ≤ n) elements still representative of the MCMC distribution can be selected by thinning the chain. Indeed, this subgroup of parameters should follow the same distribution as the full MCMC if enough elements are considered. For each of those k sets of parameters θ k , likelihoods p(D|M sw , θ k ), p(D|M s , θ k ), p(D|M 0 , θ k ) corresponding to the three hypothesises are then compared (considering the data D within the same frequency window that was used for the actual fit). For the M 0 model, as the frequency win-dow is narrow enough, we consider a flat background which is computed as the median of the power distribution of the frequency bins inside the window.
The estimated probability p α (marginalised over the parameter distribution) that a model M α is the most likely considering the data is then computed as follows: where given an ensemble E, #E denotes its cardinal. The indexes α, β, and γ in Eq. 30 should be properly replaced by sw, s, and 0 depending on the considered case. We use Eq. 30 to estimate the probability p α as the fraction of explored parameter sets θ k for which the model α is the most likely among the three models. If we consider the specific case where n = k, Eq. 30 gives the exact proportion of cases in the sampled distribution where the M α model is the most likely. The probability p α is therefore the likelihood of M α for this distribution. The thinning step to reduce to k samples allows a significant gain in computing time to obtain p α while conserving the sampled distribution properties. p 0 is the probability of the null hypothesis H0 for the detection of the strong mode while p s + p 0 is the H0 probability for the weak mode. For the considered pair, the natural logarithm of the Bayes factor K for the detection of a mode of degree is then given by: Here we reiterate the interpretation of the ln K value for the evidence against H0, as outlined by Kass & Raftery (1995) : favours H0 0 to 1 not worth more than a bare mention 1 to 3 positive 3 to 5 strong > 5 very strong.
It is easy to see that, if the model p sw , for example, is favoured in any case, this will correspond to p sw = 1 and therefore ln K > 5. If we have p sw = 2/3 and p 0 = 1/3, we will have ln K ≈ 0.69, and the M sw model is only barely favoured compared to the H0 hypothesis. Modes with ln K < 1 should be cautiously considered when exploiting the peakbagging results for modelling purposes.

Benchmark with Monte Carlo synthetic spectra and published peakbagging results
In this section, in order to assess the reliability of the code, we present the results of a Monte Carlo benchmark with synthetic spectra. We then compare apollinaire results with results obtained with other peakbagging codes. As the method has been designed to perform helioseismic and asteroseismic analysis, we decided to perform our benchmark with both solar and stellar data 5 .

Monte Carlo trial
Considering three different sets of parameters (see Table 4), we generate a limit spectrum for three pairs of modes. This limit spectrum is then multiplied by a noise vector with a power distribution following a χ 2 with two degrees of freedom. The pairs are generated for two frequency vectors: the first with one-year resolution and the second with four-year resolution. An example of such a synthetic pair is shown in Fig. 5. Each pair is then fitted using the peakbagging function. For each pair and each resolution, we repeat this process 200 times. For each considered pair and resolution, Tables 5 and 6 summarise the percentage of times for which the true value, considering the uncertainties, is outside the 68% and 99.7% credible intervals, defined by the 1σ and 3σ departure from the fitted value, respectively, with the fitted value taken as the median of the sampled distribution. The percentage of fits for which we find the true value to be in the 68% and 99.7% credible intervals is therefore close to expectations, the standard deviation for the frequency of success being 3.3 % in the case of 200 draws following a binomial law of parameters (n draw = 200, p = 0.68), and 0.4 % for (n draw = 200, p = 0.997), with n draw being the number of draws. When we consider the 2400 fitted frequencies at once, we find the uncertainties obtained with apollinaire to be conservative in this experiment. Indeed, 71.5% of the fitted frequencies are in the 68% credible interval. With a standard deviation of 0.95% for 2400 independent experiments, this is approximately four standard deviations away from the 68% expected value. We note that the uncertainties obtained for the 1460-day pairs are significantly smaller than for the 365-day pairs. An example comparison between the true value and apollinaire fitted value is shown in Figs. 6 and 7, where the frequency error ν fitted − ν true between the fitted value and the true value is represented for the = 1 mode of pair 2. The spread reduction of the error distribution appears clearly in the histograms. Figure 7 also shows the (ν fitted − ν true )/σ distribution where we specify the mean value (ν fitted − ν true )/σ . As expected, the standard deviation for the (ν fitted −ν true )/σ distribution is close to 1 in both cases. We find no systematic bias in the fitted parameters, except for the splittings in pair 3 which are systematically underestimated due to the large mode width.

GOLF solar PSD analysis
To study the p-mode frequency shifts induced by the magnetic solar cycle, Salabert et al. (2015) performed an analysis of 69 one-year subseries of the GOLF instrument, spanning from 1996 April 11 to 2014 March 5, with 91.25 days overlap. The modes were fitted using a MLE method (Salabert et al. 2007). In order to compare the results from this method with apollinaire fits, we use the same MLE code to perform a similar analysis by considering subseries of one-year spanning from 1996 April 11   to 2020 July 6, with a 91.25 days overlap 6 . We remove five subseries that have a duty-cycle of below 90% and therefore consider 89 subseries. For each of these subseries, we fit modes = 0, 1, 2, 3. However, as = 3 have low signal-to-noise ratios in the GOLF data, we compare only the results obtained for mode = 0, 1, 2 in the following. The lowest frequency considered mode is n = 11, = 2 while the highest frequency considered mode is n = 26, = 1. This means that we compare fit results between apollinaire and the MLE method for 4005 modes. The procedure to fit the series with apollinaire is the following. Frequency bins below 50 µHz are not considered to fit the background. The S/N spectrum is then computed by di-viding the PSD by the fitted background model. Modes are fitted by pair. Heights H n, , FWHMs Γ n, and splittings s n, are fitted independently for each mode. As asymmetries are expected to depend on frequency only, one asymmetry value is fitted for each pair. Finally, = {4, 5} power leakages are accounted for during the fit. For the background fit, the MCMC are sampled with 500 walkers iterated over 500 steps, with the 100 first steps discarded as burn-in. For the individual mode fit, the MCMC are sampled using 500 walkers iterated over 1000 steps. The 400 first steps are discarded to take the burn-in phase into account. We consider the 1σ and 3σ intervals relative to the values and corresponding uncertainties σ fitted with apollinaire. Among the 4005 fitted modes, the frequencies fitted by the MLE method, ν MLE , lay outside the 1σ interval for 38 modes (0.9%) and outside the 3σ interval for 10 modes (0.2%). The MLE heights H lay outside the 1σ interval for 464 modes (11.5%) and outside the 3σ interval for 23 modes (0.6%). The MLE FWHMs Γ lay outside the 1σ interval for 185 modes (4.6%) and outside the 3σ interval for 18 modes (0.4%). The relatively high number of H and Γ values laying outside the 1σ interval can be explained by the strong anti-correlation that exists between these two parameters, especially at high frequency. We also identify at least two modes that have not been correctly fitted by the MLE method. Finally, we represent the (ν apn − ν MLE )/ σ 2 apn + σ 2 MLE distribution in Fig. 8 in order to show the comparison of the two methods exhibits no systematic bias.

The Kepler main sequence LEGACY catalogue
In order to extract strong modelling constraints to be used in stellar evolution codes Silva , hereafter L17, performed a thorough peakbagging analysis of the so-called Kepler LEGACY sample, which contains 66 main sequence stars with stochastically excited p modes 7 . We select six targets to analyse within the LEGACY sample, namely KIC 5184732, 6106415, 6225718, 6603624, 12069424, and 12069449, and we compare the apollinaire results to the reference values provided by L17. Some fundamental stellar properties of the se-7 The mode frequency, amplitude and width tables are publicly available on VizieR at: http://cdsarc.u-strasbg.fr/viz-bin/Cat? J/ApJ/835/172 A&A proofs: manuscript no. output lected targets are given in Table 7, which are taken from the Kepler data release 25 (DR25, Mathur et al. 2017). As we want to show how apollinaire behaves when used blindly without any previous knowledge of the seismic values, effective temperatures T eff , masses M, and radii R are used to estimate ν max and ∆ν values from the global seismic scaling laws. We could also have used the ν max and ∆ν values from L17 as input. The PSDs we analyse were obtained from KEPSEISMICcalibrated 8 light curves (García et al. 2011). The background profile is fitted considering two Harvey models and a flat noise contribution (see Sect. 2.2). We fit the background with the complete PSD, but in order to assess the effect of a background fit performed on a rebinned PSD on the mode frequencies, we also sample the posterior probability of our background model on a PSD resampled following the method described in Appendix A. Frequency bins below 50 µHz are not considered. We use 500 walkers iterated over 5000 steps, with the 100 first iterations discarded as burn-in. We deliberately choose a large number of steps to ensure the convergence of the chains. The global mode pattern is adjusted on the S/N spectrum obtained by dividing the PSD by the background model, according to the strategy described in Sect. 3.5 and considering = {0, 1, 2, 3} of the three orders closest to ν max . We use 500 walkers iterated over 5000 steps and we discard the 250 first drawn points. We visually check that the fitted pattern has a satisfying profile in order to provide correct frequency guesses for the individual modeparameter extraction. In the individual mode parameter fit, for each target, we limit our analysis to the seven orders closest to ν max . All the chosen targets have a p-mode S/N that is sufficient to fit mode parameters farther from ν max , but we make this choice in order to ensure that the frequency estimates yielded by Eq. 24 are correct and to avoid a manual redefinition of the priors for low-and high-order modes before sampling the mode parameters. The MCMCs are sampled using 500 walkers iterated over 5000 steps. The 200 first steps are discarded to take the burn-in phase into account.
For each fitted parameter, L17 specified two uncertainty values (in the same way apollinaire allows computing σ + and σ − ). For sake of simplicity and clarity in the comparison of results, we consider only the larger uncertainty value for each parameter. L17 epsilon values are compared to the results yielded by apollinaire in Fig. 9. The échelle diagrams obtained from the apollinaire fits are provided in Fig. 10 where we also represent the frequencies obtained from Eq. 24 and the corresponding pattern sampling. As already underlined in Sect. 3.5, the échelle diagram clearly shows that, as we constrained the parameters from Eq. 24 on only the three central orders, the predicted value we obtain for modes far from ν max does not follow the actual node ridge in the échelle diagrams. are compared with the values from L17 in Fig. 11. The same type of comparison is performed for mode amplitudes and FWHMs in Fig. 12. The apollinaire fitted values and reference from L17 can be found in Tables 8, D.3, and D.4, along with the quality assurance values ln K computed according to the method presented in Sect. 3.9. The results are globally in agreement for frequencies. Considering the 1σ and 3σ interval relative to the apollinaire value again over the 154 fitted frequencies, the L17 frequency is beyond 1σ for 39 modes and beyond 3σ for three modes. We find larger discrepancies for mode amplitudes and widths: over the 42 fitted widths (one per order), the L17 value is outside the 1σ interval 25 times and outside the 3σ interval 8 times. Over the 42 fitted amplitudes, the L17 value is outside the 1σ interval 31 times and outside the 3σ interval 12 times. We find no systematic bias in the frequency, amplitude, and width comparison. Concerning the mode frequencies, the largest observed discrepancy between L17 and apollinaire is found to be on the KIC 006603624 mode M 18,3 . We have ν 18,3,apn = 2299.52 ± 0.20 µHz and ν 18,3,L17 = 2305.47 ± 0.13 µHz. However, inspecting the KIC 006603624 spectrum and the corresponding échelle diagram, it seems more credible to us that the correct position of this mode is the one yielded by apollinaire.
The discrepancies in the values obtained for fitted parameters can have multiple explanations. While we used KEPSEISMIC data, L17 exploited light curves corrected with the KASOC filter (Handberg & Lund 2014). The data calibration influences the power redistribution in the PSD frequency bins and may therefore be responsible for some of the observed discrepancies between the mode widths and amplitudes, and, to a lesser extent, for frequency discrepancies. However, when comparing peakbagging codes, it is found that the main discrepancies are induced by the differences in the methods used to fit the background (Appourchaux et al. 2014). It should also be underlined S.N. Breton et al.: The apollinaire package Fig. 10. Échelle diagram of the six selected targets apollinaire fitted p-mode frequencies (black diamonds) with corresponding error bars. The blue circles signal the frequency initial guesses inferred from Eq. 24. that there are several differences in the models we use when compared to those used by L17. The first main difference resides in the fact that L17 included the V parameters for which posterior probability distributions are sampled while these parameters were fixed in our model (see Sect. 2.6). The second difference is that, as we fit order by order, the inclination angle i is poorly constrained with this choice of strategy. It is also important to keep in mind that the uncertainty values yielded by a MCMC process are only formal uncertainties related to the variance of the parameters inside a given model confronted to the data. It has already been demonstrated that an inaccurate description (e.g. by neglecting the = 4, 5 contribution or the wing power of the modes outside the fitting window) of the mode profiles could bias the estimation of the real values by several times the uncertainty values (Jiménez-Reyes et al. 2008).
Concerning values, L17 and White et al. (2011) noted that a strong anti-correlation exists between ∆ν and . We also note that for most ν max , ∆ν, and , the apollinaire uncertainty values are much larger than for the L17 values. For KIC 006225718, for example, apollinaire yielded ∆ν = 106.78 ± 0.43 µHz while L17 gave ∆ν = 105.695 ± 0.018 µHz. This can be explained by the fact that, even if our Eq. 24 is really close to Eq. 27 of L17, Article number, page 13 of 25 A&A proofs: manuscript no. output the strategy exploited to sample the parameter distribution is totally different. We use Eq. 24 together with Eq. 27 and a common Γ value in order to compute a mode profile for the three central orders and directly constrain our parameters with the PSD. On the contrary, L17 used their Eq. 27 to constrain the pattern parameters from the individual mode frequencies they had already obtained from the PSD analysis.
Finally, Fig.  13 shows the (ν full − ν resampled )/ σ 2 full + σ 2 resampled distribution, where ν full and σ full are the frequencies and uncertainties obtained by consider-ing the complete PSD for the background fit, and ν resampled and σ resampled are the frequencies and uncertainties obtained after using the background fitted on the resampled PSD. This shows that, for mode frequencies, the discrepancies between the two methods are negligible with respect to the fitted uncertainties, which means that frequencies obtained with the PSD resampling method can be used without risk as input for modelling codes. This resampling strategy is made relevant in the optic of the PLATO mission preparation, where it will be necessary to  Fig. 12. Comparison of p-mode FWHMs and amplitudes yielded by apollinaire (dark blue and red, respectively) and reference values from L17 (light blue and salmon, respectively). The error bars overlapping intervals are represented in medium blue and dark salmon, respectively. perform the peakbagging analysis for tens of thousands of stars in an optimised computing time.

Discussion
The possibilities offered by MCMC sampling methods and the flexibility of the possible strategies allowed by apollinaire merit a brief discussion. Another question that arises when using MCMC is the hyper-parameters value that must be chosen in order to correctly dimension the problem. We decided in this paper to use 500 walkers for all the sampling we perform, which is a typical value used in several peakbagging studies (see e.g. Davies et al. 2016;. Choosing the optimal number of steps to perform when sampling the chain and the proportion of steps that should be burned in at the beginning of the run represents a complex issue that has no universal solution. This can only be done through test and trial, especially when the initial guesses appear to be quite far from the posterior probability optimum. One way to check that the chain is fully converged is to split it in half after having removed the burned in elements and to verify that the shapes of the two subdistributions are identical. It is also possible to estimate the auto-correlation time τ of the A&A proofs: manuscript no. output  chain to assess its convergence. However, obtaining a reliable estimate of τ requires running the sampling for at least 50τ steps, which can prove extremely computationally extensive. Having run a sampling experiment with KIC 6603624 as reference, we find that, when the posterior is data dominated, τ values are typically of a few hundred steps. Parameters for which the posterior is prior dominated typically have longer auto-correlation times.
It is also important to keep in mind that some fitting strategies may have to be preferred considering the target being analysed. For example, for high-resolution solar data, we will favour a fit by pair, with the height and the FWHM of each mode set to vary freely. Mode asymmetries should be fitted for modes above 2450 µHz. In stellar data, especially with low-resolution low-S/N data, modes will be fitted globally or by order, with a constraint on the mode-relative amplitudes. It will only be possible to constrain the stellar inclination angle where good S/N data are available; in this case, it will be necessary to perform a global fit in order to consider the m-height ratio of the largest possible number of modes at the same time. Mode asymmetries are usually not considered when fitting stellar data. The user may keep in mind that when fitting all the modes at once with a global method, the MCMC will converge more slowly because of the increased number of dimensions in the parameter space being explored. For power spectra obtained from short time series, because of the mode stochastic excitation, allowing the mode FWHM and height to vary freely may sometimes yield better results than using the ratios specified in Table 2.

Conclusion
In this paper, we introduced the apollinaire module, an opensource Python 3 package designed for helio-and asteroseismic disk-integrated peakbagging. The module implements a set of functions that are able to extract background, p-mode global pattern, and p-mode individual parameters through MCMC sampling implemented with emcee. The implementation of these functions is designed to provide a flexible framework. They can be used independently or combined depending on the task the user wants to achieve.
We analysed data from the GOLF instrument and the Kepler LEGACY sample in order to compare apollinaire results with values available in the literature. We find good agreement between our results and the reference values. The discrepancies can be explained by differences in data calibrations, fitting strategies, and adopted models.
The code is in active development and the online repository is regularly updated. An online documentation of the package is available and includes several tutorials that should allow any interested user to quickly start working with apollinaire and use the different functions to analyse their own set of data. With the increasing number of TESS targets of asteroseismic interest and the preparation of the PLATO mission, we hope that apollinaire will become a widely used tool in the community.
Acknowledgements. The authors thanks the anonymous referee for constructive comments and precious suggestions that greatly enriched and completed the material of the paper. S.N.B. and R.A.G acknowledge the support from PLATO and GOLF CNES grants. J.B. acknowledges the support from CNES. V.D. acknowledges the support of the IAC (Tenerife, Spain). The authors are deeply grateful to G.R. Davies who originally designed the a2z data structures in collaboration with R.A.G. They also thank L. Bugnet, M. Delorme, A. Jiménez, S. Mathur, and V. Prat for fruitful discussions. A special thank to A. Finley for his precious suggestions concerning the name of the paper. This paper includes data collected by the Kepler mission.  (Ambikasaran et al. 2014). The previous references should be cited together with apollinaire when using the module. Packaged version of apollinaire can be installed through pip and conda-forge (conda-forge community 2015). The source code and most recent versions of apollinaire can be found at gitlab.com/sybreton/ apollinaire while the module documentation is hosted at https:// apollinaire.readthedocs.io/.

Appendix A: PSD resampling for background fit
We included several resampling strategies in apollinaire in order to perform quicker background fits (the computing time of the MCMC process being mainly constrained by the length of the vector on which the posterior probability is computed at each step). We describe here the resampling methods implemented in the module. The user has the choice between two resampling strategies, reboxing and advanced_reboxing.
The reboxing strategy uses the numpy.logspace function to create a new frequency vector with logarithmic frequency spacing and a number of data points specified by the user. The logarithmic spacing is preferred over a linear spacing in order to increase the proportion of low-frequency data-point in the fit. Each bin of the original PSD is then attributed to a box considering the closest frequency bin in the new frequency vector. The median of each box is then considered in order to compute the resampled PSD.
In order to apply a distinct resampling to the p-mode region, the advanced_reboxing extends this approach with a more subtle resampling techniques. In this case, the user specifies a target number N of data points in the resampled PSD outside the ν max region and a number M of data points in the ν max region. Considering consecutive frequency bins ν k and ν k+1 , they are directly put in the resampled frequency vector under the condition We define k 0 as the frequency bin verifying: and three frequency intervals I j = [ν a , ν b ] such as For each frequency interval, the box size parameter S j is (A.9) and the parameter C is given by with n bin the number of frequency bins inside the box.

Appendix B: Subgiant mixed-mode fitting recipe
Although apollinaire was not initially designed to perform the automatic analysis of subgiant stars, it is possible to exploit the global fitting strategy in order to characterise stars with an important number of mixed modes. In this appendix we propose a recipe to analyse such stars. The user who would like to use this recipe should be aware that in its current version, apollinaire provides no method to automatically compute guesses for mixed-mode parameters. Such methods were presented, for example, in Mosser et al. (2015) or Appourchaux (2020). We provide here an example 9 of such a fit performed on KIC 5723165, a subgiant star observed by Kepler in short cadence in Q1, Q5, and continuously from Q7 to Q13 for a total of ∼760 days (e.g. Appourchaux 2020), which has not been peakbagged yet. The apollinaire KIC 5723165 peakbagging summary plot is shown in Fig. B.1.
The sequence of steps to be followed in order to apply the recipe is therefore as follows: 1. We start by fitting the background with the explore_distribution_background function, including the fitting of a Gaussian function to take into account the power hump of the p and mixed modes. 2. The presence of mixed modes can perturb the analysis of the mode pattern and the calculation of ∆ν. Therefore, after dividing the PSD by the background to work in S/N, we remove the region of the odd modes from the PSD prior to fit the universal pattern using only the even modes. To do so, we provide a Jupyter notebook in the documentation with an easy procedure to select the regions of the even modes by directly clicking on the PSD or by providing a list of frequencies determining the boudaries of these bands. Then, a median clipping of the n orders selected by the user around ν max is done. To help performing this manual selection, the user can use ν max obtained from the fitting of the Gaussian when the background is determined. A first estimation of ∆ν can then be obtained using the seismic scaling relations. 3. After median clipping the region of the odd modes, the fitting of the universal pattern can be performed using the masked PSD (see Fig.B.2). In general, if the masking is well done, the standard guesses of the universal pattern provided by apollinaire would be good enough. However, the user can try to change these original guesses and bounds for δν 0,2 , as they are too low for subgiants. A good choice could be ∼4 µHz, and the upper bound should be higher than 6 µHz. The upper bound for the mode width Γ could also be lowered. Indeed, it is important to note that having a small δν 0,2 guess value and a large one for Γ often leads to an inaccurate fit, as the = 0 and = 2 are considered as part of the same mode. 4. After fitting the Universal pattern, the code provides an a2z file with the guesses for the even modes to be fitted by the peak bagging module. A first characterisation of those modes can be performed directly if we are not interested in the odd ones.
5. To fit the odd modes (see Fig. B.1), the a2z guess file needs to be modified manually. The only way to do the fitting is global and not by order as the mixed modes cannot be assigned to any particular order. In the a2z guess file, we suggest to write the radial order n of each mixed mode starting by 100. The order is not important and these values are given only as an indication to the user that the corresponding modes are mixed modes. We recommend that each mixed mode is fitted with an individual frequency, width, and height.  An example of a pkb file is presented in Table D.2. In apollinaire procedures, pkb arrays (numpy arrays with the shape specified in the previous paragraph) are usually created by converting a2z DataFrame with the a2Z_to_pkb function. Those pkb arrays are used to build the models S SN (ν) which are necessary to compute the likelihood and the posterior probability.

Appendix D.3: The extended pkb array
The structure of the extended pkb array is close to the classical pkb array structure, except that it contains 20 columns. Instead of σ sym , two uncertainties values are provided for each parameters: σ − and σ + (see Sect. 3.1).