Indications for very high metallicity and absence of methane for the eccentric exo-Saturn WASP-117b

{We investigate the atmospheric composition of the long period ($P_{\rm orb}=$~10 days), eccentric exo-Saturn WASP-117b. WASP-117b could be in atmospheric temperature and chemistry similar to WASP-107b. In mass and radius WASP-117b is similar to WASP-39b, which allows a comparative study of these planets.} % methods heading (mandatory) { We analyze a near-infrared transmission spectrum of WASP-117b taken with Hubble Space Telescope/WFC3 G141, which was reduced with two independent pipelines to ensure a robust detection of the $3 \sigma$-water spectrum. High resolution measurements were taken with VLT/ESPRESSO in the optical.} % results heading (mandatory) {Using a 1D atmosphere model with isothermal temperature, uniform cloud deck and equilibrium chemistry, the Bayesian evidence of a retrieval analysis of the transmission spectrum indicates a preference for a high atmospheric metallicity ${\rm [Fe/H]}=2.58^{+0.26}_{-0.37}$ and clear skies. The data are also consistent with a lower-metallicity composition ${\rm [Fe/H]}<1.75$ and a cloud deck between $10^{-2.2} - 10^{-5.1}$~bar, but with weaker Bayesian preference. We retrieve a low CH$_4$ abundance of $<10^{-4}$ volume fraction within $1 \sigma$. We cannot constrain the temperature between theoretically imposed limits of 700 and 1000~K. Further observations are needed to confirm quenching of CH$_4$ with $K_{zz}\geq 10^8$~cm$^2$/s. We report indications of Na and K in the VLT/ESPRESSO high resolution spectrum with substantial Bayesian evidence in combination with HST data.}


Introduction
The past years have revealed a large diversity in the atmospheres of transiting extrasolar gas planets colder than 1000 K and smaller than Jupiter as well as a lack in methane in their atmospheres (e.g. Kreidberg 2015;Kreidberg et al. 2018;Wakeford et al. 2017Wakeford et al. , 2018Benneke et al. 2019;Chachan et al. 2019). Whereas the atmospheric chemistry of hot Jupiters can be explained mainly with equilibrium chemistry, disequilibrium chemistry is expected to become more important for these cooler planets via vertical quenching, which results in an under-abundance of CH 4 compared to predictions from equilibrium chemistry (Crossfield 2015). In principle, CH 4 is readily detectable along-side H 2 O in the near to mid-infrared.
While the quenching of CH 4 has been confirmed first in brown dwarfs and later in directly imaged exoplanets (Barman et al. 2011b,a;Moses et al. 2016;Miles et al. 2018;Janson et al. 2013), observing disequilibrium chemistry in transiting, tidally locked extrasolar gas planets has been more challenging. Quantifying CH 4 quenching reliably in transiting exoplanets to compare these disequilibrium chemistry processes to those occurring in brown dwarfs and directly imaged planets will be very illu-Article number, page 1 of 42 arXiv:2006.05382v1 [astro-ph.EP] 9 Jun 2020 A&A proofs: manuscript no. WASP117b minating to explore dynamical differences in different substellar atmospheres.
Disequilibrium chemistry depends on vertical atmospheric mixing (K zz ), which provides a link between the observable atmosphere and deeper layers (see e.g. Agúndez et al. 2014). Dynamical processes are further expected to be very different in tidally locked exoplanets compared to brown dwarfs just due to the different rotation and irradiation regime (Showman et al. , 2019. For example, tidal locking slows down the rotation periods of transiting exoplanets to a few days, which is very slow in comparison to e.g. brown dwarfs with rotation periods less than 1 day (see e.g. Apai et al. 2017). Disequilibrium chemistry processes across a whole range of substellar atmospheres could thus shed light on deep atmospheric processes and on to why the detection of methane has been proven to be very difficult so far in transiting exoplanets.
To this date, both methane and water have only been reliably detected for one extrasolar gas planet: at the day side of the warm (600-850 K) Jupiter HD 102195b (Guilluy et al. 2019). For HAT-P-11b, the presence of methane is inferred with 1D atmosphere models for retrieval by Chachan et al. (2019), where the authors find that the observed steep rise in transit depth for wavelengths ≥ 1.5µm in their HST/WFC3 G141 data is not present when they simulate the spectrum after removing CH 4 opacities from the model. The authors could, however, not constrain CH 4 abundances further (Chachan et al. 2019). For the warm Super-Neptune WASP-107b (Kreidberg et al. 2018), methane quenching is suggested due to the absence of methane (no discernible opacity source that would translate to an increase in transit depth for λ ≥ 1.5µm in their HST/WFC3 G141 data), while water could be clearly detected. Benneke et al. (2019) also report for the mini-Neptune GJ 3470b methane depletion compared to equilibrium chemistry models.
Many of the transiting exoplanets, for which disequilibrium chemistry may play a role, were also found to be less massive than Jupiter, ranging from mini-Neptune to Saturn-mass. These objects show a spread in metallicity that ranges from very low, ≤ 4× solar, for the Neptune HAT-P-11b (Chachan et al. 2019) to very high values, > 100× metallicity, for the exo-Saturn WASP-39b (Wakeford et al. 2018).
In this paper, the transiting exo-Saturn WASP-117b joins the rank of super-Neptune-mass exoplanets with atmospheric composition constraints. WASP-117b is in mass (0.277M Jup ) and radius (1.021R Jup ) close to WASP-39b, which was found to be metal-rich (Wakeford et al. 2018). In temperature it could be 900 K or colder and thus similar in atmospheric chemistry to WASP-107b (Kreidberg et al. 2018). Therefore, this planet will aid a comparative analysis of methane content and metallicity from the Neptune to Saturn-mass range.
Furthermore, WASP-117b orbits its quiet F-type main sequence star on an eccentric orbit (e = 0.3) with a relatively large orbital period of ≈ 10 days (Lendl et al. 2014;Mallonn et al. 2019). Using the same formalism as Lendl et al. (2014) 1 to calculate atmospheric temperatures during one orbit, the planet would then reside on its eccentric orbit for several days each in the warm T < 1000 K regime with potentially disequilibrium chemistry, depending on the strength of vertical mixing (K zz ) and atmospheric metallicity, and the hot (T > 1000 K) temperature regime with equilibrium chemistry, respectively ( Figure 1). This Fig. 1. The orbit orientation of WASP-117b and the expected equilibrium temperature, assuming albedo α = 0 and to first order instant adjustment of atmospheric temperature to changes of incoming stellar flux as used in Lendl et al. (2014). Temperatures around 1000 K for solar metallicity denote the transition in equilibrium chemistry between CO (for the hot regime T eq 1000 K) and CH 4 (for the warm T eq 1000 K) (Crossfield 2015). In the latter case, however, formation of CH 4 is expected to be quenched due to disequilibrium chemistry. The planet is expected to remain 4.6 days in the hot regime and 5.4 days in the colder regime, respectively. is a different situation than compared to WASP-107b and WASP-39b that reside on tighter circular orbits all the time in the same temperature and thus chemistry regime.
The atmospheric properties of WASP-117b will thus shed further light on the diversity in basic atmospheric properties of transiting Super-Neptunes like CH 4 quenching for atmospheric temperatures colder than 1000 K and atmospheric metallicity composition. In addition, due to the relatively long eccentric orbit, this exo-Saturn will also allow us to understand how exoplanets on wider non-circular orbits that are subjected to varying irradiation and atmospheric erosion differ from exoplanets on tighter, circular orbits.
Last but not least, we will show that it is possible and worthwhile to characterize exo-Saturns on orbital periods of 10 days with single-epoch observations from space. Thus, our WASP-117b observations are prototype observations for other transiting exoplanets with orbital periods of 10 days and longer like K2-287 b ) and similar objects that were discovered after WASP-117b (Brahm et al. 2018;Jordán et al. 2020;Rodriguez et al. 2019).
We report here the first observations to characterize the atmosphere of the exo-Saturn WASP-117b (Section 2) in the near-

Observations
We report here a near-infrared transmission spectrum measured with HST (Program GO 15301, PI: L. Carone) and a high resolution optical spectrum measured with VLT/ESPRESSO (PI: F. Yan). In the following, both set of observations and data reduction are described. For the HST/WFC3 raw data, we employed two completely independent data reduction pipelines for a robust retrieval of the atmospheric signal.
The relatively long transit duration of 6 hours required a long and stable observation by both HST and VLT/ESPRESSO, which were successful carried out with both instruments.

HST observation and data reduction
We observed one transit of WASP-117 b with HST's Wide Field Camera 3 (WFC3) instrument on UT 20-21 September 2019. The transit observation consists of eleven consecutive HST orbits. This HST observation benefited from re-observations of the WASP-117b transit in July 2017 with broad band photometry using two small telescopes. One in Chile, the Chilean-Hungarian Automated Telescope 0.7m (CHAT, PI: Jordán) and one in South Africa (1m, Los Cumbres Observatory). The combined observations allowed to improve the uncertainties in mid-transit time from 2 hours (based on Lendl et al. 2014) to 2 minutes (Mallonn et al. 2019).
For the HST measurement, at the start of each orbit, an image of the target with the F126N filter, using NSAMP=3, rapid readout mode was taken. This image was used to anchor the wavelength calibration for each orbit. Otherwise, we obtained time series spectra with the G141 grism, which covers the wavelength range 1.1-1.7 µm. We used the NSAMP=15, SPARS 10 readout mode for these, in spatial scanning mode with a scan rate of 0.12 arcseconds/sec and scan direction "round trip".
For raw data reduction, we employed two separate and completely independent pipelines to ensure that the derived atmospheric spectra are robust. We call the two pipelines henceforth nominal pipeline and CASCADE pipeline; their respective data reduction procedures are described in the following.

Nominal pipeline
We started our data reduction with the ima files, which were produced by the CALWFC3 data reduction pipeline. The ima files contain all calibrated non-destructive reads. Before extracting the light curves, we first measure the sub-pixel-level telescope pointing drifts using the cross-correlation method and correct the drift-induced wavelength calibration error by aligning images in the x-direction (dispersion direction) using linear interpolation shift. We then followed the standard procedures (Deming et al. 2013) to extract light curves from the ima files. First, differential images, which represent the spectrum collected in each nondestructive read, were obtained by subtracting each ima frame from its previous read. Second, we identified and corrected bad pixels and cosmic rays. We first marked pixels with data quality flags of 4, 16, 32, 256 as bad pixels. We then searched for spurious pixels caused by cosmic ray hits by comparing the differential images with their median-filtered images and selected the 5σ outliers as cosmic ray pixels. Pixels identified as either bad or cosmic rays were replaced by linear interpolations of their neighboring pixels. Third, we subtracted the sky background from each differential image. The sky background was estimated by taking the median values of pixels that were neither illuminated by any astrophysical sources or bad pixels. To select the background region, we manually constructed pixel masks to exclude all visible spectral traces. Besides, we applied 5σ, 10-iteration sigma-clipping on the unmasked images to remove remaining bright pixels from background estimation. Finally, we measured the raw light curve of each column by taking a R = pixels aperture photometry. We integrated spectroscopic light curves using a 5-pixel wide bin. The raw light curves are presented in Figure 2. The raw light curves demonstrate systematic trends that are typical to WFC3/IR observations (the "ramp effect"). Zhou et al. (2017) introduced a physically-motivated model for this effect (RECTE) by assuming that the rise of the light curve at the beginning of each orbit is introduced by stimuli gradually filling charge traps that are caused by detector defects. In the RECTE model, there are two populations of charge traps differing in their trapping efficiencies and sustaining lifetime. We corrected the systematics and fit the transit profile simultaneously. We used RECTE to model the systematics. For the transit profile, we adopted the batman (Kreidberg 2015) code and assumed the linear limb-darkening formula. The parameters related to the properties of the charge trap, including the numbers, trapping efficiencies, and trapping lifetimes were pre-determined. There were four free parameters in the RECTE model. They were the initially trapped charges and the amounts of trapped charges added during earth occultations for both trap populations. We also included linear trends (two free parameters) for light curves in both forward and backward scanning directions as part of the systematic model. We noted that the light curves demonstrated correlated changes with the telescope pointing variations in the y-direction (cross-dispersion direction), which could be 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17. adequately modeled by a linear function of ∆y. We added this ∆y linear term as part of the systematics model as well.
We first conducted the joint transit+systematics fit on the 1.10 to 1.70 µm broadband light curve. In addition to the parameters for the systematics model described above, parameters of the transit mid-time, the transit depth, and the limb-darkening coefficient were optimized using the emcee code (Foreman-Mackey et al. 2013). The other planetary-system related parameters, including the orbital period, system semi-major axis, eccentricity, inclination, and the longitude of periastron, were fixed using values from the literature (Lendl et al. 2014;Mallonn et al. 2019), which were already good enough for the fit. We then conducted the fit to spectroscopic light curves using the same method except fixing the transit mid-time using the best-fitting value from the broadband fit. The corrected light curves and their best-fitting transit profile models are presented in   Figure 2, the systematics are divided-out of the light curves. The light curves are color-coded identically to Figure 2. The gray curves represent the best-fitting transit profile models.

CASCADE
In addition to the nominal pipeline we also used the Calibration of trAnsit Spectroscopy using CAusal Data (CASCADe 2 ) data reduction package developed with the Exoplanet Atmosphere New Emission Transmission Spectra Analysis (ExoplANETS-A 3 ) Horizon-2020 programme. The CASCADe pipeline also starts the data reduction with the ima intermediate data product, which were produced by the CALWFC3 data reduction pipeline. As the data was observed using the SPARS10 readout mode with NSAMP = 15, the first non-destructive readout of the 16 samples of the detector ramp stored in the ima data product was removed as it has a much shorter readout time than the other readouts. For the remaining readouts, we constructed pairwise differences between consecutive readouts resulting in 14 signal measurements per detector ramp.
To subtract the background signal from our target signal we used the background subtraction procedure outlined in Brammer et al. (2015). This entails fitting and subtracting "master sky" images for the zodiacal light and a 1.083 µm emission line from the Earth's upper atmosphere appearing in exposures outside of the Earth's shadow. For details on the exact procedure we refer to section 6 of (Brammer et al. 2015).
To identify and clean bad pixels we first masked all pixels with quality flag other then zero. We then employed an edgepreserving filtering technique similar to Nagao & Matsuyama (1979). We refer to this paper for details on the method. This procedure ensures that the spatial profile of the dispersed light is preserved. As directional filters we use a series of Gaussian two dimensional kernels with a standard deviation of 0.1 respectively 3 pixels and different orientations. We flag all pixels deviating more than 4 sigma from the mean determined by the filter profile. After the filtering, we have three data products: one were all bad pixels are masked; a cleaned data set were all bad pixels are replaced by a mean value determined by the optimal filter for those pixels, which we will use to determine the relative telescope movements; and a smoothed data product on which we base our spectral extraction profile.
We use the image of the target with the F126N filter taken at the start of the first orbit to determine the initial wavelength solution and position of the time-series of spectral images taken with the G141 grism. We implemented the method outlined in Pirzkal et al. (2016) to determine wavelength calibration and spectral trace from the acquisition image, using the G141.F126N.V4.32.conf configuration file for the correct polynomial solution. The initial wavelength and trace solution is expected to have an 0.1 pixel accuracy for the trace description and better than 0.5 pixel (∼ 20 accuracy for the overall wavelength solution. As our observations are performed using the spatial-scan mode, and thus move back and forth around the initial detector position of the target system, we determined the center-of-light position of the first signal measurement to determine the absolute offset to the initial trace position. We then employ a cross-correlation technique in Fourier space, using the register_translation and warp_polar functions in the scikit-image package (van der Walt et al. 2014) version 16.2, to determine the relative positional shifts (∆x, ∆y), rotation (∆Ω) and scale changes of all subsequent exposures. The average change per scan can be seen in Figure 4. We use these relative movements and rotations to update our wavelength solution for the spectral images at each time step.
We then optimally extract (see Horne 1986) the 1 dimensional spectra from the spectral images using as an extraction profile for each time step the normalized filtered and smoothed spectral images. After the spectral extraction step we rebin the spectra to a uniform wavelength grid using the method outlined in Carnall (2017) to determine the corresponding flux values and error estimates, after which we create an average spectra per spectral scan. The resulting white light curve of the uncalibrated time series data can be seen in the top panel of Figure 5.
To calibrate the extracted spectral timeseries data and to extract the transmission spectrum of WASP 117b we make use of the half-sibling-regression methodology developed by Schölkopf et al. (2016) using causal connections within the dataset to model both transit signal and any systematics, and which has been successfully applied to transit observations from the Kepler mission ) and field-stabilized imaging data (Samland et al. 2020). For a detailed discussion and the mathematical proof of this method we refer to the Schölkopf et al. (2016) paper. Here we briefly discuss the main concepts and our implementation of the method for the HST spectroscopic data. Lets assume we have a time series of (spectral) image data D i,k ∈ R M×T , where M is the number of detector pixels and T the number of discrete measurement times in the time series. The main idea is that all data D i,k as a function of detector position x i at time t k can be described by an additive noise model: where S (transit) signal contribution we which to measure, g denotes the functional form of the systematics affecting this measurement, and is the stochastic noise (e.g., photon noise). N here is a stand-in for all hidden parameters that may be responsible for causing the systematics. We can now split the dataset into two subsets: Article number, page 5 of 42  where the set P s consists of all detector data which "see" the signal S and P c all other detector data. In practice one assigns those detector data to P c which far enough removed from the position of the source S on the detector, typically a few times the full width half max of the point spread function. Further, as some of the systematics are unique to point sources (e.g. intrapixel sensitivity variations) one limits P c to those pixels seeing other point sources in case of multi object observations (see also Wang et al. 2016). An estimate of the signal S can be made as follows: where E is the expectation value i.e. regression model for the dataset D i,k given the other data D j,k and any auxilary data Θ k , such as derived telescope movement as seen in Figure 4 correlated with the systematics. After data reduction, as described in Sections 2.1.1 and 2.1.2, we derived two transmission spectra ( Figure 6). We excluded points outside of the wavelength range 1.125 -1.65 µm because the instrument transmission has steeper gradients in those regions compared to the centre of the spectrum. Jittering in the wavelength direction together with the strong variation in transmission profile change can introduce large systematic errors. Since a strong methane absorption feature covered by the WFC3/G141 grism is centered at 1.62 µm, where H 2 O has no absorption, we can still make judgements about the presence or absence of methane based on a spectrum covering 1.125 -1.65 µm. This wavelength range was used for our further analysis.
The WASP-117b HST/WFC3 data indicate the presence of a muted water absorption spectrum with a clear peak in absorption centered at 1.45 µm and potentially a second peak towards the shorter wavelengths at the edge of the spectral window. There is no strong CH 4 absorption feature towards the longest wavelengths in the observed spectral window. The feature is present in both spectra (nominal and CASCADE), using two independent reduction pipelines for the same data set. However, it also appears that the nominal spectrum is slightly shifted towards deeper transit depths than the CASCADE spectrum; this shift is however not significant (Table 1).
We will use both data pipelines for a robuster interpretation of a single epoch transit observation with HST/WFC3. In the following, we will use mainly results derived from the nominal pipeline that is well established for further analysis e.g. with VLT/ESPRESSO (Section 2.2) and TESS data (Section 3.6).

VLT/ESPRESSO observations
To also probe gas phase absorption of Na and K in the atmosphere of WASP-117b, we observed one transit of the planet on 24/25 October 2018 under the ESO program 0102.C-0347 with the ultra-stable fibre-fed echelle high-resolution spectrograph ESPRESSO (Pepe et al. 2010) mounted on the VLT. The observation was performed with the 1-UT high-resolution and fast read-out mode. We set the exposure time as 300 s and pointed fiber B on sky. We observed the target continuously from 23:56 UT to 09:28 UT and obtained 98 spectra in total. These spectra have a resolution of R ∼ 140 000 and a wavelength coverage of 380-788 nm.

Data reduction
We reduced the raw spectral images with the ESPRESSO data reduction pipeline (version 2.0.0). The pipeline produced spectra with sky background corrected by subtracting the target spectra measured from fiber A with the sky spectra measured from fiber B. The pipeline also calculated the radial velocity (RV) of the stellar spectrum with the cross correlation technique. We discarded six spectra that have relatively low signal-to-noise ratios. Among the remaining 92 spectra, 59 spectra were observed during transit and 33 spectra were observed during the out-of-transit phase.

Transmission spectra of sodium and potassium
To obtain the planetary transmission spectra of sodium (Na) and potassium (K), we implemented the following procedures.
(1) Removal of telluric lines There are telluric Na emission lines around the sodium doublet, and these emission lines were corrected using fiber B spectra. We further corrected the telluric H 2 O and O 2 absorption lines by employing the theoretical H 2 O and O 2 transmission model described in . The corrections were performed in the Earth's rest frame.
(2) Removal of stellar lines In order to remove the stellar lines, we firstly aligned all the spectra into the stellar rest frame by correcting the barycentric Earth radial velocity (BERV) and the stellar systemic velocity. We then generated a master spectrum by averaging all the outof-transit spectra and divided each observed spectrum with this master spectrum. The residual spectra were then filtered with a Gaussian function (σ ∼ 3 Å) to remove large scale features.
(3) Correction of the CLV and RM effects During the planet transit, the observed stellar line profile has variations originated from several effects. The Rossiter-McLaughlin effect (Queloz et al. 2000) and the centerto-limb variation (CLV) effect Czesla et al. 2015;Yan et al. 2017) are two main effects. We followed the method described in Yan & Henning (2018) and Yan et al. (2019) to model the RM and CLV effects simultaneously. The stellar spectrum was modelled with the Spectroscopy Made Easy tool (Piskunov & Valenti 2017) and the Kurucz ATLAS12 model (Kurucz 1993). We used the stellar parameters from Lendl et al. (2014) except the v sini and λ values, which are taken from the RM fit of the ESPRESSO RVs. The simulated line profile change due to the CLV and RM effects is weak and below the errors of the observed data. We subsequently corrected the CLV and RM effects for the obtained residual spectra.
(4) Obtaining the transmission spectrum We shifted all the in-transit residual spectra to the planetary rest frame and added up all these shifted spectra. Because of the high orbital eccentricity, the planetary orbital velocity changes from +10 km s −1 to +20 km s −1 during transit. Therefore, in the planetary rest frame, the position of the stellar Na/K line is ∼ +15 km s −1 away from the expected planetary signal.
We investigated the Na D doublet lines (5891.584 Å and 5897.555 Å) and K D 1 line (7701.084 Å). The K D 2 line (7667.009 Å) is heavily affected by the dense O 2 lines and therefore is not used in the analysis. The listed wavelengths are in vacuum. The final transmission spectra are presented in Fig. 7. There are no strong absorption features (dip) at the expected wavelengths of the planetary Na/K lines within 3σ (indicated as dashed lines in the figure).
At the wavelength regions where the stellar Na lines are located (i.e. ∼ 0.3 Å away from the expected planetary signal), there are some spectral features. But we attribute these features to the large errors of these points, because the flux inside the deep stellar Na line is significantly lower than the adjacent continuum.
The planetary Na and K lines in the VLT/ESPRESSO data are below 3σ significance. Thus, we decided not to perform atmospheric retrieval on the VLT/ESPRESSO data alone. Instead, we used the statistically significant atmosphere signal in the HST/WFC3 data in conjunction with VLT/ESPRESSO to strengthen the case that planetary Na and K lines are indeed present in the VLT/ESPRESSO data. This analysis will be presented in Section 3.1.

WASP-117b atmospheric properties and improved planetary parameters
In this section, we first explore the significance of the water detection in the HST/WFC3 and the possible existence of weak Na and K in the VLT/ESPRESSO data. We then perform atmospheric retrieval with different models to interpret the atmospheric properties of the exo-Saturn WASP-117b.

Significance of water and Na/K detection
To identify the significance of the H 2 O detection in the HST/WFC3 spectrum, we follow the approach of (Benneke & Seager 2013) and run a number of forward models of similar complexity against each other, using petitRADTRANS (Mollière et al. 2019). In this model, we assume isothermal temperature and a grey uniform cloud. Furthermore, we included opacities of the following absorbers: H 2 O, CH 4 , N 2 , CO and CO 2 . The mass fractions of absorbers are free parameters with priors in log space ranging from -10 to 0. For the remaining atmospheric mass, a mixture of H 2 and He is assumed with a ratio of 3:1. Furthermore, we use the reference atmospheric pressure P 0 to reproduce the apparent size of the planet in the WFC3 wavelength range.
We quantify the significance of the observed molecular absorption features by using the MultiNest sampling technique that enables to quantify and compare model parameters and their significance. MultiNest is implemented within the python wrapper PyMultiNest (Buchner 2014).
Following Kass & Raftery (1995), we regard B values of 1-3, 3-20, 20-150, and > 150 as 'weak', 'substantial', 'strong', and 'very strong' preference for a given hypothesis, respectively. Benneke & Seager (2013 , Table 3), adopted from Trotta (2008) allows us further to translate B to lower limits on σ values. Table 2 lists the (natural) log evidences (lnZ) and the Bayes factor B for different hypotheses. B is calculated via B = exp(ln Z base − ln Z) from the MultiNest output for each model. Figure 8 shows the full model, including H 2 O, versus the model without H 2 O for data derived with both pipelines.
Our statistical analysis yields "strong" Bayesian preference (B >20) for the detection of water in the WASP-117b observational data for both pipeline. This corresponds to a 3σ detection.
Article number, page 7 of 42 A&A proofs: manuscript no. WASP117b  The same approach is also used for the combined HST/WFC3 (nominal) and VLT/ESPRESSO spectrum and the possible detection of Na and K. Here, the full model is extended to also include K and Na opacities again assuming their combined mass fraction to be a free parameter, but fixing the Na/K abundance ratio to the solar value (Asplund et al. 2009). Figure 9 shows the full model, including Na and K, versus the model without Na and K for data derived with the nom-inal pipeline. The combined analysis of the HST/WFC3 and VLT/ESPRESSO yields still substantial (B > 3) evidence for the presence of Na and K in the data, which would correspond to ≈ 2.4σ, using Benneke & Seager (2013 , Table 3).
Corner plots of the full set of retrieved parameters for the full retrieval models and those without H 2 O and without Na/K (B > 3) can be found in the appendix ( Figures A.1  For all spectra, retrieved atmosphere models with confidence within 1σ (dark green) and 2σ (light green) are shown.

Atmospheric retrieval
To constrain the atmospheric parameters of WASP-117b further, we again ran forward atmospheric models using the open-source code petitRADTRANS (Mollière et al. 2019), this time retrieving the planetary metallicity and C/O value directly. Our retrieval setup was applied on only the HST/WFC3 data first, and then on the combination of the HST/WFC3 and ESPRESSO data.
We constructed the retrieval forward model with the following rationale: in principle, retrievals of transmission spectra allow for many properties of the atmosphere to be constrained, such as the average terminator temperature and temperature gradient, abundances of absorbers, as well as the cloud properties such as cloud base position, scale height, average particle sizes and cloudiness fraction of the terminator (e.g., Barstow et al. 2013;Rocchetto et al. 2016;Line & Parmentier 2016;MacDonald & Madhusudhan 2017;Mollière et al. 2019;Barstow 2020). Also differences between morning and evening terminators can likely be constrained or lead to erroneous conclusions, if ignored (MacDonald et al. 2020). The same holds for variations between the day and nightside, probed across the terminator (Caldas et al. 2019;Pluriel et al. 2020). The number of free parameters, that is, the complexity of the retrieval model, needs to be justified by the quality of the data, however. Too complex models should be avoided for observations of low S/N, and in general the number of free parameters should be less than the number of data points. This prevents overfitting of the data. A useful criterion to judge whether one model is better than another, given the data, is the Bayes factor (Kass & Raftery 1995). The Bayes factor analysis will punish those models which are too complex, given the data. However, a Bayes factor analysis should not be applied blindly.
A simple (few parameters) model, that gives a good fit to the data, will always be favored, even if the model assumptions are highly unphysical. The caveat in using Bayesian factors analysis is that Bayes factors do not know physics.
Given the limited quality of our data, we decided to keep the complexity of the retrieval 1D forward model low, with a limited number of free parameters (see Table 3). More specifically, we assumed an isothermal temperature structure, whereas the absorber abundances are here not kept as free parameters but are modelled with a chemical equilibrium model, using the chemistry code described in the appendix of Mollière et al. (2017). Our abundance treatment for WASP-117b is thus analogous to Kreidberg et al. (2018) for WASP-107b. WASP-107b could be similar in temperature and atmosphere chemistry to WASP-117b during transit.
While petitRADTRANS offers a wide range of different cloud parameterizations, we decided to only retrieve a gray cloud deck pressure and, optionally, a scattering-slope opacity to model small-particle hazes. The haze opacity is parameterized as in the retrieval model, where γ determines the steepness of the scattering slope, which we constrain to γ > −6 to avoid unphysically steep slopes (Pinhas & Madhusudhan 2017;Barstow 2020). The parameters κ 0 and λ 0 are reference opacities and wavelengths, where the first is freely retrieved and the latter is set to λ 0 = 0.35 µm. Assuming a 1D temperature structure for transmission spectroscopy during transit could lead to underestimating the atmospheric temperature at the limbs as indicated by MacDonald Fig. 9. Full retrieval model fit to the WASP-117b transmission spectrum in WFC3 and VLT/ESPRESSO with the nominal pipeline with Na and K. The WFC3 data are shown in the top panel, the middle planek shows the Na doublet and the bottom the K D 1 line. The retrieval model without Na and K (not shown) is a flat line in the optical and indistinguishable from the full model without Na and K in the WFC3 NIR range. For all spectra, retrieved atmosphere models with confidence within 1σ (dark green) and 2σ (light green) are shown. et al. (2020). In addition, patchy clouds could mimic situations of high enrichment (Line & Parmentier 2016), which is another possible limitation to keep in mind.

HST/WFC3 Atmosphere retrieval
In the following, we will discuss constraints on the atmospheric composition of WASP-117b, based on atmospheric retrieval of the HST/WFC3 transmission spectrum using the nominal (Section 2.1.1) and the CASCADE pipeline (Section 2.1.2), respectively.
We employed several models with physically motivated priors to constrain the atmospheric properties of WASP-117b. The models and their priors are listed in Table 3. Model 1 and Model 2 represent atmospheric models with isothermal temperature, equilibrium chemistry and a gray cloud deck. Model 1 sets weak constraints on metallicity within [−1.5, +3], Model 2 sets constraints to lower metallicity [Fe/H] < 1.75. Model 3 assumes the same parameters and priors like Model 2 and adopts additionally a haze layer.   Table 4. The retrieval based on the nominal spectrum yields a relatively high temperature of 1129 +228 −289 K, whereas retrieval based on the CASCADE spectrum would indicate a rather low temperature of 821 +264 −226 . The discrepancy in temperature can be explained by data reduction differences, that lead in the nominal spectrum towards an overall deeper transit depth, that is, a more inflated atmosphere, which can be achieved by attaining higher temperatures compared to the CASCADE spectrum. However, both temperatures are within one sigma of each other, which again indicates that this shift is not significant. We thus conclude that the atmospheric temperature is not well constrained with our data, at least when using Model 1.
Retrieval with Model 1 further suggests very large metallicities [Fe/H] > 2 based on both spectra. At the same time, a cloud top located relatively deep in the atmosphere (p > 10 −4 bar) is inferred and the reference pressure P 0 is set high in the atmosphere (P 0 < 10 −5.5 ) to reconcile the relatively large transit depth of this inflated exo-Saturn with the high metallicities. The high metallicity solutions thus indicate relatively clear skies with very small atmospheric scale heights. These results indicate that the water feature is strongly muted and that condensate cloud modelling alone can not properly account for the shape of the water signal. Instead, the model assumes high mean molecular weight with more than 100× solar metallicity, irrespective of which data reduction pipeline is used.
Furthermore, retrieval based on both the nominal and CAS-CADE spectra tend to yield subsolar C/O values with an upper limit of 0.42 and 0.51 respectively. Due to this, the CH 4 content is retrieved within 1σ to be below 1.1 × 10 −4 volume fraction at p = 10 −4 bar, when using data retrieved with the CASCADE spectrum and even below 5.7 × 10 −10 volume fraction when using the nominal pipeline. However, we note that the 3σ range is very large (see Figure A.15) such that we cannot claim strong significance for these low CH 4 abundances.
Similar low C/O values were also found for WASP-107b by Kreidberg et al. (2018). These authors likewise attribute the low  C/O values to the absence of CH 4 within the WFC3/G141 wavelength range, which is consistent with our retrieved low abundances of CH 4 . If WASP-117b is colder than 800 K during transit and thus in a similar temperature range to WASP-107b then the very low abundance of CH 4 could indicate disequilibrium chemistry. We explore this possibility in Section 4.
We further note that we find a tail of solutions with lower metallicity and high cloudiness in the retrieved parameters based on Model 1 fitted to both spectra, the nominal and the CAS-CADE spectrum (See Figures A.9 and A.10 in the [Fe/H] − P cloud corners, indicated with a black ellipse). We will thus explore lower metallicity solutions more carefully in the next sections.  Zhou et al. (2017). Right: Model 2 (prior [Fe/H] < 1.75and grey cloud) fit to WASP-117b transmission spectrum reduced with the data pipeline CASCADE. For both spectra, retrieved atmosphere models with confidence within 1σ (dark green) and 2σ (light green) are shown. These models were derived with petitRADTRANS (Mollière et al. 2019) and indicate a cloudy atmosphere with a muted water feature.

Model 2 -condensate clouds and [Fe/H] < 1.75
We designed Model 2 with a constraint on allowed metallicities as agnostically as possible to select for the tail of cloudy low metallicity solutions of Model 1. We adapted again the simplest (uniform) cloud model for retrieval, because the information content in the muted observed water spectrum observed with HST/WFC3 is too limited to warrant a more complex cloud model with e.g. patchy clouds as proposed by Line & Parmentier (2016).
We set an upper limit of 56× solar metallicity, that is, we imposed a prior for [Fe/H] < 1.75. Figure 11 displays the best fit of Model 1 to the WASP-117b HST data, reduced with the nominal and CASCADE pipelines, respectively. The detailed corner plots with the retrieved properties based on Model 2 are displayed in Figures A.11, A.12 and A.16. Table 4 lists concisely the results of the retrieval.
We note that the retrieved atmospheric metallicity for WASP-117b is not well constrained for Model 2 with a tendency to still favor solutions at the higher end of the imposed prior. On the other hand, no matter which data reduction pipeline is used, nominal or CASCADE, in both cases we retrieve solutions with better constrained cloud deck located higher in the atmospheric compared to Model 1. With Model 2, the cloud deck is retrieved to lie between p = 10 −5.1 and p = 10 −2.2 bar. Since we excluded higher metallicities that reduce the scale height to very low values, the reference pressure P 0 is deeper in the atmosphere compared to Model 1. In other words, in Model 2 cloudy solutions are favored to explain the muted water feature.
Sub-solar C/O values that indicate low abundances of CH 4 are also favored by Model 2. However, we report a division between possible solutions within our constrained metallicity range. There is a tail of high C/O > 1 solutions (Figures A.11 and A.12), which are correlated in the posterior of our retrieval with low metallicities ([Fe/H] < −0.5), whereas high metallicity solutions ([Fe/H] > 1)) still strongly favor solar to sub-solar C/O ratios also in Model 2.
The low C/O ratios appear to correspond to low CH 4 abundances, where we retrieved within 1σ CH 4 abundances below 6.7 × 10 −4 volume fraction. Also for Model 2, the 3σ envelope of CH 4 abundances is very large (Figure A.16 Another possibility, which may explain the muted observed water spectrum, could be the existence of a haze layer on top of the cloudy atmosphere. This possibility was explored with Model 3. Figure 12 displays the best fit of Model 3 to the WASP-117b HST data, reduced with the nominal and CASCADE pipelines, respectively. The detailed corner plots with the retrieved properties based on Model 3 are displayed in Figures A.13, A.14, and A.17. A summary of the parameters retrieved with this model is given in Table 4. For the CASCADE spectrum, Model 3 yields a bi-modal posterior distribution for the haze opacity κ 0 = −1.99 +2.84 −5.22 , which indicates some of the models are hazy with a moderate scattering slope of γ = −2.61 +1.59 −2.09 (Figure 12, right). High values of κ 0 also correspond to a gray cloud deck that can lie deeper in the atmosphere compared to Model 2. The water feature is for some models inside the Model 3 framework muted by the combined effect of condensed clouds and the haze layer.
For the nominal spectrum, however, Model 3 is clearly a less good fit (Figure 12, left). Comparison between the corner plots Figure  layer properties κ 0 and γ in the nominal spectrum are more uniformly spread across the allowed parameter range compared to the CASCADE spectrum. Generally, the retrieved parameters based on the nominal spectrum are, however, for Model 3 well within 1σ of the parameters retrieved with the CASCADE spectrum as a basis. The sub-set of hazy solutions identified in the posterior of retrieval with Model 3 based on the CASCADE spectrum are thus not significant enough to indicate a strong disagreement between data reduction pipelines that are used in this work.
The metallicity is like in Model 2 unconstrained in Model 3 within imposed low metallicity limits of [Fe/H] = [−1.5, 1.75], irrespective of which pipeline is used. The temperature is again not well constrained in Model 3, as can be seen when comparing Figure A.14 and Figure A.12.
In Model 3, we identify again a tail of high C/O ratio solutions as in Model 2. Most solutions, especially those with [Fe/H] > 1, favor, however, subsolar C/O ratios. As noted in previous subsections, low C/O values are indicative of low abundances of methane. This is reflected again by retrieved CH 4 abundances smaller than 10 −4 volume fraction within 1σ for both data reduction cases.
The significance of the molecular absorption feature as well as that of a haze layer is investigated in the following.

Significance of differences in metallicity and haze layer
To quantify the different models that we used for retrieval, we compare now Model 1, 2 and 3 with each other, using the same method as in Section 3.1.
In Table 5, we list again the (natural) log evidences (lnZ) and the Bayes factor B. According to this statistical analysis, Model 1, a model with high atmospheric metallicity ([Fe/H] ≥ 2) and clear skies fits the data best compared to the lower metallicity, cloudy models Model 2 and Model 3, where the latter also assumes a haze layer on top. This is true for both pipelines.
The Bayesian preference is, however, larger for the CAS-CADE spectrum compared to the nominal spectrum. Based on data reduced with the CASCADE pipeline, the preference is "substantial" to "strong". Based on data reduced with the nominal spectrum, the preference is still "substantial", that is, 1 − 2σ.
Theoretical predictions of  indicate that up to 200× solar metallicity are in principle possible for Saturn-mass exoplanets like WASP-117b. We again point out the possible degeneracy of high metallicity with patchy cloud solutions (Line & Parmentier 2016).
Clearly, better and additional data e.g. in the optical range are needed for further constraints of cloud and haze coverage of WASP-117b and thus atmospheric metallicity. We al-  2) to support the HST/WFC3 observation. In addition, we can analyze WASP-117b transit data obtained by the TESS satellite (Section 3.6).

Rossiter-McLaughlin effect measured with VLT/ESPRESSO
From our high resolution ESPRESSO data we could also improve the constraints on the stellar rotation and spin-orbit alignment of WASP-117b, via the Rossiter-McLaughlin (RM) effect. The RM of the system was initially measured by Lendl et al. (2014) using one transit dataset from the HARPS observation. We fitted the RM effect using our ESPRESSO data to update the obliquity parameters. We applied a Markov chain Monte Carlo (MCMC) approach using the emcee tool (Foreman-Mackey et al. 2013). The RM effect was modelled with the python code RmcLell from the PyAstronomy library 4 (Czesla et al. 2019). The projected stellar rotation velocity (v sini ), projected spin-orbit angle (λ), and systemic velocity (V sys ) were set as free parameters and we fixed other parameters to the values in Lendl et al. (2014). The RV curve together with the best-fit model are presented in Fig. 13. The retrieved parameters are listed in Table 6. We did not use these updated parameters in the analyses for the rest of this work, for which the original data by Lendl et al. (2014) was already sufficient to reduce the HST/WFC3 and TESS data before the updated VLT/EPRESSO data was available to us. Still, we present the updated RM values here for completeness sake and for inclusion in future work. 4 https://github.com/sczesla/PyAstronomy 3.6. TESS transit data We further found that the Transiting Exoplanet Survey Satellite (TESS) has acquired four observations of WASP-117 between August 26 2018 and October 13 2018 during Sectors 2 and 3 of TESS' primary mission, covering four consecutive transits of WASP-117b 5 . These transit data are more accurate than the transit depth measured from the ground as reported in the discovery paper (Lendl et al. 2014) from the WASP South survey and could potentially put further constraints on the cloud properties of WASP-117b.
Interestingly, the reported TESS transit depth of 7200 ± 150 ppm was found to be very shallow (albeit within 3σ) compared to the value reported by Lendl et al. (2014) (8000 +550 −480 ppm) and by our HST/WFC3 transit depths (7563 ± 43 ppm, Table 1). We thus decided to perform an independent analysis on both the published TESS photometry and the target-pixel files (TPFs) of this target, to verify this slight discrepancy. For the former, we used the PDC lightcurve published at MAST 6 , which we fitted using juliet . As priors for this fit, we used the eccentricity and argument of periastron as reported by Lendl et al. (2014) as priors (e = 0.302 ± 0.023, argument of periastron ω = 242 ± 2.7 deg). To account for the systematic trends in the data, we used a Gaussian Process (GP) with a Matèrn 3/2 kernel, whose parameters had wide priors -the time-scale of the process had a prior between 10 −5 and 1000 days, whereas the square-root of the variance of the process had a prior between 10 −3 to 100 ppm. Independent GPs were used for each sector, which also had an added white-noise component modelled as a added gaussian-noise to the GP with a large prior on the square-root of its variance between 0.01 and 1000 ppm, and independent out-of-transit flux offsets with gaussian priors centered around 0, but with a standard deviation of 10 5 ppm. For the transit depth and impact parameter we used the uninformative sampling scheme proposed in Espinoza (2018), which samples the entire range of physically plausible values for those parameters. A wide log-uniform prior was defined for the stellar density between 100 and 10,000 kg/m 3 , and the efficient sampling scheme of Kipping (2013) was used to model the limb-darkening effect through a quadratic law, where the coefficients were left as free parameters of the fit. Finally, relatively wide priors were defined for the ephemerides of the orbit: a normal distribution centered around 10 with a standard deviation of 0.1 days for the period, and a normal distribution centered around 2458357.6 with a standard deviation of 0.1 days for the time-of-transit center. No dilution contamination was applied in this fit, as this is corrected in the PDC photometry.
Our juliet analysis on the PDC lightcurve reveals a transit depth within 1-sigma with the ExoFOP reported transit depth. The combined transit depths from WASP, TESS and the Exo-FOP TESS depth are listed in Table 7. Next, we went on to test if the PDC algorithm might be diluting the transit signal due to, e.g., too strong detrending. To this end, we used the TPFs reported in MAST to extract our own photometry for this target. We used apertures consisting of 1, 2 and 3 pixels around the brightest pixel, which accounts for apertures of about 30", 40" and 50" around the target. To detrend the data we used Pixel-Level Decorrelation (PLD Deming et al. 2015), initially used for Spitzer but which has been successfully applied to photometric data from missions like K2 (Luger et al. 2016). The detrending was performed simultaneously with the transit model defined above, in order to account for all the uncertainties simul-L. Carone et al.: Very high metallicity and absence of methane for WASP-117b taneously on each of the apertures. The results from this fit were in excellent agreement with the transit depths obtained from the PDC algorithm; for example, for our smallest aperture, we obtained a transit depth of 7144±177 ppm. Given no strong dilution is expected in this smaller aperture (judging from Gaia sources around WASP-117), this analysis gives us confidence in the fact that the relatively shallower TESS transit depths are in fact real and not related to dilution or detrending methods. This result implies that the TESS transit depth is shallow due to a physical mechanism: either due to contamination in the host star WASP-117 or due to clouds and hazes in the atmosphere of WASP-117b.

Stellar activity and clouds as a possible explanation for TESS data discrepancy
Stellar contamination can add systematic offsets to the measured transit depth of an exoplanet like WASP-117b, which can make it difficult to compare observations at different wavelength ranges and taken at different epochs (Rackham et al. 2019).
In the WASP-117b discovery paper, Lendl et al. (2014) did not detect any variability greater than 1.3 mmag with 95% confidence. We independently investigated the variability of WASP-117 using ASAS-SN Sky Patrol 7 (Shappee et al. 2014;Kochanek et al. 2017), which includes V-band photometric monitoring of WASP-117 for the 2014-2017 observing seasons taken with two cameras, 'bf' and 'bh'. We considered only the data for the 2014 and 2015 observing seasons using the 'bf' camera. Data collected using the 'bh' camera, including some of the 2015 season and all later seasons, appear to suffer from a strong instrumental systematic. The RMS scatter of the data included in this analysis is 11.5 mmag, while the mean measurement uncertainty is 4.5 mmag, suggesting that some variability is present. A Lomb-Scargle periodigram analysis (Lomb 1976;Scargle 1982), implemented in astropy 8 , consistently reveals a periodicity with a full amplitude of 9.4 mmag, though the best period depends on the observing season included. Considering the ASAS-SN data, we conservatively adopt 9.4 mmag or 1% as the reference variability level for WASP-117 in this analysis.
We modeled the variability of WASP-117 using the method detailed by Rackham et al. (2018), including considerations for FGK dwarfs (Rackham et al. 2019). We set the photosphere temperature to the stellar effective temperature determined by Lendl et al. (2014). We further adopted the surface gravity, stellar surface gravity, and metallicity from that work. We determined spot and facula temperatures from the scaling relations detailed by Rackham et al. (2019). We used the spot size and faculae-tospot ratio from the 'solar-like case' outlined by Rackham et al. (2018). Table 8 summarizes the adopted parameters. Figure 14 illustrates the modeled variability of WASP-117 as a function of spot covering fraction. We find that the adopted 7 https://asas-sn.osu.edu/ 8 http://www.astropy.org. Astropy is a community-developed core  −9 %, respectively, at 1σ confidence. If present outside of the transit chord, these active regions could alter the observed transit depths. Figure 15 illustrates the wavelength-dependent stellar contamination factor λ produced by these active region coverages over the HST/WFC3 G141 bandpass. We find the effect of unocculted spots dominates over that of unocculted faculae, producing a net increase in transit depths ( λ > 1). Over the complete G141 bandpass, we estimate that transit depths can be inflated by 1.4 +3.1 −0.6 % at 1σ confidence. This inflation is relative to a measurement in the same wavelength range that is completely unaffected by stellar activity. No strong spectral features are apparent in the contamination signal besides a slight increase toward shorter wavelengths: at 1.1 µm, transit depths may be increased by 1.5 +3.8 −0.7 %, while at 1.7 µm the increase is limited to 1.1 +2.3 −0.5 %. We note that while the scale of λ is comparable to the precision of our HST/WFC3 G141 spectrum, the lack of strong spectral features in the contamination spectrum provides reassurance against spectral features we see in the near-infrared spectrum actually resulting from stellar contamination. Figure 16 shows the relative increase of the planetary radius in the near infrared (1.1µm) between 1% and 4%, the TESS transit depths and HST/WFC3 spectrum. If the TESS data (observed in 9/2018) would have been taken at a time when the host star showed low activity, that is, very low coverage fraction of active regions compared to the time HST/WFC3 data was taken (observed in 9/2019), then this difference in stellar activity between the two epochs could in principle explain the discrepancy between the optical and the near-infrared spectrum.
We note, however, that the raw HST/WFC3 data (Figure 3) do not display any obvious signs of stellar activity, e.g., in form of spot-crossing events. Conversely, the VLT/ESPRESSO data (10/2019) taken at a few weeks after the TESS measurements do not show strong activity in the stellar Na and K lines. In other words, with the data at hand, there are no indications that the host star was more active in one epoch compared to the other.
Alternatively, if the stellar activity between the two epochs did not differ drastically, unocculted bright regions (faculae) could be invoked to explain decreases in transit depths at visual wavelengths (that is, < 1). We consider this unlikely , however, because the TESS band pass is redder than the wavelength   where we would expect faculae to start dramatically decreasing transit depths, which is typically around 0.5 µm (Rackham et al. 2019, Fig. 5). Furthermore, the scale of the decrease is too large between the TESS and WFC3 band. The transit depth is shallower by about 4% in the TESS band ( Figure 16). We expect, however, less than 1% decrease in transit depth due to the chro-   maticity of bright faculae between the TESS and WFC3 G141 bands ( Figure 17).
As we will show in Section 5, no atmospheric model fitted to the HST/WFC3 data yields transit depths shallower than 7400 ppm in the optical wavelength range -including the very cloudy (Model 2) and hazy solutions (Model 3). Without additional measurements, it is thus for now not possible to determine why the TESS transit depth is shallow (7200 ± 150) compared to the earlier WASP and the most recent HST/WFC3 measurements.

WASP-117b temperatures and chemistry composition
We could not use the TESS data to constrain atmospheric properties of WASP-117b. We further could not constrain the atmospheric temperature of WASP-117b during transit based on the muted water feature observed with HST/WFC3 (Table 4). We still think, however, that it is worthwhile to explore exemplary chemistry models for atmospheric compositions and temperature possible for WASP-117b based on the WFC3 data and theoretical reasoning. These physically based constraints a) act as sanity checks for the simplified 1D retrieval, b) still allow us to draw further conclusions and c) inform us about the benefit of future observations. We calculated theoretical planetary equilibrium temperatures T Pl,inst , assuming that the planet instantly adjusts to the stellar irradiation and that the planet is in equilibrium with the incoming stellar irradiation. This is the assumption that Lendl et al. (2014) used for their analysis, using albedo α = 0. Here, we use equation (2) in Méndez & Rivera-Valentín (2017) with β = 1 (efficient heat re-distribution over the whole planet) and broadband thermal emissivity = 1, using stellar and planetary parameters from Lendl et al. (2014) and setting albedos α = 0, 0.3, 0.6 within limits of cloud models for hot to warm Jupiters . T Pl,inst gives the maximum possible temperature variation for WASP-117b during one orbit, including at transit (Table 10, temperatures during transit set in bold). These temperatures were compared to the assumption that the planet's temperature is constant and equal the time-averaged planetary temperature T Pl,ave , where we used equation (16) of Méndez & Rivera-Valentín (2017) again with α = 0, 0.3, 0.6. We stress that these temperatures are first order assumptions to estimate the maximum possible temperature range during transit. A more comprehensive analysis of temperature variation during one orbit requires the comparison between radiative and dynamical time scales, where the latter has to be estimated from a 3D circulation model . Table 10 shows that depending on albedo and heat adjustment assumptions, WASP-117b could be between 700K to 1000 K warm during transit. In the following, we investigated the implications of equilibrium and disequilibrium chemistry within this temperature range for Model 1 (high metallicity [Fe/H] > 2) and Model 2 (low metallicity [Fe/H] < 1.75).
We computed a self-consistent pressure-temperature profile for the exo-Saturn WASP-117b using petitCODE (Mollière et al. 2015(Mollière et al. , 2017 and multiple one-dimensional chemical kinetics models (Venot et al. 2012), incorporating vertical mixing and photo-chemistry. In order to systematically represent the best-fit retrieval models (see Table 4), as well as the parameter space of physically accepted solutions, we performed four base models with varying temperature and metallicity. We adopted values for atmospheric metallicities of 350 times and 5 times solar metallicity, consistent with Model 1 and Model 2 respectively. Further, we picked temperatures of 1000 K and 700 K to represent the theoretically possible range of temperatures during transit. A C/O ratio of 0.30 is adopted in accordance with the retrievals.
We acknowledge that a C/O ratio of 0.3 may be too low, because we assumed equilibrium chemistry during retrieval. We argue, however, that higher C/O ratios produce even higher CH 4 abundances for the same pressure and temperature range (see e.g. Madhusudhan 2012). Thus, any constraints on CH 4 quenching via vertical mixing (K zz ) that we find for such a low C/O ratio of 0.3, will also hold for higher C/O ratios and provide e.g. a lower limit for possible K zz . Therefore, we decided given the lack of other constraints that adopting C/O = 0.3 is the best approach for the time being.
Another possible concern in basing disequilibrium chemistry on retrieved parameters is the difference in planetary atmospheric temperature structure used by both models. The former uses a self-consistent 1D atmospheric temperature profile ( Figure B.1, bottom right panel), based on petitCODE (Mollière et al. 2015(Mollière et al. , 2017, the retrieval code petitRADTRANS (Mollière et al. 2019) uses isothermal temperatures. However, we note that for p ≤ 10 −3 bar the temperature is approximately isothermal also in the self-consistent 1D exoplanet atmosphere model. Thus, we conclude that these differences in temperature do not play a strong role here.
Furthermore, for the pressure-temperature iteration we assumed a uniform day-side heat redistribution and a moderate intrinsic temperature of 400 K, in agreement with the planet's irradiation . Other stellar and planetary parameters were derived from Lendl et al. (2014). For the chemical kinetics model we applied a vertically constant eddy diffusion coefficient K zz = 10 10 cm 2 /s. We also applied photochemical reactions. To represent the spectral energy distribution of WASP-117, we have used an ATLAS stellar model (T = 6000 K, logg = 4.5) (Castelli & Kurucz 2003) between 168 nm and 80 µm, and a time-averaged solar UV spectrum (Thuillier et al. 2004) between 1 nm and 168 nm.
As can be seen in Figure 18, the hot models of 1000 K naturally have a low methane abundance in chemical equilibrium. This is exacerbated when disequilibrium chemistry is taken into account, as methane gets quenched at even lower abundances. For the warm models of 700 K, methane starts to dominate over CO in chemical equilibrium in the higher atmospheric layers (p < 1 mbar for the high metallicity case and p < 40 mbar in the lower metallicity case). Also in these models, vertical mixing serves to reduce the methane abundance in the layers above the quenching level (p < 100 mbar for the high metallicity case and p < 1 mbar in the lower metallicity case).
For low atmospheric temperatures (700 K), disequilibrium chemistry through vertical mixing is required to keep a low methane abundance (< 10 −4 volume fraction) in the observable part of the atmosphere (p < 0.1 bar in transmission) for both, low and high atmospheric metallicity. Vertical mixing would be even more important for lower atmospheric metallicity composition compared to high metallicity because in the former more CH 4 is produced at a given pressure level. Provided WASP-117b is indeed 700 K warm during transit, we could potentially further constrain K zz ≥ 10 8 cm 2 /s (see Figure B.1 and Section B, where we present an abundance study for this case, as a function of K zz ). We note that this assessment is based on the median abundances that we retrieve for CH 4 , with 1-σ uncertainties on the upper limit of the CH 4 abundances as large as 2 orders of magnitude, see Figures A.15,A.16,A.17. JWST data, for example, will be able yield a much tighter constraint on the CH 4 abundances and atmospheric temperatures and thus on disequilibrium chemistry in WASP-117b as we investigated in the next section.

Prospects for future investigations with HST and JWST
We investigated to what extent future observations with HST in the UV and optical, and JWST in the near to mid-infrared could constrain the properties of WASP-117b. Synthetic spectra based on the results of Model 1 (high metallicity), Model 2 (low metallicity & cloudy) and Model 3 (hazy) (see Section 3.3, Table 4) are shown in Figure 19.
The possible atmospheric compositions retrieved for WASP-117b have large differences in wavelength ranges between 0.3 − 0.5µm (Figure 19, top panels). An accuracy of 200 ppm could already be sufficient to distinguish between very hazy (extreme hazy solutions of Model 3, based on the CASCADE spectrum) and clear sky, heavy metal atmospheric composition (Model 1), based on the HST/WFC3 G141 grism spectrum reduced with the CASCADE pipeline ( Figure 19, top right). Accuracy of 100 ppm and better would be needed between 0.3 and 0.5 µm to distinguish between the medians of the clear sky, heavy metal, the A&A proofs: manuscript no. WASP117b  Fig. 18. Molecular abundances computed for hot (1000 K) or warm (700 K), and very (Z × 350) or moderately (Z × 5) enriched atmospheres using the chemical kinetics code of Venot et al. (2012). The colored lines denote the disequilibrium abundances for the main molecules in the planetary atmosphere. The gray lines indicate the chemical equilibrium state. A constant K zz = 10 10 cm 2 /s was adopted. grey cloudy solutions with lower metal-content and the hazy solutions, irrespective of data pipeline used (Figure 19, top panels). Wakeford et al. (2020) recently showed that it is, in principle, possible to accurately measure a transmission spectrum of a transiting exoplanet in this wavelength range using WFC3/UVIS G280 grism. The authors report an accuracy of 29 to 33 ppm for the broad band depth between 0.3 to 0.8 µm and 200 ppm in 10 nm spectroscopic bins. Wakeford et al. (2020) observed two transits of the hot Jupiter HAT-P-41b around a quiet F star with magnitude m V = 11.08. Our target WASP-117b likewise orbits a quiet F star that is even brighter than the host star of HAT-P-42b. WASP-117 has a magnitude of m V = 10.15.
Thus, we estimate that two transit observations of WASP-117b could be sufficient to constrain a haze layer and potentially also the atmospheric metallicity using WFC3/UVIS. We did not consider here the optical transits in particular the TESS transit depth 7200 ± 150 ppm. All synthetic spectra have transit depths larger than 7400 ppm in the 0.8-1 µm range and do not fit within 1σ the TESS data (see Section 3.6).
In the JWST wavelength range with a single transit in each instrument, we expect clear H 2 O features for the exo-Saturn WASP-117b in the JWST wavelength range (Figure 19, bottom panels). Furthermore, some models with low metallicity solutions (Model 2 and Model 3 framework) yield ∼ 100 ppm strong CH 4 features at 3.4 µm. For example, the best fit model for Model 2 based on the CASCADE spectrum predicts a clear CH 4 signal at 3.4 µm (Figure 19, bottom, left panel). This would correspond to a CH 4 abundance of 1.8 × 10 −5 volume fraction (Table 4).
Observing WASP-117b with JWST/NIRSpec should thus at the very least impose constraints on the CH 4 and H 2 O content in the atmosphere. Observations of relatively high CH 4 abundances of 10 −5 volume fraction could also disprove high metallicity via chemistry constraints as shown in Section 4 ( Figure 18). Nonobservations of CH 4 could, on the other hand, indicate either high metallicity, high atmospheric temperatures (T 1000 K) during transit, or disequilibrium chemistry. In the latter case, we could also constrain vertical mixing (K zz ). It is also evident from Figure 19 that CO 2 is a useful probe of metallicity, with the high metallicity cases showing increased CO 2 absorption at 4.3 µm and at the red edge of the spectra, towards 15 µm.
Taking into account the expected instrument performance confirms our statement that NIRSpec should be capable, even with one single transit observation of WASP-117b, to resolve differences in CO 2 at 4.3 µm (NIRSpec/G385M) that are due to different metallicity assumptions in between atmospheric models ( Figure 20) as well as constrain CH 4 at both, 2.3 and 3.4 µm (NIRSpec/G235M). This is readily apparent in Figure 20, right panel for CASCADE Model 2 (green) with enhanced CH 4 content compared to other retrieved models (see Table 4). Fig. 19. Synthetic spectra based on Model 1 (high metallicity, purple), Model 2 (low metallicity & cloudy, green) and Model 3 (haze, black) based on the WASP-117b spectrum reduced with the nominal pipeline (left panels), and the CASCADE pipeline (right panels). Colored areas give the 1σ envelope for the solutions for each model. Solid lines in grey, yellow and light-pink give the best fit for Model 1, Model 2 and Model 3, respectively. The spectra are shown for the wavelength range: 0.3 -0.9 µm (upper panels) and 1 -15 µm (lower panels). In addition, a hypothetical data point with 50 ppm accuracy is shown for guidance.

Nominal CASCADE
The comparison between spectra derived from the nominal and CASCADE pipelines ( Figure 20 left and right panel, respectively) also shows that the nominal H 2 O features in the NIRSpec/G235M wavelenght range are in the former case significantly enhanced compared to the latter, regardless of which underlying model is used. The differences in depth of the water absorption is mainly due to the differences in atmospheric temperatures assumed in the model medians. Models retrieved from the nominal HST/WFC3 spectrum yield a median temperature of about 1000 K, whereas models retrieving the CAS-CADE HST/WFC3 spectrum retrieve lower median temperature of 800 K and cooler (Table 4).
The simultaneous measurement of H 2 O, CH 4 , and CO 2 with JWST thus promises to improve the constraint of the C/O ratio of the planetary atmosphere and thus either confirm or disprove the sub-solar C/O ratio that we tend to find for WASP-117b. JWST will further be extremely important to constrain temperature, metallicity, CH 4 content and thus methane chemistry in the exo-Saturn WASP-117b during transit. HST/UVIS observations will be more important to constrain hazes and cloudiness, which could also constrain atmospheric metallicity if an accuracy of 100 ppm or better is reached between 0.3 and 0.6 µm.

Discussion
This work shows that it is possible but also very challenging to investigate the atmospheric properties of an exoplanet with an orbital period of 10 days and longer.
Atmosphere retrieval favors, based on Bayesian analysis alone, a high metallicity ([Fe/H] > 2), clear atmospheric composition (Model 1) "substantially" (B > 4) over the cloudy, lower metallicity solution (Model 2) as fit to the detected 3σ water spectrum. See Section 3.4 for an overview of the statistical evidence for the different models. We caution, however, against using Bayesian analysis of the HST/WFC3 data alone to rule out cloudy, lower metallicity solutions, completely. We also caution that patchy clouds could mimick a high metalliticty composition (Line & Parmentier 2016).
VLT/ESPRESSO observations of WASP-117b in the optical could not be used to constrain the atmospheric properties of WASP-117b further. The planetary Na and K signal is too small (2.4σ). Thus, we conclude that exoplanets with long orbital periods should only be observed with ground based telescopes, after space based observations have yielded first constraints on the cloud coverage and thus the feasibility to obtain a strong enough planetary signal in the optical.
TESS data could also not be used for atmospheric constraints, because the TESS transit depth is abnormally shallow. We find that the TESS data is inconsistent with all atmospheric models that we explored, including cloudy and haze atmosphere models (Section 5). Stellar activity is also unlikely to cause a 4% transit depth differences between the TESS and the WFC3/NIR wavelength range (Figure 16). This example shows, however, how difficult it can be to combine measurements taken at two different epochs in two different wavelengths, even when the respective atmospheric measurements are precise and the host star is quiet (Rackham et al. 2019).
We maintain that a better constraint of the metallicity in the atmosphere of the eccentric WASP-117b is warranted as its properties would be complementary to other exoplanets on circular orbits in the same mass range: A clear sky, high metallicity atmospheric composition would place WASP-117b significantly outside of the mass-metallicity relation inferred from the Solar System 9 Further, it is unclear in how far erosion processes have influenced the atmospheric metallicity of extrasolar Neptune and Saturn-mass exoplanets (see e.g. Owen & Lai 2018;dos Santos et al. 2019;Armstrong et al. 2020). Also, here the atmospheric properties of the eccentric WASP-117b could be illuminating. Figure 21, right panel, shows that WASP-39b and WASP- 9 The values for the Solar System bodies come with the caveat that even there the inferred metallicity depend on uncertainties like core size and H 2 O content . (Figure 21, left panel). The exo-Saturn WASP-117b would thus join ranks with planets like WASP-39b with comparatively high metallicity (Wakeford et al. 2018), and HAT-P-11b (Chachan et al. 2019), which also do not follow the Solar System mass-metallicity relation. In fact, a high atmospheric metallicity and its mass and radius would make WASP-117b very similar to WASP-39b. A cloudy, lower atmospheric metallicity composition (Model 2), on the other hand, would make WASP-117b more comparable to WASP-107b, WASP-43b and WASP-12b that all follow the inferred Solar System mass-metallicity correlation (see also  107b lie in the Neptune 10 desert (Mazeh et al. 2016), which is thought to reflect significant mass loss in this planetary mass and orbital period regime. The Super-Neptune WASP-107b was indeed found to experience loss of helium (Spake et al. 2018;Kirk et al. 2020) that could modify atmospheric metallicity over time. The exo-Saturn WASP-117b, on the other hand, should lie at 0.1168 AU at least during transit well outside of the Neptune desert. The large distance during transit also explains the absence of deep Na and K features in the VLT/ESPRESSO data during transit. The atmospheric metallicity of WASP-117b could thus shed light on if a high metallicity atmospheric composition (if confirmed) is a sign of atmospheric erosion processes or instead of formation processes different from the Solar System.
The atmosphere of WASP-117b could also be illustrative for the study of disequilibrium processes. We found consistently low CH 4 abundances (< 10 −4 volume fraction), expressed by our retrievals finding sub-solar C/O ratios. This is in particular true when we assume high metallicity atmospheric composition (Model 1) with the highest evidence. Interpreting the atmospheric chemistry represented by these parameters, requires the additional knowledge of the atmospheric temperature, which we could not constrain within prescribed limits of 500 to 1500 K based on the WFC3 data alone.
We maintain, however, based on simple equilibrium temperatures calculations using Méndez & Rivera-Valentín (2017) that temperatures between 700 and 1000 K are physical during the transit of the eccentric exo-Saturn WASP-117b (Section 4, Table 10). Chemical modelling for 700 K and 1000 K for high (350× solar) and low metallicities (5× solar) further confirm that very low CH 4 abundances in WASP-117b during transit are physically plausible. The 700 K models require disequilibrium chemistry to keep atmospheric CH 4 abundances below 10 −4 volume fraction at p = 10 −4 bar, which we identified to first order as the main region that is probed during transmission. For the colder temperature models, we thus infer a vertical eddy diffusion coefficient of K zz ≥ 10 8 cm 2 /s to explain absence of methane via CH 4 quenching.
Generally, up to now, the value of the eddy diffusion coefficient in transiting extrasolar planets is only constrained theoretically with order-of-magnitude parametrizations from 3D GCM simulations. For the hot Jupiter HD 209458b, e.g. K zz = 5 · 10 8 cm 2 /s -10 9 cm 2 /s is derived at 1 bar (Parmentier et al. 2013;Menou 2019). Atmospheres with higher metallicites, as we infer for WASP-117b, are expected to have more vertical mixing (Charnay et al. 2015). Our tentative inference of K zz ≥ 10 8 cm 2 /s constraint for 700 K atmospheric temperature, would thus fit theoretically motivated expectations.
We argue that additional transit measurements of WASP-117b with HST/UVIS and the G280 grism could yield the necessary accuracy (200-100 ppm, (Wakeford et al. 2020)) to constrain a haze and cloud layer in WASP-117b. A low resolution transmission spectrum in that range would reveal cloud properties due to the different scattering slopes between 0.3 and 0.5 µm for a clear atmosphere with high metallicity, a grey cloudy and hazy atmosphere with low metallicity (Figure 5, upper panels). HST/UVIS measurements would thus already constrain atmospheric metallicity, which is degenerate with cloud properties in the HST/WFC3 range. In addition, HST/UVIS data could also potentially shed light on the origin of the shallow TESS transit depth. Such data would also warrant the use of a patchy cloud model (Line & Parmentier 2016).
Additional measurements with JWST/NIRSpec observations in the 0.8-5 micron region could further constrain CH 4 to at least ≤ 1.8 × 10 −5 volume fraction by resolving the 3.4 µm CH 4 line. A further constraint on CH 4 would also improve our understanding of atmosphere chemistry in WASP-117b, in particular, in conjunction with secondary eclipse observations to constrain temperature variations of WASP-117b during one orbit. The presence and absence of CH 4 , or CO 2 at 4.3 micron, would also constrain atmospheric metallicity and together with H 2 O features confirm if WASP-117b indeed has a subsolar C/O ratio. See also Guzmán Mesa et al. (2020) for the capabilities to use JWST/NIRSpec to characterize the atmospheric chemistry of warm Neptunes.

Conclusions
We detected a 3σ water spectrum in the HST/WFC3 nearinfrared transmission spectrum of WASP-107b. Comparison to one-dimensional atmospheric models assuming vertical isothermal temperatures, a uniform grey cloud coverage and equilibrium chemistry (petitRADTRANS) shows that the water absorption is less pronounced than expected. Based on the preferences derived from the Bayes factor analysis alone, WASP-117b appears to have a high atmospheric metallicity ([Fe/H] = 2.58 +0.26 −0.37 or 380 +311 −217 × solar, based on the nominal pipeline) with relatively clear skies. We cannot rule, however, lower atmospheric metallicity (Fe/H] < 1.75) with clouds and potentially an additional haze layer. However, we conclude that the data quality is too limited to warrant the application of a hazy atmosphere layer. The high metallicities that are strongly preferred by a Bayesian factor analysis are comparable to similarly high atmospheric metallicities (151 +46 −48 × solar) found in WASP-39b, an exo-Saturn on a circular 4 days orbit around a cooler G type star (Wakeford et al. 2018).
While the basic atmospheric composition could be constrained, the temperature of WASP-117b during transit is unconstrained between 500-1500 K. Thus, we can not identify if the planet changes temperature during its eccentric 10 days orbit. Based on basic energy balance equations and realistic range of heat redistribution scenarios, we estimate that the planet could theoretically adopt equilibrium temperatures between 700 K and 1000 K, depending on albedo and heat adjustment.
Because the temperature of WASP-117b during transit is not constrained, we cannot further test multiple possible hypotheses that may explain the low abundance of CH 4 in the atmosphere of WASP-117b during transit. If the atmospheric temperature is above 1000 K, then CH 4 can not form and we do not expect to detect measurable levels of CH 4 . If the temperature of WASP-117b is during transit 700 K, then higher CH 4 abundances than < 10 −4 volume fraction would form in the atmospheric levels that we probe (p < 10 −4 bar), assuming equilibrium chemistry. Thus, in the latter case, we would need to invoke CH 4 quenching and we could constrain the eddy diffusion coefficient to K zz ≥ 10 8 cm 2 /s. If WASP-117b is cool, it would thus show methane-quenching like indicated for the super-Neptune WASP-107b (Kreidberg et al. 2018).
Using VLT/ESPRESSO data alone, we do not find strong indications for Na and K within 3σ. In combination with HST/WFC3 data, however, we still find "substantial" (2.4σ) preference for the presence of Na and K.
We also compared our WFC3 spectrum to TESS broad band observations of WASP-117b and found that the TESS transit depth is 4% shallower than the HST/WFC3 transit depth. We can not explain this difference with either systematic noise, stellar activity or clouds.
Furthermore, with the new VLT/ESPRESSO data, we improved via the Rossiter-MacLaughlin effect measurements of the v sin i of the star and the projected spin-orbit angle to 1.46 ± 0.14 km/s and −46.9 +5.5 −4.8 deg, respectively.

Outlook
WASP-117b is one of the first warm eccentric exo-Saturns with a long orbital period (10 days), the atmosphere of which was characterized. While the observations proved to be challenging, they indicate that WASP-117b could have interesting properties such that further observations with JWST and HST in the optical range would be warranted. WASP-117b could be an ideal target to investigate how (disequilibrium) chemistry processes are controlled by precisely known, varying heating conditions on a 10 days orbit around a quiet F main sequence star. This studies would complement studies of eccentric exoplanets on tighter orbits (e.g. Lewis et al. 2013Lewis et al. , 2014Agúndez et al. 2014). WASP-117b is also a complementary observation target to the super-Neptunes WASP-107b and WASP-39b on circular, tighter orbits to aid our understanding on how erosion processes shape atmospheric properties such as metallicities in the Neptune-to-Saturn mass range.
As of now, several promising extrasolar exo-Saturns on eccentric orbits with P orb ≥10 days around bright stars suitable for in-depth atmospheric characterization are known (Brahm et al. 2016(Brahm et al. , 2018Jordán et al. 2019;Brahm et al. 2019;Rodriguez et al. 2019). One such planet, K2-287 b, orbits a G-type star with a time-averaged equilibrium temperature of 800 K , that is, it is at least 200 K cooler than WASP-117b would thus be even more ideal to study disequilibrium chemistry in warm eccentric gas planets on relatively wide orbits.          Posterior probability distributions of the joint analysis of the HST and VLT ESPRESSO dataset for WASP-117b, reduced with the CASCADE pipeline. Here, the retrieval without Na and K was applied (see Section 3.1). Median parameter values and 68% confidence intervals for the marginalized 1D posterior probability distributions are indicated with horizontal error bars.