Peering into the formation history of beta Pictoris b with VLTI/GRAVITY long-baseline interferometry

Context. Beta Pictoris is arguably one of the most studied stellar systems outside of our own. Some 30 years of observations have revealed a highly-structured circumstellar disk, with rings, belts, and a giant planet: β Pictoris b. However very little is known about how this system came into being. Aims. Our objective is to estimate the C/O ratio in the atmosphere of β Pictoris b and obtain an estimate of the dynamical mass of the planet, as well as to reﬁne its orbital parameters using high-precision astrometry. Methods. We used the GRAVITY instrument with the four 8.2 m telescopes of the Very Large Telescope Interferometer to obtain K-band spectro-interferometric data on β Pic b. We extracted a medium resolution (R=500) K-band spectrum of the planet and a high-precision astrometric position. We estimated the planetary C/O ratio using two diﬀerent approaches (forward modeling and free retrieval) from two diﬀerent codes (ExoREM and petitRADTRANS, respectively). Finally, we used a simpliﬁed model of two formation scenarios (gravitational collapse and core-accretion) to determine which can best explain the measured C/O ratio. Results. Our new astrometry disfavors a circular orbit for β Pic b ( e = 0 . 15 +0 . 05 − 0 . 04 ). Combined with previous results and with Hipparcos/GAIA measurements, this astrometry points to a planet mass of M = 12 . 7 ± 2 . 2 M Jup . This value is compatible with the mass derived with the free-retrieval code petitRADTRANS using spectral data only. The forward modeling and free-retrieval approches yield very similar results regarding the atmosphere of β Pic b. In particular, the C/O ratios derived with the two codes are identical (0 . 43 ± 0 . 05 vs 0 . 43 +0 . 04 − 0 . 03 ). We argue that if the stellar C/O in β Pic is Solar, then this combination of a very high mass and a low C/O ratio for the planet suggests a formation through core-accretion, with strong planetesimal enrichment.


Introduction
The ever-increasing number of exoplanet detections (over 4000, at the time of this writing 1 ) proves that our instrumental capabilities are getting better and better at discovering these other worlds. But even though exoplanets are now routinely being observed, determining their physical properties (temperature, mass, composition), let alone the history of their formation, remains extremely challenging. And yet, these measurements are key to understanding the details of planetary formation processes.
Among all measurable quantities, element abundance ratios are emerging as some of the most promising for understanding planetary formation. The question of the supersolar abundances of heavy elements in the atmosphere of Jupiter is probably what motivated the first attempts to link abundance ratios to planetary formation, and several studies have been carried out to understand how planetesimal accretion can lead to heavy element enrichment (Helled et al. 2006;Helled & Schubert 2009;Owen et al. 1999;Alibert et al. 2005). On the exoplanet front, the work ofÖberg et al. (2011) was the first general attempt to show that element ratios in an exoplanet atmosphere can be an imprint of its formation history. This idea has since been investigated further by several authors (e.g., Ali-Dib et al. 2014;Thiabaud et al. 2014;Helling et al. 2014;Marboeuf et al. 2014a,b;Madhusudhan et al. 2014Madhusudhan et al. , 2017Mordasini et al. 2016;Öberg & Bergin 2016;Cridland et al. 2016;Eistrup et al. 2016Eistrup et al. , 2018. WhileÖberg et al. (2011) highlighted how gas disk abundances can influence the atmospheric composition, the importance of icy planetesimals for the atmospheric enrichment is stressed in Mordasini et al. (2016), where exoplanet spectra are derived from modeling full formation in the core-accretion paradigm.
Measuring the element ratios is not easy, and requires high-quality data. Madhusudhan et al. (2011) used a free retrieval method on a set of Spitzer and ground-based photometric data in 7 different bands to obtain the first exoplanetary C/O ratio on the hot Jupiter WASP-12b. But the value of C/O > 1 they obtained has since been ruled out by Kreidberg et al. (2015), showing the difficulty of obtaining reliable abundance ratios. Konopacky et al. (2013) used a different approach in their study of HR 8799 c. They obtained K-band spectroscopic observations of the planet with the spectrograph OSIRIS on the Keck II telescope, and were able to extract an estimate of the C/O ratio using model grid fitting. They found a value of C/O = 0.65±0.15. Looking at the same planetary system, Lavie et al. (2017) estimated the C/O ratio for four planets (HR 8799 b, c, d, and e), using a retrieval analysis method. In their analysis, they notably emphasized the importance of high-quality Kband spectroscopic data, which they found to be critical for a reliable measurement of the C/O and C/H ratios.
With the recent direct detection of the giant planet HR 8799 e with the GRAVITY instrument (Gravity Collaboration et al. 2019) on the Very Large Telescope Interferometer (VLTI), optical interferometry has become a new arrow in the quiver of exoplanet observers. By taking advantage of the angular-resolution offered by 100+ meter baselines, optical interferometers can separate a dim exoplanet from the overwhelming residual starlight, leading to accurate measurements of the astrometric position (up to 10 µas, Gravity Collaboration et al. 2018), and high signalto-noise spectroscopic data with absolute calibration of the continuum.
In this paper, we present observations of the giant planet β Pic b obtained with GRAVITY and we investigate the possibility of using this K-band spectro-interferometric data to determine the C/O ratio of the planet. The observations are presented in Section 2, together with a brief summary of the data reduction (a complete explanation is given in Appendix A). Section 3 focuses on the orbit and mass of β Pic b. We show in this section how the new GRAVITY astrometric data impacts the best orbital estimate currently available and we provide a new estimate of the dynamical mass of the planet. Section 4 is devoted to the measurement of atmospheric properties and, in particular, to the determination of the C/O ratio, using two different approaches: forward modeling and free retrieval. In Section 5, we discuss the C/O ratio obtained in the case of a formation of β Pic b through gravitational accretion and then through core-accretion. Our general conclusions can be found in Section 6.

Observations
Observations of β Pictoris b were obtained on September, 22, 2018, using the GRAVITY instrument (Gravity Collaboration et al. 2017), with the four 8 m Unit Telescopes (UTs) of the VLT. The instrument was set up in its medium resolution mode (R = 500), and observations were conducted in on-axis/dual-field mode.
The observing strategy was similar to the one described in Gravity Collaboration et al. (2019): the fringe-tracker ) was using the flux from the central star during the observing sequence, while the position of the science fiber was changed at each exposure, alternating between the central star and the position of the planet. Since the planet was not visible on the acquisition camera, the position used to center the fiber during the planet exposures was a theoretical position, based on predictions from previous monitoring (Wang et al. 2016;Lagrange et al. 2018).
A total of 16 exposures (resp. 17) were acquired on the star (resp. the planet). Each star exposure was made of 50 individual 0.3 s integrations. For the planet, which is ∼ 10 mag fainter than the star, the integration time was initially set to 30 s, with 10 integrations per exposure, and reduced to 10 s with 30 integrations at mid-course, since the observing conditions were good (seeing < 0.8 ). The complete dataset contains 1.4 hr of integration on the planet (and 0.35 hr of associated background exposures), and 4 min 30 s of integration on the central star (plus 1 min 15 s of sky background). The observing log is given in Table 1.

General data reduction
During planet exposures, the science fiber at each telescope is kept at an offset position with respect to the star, reducing significantly the star to fiber coupling ratio. But even though most of the stellar flux is rejected, speckle noise can still couple to the science fiber and dominate the exposures, hence the need for careful data reduction to disentangle the planet signal from the remaining coherent stellar flux.
The general data reduction method used to reduce the VLTI/GRAVITY observations of β Pic b is presented in details in Appendix A. It can be divided into different parts: pipeline reduction (common to all GRAVITY observations), astrometric extraction, and spectrum extraction. These steps are described in Appendix A.2, A.4, and A.5. The end products are an astrometric position for the planet with respect to the star (∆α, ∆δ), and a planet-tostar contrast spectrum C(λ) = S P (λ)/S (λ) which is the ratio between the spectra of the planet and of the star.

K-band spectrum
The contrast spectrum of β Pic b was converted to an absolute spectrum of the planet using a model of the stellar spectrum: S P (λ) = C(λ) × S (λ). We used a BT-NextGen model (Hauschildt et al. 1999), with a temperature of 8000 K, a surface gravity of log(g/g 0 ) = 4, and a Solar metallicity, as close as possible to the measured value for this star (Lanz et al. 1995;Gray et al. 2006). We scaled this synthetic spectrum to an ESO K-band magnitude of 3.495, taking into account the correct filter (van der Bliek et al. 1996). This strategy, based on the extraction of a contrast spectrum and the use of a model for the star, helps to reduce the impact of Earth's atmosphere on the final planet spectrum. The result is given in Figure 1.

Astrometry
Using the data reduction method described in Appendix A.4, we found a mean relative planet to This GRAVITY measurement is shown in the inset plot of Figure 2. In its dual-field mode, GRAVITY is limited to observations of planets above the diffraction limit of a single telescope (to separate the planet from the central star), but the relative astrometry derived from these observations still fully benefits from the length of the telescope array.

Orbital parameters
We fit a Keplerian orbit to the visual astrometry of the planet to characterize its dynamics. As our new GRAVITY point is more than an order of magnitude more precise than any other published astrometric point on the northeastern half of its orbit (c.f., Lagrange et al. 2019a), we expected a better constraint on the eccentricity of the planet's orbit. We used the published astrometry from Chauvin et al. (2012), Nielsen et al. (2014, and Wang et al. (2016) in this analysis. The orbit was fit using the open-source Python orbit fitting package orbitize! (Blunt et al. 2019). We included a custom likelihood to fit the GRAVITY measurement along the two principal axes of the error ellipse. We fit for the same eight parameters as Wang et al. (2016): semimajor axis (a), eccentricity (e), inclination (i), argument of periastron (ω), position angle of the ascending node (Ω), the first periastron passage after MJD = 55,000 in units of fractional orbital period (τ ), system parallax, and total system mass (M tot ). We generally used relatively unconstrained priors for most of the orbital parameters (see Table  2). For Ω, we constrained it to between π/10 and π/2 to account for the fact that Snellen et al. (2014) detected the RV signal of the planet. However, we chose not to explicitly include the RV in the fit as there could be systematics in the reported uncertainties. For the parallax, we used a normal distribution to represent the parallax of 51.44 ± 0.12 mas measured by Hipparcos (van Leeuwen 2007). We sampled the posterior using the parallel-temperature affine-invariant sampler in ptemcee (Foreman-Mackey et al. 2013;Vousden et al. 2016) with 20 temperatures, 1000 walkers per temperature. We discarded the first 15,000 steps to allow the walkers to converge. We assessed convergence using the autocorrelation time and by visual inspection of the samples. We then ran each walker for 5000 steps, keeping only ev- Plotted in black are possible orbits randomly drawn from the posterior using only relative astrometry (Section 3.1). Previous astrometric measurements used in the orbit fit are in blue. The GRAVITY measurement from this work is in red, with an inset plot that is zoomed in by a factor of ∼2000 to display the uncertainties on this measurement. ery tenth sample to mitigate correlations in the samples produced by any given walker.
Our constraints on the orbit of β Pic b using just astrometry of the planet are collected in Table 2 and plotted in Figure 2. We find that < 2% of allowed orbits have e < 0.05 and < 0.5% of orbits have e < 0.03, although there are still some allowed circular orbits. Dupuy et al. (2019) also proposed an e ≈ 0.25 when including astrometric and radial velocity data on the system. To statistically assess whether eccentric orbits are preferred, we refit the orbit fixing e = 0 and ω = 0 resulting in a fit with two less parameters. Similar to (Wang et al. 2018) in assessing the coplanarity of the HR 8799 planets, we compared the Bayesian Information Criterion (BIC) of the fit that allowed eccentric orbits with the fit that fixed the orbit to be circular, and found that the BIC disfavors the circular orbit by 9.9. The reduction in model parameters for a purely circular orbit does not compensate for an increase in fitting residuals, so we disfavor circular orbits for a single planet model. However, additional confusion on this measurement could be due to a second planet in the system (Lagrange et al. 2019b). The second planet β Pic c would induce epicycles in the apparent orbit of β Pic b around the star due to the gravitational influence of the second planet on the orbit of the host star. Using parameters for β Pic c from Lagrange et al. (2019b), the magnitude of these epicycles are several hundred µas, so well detectable by GRAVITY, but hidden beneath the uncertainty of previous astrometry. Thus, they would also bias this single GRAVITY measurement, and continued astrometric monitoring is required to separate out the signal of the separate planet from a possibly eccentric orbit of β Pic b.
However, a moderate eccentricity would fit nicely in the dynamics of the system. An e ≈ 0.15 is consistent with the picture of an eccentric β Pic b launching small bodies towards the star, causing spectroscopic and transiting signatures of exo-comets in observations of the star (Thébault & Beust 2001;Zieba et al. 2019). An interesting question is how such a massive planet acquired a significant eccentricity. The obvious conclusion would point to a second massive planet in the system, such as the radial velocity detected β Pic c (Lagrange et al. 2019b). Otherwise, Dupuy et al. (2019) proposed that if the planet had formed further out and migrated inwards, resonant interactions with the circumstellar disk could pump up its eccentricity to the values we observe today. Characterizing the detailed structure of the circumstellar dust in the system as well as the chemical composition of β Pic b could test this theory.
Generally, the other orbital parameters of β Pic b have already been sufficiently well constrained previously that out results agree with the conclusions drawn in previous works (Millar-Blanchaer et al. 2015;Wang et al. 2016;Lagrange et al. 2019a;Dupuy et al. 2019). We still find that the planet did not transit the star in 2017, and that the Hill sphere of the planet did transit. Assuming a planet mass of 12.9 ± 0.2 M Jup , we find a Hill sphere ingress at MJD 57852 ± 2 (2017 April 8) and a Hill sphere egress at MJD 58163 ± 2 (2018 February 13). The closest approach, which does not require an assumption on the planet's mass, is at MJD 58008 ± 1 (2017 September 11), with the planet passing 8.57 ± 0.13 mas from the star (0.166 ± 0.003 au in projection). The precise astrometry of the GRAVITY epoch post conjunction has significantly improved the transit ephemeris from Wang et al. (2016).

Dynamical mass determination
A significant astrometric acceleration for the star β Pic was detected when comparing its average velocity over the course of the Hipparcos mission and the average velocity inferred by the change in position of the star between the Hipparcos and Gaia missions (Snellen & Brown 2018;Kervella et al. 2019). Assuming this acceleration is due entirely to β Pic b, Snellen & Brown (2018) and Dupuy et al. (2019) used it in conjunction with the visual orbit to measure a dynamical mass for the planet. Snellen & Brown (2018) Table 2. Orbital Parameters of β Pic b. Listed are fits using just astrometry of the planet (Section 3.1) and also including measurements of the stellar orbit for dynamical mass estimates of the planet (Section 3.2). For each fit, the first column lists the 68% credible interval centered about the median. The second column lists the fit with the maximum posterior probability. We note that this the best fit orbit is generally not the best estimate of the true orbit. However, it is useful as a valid representative orbit, whereas using the median of all of the orbital parameters often is not a valid orbit due to complex covariances.  (2018) orbit fit, we include five additional parameters in the fit: the position and proper motion of the star in RA and DEC, as well as the mass of the planet. We also switch the prior on parallax to a uniform prior between 50.24 and 52.64 mas, since the Hipparcos intermediate astrometric data now constrains this parallax. To repeat the Dupuy et al. (2019) analysis, we only fit for changes in the tangential velocity of the host star, so we do not need to fit for its actual position and proper motion. We only include a RV offset and RV jitter term for the stellar RV data. We modified the orbitize! custom likelihood function to include these measurements of the host star, and repeated the orbit fit.
We list the orbital and mass constraints in Table 2, marginalizing over astrometric parameters of the host star and stellar RV calibration numbers in the two fits. We also plot the posterior probabilties for the mass of β Pic b in Figure 3. In the fit using the Hipparcos IAD, the semimajor axis and eccentricity posteriors now favor slightly higher values by 1σ. We find a dynamical mass of β Pic b of 12.7 ± 2.2 M Jup , which is consistent with the values from Snellen & Brown (2018). Conversely, using the recalibrated stellar astrometry from the Brandt (2018) HGCA catalog and the stellar RVs, we find a slightly lower semi-major axis and eccentricity by 1σ than the relative astrometry only fit. Despite these minor differences, all three fits considered in this work favor an eccentricity between 0.1-0.2. We do not find orbital solutions with e > 0.25 as has been suggested by Dupuy et al. (2019). In the HGCA fit, we also find a weaker dynamical mass constraint for β Pic b of 14.2 +3.7 −3.9 M Jup , which is consistent with Dupuy et al. (2019). The Hipparcos IAD method provides more stringent constraints on the planet mass, likely because it has smaller uncertainties. It is unclear whether this better constrain is unbiased, or if the uncertainties are underestimated due to calibration systematics or effects of other planets on the stellar astrometry. However, as seen in Figure 3, both fits agree with each other, and both dynamical masses are consistent with hot-start derived masses of 12.7±0.3 M Jup from Morzinski et al. (2015) and 12.9 ± 0.2 M Jup from Chilcote et al. (2017). More accurate stellar astrometry or RVs are necessary to test hot-start evolutionary models more stringently, given that the model-dependent hot-start masses have an order of magnitude better precision than the dynamical masses.

Previous work
Physical parameters of β Pictoris b have been reported in a number of previous studies (see Table 15 of Morzinski et al. (2015), Table 2 in Chilcote et al. (2017) for a summary of these results). The temperature of the planet has been estimated by several authors, using atmospheric or evolutionary model grid fitting, on photometric and/or spectroscopic data. The most extensive study to date was performed by Chilcote et al. (2017), who obtained GPI spectroscopic data at R 50 in Y, J, H, and K-band, as well as photometric points in different bands ranging from 1 to 5 µm. Using different atmospheric models (BT-Settl: Allard et al., 2012; DRIFT-PHOENIX: Woitke & Helling, 2003, Helling & Woitke, 2006; AMES-DUSTY: Chabrier et al., 2000;Allard et al., 2001), they obtained values ranging from 1650 K to 1800 K for the temperature, and 3.0 to 4.5 for log(g/g 0 ). These values are similar to what is reported in Bonnefoy et al. (2013Bonnefoy et al. ( , 2014, Chilcote et al. (2015), and Morzinski et al. (2015), with the same models.
The lower limit of the range of temperatures estimated comes from Baudino et al. (2015). Using their Exo-REM model grid, and a set of photometric data only (the GPI spectrum was not available at the time), they derived a temperature of 1550 K, and a surface gravity of log(g/g 0 ) = 3.5.

ExoREM atmospheric grid fitting
Using either the GRAVITY K-band spectrum only, or the GRAVITY K-band and GPI YJH bands spectra, we performed a grid model fitting using the newest ExoREM grid (Charnay et al. 2018).
We performed a χ 2 -based grid model fitting on the GRAVITY K-band only data, using the same ExoREM model grid as used to fit the GRAVITY HR 8799 e spectrum in Gravity Collaboration et al. (2019), ranging from 400 to 1800 K in temperature, with a step-size of 50 K, from 3.0 to 5.0 in log(g/g 0 ), with a step-size of 0.2, for a metalicity of [Fe/H] = −0.5, 0, and 0.5, and with a Solar C/O ratio. The best fit was obtained for a Solar metallicity, a temperature of 1750 K, and a log(g/g 0 ) of 3.30. However, this best fit also leads to a mass of 1.3 M Jup , more than 5 σ away from our estimate given in Table 2. To force the result of the fit to be in agreement with our mass estimate, we added a mass prior in the χ 2 calculation. We used a weight for the prior similar to the weight of the entire GRAVITY spectrum: in which F data and F model represent the flux from the data and from the model at the different wavelengths, σ F the error on the data, and m the mass derived from the flux level.
With this new definition of the χ 2 , the same ExoREM grid led to a best fit at T = 1500 K, log(g/g 0 ) = 4.0, for a Solar metallicity. The corresponding planet radius is 1.9 R Jup , and the mass is 14 M Jup , compatible with the estimate of Section 3. However, we find that the fit itself was not very good, with a χ 2 red value of 6.8. The CO region around 2.3 µm was particularly poorly fitted.
The fit was improved by generating a second ExoREM grid, which included the C/O ratio as an additional parameter. The grid was generated on the same range of temperature, surface gravity, and metallicity, for C/O values ranging from 0.3 to 0.8, with a step of 0.05.
Without the mass prior, the new grid yielded a best fit corresponding to a temperature of 1700 ± 50 K, a surface gravity of log(g/g 0 ) = 3.5, a metallicity of -0.5 (the lowest value available in our grid), and a C/O ratio < 0.3. The resulting planet mass remained too low, at 2 M Jup . Adding the mass prior in the definition of the χ 2 , as in Equation 2 led to a temperature of 1550 ± 20 K, a surface gravity log(g/g 0 ) of 4.0, a metallicity of [Fe/H] = 0.5 (highest value from our grid), and a C/O ratio of 0.41 ± 0.05, for a planet mass of 11.5 M Jup , in very good agreement with the result of Section 3. Contraining the fit to Solar metallicity resulted in a very low C/O ratio of 0.3, with similar temperature and surface gravity.
Including the GPI Y, J, and H band data from Chilcote et al. (2017) and allowing for a multiplicative scaling factor between GPI and GRAVITY resulted in a temperature of T = 1590±20 K, with a C/O of 0.43±0.05, for a metallicity of [Fe/H] = 0.5. For reference, the typical multiplicative factors needed to scale the GPI spectra on the ExoREM grid were 0.85 for the Y band, and 0.9 for J and H bands.
The results of these different fits are summarized in Table 3, and the best fit obtained using GRAVITY+GPI and a mass prior is shown in Figure 4.

Retrieval forward model
In addition to fitting a model grid to the β Pic b observation, we carried out a free retrieval. To this end, the spectra were compared to the predictions of a spectral synthesis code, where the atmospheric structure was parametrized. In such an approach more weight is given on atmospheric conditions as constrained by the data, while principles such as radiative-convective equilibrium do not have to be strictly fulfilled. This approach was motivated by the work of Line et al. (2015Line et al. ( , 2017; Zalesky et al. (2019) for clear, and Burningham et al. (2017) for cloudy brown dwarfs, in which the power of free retrievals to constrain condensation and cloud processes has been demonstrated.
Our "forward model", used for predicting the spectra, was constructed using petitRADTRANS . Because the atmosphere of β Pic b is expected to be cloudy, we added scattering to petitRADTRANS. We verified the calculations by comparing to spectra of selfconsistent models for cloudy, self-luminous planets obtained with petitCODE (Mollière et al. 2015, which agreed excellently. One benefit of using a free retrieval is that one of the most uncertain physical processes, namely the formation of clouds, can be parametrized, letting the observations constrain the cloud mass fraction and particle size distribution. A related approach was taken by Burningham et al. (2017), who carried out free retrievals for cloudy brown dwarfs for the first time. Here, we assume that our clouds consist of iron and silicate particles, which fixes the location of the cloud base for a given temperature profile. The cloud parameterization of Burningham et al. (2017) was even more general. One of them retrieved the cloud location (where it becomes optically thick), scale height, the single scattering albedo, as well as the power law slope of the opacity.
For the fits presented here, we parametrized the clouds using the Ackerman & Marley (2001) cloud model. However, in contrast to the usual treatment in grid models (see, e.g. Ackerman & Marley 2001;Marley et al. 2012;Morley et al. 2014;Mollière et al. 2017;Samland et al. 2017;Charnay et al. 2018), we retrieved all of its three parameters. First, the settling parameter f sed , which is the mass-averaged ratio between the settling and mixing velocity of the cloud particles. This determines the decrease of the cloud mass fraction with altitude, which we set to be ∝ P f sed . Second, the atmospheric mixing coefficient K zz , which sets the average particle size, once f sed is fixed. In grid models using the Ackerman & Marley (2001) cloud model, this parameter is usually fixed by mixing length theory (with overshooting) or held constant. Third, the width   of the log-normal particle size distribution σ g , which is normally also kept constant. The cloud mass fraction at the bottom of the cloud was a free parameter, whereas the position of the cloud base was found by intersecting the P -T profile with the saturation vapor pressure curves (taken from Ackerman & Marley 2001, in the corrected pressure units) of the cloud species we considered, Fe and MgSiO 3 .
In the future we plan to also test the Burningham et al. (2017) models of clouds, as they are more general, and do not assume the prevalence of a certain cloud species. Moreover, retrieving the power law slope and albedo of the cloud opacities may represent a better choice: for us this is encoded in our choice of cloud species, particle sizes and width of the log-normal particle size distribution, in a non-trivial way. Based on their findings, Burningham et al. (2017) suggest that a log-normal particle size distribution may not be the ideal choice, and that a Hansen distribution (Hansen 1971) may be better.
While carrying out verification retrievals of cloudy pe-titCODE spectra, we found that we had to be very careful with how the temperature was parametrized. If the temperature model was too flexible (e.g., independent layers + p-spline interpolation, as used in Line et al. 2015), test retrievals of cloudy synthetic spectra lead to clear, hot atmospheres with shallow temperature gradients, that well matched the synthetic input spectrum, but were inconsistent with the input temperature and cloud structure. This could indicate that the cloud-free solutions occupied a larger prior volume, and were thus favored when using a Markov Chain Monte Carlo (MCMC) retrieval.
Specifically, we found it to be necessary to impose a temperature profile in the photospheric region that follows the Eddington approximation, that is where T 0 is normally the internal temperature (taken to be a free nuisance parameter here) and τ the optical depth. This shape was used from τ = 0.1 to the radiativeconvective boundary, below which we forced the atmosphere onto a moist adiabat. The optical depth was modeled via where δ and α are free parameters. A quite strict prior was imposed on α. We rejected all models where |α −α| > 0.1, whereα is the power law index measured from the opacity structure of a given forward model realization. It was obtained from estimating the Rosseland mean opacity using the non-gray opacity of the atmosphere, across the spectral range of the observations. These altitude-dependent values were then used to calculate an optical depthτ , and from thisα = dlogτ dlogP .
Here denotes the average over the photospheric region. This prior ensures that the parametrized, pressuredependent opacity is consistent with the atmosphere's non-gray opacity structure. In future applications of the  Table 3. Results obtained with the ExoREM model grid and free parameter retrieval petitRADTRANS. (*) Using a mass prior in the fit. (a) Mean value of 100 posterior samples, assuming 17 free parameters, using the GRAVITY covariance matrix. (b) Mean value of 100 posterior samples, assuming 21 free parameters, using the GRAVITY covariance matrix.
parametrized P -T we will test to not downright reject models with too large |α −α|. Instead one could adapt the loglikelihood by adding and fitting for σ α as a free parameter. Moreover, other P-T parametrizations, for example that of Madhusudhan & Seager (2009), should be tested. This parametrization was also used in Burningham et al. (2017).
In order to prevent the location of the Eddington photosphere to be unrealistically deep in the atmosphere, we also rejected models where Above the photosphere the temperature was freely variable. We modeled these high altitudes by retrieving the temperature of three locations spaced equidistantly in log(P ) space, and spline interpolating between them. The chemical abundances and moist adiabat of the atmosphere were found by interpolating in a chemical equilibrium table which contained these quantities as a function of T , P , C/O and [Fe/H]. This table was calculated with the equilibrium chemistry code described in ). In addition, we also retrieved a quench pressure P quench . At pressures smaller than P quench the abundances of CH 4 , H 2 O and CO were held constant, so as to model the effect of chemical quenching in regions where the chemical reaction timescales become longer than the mixing timescales (see, e.g., Zahnle & Marley 2014).
Using the setup described above, we were able to successfully retrieve the spectrum and atmospheric parameters for a synthetic observation of a cloudy, self-consistent model obtained with petitCODE. The implementation of the retrieval forward model presented here will be described in detail in an upcoming paper. It will contain a description of how the scattering was added, and the verification thereof, as well as the verification retrieval test.
The parameter estimation was carried out using emcee (Foreman-Mackey et al. 2013). Due to the complex priors resulting from the temperature parametrization, the high dimensionality, and a potentially multimodal posterior, the acceptance fraction is low (of the order of 1-2 %) such that one million models were drawn (started around the best-fit position of a pre-burn) to obtain results where the walker positions had converged. As is common for all parameter estimations using an MCMC method, we cannot guarantee that the retrieval results have converged to the true global maximum of the log-probability. While the multi-modality of our model is an inherent property, the acceptance fraction can be improved by setting up the chain closely around the best-fit position of the pre-burnin run (Foreman-Mackey et al. 2013). In addition, we are currently working on implementing a parameter estimation using nested sampling, which also applies a clustering algorithm for the parameter estimation. This should alleviate the low acceptance rate problem, and lead to a complete sampling of the parameter space (Feroz & Hobson 2008;Feroz et al. 2009). With these current limitations in mind, we note that we retrieved similar values as in the grid retrieval with Exo-REM in Section 4.2, and could successfully retrieve self-consistent cloudy input models when testing our method.

Retrieval parameter results
Our forward model has 17 free parameters: 6 for the temperature model described above, 3 for the abundances (C/O, [Fe/H], P quench ), 5 for the clouds (the cloud mass fraction of the MgSiO 3 and Fe at the cloud base, the settling parameter f sed , the eddy diffusion coefficient K zz , the width of the log-normal particle size distribution σ g ), the gravity log(g), the planetary radius R Pl , and the abundance of FeH (currently not included in the chemical table). We used uniform or log-uniform priors for all parameters. In addition to the parameters above we allowed for an individual scaling of the GPI (Y, J, H) bands by up to ±50 %, and by up to ±2.5 % for the GRAVITY data. A large value for the scaling of the GPI data was chosen because a similar scaling value was found when comparing the GPI and SPHERE J-band for 51 Eri b in Samland et al. (2017). The maximum scaling we retrieve for β Pic b is 13 % in the GPI Y-band, see below.
Similar to the Exo-REM analysis in Section 4.2, we ran retrievals for a GRAVITY only and GRAVITY + GPI case. In Figure 5 we show the results when fitting the GRAVITY and GPI data together, but without imposing a prior on the mass of β Pic b. For producing this plot, we sampled the posterior distribution 100 times, and plot both the spectra and the accordingly scaled data points. We give the me-  As can be seen in Figure 5, the GRAVITY data can be well fit. At least two CO bandheads at ∼2.3 micron are visible in the data. The GPI data is less well fit, which may be partially due to the high S/N of the GRAVITY data, which dominates the fit. We found that the fit of the GPI data improved when increasing the error bars of the GRAVITY data. Likewise, we found that the slope in the red part of the GRAVITY spectrum can be fit better if the GPI data are neglected. The retrieved parameters are presented in Table 3, together with the ExoRem results for comparison. Most interestingly, petitRADTRANS retrieves a mass which is consistent with the values from the astrometric measurement in Section 3, without the need of imposing a prior on the mass. Here we find M P = 15.43 +2.91 −2.79 M Jup , which is consistent with the values presented in Table 2.
From the retrievals presented here, it appears that the GRAVITY K-band data are useful for constraining the planetary C/O ratio, while adding the GPI Y, J, and Hbands is important for obtaining better constraints on the planet's gravity, and hence mass (petitRADTRANS retrieves a too low mass if the GPI data are neglected, see Section 4.4 below). This is consistent with the sensitivity of the NIR YJH bands to gravity (atomic and molecular features, as well as the H band shape), while the CO absorption in the K band is not expected to probe the surface gravity very well (see, e.g., Béjar & Martín 2018). As demonstrated here, estimating the planetary mass from a planet's spectrum alone may become feasible when applying free retrievals over a broad spectral range, such as carried out here with petitRADTRANS.

Comparison between the grid and free retrieval
In agreement with the Exo-REM fit for GRAVITY+GPI, petitRADTRANS obtains a cloudy atmosphere, and a slightly sub-solar C/O of 0.43 +0.04 −0.03 (Exo-REM found 0.43 ± 0.05). The free retrieval obtains a metallicity of 0.68 +0.11 −0.08 . This is higher than Exo-REM, where 0.5 was found, but this was at the boundaries of the Exo-REM grid, and could likely be higher. This could also be the reason for the slightly higher log(g/g 0 ) value (4.34 +0.08 −0.09 , and 4 for Exo-REM), due to the gravity-metallicity correlation. pe-titRADTRANS finds an effective temperature which is higher than in the Exo-REM fit by about 150 K (1742 ± 10 K, compared to 1590 K for Exo-REM). The larger radius found by Exo-REM is likely due to the lower temperature it retrieved, so as to conserve the total amount of flux. At the estimated age of β Pic (24 ± 3 Myr; see Bell et al. 2015), a radius of 1.7 R Jup (the value Exo-REM retrieved) requires masses in excess of 20 M Jup , and effective temperatures of around 2500 K, when considering hot start models (Spiegel et al. 2011). Core accretion models under the warm 2 start assumption, which include deuterium burning, require similarly large masses and temperatures, but potentially somewhat younger ages (Mollière & Mordasini 2012;Mordasini et al. 2017), and would put the planet firmly into the mass regime of brown dwarfs currently undergoing deuterium burning. Hence, the large radius retrieved by Exo-REM is likely to be inconsistent with the retrieved mass and temperature. The values of the mass, temperature, and radius (1.36 R Jup ) retrieved by petitRADTRANS agree with both the cold and hot start predictions, given the age of β Pic.
Also when fitting only the GRAVITY data, the pe-titRADTRANS results are mostly consistent with Exo-REM. Without a mass prior we find C/O = 0.35 +0.07 Only the temperature is larger again, at 1847 ± 55 K (petitRAD-TRANS), compared to 1700 K (Exo-REM).
In summary, a free retrieval approach gives similar results to a more classical retrieval from a grid of forward models. We note that here a free retrieval appears to lead to physically more consistent results when constraining radii and effective temperatures. Another possible cause for the differences could be how the opacities of gas and clouds are treated. For the gas opacities we note that petitRAD-TRANS uses the opacity database of petitCODE, the latter of which was successfully benchmarked with Exo-REM in Baudino et al. (2017). Small remaining differences, identified to stem from the use of different line lists in Baudino et al. (2017), have since been removed by updating the opacity database of petitCODE/petitRADTRANS in 2017.

Comparison to Chilcote et al. (2017)
A substantial analysis of the NIR spectrocopy of β Pic b was carried out in Chilcote et al. (2017), using GPI YJHK band spectra. The data were compared to low gravity and field brown dwarf spectra, the derived bolometric luminosity was compared to evolutionary models, and spectral fits were carried out with four different model grids.
Comparing their bolometric luminosity to evolutionary models (hot start models of Baraffe et al. 2003), Chilcote et al. (2017) found a mass of 12.9 ± 0.2 M Jup , an effective temperature of 1724 ± 15 K, a surface gravity of log(g/g 0 ) = 4.18 ± 0.01 and a radius of 1.46 ± 0.01 R Jup . Their mass measurement is consistent both with our astrometrically and spectroscopically inferred mass values. Moreover, the other values inferred from the the YJHK fit of petitRADTRANS are close to the evolutionary values of Chilcote et al. (2017), but not within the uncertainties of one another (e.g., 1742 ± 10 K vs. 1724 ± 15 K). As noted in Chilcote et al. (2017), these uncertainties do not contain a contribution of the model uncertainties, and the true uncertainties must be larger. The same holds for our retrievals and fits carried out here. The Exo-REM fits with mass prior lead to a similar agreement in gravity, but the radii and temperatures are further away from the Chilcote et al. (2017) values, with the planet being cooler, and thus larger, in the Exo-REM fits.
The best grid model fit of the combined photometry and spectroscopy in Chilcote et al. (2017) was obtained with Drift-PHOENIX (e.g., Helling et al. 2008), where T eff = 1651 K, log(g) = 3 and R = 1.58 R Jup was found, leading to a mass of ∼1 M Jup . The AMES-Dusty (e.g., Allard et al. 2001) fit gave the highest mass (most consistent with our astrometric, spectroscopic and Chilcote et al. 2017's evolutionary mass), namely 17 M Jup . The AMES-Dusty best-fit values are also closer to the evolutionary parameters derived in Chilcote et al. (2017), at T eff = 1706 K, log(g) = 4.5 and R = 1.18 R Jup , but at an overall worse χ 2 than the Drift-PHOENIX fit.
In summary, the results of our spectral characterization compare well with the evolutionary values inferred for β Pic b in Chilcote et al. (2017). Especially the parameter values of the free retrieval carried out with peti-tRADTRANS are close to the evolutionary values. It is also noteworthy that our masses, inferred indepentently with astrometry or spectral retrieval, are consistent with the evolutionary mass of Chilcote et al. (2017). Holweger et al. (1997) have shown that the abundances of several elements (C, Ca, Ti, Cr, Fe, Sr, Ba) on the surface of β Pictoris are Solar. But measuring the abundance of oxygen in stars is a notoriously difficult task, due to line blending, deviations from local thermal equilibrium predictions, or sensitivity to the 3D temperature structure of the star (Asplund 2005). As a consequence, to our knowledge, the abundance of oxygen -and hence the C/O ratio -has not yet been reported in the literature. We note, though, that a subsolar C/O ratio (i.e., 0.4) would invalidate most of the following discussion.

Stellar and planetary C/O ratio
In our atmosphere analysis, both the ExoREM grid model fitting and the petitRADTRANS free retrieval point to the same result: the C/O number ratio in β Pictoris b is 0.43 ± 0.05, which is subsolar (the solar C/O ratio is 0.55, see Asplund et al., 2009).
A number of studies have been done in the recent years to try to find links between planetary formation processes   (2011) first attempted to relate the C/O ratio to the position of the different icelines in a protoplanetary system, and to the proportion of gas and solid material accreted by a young planet. They concluded that substellar C/O ratio was a sign of a formation by either gravitational collapse or coreaccretion, followed by icy planetesimal enrichment. The objective of this section is to show that the C/O ratio can possibly be used to disentangle the two formation scenarios.

General model for the evolution of the C/O ratio
In a similar fashion as toÖberg et al. (2011), we assume that the main sources of carbon and oxygen in the protoplanetary disk in which β Pic b formed were CO, CO 2 , H 2 O, silicates and carbon grains. Assuming a solar C/O ratio for the star, the table of relative abundances given in Oberg et al. (2011) is valid, and we use it as a baseline to set the abundances of each species (see Table 4).
In the framework developed byÖberg et al. (2011), the C/O ratio in the atmosphere of a planet can be calculated from the amount of solid and gaseous material entering its composition. We denote n X,s (resp. n X,g ) the abundance of element X in the solid phase (resp. gas phase) of the disk, given in number of atoms per unit of disk mass. We also write M solid (resp. M gas ) the total mass of solid (resp. gas) entering the composition of the atmosphere of the planet, and f s/g the dust-to-gas fraction in the disk, which we assume to be equal to 0.01. With these notations, the total number of elements X in the atmosphere of the planet is given by: And the C/O number ratio is then: Note that both the numerator and the denominator can be given relative to a reference species without affecting the validity of this Eq. (9). In Table 4 and in all the following, we implictly use abundances relative to H 2 O.
The exact values of n C,s , n C,g , n O,s , and n O,g depends on the abundances given in Table 4, and on the state (solid or gaseous) of each species. and hence on the location of the forming planet with respect to the different icelines.
Using ALMA observations, Qi et al. (2015) have shown that the CO iceline in the disk around HD 163296 was likely to be located at 90 AU from the star. Other observations of the same system, also performed with ALMA, led Notsu et al. (2019) to conclude that the water iceline was located at a distance of ≤ 20 AU. Since HD 163296 is also an A-type star, these two values give an idea of the possible location of the H 2 O and CO icelines in the β Pic system. However, little is known about the relationship between the current orbit of β Pic b and its exact formation location, and about possible variations of the locations of these icelines between systems. Thus, no definitive assumption can be made as to where the planet formed in comparison to the water iceline, and the two options must be considered: a formation within the water iceline, and a formation between the water and the CO 2 icelines.
From there, the terms n C,s , n C,g , n O,s , and n O,g from Eq. (9) can be determined from the values listed in Table 4. For a planet forming within the water iceline, we have: n O,g = n H2O + n CO + 2 × n CO2 = 3.33 n O,s = n O (silicates) = 2.12 n C,g = n CO + n CO2 = 2.0 n C,s = n C (grains) = 0.67 (10) And for a planet forming between the water and CO 2 icelines: n O,g = n CO + 2 × n CO2 = 2.33 n O,s = n H2O + n O (silicates) = 3.12 n C,g = n CO + n CO2 = 2.0 n C,s = n C (grains) = 0.67 (11)

C/O ratio in the gravitational collapse paradigm
We consider the case of a formation through gravitational collapse (Bodenheimer 1974), a violent mechanism which shares similarities with star formation. In this scenario, an entire region of the circumstellar disk becomes unstable, and rapidly collapses to form a protoplanet, which then slowly contracts and cools down. The total mass of solid entering in the composition of the atmosphere of a planet formed through gravitational collapse can be separated in two terms: the mass of solid initially contained in the disk fragment which collapsed to create the protoplanet, and the mass of solid planetesimals later accreted by the protoplanet. The solid mass contained in the initial clump is directly related to the dust-to-gas ratio of the disk, and we can write: This equation assumes that no core has formed in the young protoplanet, which, for a planet as massive as β Pic b is reasonable (Helled & Schubert 2008). For a planet less massive, for which a core could form, sedimentation of a fraction of the initial solid mass on the core should be taken into account.
Injecting the definition of M solid into Eq. (9), and using f s/g = 0.01, M planet = 12.7 M Jup , as well as the values for C and O abundances given in Eq. (10) or (11), it is possible to determine the C/O ratio as a function of the mass of accreted planetesimals M accreted in the gravitational collapse paradigm. The results is given in Figure 6, for two possible formation locations: within the water iceline, and between the water and carbon dioxide icelines. We have also added the 1 σ confidence intervals of our ExoREM and petitRADTRANS measurements on this graph. This figure shows that a formation bewteen the H 2 O and CO 2 icelines is more favorable to a large deviation from the stellar C/O ratio, mainly due to the injection of oxygen coming from solid water ice during planetesimal accretion.
The formation of a planet by gravitational instability can be separated in a few different steps (Bodenheimer 1974): formation of the initial clump in the disk, quasiequilibrium contraction, hydrodynamic collapse, and a new hydrostatic quasi-equilibrium phase. Accretion of planetesimals is thought to be efficient only during the pre-collapse phase (Helled & Schubert 2009). The duration of this phase decreases with increasing planet mass, and typical values ranges from a few 10 5 years for a Jupiter mass planet, to less than 10 3 years for more massive planets (Decampli & Cameron 1979;Bodenheimer et al. 1980). Using the model proposed by Helled & Schubert (2009), the mass of planetesimal accreted during the pre-collapse phase of β Pic b can be estimated using: Where t collapse is the time of collapse, R capture the protoplanet's capture radius, σ the surface density of solids in the disk at the location of the protoplanet, and Ω the orbital frequency. Andrews & Williams (2005) presented a large survey of 153 young stellar objects in the Taurus-Auriga star forming region. Among all these objects, AB Aur and V892 Tau are two A-type stars, for which they give an estimate of the mass: 0.004 M and 0.009 M . Considering all stellar types, the median disk-to-star mass ratio they found is 0.5%. More recent studies of protoplanetray disk demographics based on ALMA observations yielded similar results, with typical dust to star mass ratios of 10 −4.5 (Pascucci et al. 2016;Ansdell et al. 2017), i.e. disk-to-star mass ratios of 0.3%, assuming a dust-to-gas ratio of 1%.
Considering the upper limit of an extremely massive disk (M disk = 0.1 M ), and using a power-law for the surface density (σ = σ 0 (a/5 AU) −α , with α = 1.00), the solid density at a = 11 AU is: The orbital period of the planet is ∼ 20 yr (Wang et al. 2016;Lagrange et al. 2018, Section 3 of this work). The capture radius decreases with the contraction of the planet, but an optimistic value would be 2 to 3 × 10 12 cm for a 1 M Jup planet (Helled et al. 2006). For a planet 10 times more massive, the effective radius could be 5 × 10 12 cm. This yields: The corresponding accretion limit has been added to Figure 6, for a reasonable assumption of t collapse = 10 3 yr. Taking into account the effective time available for efficient planetesimal accretion during the pre-collapse stage, the low C/O ratio measured with GRAVITY is difficult to explaine, even in the case of a planet forming oustide the H 2 O iceline. For the C/O ratio to reach a value of 0.43, we need to assume a massive protoplanetary disk and an unusually long time for the pre-collapse phase, or an extremely efficient accretion (with an accretion rate of 4 × 10 −3 M Earth /yr).

C/O ratio in the core-accretion paradigm
Core-accretion is another formation mechanism, in which an initial solid core forms, and slowly accretes gas from the disk. When the mass of gas is roughly the same as the mass of the core, the protoplanet enters a phase of "runaway gas accretion", during which it gains significant amount of gas over a short time (Lissauer & Stevenson 2007). In this scenario, the formation of a planet is a much longer process than with gravitational instability, which gives more time to enrich the proto-atmosphere in solid material and to lower its C/O ratio. Mordasini et al. (2016) explored the effect of planetesimal enrichment coupled with disk composition, in a coreaccretion scenario. They focused on the case of Jupiter mass planets migrating to short period orbits ("hot Jupiters"), which is a different archetype than β Pic b. But the general sequence of events they use to form their planets in the core-accretion paradigm can still be applied to β Pic b, only leaving out the inward migration part. First, the core of the planet forms from the accretion of solid material. Then, once the core has formed, the protoplanet starts accreting a gaseous envelope which, during its formation, is enriched by the accretion of disintegrating planetesimals. When the planet reaches a critical mass, runaway accretion occurs, and the mass of the planet significantly increases. This runaway gas accretion clears a gap in the disk, and ends the formation of the planet.
In the gravitational instability scenario, because the formation of the planet happens so quickly compared to typical timescales of disk evolution, the gas and solid making the atmosphere necessarily have a stellar combined C/O. If the solid and gas in the atmosphere are in the same proportion as they are in the disk (M solid = f s/g M gas ), the C/O of the atmosphere is stellar. A deviation of the solid to gas proportion in the atmosphere is required to alter the C/O ratio.
In the case of core-accretion, the situation is different. Witout planetismal enrichment before the runaway gas accretion phase, the atmosphere of the planet would not be made of a mixture of gas and solid material, but purely of gas. Thus, without planetesimal enrichment, the atmospheric C/O ratio in the core-accretion paradigm can be expected to be close to the C/O ratio of the gas in the disk, that is, superstellar.
In this core-accretion paradigm, it is still possible to use Eq. (9) to calculate how the final C/O ratio of the atmo-sphere is impacted by the mass of solid material accreted before the runaway accretion phase. But in this case, all of the solid mass M solid corresponds to accreted material: M solid = M accreted , as opposed to Eq. (12).
In Figure 7, we show the evolution of the C/O ratio as a function of the mass of accreted planetesimals, for a formation thourgh core-accretion, within the water iceline, or between the water and CO 2 icelines. In this scenario, it is possible to reach C/O values compatible with our GRAVITY measurement with accretion of 80 M Earth , if the planet formed between the water and CO 2 icelines. A formation within the water iceline is more diffcult to explain, as it would require at least 150 M Earth of solid material enrichment to reach the upper limit of the 1 σ interval on the C/O measurement, and up to several 10 2 M Earth to reach a value of 0.43.

Summary and conclusions
In this work, we presented the first VLTI/GRAVITY spectro-interferometric observation of the giant planet β Pictoris b. Using an adequate data reduction technique detailed in the appendix of this paper, we extracted a high quality K-band spectrum of the planet, at a resolution of R = 500. We also derived the most precise relative astrometry obtained to date on this object, with an error of 40 µas.
We find that the astrometry disfavors circular orbits for β Pic b, with a value of e 0.15 +0.05 −0.04 . It remains unclear how a massive planet like β Pic b can acquire such a significant eccentricity. Using this new astrometric datapoint together with previously published visual astrometry and Hipparcos/Gaia data, we were able to derive an estimate of the dynamical mass of β Pictoris b, in a similar fashion as to what Snellen & Brown (2018) and Dupuy et al. (2019) did. Our value is compatible with these previous studies, with a best estimate of 12.7 ± 2.2 M Jup .
We were also able to retrieve a similar mass, albeit with larger error bars, using only the spectral data. Using a free retrieval, including the effect of scattering and clouds, with petitRADTRANS (Mollière & Snellen 2019) to fit the spectrum of β Pic b in Y, J, H, and K bands (Y, J, H from Chilcote et al. 2017, K from this work), we obtained a mass of 15.43 +2.91 −2.79 M Jup . This constitutes a rare case of validation of an atmospheric model with a model-independent measurement.
We performed an in-depth analysis of the K-band spectrum extracted from our GRAVITY observation using two different approaches: forward modeling with the ExoREM code (Charnay et al. 2018), and free-retrieval with peti-tRADTRANS. We found that both approaches point to a C/O ratio of C/O = 0.43 ± 0.05.
We showed that, if the C/O ratio of the host star β Pictoris is Solar, it is difficult to explain this C/O ratio with a gravitational collapse formation scenario. This is mainly due to the high mass of β Pictoris b, which has the dual consequence of requiring large amount of planetesimal enrichment to lower the initial C/O ratio, while at the same time making the whole formation process extremely short. In this case, it appears that a slower formation via coreaccretion, somewhere between the H 2 O and CO 2 icelines, is more likely. This scenario can potentially explain the subsolar C/O ratio if the planet was enriched in oxygen by icy planetesimal accretion.
The high metal enrichment we retrieve from the spectral fits appears to corroborate this assessment, with the exact value being quite high and at the edge of what is expected from classical core accretion Mordasini et al. (2016).
This model still comes with several important limitations. One of them is that the exact compositition of the initial protoplanetary disk around β Pic remains largely unknown. Another major issue is the efficiency of the planetesimal enrichment, which we have assumed to be of 100% (i.e., all the solid material accreted by the planet is disintegrated in the atmosphere). This is unlikely to be the case, as fraction of this material can be deposited into the planetary core, or can stay at the bottom of the atmosphere. This is particularly true for the core-accretion scenario, in which the solid material is accreted before most of the gas (Mordasini et al. 2016). Strong vertical mixing can potentially mitigate this problem, but further studies are required to be able to take into account these phenomena. Finally, disk chemistry may also play a role. For example, Eistrup et al. (2018) have shown that a large fraction of water molecules can be transformed into dioxygen (O 2 ) over a few Myr, along a chemical pathway detailed in Walsh et al. (2015). Oustide of the water iceline, such a chemical evolution can potentially deplete the solid material from its oxygen, while enriching the gas.
The observations of β Pictoris b presented in this paper show the potential of long-baseline optical interferometry with VLTI/GRAVITY for exoplanet science. The instrument gives access to medium resolution spectroscopy in K-band and high-precision astrometry, which are both extremely useful to characterise giant exoplanets and to start peering into their formation history. The data reduction used to extract the beta Pictoris b signal from the GRAVITY observations makes heavy use of complex linear algebra, complex error formalism, and maximum likelihood estimation. To avoid confusion and mistakes, complex numbers in this appendix are underlined (e.g., V ), whereas real numbers are not (e.g., X).
Most of the GRAVITY data manipulated are quantities which depends on the wavelength λ. These quantities can be represented as vectors of size n λ (the number of wavelength channels) by concatenating the individual values. These vectors are denoted using a bold font. For example, in the case of the complex visibility obtained on baseline b at time t, we denote: For a given DIT, it is also possible to concatenate all baselines to create a vector of size n b × n λ , where n b = 6 is the number of baselines. In this case, the subscript b is dropped: The complex-conjugate of a complex number V is denoted V * , and the complex-transpose of a vector or matrix A is denoted A † . It is defined by: A † = A * T where T is the transpose operator.
All the λ-vectors are understood as elements of a n λdimension complex linear space (i.e. a linear space for which the scalar field is the set of complex numbers C, rather than the set of real numbers R). Adding the natural scalar product operator (i.e. V 1 , V 2 = V † 1 V 2 ) makes this linear space an Euclidean space. This mathematical structure allows for several useful concepts: it is possible to compute othogonal projections, to use projector matrices, to define othogonal and/or orthonormal basis, etc.
The data set can be subdivided into two parts: the observations taken with the science fiber on the planet, and the observations taken on the star (see observing log in Table 1). On-planet and on-star phase-referenced visibilities are calculated from the coherent fluxes measured by GRAVITY, called VISDATA in the FITS files generated by the pipeline. The VISDATA are complex numbers, affected by noise. The GRAVITY pipeline reports these errors in another set of complex numbers, called VISERR. The real part of VISERR contains the uncertainties on the real part of VISDATA, and the imaginary part of VISERR contains the uncertainty on the imaginary part of VISDATA. These errors do not take into account any possible correlation between different spectral channels, or between the real and imaginary parts of the visibility. To take into account such correlations, it is necessary to use the covariance/pseudocovariance formalism of complex random variables.
In our data reduction algorithm, the GRAVITY pipeline errors are systematically replaced by an empirical estimate of the covariance and pseudo-covariance matrices of the visibilities. We assume that the noise affecting the measurements does not vary significantly over the individual DITs of a single exposure file (∼ 5 min), but can vary from file to file. We also allow for correlations between different spectral channels and/or between different baselines. Under these assumptions, the errors on the coherent fluxes are best represented by a set of n EXP (the number of exposure files) covariance matrices W k and n EXP pseudo-covariance matrices Z k , both of size n b × n λ , where n b = 6 is the number of baselines and n λ = 235 is the number of wavelength channels. The covariance and pseudo-covariance matrices for each exposure file are estimated directly from the DITs sequence: where the dummy t runs over the n DIT DITs of the k-th exposure.
The covariance and pseudo-covariance matrices are always related to the covariance of the real and imaginary parts by the following equations: The covariance and pseudo-covariance matrices can be propagated during the data reduction algorithm by using the complex error propagation equations: with A any complex matrix of appropriate size. The W and Z matrices can also be used to resolve linear equations involving complex data. In the case of an unknown real parameter vector X, the solution of the linear problem V = AX (in the sense of maximum likelihood) is:X For a complete mathematical derivation of Eq. (A.9), we refer the reader to Appendix B of Nowak (2019).

A.2. Pipeline reduction and phase referencing
The initial step uses the pipeline reduction and is common to all VLTI/GRAVITY observations. It consists in extracting the complex visibilities from the raw data, using the ESO pipeline (Lapeyrere et al. 2014). The pipeline takes care of the background subtraction, flat-field correction, bad-pixel interpolation and P2VM multiplication (Tatulli et al. 2007). It also corrects the phase of the visibilities using the metrology data, and combines all DITs within each exposure. This last step performed by the pipeline (averaging of all DITs within each exposure file) is unwanted for exoplanet observations (see Section A.1). Thus, for the β Pic b observations, an intermediate file product generated by the pipeline is used: the "astrored" files, in which all DITs are kept separate. The complex visibilities contained in the "astrored" files are not corrected for the metrology and fringe-tracker zero-point, and thus the correction must be applied manually (see recipe in Nowak, 2019).
For each baseline b, and each time t (i.e. for each DIT), the wavelength-dependent complex visibility VISDATA onstar and VISDATA onplanet extracted by the pipeline (and with the above-mentionned corrections) are then "phase-referenced" to the star: where arg (VISDATA onstar ) is the phase of the stellar complex visibility as measured by GRAVITY when the science fiber is positioned on the star. This phase-referencing step is performed both on the star exposures, and on the planet exposures. When dealing with a star exposure, phase-referencing the visibility is mathematically equivalent to extracting the modulus of the visibility. But when dealing with on-planet exposures, a problem arises: the instrument does not simultaneously observe both the planet and the star. Thus, the quantity arg (VISDATA onstar ) must be estimated from the available star exposures. In the observing strategy used for acquiring the β Pictoris data reported here, a star exposure was performed before and after each on-planet exposure. For each on-planet exposure, the phase reference is then simply estimated by taking the phase of the stellar complex visibily averaged on these two star exposures (before and after the on-planet observation).

A.3. A model for the on-planet visibility
In the absence of stellar flux, the on-planet visibility measured by the instrument and phase-referenced to the star can be written: in which V planet is the planet astrophysical visibility phasereferenced to the star, and G is the instrumental response. We note that the visibilities are not calibrated, meaning that the visibility at zero frequency is not 1, but the unnormalized flux. Therefore, as long as the planet remains unresolved by the instrument, its astrophysical visibility is given by: in which (U, V ) are the coordinates of baseline b in the UVplane, (∆α, ∆δ) the sky-coordinates of the planet relative to the star, and S planet (λ) the spectrum of the planet. Given the typical VLTI baseline lengths (between 45 and 130 m with the UTs), and the expected β Pic b planetto-star separation at time of observation ( 140 mas), the exponential term in the above equation should produce significant oscillations of the complex visibility over the GRAVITY wavelength range (1.9 to 2.35 µm). But the phase-referenced on-planet visibilities extracted from our β Pic b observations show no such oscillations. The reason is that the data are dominated by remaining starlight, which needs to be taken into account.
To take into account the coherent starlight leaking into the fiber, Eq. (A.15) must be modified with an additional term, proportional to the stellar phase-referenced visibility V star . In practice, since the leaking starlight does not originate in the direct coupling of the star to the fiber, but rather in the coupling of speckle noise to the fiber, this term needs to be multiplied by a polynomial in λ to account for its chromaticity. The model is now given by: with λ → Q(b, t, λ) a polynomial function in λ, whose coefficients vary with baseline b and time t.
If the star is not resolved by the instrument, its astrophysical phase-referenced visibility corresponds to its spectrum. If the star is partially resolved by the instrument, the spectrum needs to be multiplied by a term accounting for the resulting drop in visibility, which depends on the angular size of the star, limb-darkening model, etc. Explicitly separating these two terms, the referenced stellar astrophysical visibility can be written using the following equation, in which S (λ) is the star spectrum, and J a function accounting for the visibility drop due to the star geometry (typically, J is a bessel function of first order): Going back to Eq. (A.17), the planet term in the righthand side can be factored by V by introducing the planetto-star contrast spectrum C(λ) = S planet (λ)/S (λ): The on-star equivalent of Eq. (A.17) is simpler, as the reference visibility observed on-star only depends on the stellar referenced visibility and the instrumental response: This provides a natural way to estimate the term GV star in Eq. (A.19), and thus to calibrate V onplanet : Equation (A.21) shows how the physical quantities of interest (i.e. the contrast spectrum C(λ) and the planet separation ∆α, ∆δ) are encoded in the on-planet data. Even taking into account the filtering of the starlight by the offaxis fiber, as well as the only partly coherent nature of the speckle noise, the polynomial Q in the right-hand side of Eq. (A.21), which model the stellar residuals, is still a factor 20 to 30 superior to the planet signal. It is only because of the phase modulation naturally introduced by the planet separation that this planet signal can be retrieved. To do so, we proceed in two steps: we first extract the star-planet separation vector under some hypothesis on the contrast spectrum, and we then extract the contrast spectrum using the estimated separation vector. The two steps are iterated on time, to check the consistency of the results.
In matrix notations, the multiplications by G, J −1 , the exponential, or even the polynomial Q can all be represented by diagonal-matrix multiplications. We write: where m is the order of the polynomial Q, and the a k s are complex coefficients used to describe the polynomial. The vector C is defined from C(λ) using the notations introduced in Section A.1, 1 is a column vector filled with 1's, and the matrices Λ, J −1 , and Φ are all diagonal matrices of size n λ × n λ defined by:

A.4. Extracting the astrometry
At the initial iteration, the planet to star contrast spectrum C in Eq. (A.22) can most generally be replaced by a flat spectrum. In the case of β Pictoris b, the temperature and surface gravity of the planet are known from previous work (Chilcote et al. 2017). Thus, the contrast spectrum C can be set to a model value. We use a BT-Settl model (Baraffe et al. 2015), at T = 1700 K and log (g/g 0 ) = 4.0 (planet spectrum), divided by a BT-NextGen model (Hauschildt et al. 1999) at T = 8000 K and log(g/g 0 ) = 4.0 (the star). The contrast spectrum being set to a pre-determined value, Eq. (A.22) becomes a model at (m + 1) × n b n DIT complex parameters (the a k s), and 2 real parameters (∆α and ∆δ). Interestingly, this model is linear in all the a k , and nonlinear in ∆α, ∆δ. To fully benefit from this for the modelfitting, Eq. (A.22) can be re-arranged in a pseudo matrix form, with real parameters. Introducing: And A ∆α,∆δ b,t defined column by column: We have: For a given ∆α and ∆δ, the corresponding best estimate of x b,t (in the sense of the maximum likelihood) is given by: where W 2 is a matrix composed of the covariance and pseudo covariance matrices of U b,t as defined in Eq. (A.11).
The log-likelihood log L b,t (∆α, ∆β), restricted to baseline b, DIT t, and to the nonlinear parameter ∆α and ∆δ is then given by the following equation, in which the dependance in ∆α, ∆δ of the right-hand side is implicit in the definition of A andx.
28) The total log-likelihood can be obtained by summing over all baselines b and DITs t: With the expression of A b,t from Eq. (A.25) andx from Eq. (A.27), this gives a closed-form expression of the loglikelihood in ∆α, ∆δ, from which a map can be calculated, in order to extract the best estimate with the associated error bars.

A.5. Extracting the spectrum
Extracting the spectrum from the GRAVITY observations is more difficult than extracting the astrometry for two reasons: first, due to the dimensionality of the problem (n λ > 200), a log L map approach is impractical; second, the stellar residuals affecting the on-planet visibility can lead to a degenerated solution for the contrast spectrum C.
The impact of the stellar residuals on the contrast spectrum can be quantified by using Eq. (A.22) again. The calculations in the rest of this section are simpler when considering all visibilities shifted to the planet position. To do so, we multiply both sides of Eq. (A.22) by the inverse of Φ ∆α,∆δ b,t . From now on, we will denoteṼ b,t any "shifted" visibility V b,t . Eq. (A.22) becomes: From there, we can introduce the subspace C m [Λ]1 of C n λ , defined as the subspace generated by the family of m+ 1 vectors Λ 01 , . . . , Λ m1 (i.e., the subspace of vectors which are linear combinations of these m+1 vectors). Introducing this subspace if of course motivated by the fact that the stellar residual term is part of it. We can then introduce the projector matrix orthogonal to this subspace, which we denote 3 P C m  Figure  5), described in Section 4.3. See the text in Section B for a description of the parameters. Panel (b): pressure-temperature envelopes obtained for the same retrieval. At every pressure, we plot the 16 to 84-percentile envelopes in dark blue, and the 2.5 to 97.5 percentile envelopes in light blue. If the temperature values were following a Gauss distribution, this would correspond to the 1 and 2 σ envelopes, respectively.

Appendix B: Posterior of the petitRADTRANS fit
In Figure B.1 we show the posteriors of the GRAVITY+GPI fit with petitRADTRANS, described in Section 4.3. Panel (a) shows the corner plot for all but the temperature nuisance parameters. Panel (b) shows the retrieved temperature uncertainty envelopes.
The parameters shown are the following: the C/O, adjusted by changing the oxygen abundance at a given [Fe/H].
The metallicity [Fe/H], which was used to scale the number fraction of all atomic elements (except H and He) by 10 [Fe/H] . The quench pressure P quench of the atmosphere, here converged to a low enough value such that non-equilibrium chemistry effects are negligible. The mass fractions of Fe and MgSiO 3 at the cloud base. Here they are expressed in units of a log-ed decrease factor, which gets multiplied with the maximally allowed mass fraction, based on the elemental composition of the atmosphere. The stoichiometric factors of MgSiO 3 are used for finding this upper limit. The cloud settling parameter f sed , as described in Section 4.3. The log-ed eddy diffusion coefficient K zz , in units of cm 2 s −1 . This is used for calculating the cloud particle size, as described in Section 4.3. The planet's surface gravity log(g). The planetary radius R P , in units of Jupiter radii. The planetary mass M P , in units of Jupiter masses. This is calculated using the sampled log(g) and R P values. The width of the log-normal cloud particle size distribution σ g , as described in Section 4.3. The vertically constant mass fraction of FeH, expressed in units of a log-ed decrease factor. This factor gets multiplied with the maximally allowed mass fraction of FeH, based on the elemental composition of the atmosphere and the Fe atoms not yet incorporated into the Fe clouds. Finally the f GPI−Y , f GPI−J , f GPI−H and f GRAV factors describe the multiplicative scaling of the individual bands, which were allowed to vary by 50 % in the case of GPI, and by 5 % in the case of GRAVITY.
The temperature envelopes in Panel (b) of Figure B.1 are obtained by plotting, at every pressure, the 16 to 84percentile envelopes, and the 2.5 to 97.5 percentile envelopes. If the temperature values at a given pressure layer were following a normal distribution, this would correspond to the 1 and 2 σ envelopes, respectively.