Comparison of results on $N_\mathrm{eff}$ from various Planck likelihoods

In this paper, we study the estimation of the effective number of relativistic species from a combination of CMB and BAO data. We vary different ingredients of the analysis: the Planck high-$\ell$ likelihoods, the Boltzmann solvers, and the statistical approaches. The variation of the inferred values gives an indication of an additional systematic uncertainty, which is of the same order of magnitude as the error derived from each individual likelihood. We show that this systematic is essentially associated to the assumptions made in the high-$\ell$ likelihoods implementations, in particular for the foreground residuals modellings. We also compare a subset of likelihoods using only the TE power spectra, expected to be less sensitive to foreground residuals.


Introduction
The expansion rate in the early Universe depends on the energy density of relativistic particles, which is parameterised by N eff , the effective number of relativistic species. According to the Standard Model (SM) of particle physics, N eff would only receive contributions from the three neutrino species. Due to residual interactions, as the neutrinos were not completely decoupled during the electron-positron annihilation, N eff is expected to be equal to 3.046.
Any variation of the expansion rate of the Universe affects the CMB power spectra by changing the relative scales of the Silk damping relative to the sound horizon (see for instance Abazajian et al. 2013). Therefore, the current best constraint on N eff comes from the accurate measurements of the temperature and polarisation anisotropies performed by Planck.
In this paper we discuss in detail the estimation of N eff from CMB data and quantify the dependence of the results on the choices made in the analysis. We investigate different possible sources of systematic errors. We first compare the results obtained using two Boltzmann codes: CAMB and 2. Phenomenology and Methodology 2.1. Introduction N eff stands for the effective number of relativistic degrees of freedom. It relates the radiation (Ω rad ) and the photon (Ω γ ) energy densities relative to the critical density through: Under the assumption that only photons and standard light neutrinos contribute to the radiation energy density, N eff is equal to the effective number of neutrinos: N eff 3.046 (Mangano et al. 2005). This value has been derived from the number of neutrinos constrained by the measurement of the decay width of the Z boson (Beringer et al. 2012), and takes into account residual interactions during the electron-positron annihilation.

Data sets and likelihoods
The datasets and likelihoods that have been used in this paper are summarized together with their respec-tive acronyms in

low-likelihoods
At low multipoles ( < 50), the Planck public likelihood is lowTEB, based on Planck-LFI maps at 70GHz for polarization and a component-separated map using all Planck frequencies for temperature (Commander Planck Collaboration XI 2016).
In the following, we have also tested a combination of the Lollipop likelihood (Mangilli et al. 2015) with Commander in place of lowTEB, following what has been done for the latest Planck results on the reionisation optical depth (Planck Collaboration Int. XLVII 2016).

high-likelihoods
At high multipoles ( > 50), different likelihoods were developed within the Planck collaboration: Plik (Planck Collaboration XI 2016) being the one delivered to the community. They mostly differ in the assumptions made for the modelling of the foreground residuals. Since there is no valuable reason to favor an assumption or another, we use the different likelihoods to assess the impact of the various choices. We consider Plik, CamSpec (Planck Collaboration XI 2016), HiLLiPOP, and HiLLiPOPps (HiLLiPOPps being a revised version of the HiLLiPOP likelihood with a refined residual point sources model).
All those likelihoods are based on pseudo-C crossspectra between half-mission Planck-HFI maps at 100, 143 and 217 GHz (for more details, see Planck Collaboration XI 2016). The main differences are listed below: -HiLLiPOP makes use of all 15 cross-spectra from the 6 half-mission maps whereas Plik and CamSpec remove the 100 × 143 and 100 × 217 correlations together with two of the four 143 × 217 cross-spectra (for temperature data only). To avoid residual contamination from dust emission, HiLLiPOP and CamSpec do not use the multipoles below 500 for the 143 × 217 and 217 × 217 cross-spectra. -The Galactic masks used in temperature are very similar. Still, HiLLiPOP relies on a more refined procedure for the point source masks that preserves Galactic compact structures and ensures the completeness level at each frequency, but with a higher detection threshold (thus leaving more extra-Galactic diffuse sources residuals). In polarization, CamSpec uses a cut in polarization amplitude (P = Q 2 + U 2 ) to define diffuse Galactic polarization masks whereas HiLLiPOP and Plik use the same masks as in temperature. -The approximations used to calculate the covariance matrix which encompasses the -by-correlations between all the cross-power spectra are slightly different.
Plik and CamSpec assume a model for signal (from cosmological and astrophysical origin) and noise (with slight differences in the methods used to estimate noise).
In HiLLiPOP, it is estimated semi-analytically with Xpol (a polarized version of the power spectrum estimator described in Tristram et al. 2005) using a smoothed version of the estimated spectra (Couchot et al. 2017b). -HiLLiPOP uses templates for the Galactic dust emission derived from Planck measurements both for the shape of the power spectra (Planck Collaboration Int. XXX 2016) and for the Spectral Energy Distribution (Planck Collaboration Int. XXII 2015), rescaled by one amplitude for each polarisation mode (TT, EE and TE). In contrast, due to Galactic cirrus residuals that are included in their point source masks, Plik and CamSpec have to rely on an empirical fit of the spectrum mask difference at 545 GHz and fit one amplitude for each of the cross-frequency spectra with priors on the amplitude based on a power-law (with slightly different spectral index: −2.63 for Plik and −2.7 for CamSpec).
In polarization, CamSpec compresses all the frequency combinations of TE and EE spectra into single TE and EE spectra (weighted by the inverse of the diagonals of the appropriate covariance matrices), after foreground cleaning using the 353 GHz maps. As a consequence, CamSpec has no nuisance parameters describing polarized Galactic foregrounds. -The template spectra for thermal-SZ residual is based on a model for Plik and CamSpec; whereas it comes directly from Planck measurements in the case of HiLLiPOP. -In addition, HiLLiPOPps includes a 2-components point source model (including infrared dusty galaxies and extragalactic radio sources) with one amplitude for each component and a fixed SED whereas all the other likelihoods fit one point source amplitude for each crossfrequency.
Those high-likelihoods have been compared in Planck Collaboration XI (2016) when combined with a τ prior. It was shown that the ΛCDM parameters derived from temperature data were very compatible. Still, as described in Couchot et al. (2017c), when combining them with lowTEB, a disagreement was observed, especially on τ reio and A s , which was related to a discrepancy on the A L fitted value. Further studies on the impact of those differences on the measurement of the sum of the neutrino mass were also performed in Couchot et al. (2017a).
In the following we investigate the systematic effects hidden in the assumptions made for the derivation of those likelihoods. As the use of a single likelihood does not ensure the full propagation of errors, we base our analysis on a comparison of the results inferred from each of them and estimate the order of magnitude of the related errors.

Baryon Acoustic Oscillation (BAO) data
Informations on the late-time evolution of the Universe geometry are also included. In this work, we use the acousticscale distance ratio D V (z)/r drag measurements from the 6dF Galaxy Survey at z = 0.1 (Beutler et al. 2014). At higher redshift, we have also included the BOSS DR12 BAO measurements (Alam et al. 2017). They consist in constraints on (D M (z), H(z), f σ 8 (z)) in three redshift bins, which encompass both BOSS-LowZ and BOSS-CMASS DR11 results. The combination of those measurements is labelled BAO in the following. We note that this is an

Statistics and Boltzmann codes
We use the CAMEL software 1 (Henrot-Versillé et al. 2016) tuned to a high precision setting to perform the statistical analysis. It allows us to compare both the frequentist (profile likelihoods) and the Bayesian approaches. CAMEL includes a MCMC algorithm based on the Adaptative Metropolis method (Haario et al. 2001). It also encapsulates the CLASS Boltzmann solver (Blas et al. 2011). The CLASS and CAMB softwares have been extensively compared (Lesgourgues 2011), and lead to very close predictions in terms of CMB spectra. Still, the public Planck results are derived using CAMB: their comparison with the ones derived with CAMEL allows us to cross-check the compatibility of the theoretical predictions while fitting for N eff . For both setups, we are using the model of Takahashi et al. (2012) extended to massive neutrinos as described in Bird et al. (2012) to include non-linear effects on the matter power spectrum evolution. We have used the BBN predictions calculated with the PArthENoPE code (Consiglio et al. 2017) updated to the latest observational data on nuclear rates and assuming a neutron lifetime of 880.2 s, identical to the standard assumptions made in Planck Collaboration XIII (2016).
For ΛCDM, we assume that all the neutrino mass (Σm ν =0.06 eV) is carried out by only one heavy neutrino We also have performed fits with three neutrinos with a mass splitting scheme derived from the normal hierarchy (keeping Σm ν =0.06 eV) and we did obtain identical results.
We illustrate in Fig. 1 the relative variations of the temperature spectra (∆C /C ) between CAMB and CLASS. We show the impact of a negative shift of the N eff value in three cases: ∆N eff = −0.18 corresponding to the 1σ error reported by Planck, ∆N eff = −0.027 which is the forecasted uncertainty for CMB-S4, and ∆N eff = −0.01 which is close to the CAMB/CLASS difference. The non-linear effects have been deliberately neglected to produce this Figure.

TT+TE+EE+BAO results
In this section, we discuss the results obtained with the combination of Planck TT+TE+EE (so-called ALL) like-1 camel.in2p3.fr Figure 1. Relative variations of the predicted temperature spectra between CLASS and CAMB (blue). We also compare the spectra when we shift N eff toward negative values for ∆N eff = −0.18 (red), ∆N eff = −0.027 (black) and ∆N eff = −0.01 (green) with CLASS (θ is fixed, H 0 is therefore recalculated).
lihoods together with BAO data. They are given in Table 2, and are classified according to various kind of systematic errors. Each of them is further discussed in a dedicated subsection below: we first assess the impact of the choice of the Boltzmann solver, then we discuss the impact of the choice of the high− likelihood. Finally we compare the results using different statistical analysis.
All the values which are tagged with a are extracted from the PLA 2 .

Boltzmann code and sampler effects
In this subsection, we study the impact of the choice of the Boltzmann solver that is used to infer cosmological parameters. Within our setup we cannot disentangle the impact of the Boltzmann code from the one of the sampler used for the MCMC mutiparameter space exploration, as a consequence the estimation given in this section combine both effects.
The comparison of the results using Plik are given in  Figure). This shift is consistent with the difference shown on Fig. 1 between spectra predicted by both Boltzmann solvers, and is largely subdominant compared to the statistical uncertainty. We also tested the effect of changing the neutrino model. We have compared the results when attributing to each of the three neutrinos a mass derived from the Normal Hierarchy scenario expectation and found a 0.01 shift of the N eff results. Given the actual precisions on the CMB spectra, we can therefore safely assume a ΛCDM model with only one massive neutrino carrying all the mass.

Results
A possible source of systematic error to be estimated is the one related to the choice of the Planck high-likelihood. As discussed in Sect. 2.2, various assumptions can be made with respect to the modelling of residuals foregrounds. The comparison of the results from each likelihood allow to quantify the impact of the underlying assumptions.
A discrepancy between PlikALL and CamSpecALL is already mentioned in Planck Collaboration XIII (2016), which quotes ∆N eff 0.15. Using the HiLLiPOP likelihoods, we find differences of the same order of magnitude as quoted in Table 2 (line 1 vs. lines 3,4,5). This variation can reach a maximum of ∆N eff 0.17.
However, as stated in Sect. 2.2, there are more data in the HiLLiPOP likelihoods than in Plik and CamSpec. This can affect the interpretation of the shift, as part of it might be due to statistical fluctuations. To test this effect, we have derived the results using HiLLiPOP while removing the 100 × 143, 100 × 217 and two of the four 143 × 217 cross-spectra, and reducing the range (cf. Sect. 2.2.2): the results are quoted on line 9 for HiLLiPOP (labelled Pliklike). We see that a small part (up to 0.03) of those 0.17 may be attributed to a statistical effect.

Correlations with other parameters
In this section we investigate the correlation between N eff and the cosmological and nuisance parameters.
We give on Table 3 and Fig. 2 the results of the CAMEL MCMC sampler using the CLASS Boltzmann solver with PlikALL, hlpALL and hlpALLps (combined with lowTEB and BAO) compared to the Planck public chains for N eff plus the six ΛCDM parameters. Similarly to what is observed on N eff , we find variations of the parameters between likelihoods of the order of one sigma or less. The error bars of the HiLLiPOP likelihoods are slightly smaller due to the additional data that are used (cf. Sect. 2.2.2).
The correlations between N eff and the nuisance parameters are illustrated on Fig. 3 for the three likelihoods (from top to bottom: hlpALL, hlpALLps, PlikALL). For HiLLiPOP, the highest values of the coefficients are obtained for the nuisance parameters related to foregrounds which play a role at small scales: namely the point sources and/or the SZ sector. N eff is anti-correlated to the point source parameters when the related nuisance parameters are left free to vary (as this is the case of HiLLiPOPps). While, adding information in the point source model (as done in hlpTTps), this relation is broken. We also observe a correlation between N eff and A kSZ and A SZxCIB but those parameters are only very slightly constrained with Planck data. For Plik, the correlation level is lower for all nuisance parameters, but the number of parameters is higher.

Results
In this section, we sudy the impact of the choice of the statistical analysis (Bayesian vs. frequentist). The main purpose of such a comparison is to check for any volume effect that may impact significantly the results (see for example Hamann 2012). The N eff estimates for various Planck likelihoods using profile likelihoods are given on lines 6 to 8 of Table 2. A visual comparison of the results are shown on Fig. 4, where the profile analysis results are transformed in terms of L/L max and are superimposed to the MCMC posterior distributions.
The profile analysis results systematically lead to smaller mean values, keeping the error bars almost similar. This effect is also present in the PLA: for example, the   Table 3. Results on MCMC chains for all cosmological parameters obtained when combining the PlikALL, hlpALL and hlpALLps Planck likelihoods with lowTEB and BAO (errors are given at 68%CL).
N eff values extracted from the best fit procedure (which is exactly what is done in a profile analysis) quoted for the PlikALL+lowTEB+BAO combination is equal to 2.996, a value which is 0.04 smaller than the maximum of the MCMC posterior distributions. The variation is specific to each likelihood and not expected to be constant as it reflects its very shape in the multidimensional parameter space. The higher volume effect is observed for HiLLiPOP and does not exceed ∆N eff = 0.05.

Statistical and Nuisance error contribution
Following the procedure described in Aad et al. (2014), we have separately estimated the two contributions to the total error: the one coming from statistics and the one linked to the foreground and instrumental modelling (so-called nuisance error). We first built the usual profiles for each likelihood: they are shown in solid lines on Fig. 5 Table 4. Full error on N eff and contributions from Statistics and Nuisance derived using profile likelihoods and CLASS (cf. description of the procedure in Section 3.3.2) obtained when combining PlikALL, hlpALL and hlpALLps with lowTEB and BAO.
sance parameters to the values of the previously obtained best-fit. The errors derived from this second fit (shown in dashed line on Fig. 5) correspond to the ultimate error one would obtain if we knew the nuisance parameters perfectly (and they had the values given by the best-fit). Finally the nuisance error of each individual likelihood is deduced by quadratically subtracting the statistical uncertainty from the total error. The results are given in Table 4.  From this split, we show that the main difference between Plik and HiLLiPOP come from the nuisance. This is not surprising since the number of parameters is very different from one likelihood to another (see also Fig.3).
The slight difference in the derived statistical uncertainties is due to the additional data that are used in HiLLiPOP compared to Plik, and is fully compatible with

Other cosmological data
We have further tested the impact of CMB Lensing on the N eff measurement and found it to be very small, as expected (cf. Planck Collaboration XIII 2016), slightly lowering the overall results by 0.01.
We have also checked that the choice of the low-likelihood had no impact on the final results (replacing lowTEB with Lollipop+Commander as stated in Sect. 2.2.1).
It has to be noted that the SN data do not help to further constraint N eff once the BAO data are used, we therefore chose not to use them in this analysis. For completeness we note that the update of the BAO data from DR11 to DR12 does not impact the constraint on N eff Alam et al. (2017).

Summary
The results on N eff are summarized in Fig. 6. The shift of the mean values observed when using a likelihood or another is of the same order of magnitude as the error derived from each individual likelihood (∆N eff 0.17 vs. σ(N eff ) = 0.18). It has been shown to be mainly driven by the assumptions made for the foreground modelling. A small part of this variation (up to 0.03) has been identified to be linked to the data considered in HiLLiPOP and not in Plik. Still, it is high enough not to be neglected when constraining theoretical models from the N eff measurement only.

Fitting TT and TE separately
In the previous sections we have shown the results of the combination of temperature and polarisation CMB data. In the following, we estimate N eff for TT and TE separately to further compare the outcome of each likelihood.

TT+lowTEB+BAO results
In this section, we consider the combination of temperatureonly CMB likelihoods, together with BAO data. The results   Table 6. Results on N eff obtained when combining BAO with PlikTE, and hlpTE (errors are given at 68%CL). lowTEB has been used at low-.
are summarized in Table 5 for various configurations. For this specific combination the CamSpec results are not public, we therefore cannot use them in the comparison. From this Table, we obtain ∆N eff 0.18 from the largest difference observed between hlpTT-Profile/CLASS and PlikTT -Profile/CLASS.
As in the previous section, we have checked that the impact of the neutrino settings is almost negligible, as well as the impact of the SN data. In addition, the choice of the DR12 BAO data instead of DR11 has no effect.

TE+lowTEB+BAO results
Given the Planck noise level, the TE likelihoods lead to similar results than those obtained with TT on ΛCDM. In addition they are less sensitive to the foreground modellings (Galli et al. 2014;Couchot et al. 2017b). In this section we use of the TE likelihood in place of the TT one and compare the results obtained on N eff when combined with lowTEB and BAO on Table 6.
The remaining ∆N eff is of the order of 0.07, which is small with respect to the total error with TE only. It may still contain some residual systematics from temperature to polarisation leakage which study is beyond the scope of this paper.

Conclusions
We have studied in detail the estimation of the effective number of relativistic species from CMB Planck data. We have tested different ingredients of the analysis to further quantify their impact on the results: mainly the Boltzmann codes, the high− likelihoods (Plik, HiLLiPOP and CamSpec), and the statistical analysis.
-The estimated variation of N eff when switching from CAMB to CLASS is negligible, of the order of ∆N eff = 0.01. -The variation linked to the assumptions on foreground residuals modelling derived from the comparison of the high− likelihoods has been estimated to be of the order of ∆N eff = 0.17 on which a small part (up to 0.03) may be attributed to a statistical effect. We have also shown that, at least for HiLLiPOP, N eff was mainly correlated with nuisance parameters linked to foregrounds playing a role at small scales (ie. point sources and SZ). -We have found slight differences between the Bayesian and the frequentist inferred mean values, linked to particular likelihood volume effects. A shift between both methods has been estimated to be ∆N eff ≤ 0.05.
As an overall conclusion, we have shown that the variation of the mean N eff values is non-negligible. This foreground related systematic uncertainty is of the same order of magnitude as the error derived for each individual likelihood. In addition the results obtained with HiLLiPOP and CamSpec lead systematically to lower values than the ones derived from the public Planck likelihood. We have cross-checked the consistency of the results when considering TT and TE separately. When considering TE only (together with BAO and lowTEB), which is less sensitive to foreground residuals, this observed variation drops down to ∆N eff = 0.05 for the likelihoods we have been able to compare.
We have shown that likelihood modelling is an important challenge for the current Planck measurements for the N eff interpretation, even for temperature data. The shift discussed in this paper is very large compared to the σ(N eff ) = 0.027 statistical-only expectations for CMB-S4 (Abazajian et al. 2016). We however expect data from the next generation of CMB experiments to be more robust to such systematic error. The increase in constraining power from the TE power spectrum with respect to the TT one, as well as the better determination of the temperature power spectrum on small scales will reduce the impact of foregrounds mismodelling.