SUGAR: An improved empirical model of Type Ia Supernovae based on spectral features

Type Ia Supernovae (SNe Ia) are widely used to measure the expansion of the Universe. Improving distance measurements of SNe Ia is one technique to better constrain the acceleration of expansion and determine its physical nature. This document develops a new SNe Ia spectral energy distribution (SED) model, called the SUpernova Generator And Reconstructor (SUGAR), which improves the spectral description of SNe Ia, and consequently could improve the distance measurements. This model is constructed from SNe Ia spectral properties and spectrophotometric data from The Nearby Supernova Factory collaboration. In a first step, a PCA-like method is used on spectral features measured at maximum light, which allows us to extract the intrinsic properties of SNe Ia. Next, the intrinsic properties are used to extract the average extinction curve. Third, an interpolation using Gaussian Processes facilitates using data taken at different epochs during the lifetime of a SN Ia and then projecting the data on a fixed time grid. Finally, the three steps are combined to build the SED model as a function of time and wavelength. This is the SUGAR model. The main advancement in SUGAR is the addition of two additional parameters to characterize SNe Ia variability. The first is tied to the properties of SNe Ia ejecta velocity, the second is correlated with their calcium lines. The addition of these parameters, as well as the high quality the Nearby Supernova Factory data, makes SUGAR an accurate and efficient model for describing the spectra of normal SNe Ia as they brighten and fade. The performance of this model makes it an excellent SED model for experiments like ZTF, LSST or WFIRST.


Introduction
Type Ia supernovae (SNe Ia) are excellent cosmological probes: they are very luminous objects, visible up to a redshift of z ∼ 2 ( Guillochon et al. 2017), and their luminosity dispersion is naturally low and can be further reduced by an appropriate standardization process. As precise distance indicators, comparing their luminosity and their redshift allowed Perlmutter et al. (1998Perlmutter et al. ( , 1999, Riess et al. (1998), and Schmidt et al. (1998) to demonstrate that the expansion of the Universe is accelerating, a feature that gave rise to the dark energy paradigm. This result, obtained with a small sample of SNe Ia, has since been repeatedly confirmed with larger samples (Astier et al. 2006;Guy et al. 2010; Article number, page 1 of 25 arXiv:1909.11239v1 [astro-ph.CO] 25 Sep 2019 Suzuki et al. 2012;Rest et al. 2013;Betoule et al. 2014;Scolnic et al. 2018), and in combination with other cosmological probes like the Cosmological Microwave Background (CMB) (Planck Collaboration et al. 2016), Baryon Acoustic Oscillations (BAO) (Delubac et al. 2015), or cosmic shear (Troxel et al. 2018) gave birth to the so-called concordance model: the flat-ΛCDM model. Improving our description of SNe Ia as a cosmological probe is needed, not only to discriminate among alternate dark energy models (Copeland et al. 2006), but also in order to address existing tensions such as the 3.4 σ difference between the values of the Hubble Constant H 0 from the local measurement by Riess et al. (2016) and the cosmological fit from Planck Collaboration et al. (2016). The current uncertainty budget due to limited SN Ia statistics will be greatly improved by current surveys like ZTF (Bellm 2014) or next generation surveys like LSST or WFIRST (LSST Dark Energy Science Collaboration 2012; Spergel et al. 2015). Since, systematic and statistical uncertainties are already of the same order of magnitude (Betoule et al. 2014;Scolnic et al. 2018), a better understanding of systematics will be needed to improve the accuracy of SNe Ia as a cosmology probe.
Part of the systematic uncertainty comes from the standardization process: the observed flux of the SN Ia has to be corrected for variations observed from one object to another. Two main contributions to this variation have been observed. The first one, seen by Rust (1974) and Pskovskii (1977Pskovskii ( , 1984, is the correlation between the peak luminosity of a SNe Ia and the light curve decrease time, the so-called brighter-slower effect. Various parametrizations have been proposed for this effect, the most commonly used being ∆m 15 (Phillips 1993), stretch (Perlmutter et al. 1997) or X 1 (Guy et al. 2007). The second contribution, observed first by Hamuy et al. (1995) and Riess et al. (1996), is that the peak luminosity depends on color, the socalled brighter-bluer effect. This effect can be explained by the presence of dust in varying quantities along the line of sight, possibly combined with an intrinsic color-brightness correlation after the stretch effect has been taken into account. Most standardization techniques in photometry thus rely on a stretch and color relation known as the Tripp (1998) relation. The SALT2 model (Guy et al. 2007) is one such standardization method, and has over the years become the reference in cosmological analysis. Despite many attempts, no consensus has yet emerged on how to go beyond a 2-component model to describe SNe Ia light curves. However, the observation of a step in the standardized luminosity with respect to the host mass by Kelly et al. (2010) and  has led to the inclusion of a corrective term in the subsequent cosmological analysis. This corrective term linked to the environment is a hint that more fundamental properties of SNe Ia physics are not captured by the stretch-color standardization scheme, and that there is room for improvement.
However, the most obvious indication that the parameterization used is insufficient comes from the observation of a residual dispersion of SNe Ia luminosity around the Hubble diagram after standardization. In order to obtain statistically coherent results, Perlmutter et al. (1999) and Astier et al. (2006) introduced an intrinsic dispersion in luminosity as an additional uncertainty in the fit. Betoule et al. (2014) estimates the dispersion value at 0.11 mag using a SALT2 standardization, for an observed total dispersion of 0.16 mag. Since the intrinsic dispersion is due to unmodeled SN Ia variations that may depend on the redshift, this may result in a significant error on the extraction of cosmological parameters (Rigault et al. 2018). The precision of SNe Ia as cosmological probes therefore depends on this intrinsic dispersion in luminosity. However, the intrinsic dispersion depends strongly on the assumptions about measurement uncertainties.
A way to characterize the overall accuracy obtained by a given standardization method is to use the weighted Root Mean Square (wRMS) metric on the standardized magnitude (e.g. Blondin et al. 2011). Thus, in order to reduce the effects of unmodeled variations of SNe Ia, an improved standardization procedure must also involve reducing the wRMS.
Most efforts to improve the standardization of SNe Ia involve the search for a new parameter correlated with the intrinsic luminosity. After the discovery of the mass-step, many efforts focused on describing the environmental effects of the SNe Ia (Kelly et al. 2010;Rigault et al. 2013Rigault et al. , 2014Roman et al. 2018;Rigault et al. 2018). A complementary approach consists of directly looking for this new parameter from the analysis of the light curves or observed spectra. For example, Mandel et al. (2017) proposed that the color of SNe Ia is a mixture of intrinsic color and extinction by dust, and proposed a Bayesian model to separate these two components. For their part, Chotard et al. (2011) showed that once the Ca ii H&K variability is taken into account, the color law is compatible with a Cardelli et al. (1989) extinction curve. Moreover, Mandel et al. (2014) highlighted the dependence of the extinction curve on the minima of P-Cygni profiles. Thus the use of additional variables related to spectral indicators seems to be a promising avenue to describe the variability of SNe Ia.
The use of spectral indicators to standardize SNe Ia has a long history (Nugent et al. 1995;Arsenijevic et al. 2008;Bailey et al. 2009;Wang et al. 2009;Foley & Kasen 2011;Chotard et al. 2011). As example, using only the flux ratio R 642/443 , Bailey et al. (2009) is able to get a significatively better standardization in B-band in comparison to the classical Tripp (1998) relation. The method of using spectral information is also suggested by the analysis of Fakhouri et al. (2015), which shows that the best method to measure distance with SNe Ia is with spectroscopic twins SNe Ia. Another recent study by Nordin et al. (2018) has shown that the use of spectral information from the UV part of the spectra improves distance measurement compared to the Tripp (1998) relation. However, those methods do not currently lead to a spectral energy distribution (SED) model, which is necessary for cosmological analyses on purely photometric data, such as LSST (LSST Dark Energy Science Collaboration 2012). Thus we propose here a full SED model which will be based on spectral indicators generalizing the procedure originally developed in Chotard et al. (2011). This method uses spectral features, which allows for the addition of more than one intrinsic parameter and they offer a possible way to separate intrinsic properties from extrinsic properties. From it, we build a full SED model which may be used for purely photometric surveys.
The aim of this study is to revisit the parametrization of SNe Ia SED in light of the SNfactory spectrophotometric dataset (Aldering et al. 2002), and to seek new sources of variability by statistical analysis. Our model is trained using spectral indicators, derived around maximum light in B-band, as features to describe the model. They provide both a reduced dimensionality description of spectra, and a description which is linked to the physics of the explosion. In addition, we select indicators insensitive to reddening in order to decouple the characterization of the reddening from effects purely linked to the intrinsic part of the explosion. This new SNe Ia SED model is named the SUpernova Generator And Reconstructor (SUGAR) model.
Another approach to develop a new SED model was undertaken in parallel by Saunders et al. (2018), using the same dataset and based on a "SALT2 like" framework. Saunders et al. (2018) generalized a strategy that was originally proposed by Guy et al. P.-F. Léget & SNfactory: SUGAR model (2007) and which was first performed on broad band photometry from the SNfactory dataset in Kim et al. (2013). In brief, Saunders et al. (2018) did a Principal Component Analysis-like study on interpolated spectral time series in order to go beyond the classical Tripp (1998) relation. Saunders et al. (2018) is able to significatively improve the SED description with respect to the SALT2 model. The SNEMO model developed in Saunders et al. (2018) differs from the SUGAR model in that SNEMO attempts to find principal components purely based on spectral time series variability in the relative luminosities at each wavelength and as a function of time, whereas SUGAR uses spectral features to try to find a compact description of such spectral time series. However, both models share some technical details in the model training and were developed in common.
This paper is organized as follows: Section 2 presents how the spectrophotometric time series of the SNfactory were obtained, Section 3 focuses on the intermediary data used in building the SUGAR model (spectral indicators at maximum light, dimensionality reduction through factor analysis, derivation of extinction parameters and time interpolation). Section 4 describes the SUGAR model: its formalism, training and components. Section 5 presents the performance of the model for fitting spectral time series using SUGAR model. These results are compared to the performance achieved by SALT2 for the same data. Finally, Section 6 discusses adding additional components to the SUGAR model and some technical choices that were made in the training of the SUGAR model. The appendices describe details of the mathematical implementation of the SUGAR model. The SUGAR template is available online at http://supernovae. in2p3.fr/sugar, while the data used to train SUGAR are available at https://snfactory.lbl.gov/sugar 1 .

Spectrophotometric time series from SNfactory
This analysis is based on 171 SNe Ia obtained by the SNfactory collaboration beginning in 2004 (Aldering et al. 2002) with the SuperNova Integral Field Spectrograph (SNIFS, Lantz et al. (2004)) installed on the University of Hawaii 2.2-m telescope (Mauna Kea). SNIFS is a fully integrated instrument optimized for semi-automated observations of point sources on a structured background over an extended optical window at moderate spectral resolution. SNIFS has a fully-filled 6.4 ×6.4 spectroscopic field-of-view subdivided into a grid of 15 × 15 contiguous square spatial elements (spaxels). The dual-channel spectrograph simultaneously covers 3200-5200 Å (B-channel) and 5100-10000 Å (R-channel) with 2.8 and 3.2 Å resolution, respectively. The data reduction of the x, y, λ data cubes is summarized by Aldering et al. (2006) and updated in Sect. 2.1 of Scalzo et al. (2010). The flux calibration is developed in Sect. 2.2 of Pereira et al. (2013) based on the atmospheric extinction derived in Buton et al. (2013). In addition, observations are obtained at the SN Ia location at least one year after the explosion to serve as a final reference to enable subtraction of the underlying host and the host subtraction, as described in Bongard et al. (2011). For every SN Ia followed, the SNfactory creates a spectrophotometric time series, typically composed of ∼14 epochs, with the first spectrum taken on average three days before maximum light in B-band (Bailey et al. 2009;Chotard et al. 2011). The sample of 171 SNe Ia contains the objects with good final references, objects that passed quality cuts suggested by Guy et al. (2010), and is restricted to objects with at least one observation in a time window of ±2.5 days around maximum light in B-band. The 1 The data link will become active upon journal publication. size of this window is kept identical with respect to the study of Chotard et al. (2011) and is discussed in Léget (2016). After flux calibration, host-galaxy subtraction and correction for Milky Way extinction, the flux of the 171 spectra is integrated in synthetic top-hat filters defined in Pereira et al. (2013) and for reference a SALT2 fit is applied with the model from Betoule et al. (2014) in order to obtain the X 1 , C and m B parameters. The spectra of the 171 SNe Ia are transformed to the rest frame with a fiducial cosmology and an arbitrary Hubble constant. It is allowing to respect blinding of future cosmological analysis using this dataset. For the spectral analysis here, the spectra are rebinned at 1500 km s −1 between 3254 and 8649 Å (197 bins per spectra) for computational efficiency while still resolving spectral features and converted to the absolute AB magnitude system. The training is done on 113 SNe Ia observed before 2010 and the validation is done on 58 SNe Ia observed after 2010.

Derived data
Any empirical SN Ia modeling must solve three problems 2 : choosing what features to model, accounting for color, and dealing with the data sampling.
For the features modeling of SUGAR, we used spectral features at maximum light to describe the intrinsic part of the SED. This is described successively in Section 3.1 and Section 3.2. Section 3.1 describes how the spectral features are selected and measured. Section 3.2 explains how the spectral features are projected onto a new basis that allows us to work in an orthogonal basis for the SUGAR training.
To estimate the average color curve of SNe Ia, we will generalize the method of Chotard et al. (2011). This is described in Section 3.3.
To deal with the data sampling, we project the observed spectra onto a common time grid so that the SED model can be calculated on the same time grid. This is done in Section 3.4 using the Gaussian Process method.
The following sections describe those intermediate steps that will determine the full SUGAR SED model in Section 4. We will use the spectral features derived and color curve parameters derived at maximum light combined with the interpolated spectra to infer the full SED at all epochs.

Spectral indicators at maximum light
Spectral indicators are metrics of empirical features of the input spectrum such as equivalent widths or line velocities. They offer an efficient characterization of spectral variability by representing the underlying spectral complexity with a few key numbers, and play a role in non-linear dimensionality reduction of the original data. These two characteristics, interpretability and simplicity, make them ideal for describing the intrinsic part of SNe Ia SED, and consequently derive the extrinsic part in the same spirit as Chotard et al. (2011). This decoupling restricts the set of spectral indicators to pseudo-equivalent widths and the wavelengths of P-Cygni profile minima. As the spectral indicators evolve with phase, we select the spectrum closest to maximum light in B-band, if it is within a time window of ± 2.5 days. This window size was chosen to optimize the trade-off between the total number of SNe Ia in the sample and the potential loss in precision due to time evolution. Near maximum light and within the spectral range covered by the SNfactory, it is possible to systematically obtain the 13 spectral indicators represented in Figure 1. They are the 9 pseudo-equivalent widths of Ca ii H&K, Si ii λ4131, Mg ii, Fe λ4800, S ii W, Si ii λ5972, Si ii λ6355, O i λ7773 3 , and Ca ii IR features, as well as 4 minima of P-Cygni profiles (Si ii λ4131, S ii λ5454, S ii λ5640, and Si ii λ6355). For the pseudo-equivalent widths, we rely only on well-defined troughs that are present in all SNe Ia in the sample, and consider line blends as a whole. Some of the possible minima were also discarded either because the corresponding lines form a complex mixture or because the corresponding trough is too shallow to accurately define a minimum for some SNe Ia. This is why the Si ii λ5972 feature velocity is rejected. The spectral indicators and their uncertainties were automatically derived from the spectra at maximum light following the procedure of Chotard et al. (2011), which is described in detail in Appendix A.

Factor analysis model
The space defined by the 13 selected spectral indicators has too high dimensionality to efficiently train a model. Additionally, some of the spectral indicators are correlated, and therefore contain redundant information. Most of the model variation can be captured in a reduced number of dimensions. Principal Component Analysis (PCA) (Pearson 1901) is one of the methods to implement such a dimensionality reduction. It consists of diagonalizing the covariance matrix of the sample and projecting the data into the resulting eigenvectors basis. In this new basis, the variables are uncorrelated, and an approximation of the input data is found by neglecting the dimensions corresponding to the smallest eigenvalues. This method has been employed in the case of SNe Ia by Guy et al. (2007), Kim et al. (2013), and Sasdelli et al. (2015). However, in the case considered here, some directions are dominated by noise, so their eigenvectors would align along the direction of measurement errors rather than the intrinsic sample variance. To solve this problem, we employ a variant of PCA, Factor Analysis (Spearman 1904(Spearman , 1927, which has the advantage of taking into account the variance caused by measurement uncertainties. This technique decomposes the observables into two terms, one representing the explanatory factors and one representing the noise affecting each variable. This is expressed by the following relation (Ghahramani & Hinton 1997): where x i is the spectral feature vector with s components for the i th SN Ia. q i is the explanatory factor vector of dimension m ≤ s, and is linearly related to x i by the matrix Λ of dimensions m × s. u i is the noise with variance Ψ of dimensions s× s. In order to fix the normalization of Λ and q i , we assume that the q i are drawn from a centered normal distribution: In the framework of the PCA, and up to a normalization, the matrix Λ and the q i are respectively equivalent to the eigenvector matrix and the projections into the new basis. Factor analysis thus consists of determining the matrices Λ and Ψ that maximize the likelihood, under the assumption that the x i are distributed according to a normal distribution with both intrinsic scatter and measurement noise: To estimate Λ, Ψ, and the explanatory factors q i , Ghahramani & Hinton (1997) propose a solution based on an expectation-maximization algorithm where q i and Λ are estimated iteratively. Here, unlike in conventional factor analysis, a reliable estimate of the spectral indicator measurement error is provided. For each SN Ia i, Ψ i is the known diagonal matrix that contains the squared errors of the spectral indicators, hence, it is not necessary to fit for a global Ψ. We adapted the aforementioned expectation-maximization algorithm to take this change into account. Additional details can be found in Appendix B.
Finally, a prescription is needed to normalize each individual variable. The amplitude of the spectral indicator variation, expressed in Å , is not a good indicator of their impact on the spectral shape. As an example, Branch et al. (2006) show that the weak Si ii λ5972 line can play a significant role in subclassing SNe Ia. As a consequence, the input data are normalized to unit variance prior to the factor analysis, so that no spectral indicator is favored a priori. Within this framework, the eigenvalues of ΛΛ T represent the variance explained by each factor, and s − Tr(ΛΛ T ) is the variance coming from the noise, where here s = 13 is the number of spectral indicators and Tr is the trace operator.

Outlier rejection
Outliers affect the sample variance of any population. As we care most about a correct description of the bulk of SNe Ia, it is desirable to identify and remove these outliers. Amongst them, SNe Ia of type SN1991T (Filippenko et al. 1992b) or SN1991bg (Filippenko et al. 1992a) are known to have different spectral and photometric behavior from other SNe Ia. However, basing an outlier 1 2 3 4 5 6 7 8 9 10 11 12 13 Factor analysis component q i  They are ordered in decreasing order of variance, and the total amounts to 100% once the contribution of the noise (indicated by a red line) is taken into account. We can see that the first three vectors dominate the variability of this space. rejection on this empirical identification has two issues. The first is that within the spectral indicator space, some of those subtypes may not appear as distinct subclasses, and do not offer objective grounds for rejection. The second issue is that the attribution of a given SN Ia to one of these subclasses by SNID (Blondin & Tonry 2007) may provide inconsistent results depending on the epoch considered for the identification. In order to apply a selfcontained criterion for defining an outlier, we thus resorted to a χ 2 -based definition for identifying outliers: this quantity can be interpreted as the squared distance to the center of the distribution normalized by the natural dispersion. For SN Ia i, it is given by In the case of a Gaussian distribution, χ 2 i should follow a χ 2distribution with 13 degrees of freedom. An iterative cut at 3 σ on the value of χ 2 i rejects 8 SNe Ia from the sample. A visual inspection carried out on each of these 8 SNe Ia, as well as a sub-classification made using SNID, show that 4 of those objects exhibit shallow silicon features as defined by Branch et al. (2006), 2 of them have high-velocity silicon features as defined by Wang et al. (2009), one is within the broad line subcategory, and the remaining one has a failed estimation of one of the spectral indicators.
We studied the influence of the 3 σ cut by carrying out the analysis successively with and without the cut. Even though this had no major effect on the direction of the vectors, the cut is applied to the rest of the analysis to avoid training the model on outliers.

Factor analysis results on spectral features
The EM-FA algorithm described in the previous sections is applied to the 13 spectral features measured at maximum light. The relative weight of the eigenvalues on the total variance is shown in Figure 2, and the correlations between the eigenvectors and the spectral indicators are presented in Figure 3. The significance of the correlation is expressed in units of σ, meaning that the null hypothesis, i.e. 0 correlation is rejected at σ confidence level. Figure 2 shows that two vectors dominate the variability in the space of spectral indicators. The first vector is linked with coherent variations of the pseudo-equivalent widths, except the widths of the Ca ii H&K and S ii W lines as seen in Figure 3. The second vector is anticorrelated with these same two features, and describes a coherent variation of the velocities. Since the first vector is very similar to the pseudo-equivalent width of Si ii λ4131, which is shown by Arsenijevic et al. (2008) to be highly correlated with the stretch parameter, it is therefore natural to assume that this vector represents the main source of intrinsic variability already known for SNe Ia. The second vector is mostly driven by the line velocities and pEW(Ca ii H&K) and pEW(S ii W) lines, with smaller dependencies from the other pseudo-equivalent widths. Interestingly, while investigating the spectral diversity of SNe Ia beyond stretch and color, Chotard et al. (2011) demonstrated the role of pEW(Ca ii H&K), while Wang et al. (2009) focused on the role of the speed of Si ii λ6355. Our second vector unifies these two approaches. The interpretation of the next vectors is less straightforward. As their rank increases, they describe less of the sample variance, and thus exhibit lower correlations with the original data. The third vector is the last of the eigenvectors to show strong correlation with one of the spectral indicators, pEW Si ii λ5972. The correlations with the SALT2 parameters are presented in Figure 4. As expected from the correlations with the pseudo-equivalent widths, q 1 is strongly correlated with X 1 . Also, q 3 exhibits a significant correlation with X 1 , which may be an indication that the stretch is driven by two different parameters. Remaining factors are only weakly correlated with X 1 . The correlations between SALT2 color and any of the factors are close to zero. The SALT2 absolute magnitude cosmology residual, ∆µ B , are mainly correlated with q 1 and q 3 , presumably due to the correlation of both with X 1 .
Within this framework, the eigenvalues of ΛΛ T represent the variance explained by each factor, and s − Tr(ΛΛ T ) is the variance coming from the noise, where here s = 13 is the number of spectral indicators and Tr is the trace operator. Noise, in this situation, represents 18% of the variability observed, so only the first two vectors clearly outweigh the noise and it is legitimate to ask whether or not the other vectors are sensitive to statistical fluctuations. For the rest of the analysis, we keep the first three factors for the final training and discuss this choice in the Section 6.2, by looking at the impact on the final model.

Extinction curve estimation
In this section, we estimate the average extinction curve by generalizing the procedure described in Chotard et al. (2011). For this, a model of the SED at maximum light is derived. This model will allow us to deduce the average extinction curve and the procedure to estimate it: the results are described in the following.

Empirical description
The SED model at maximum light presented here is a generalization of the model presented in Chotard et al. (2011), in which the authors removed intrinsic variability tied to spectral features and measured the remaining spectral variation, which was found to be consistent with an extinction curve like that for the Galaxy. The major difference is that the intrinsic description is here mo- p E W C a I I H & K p E W S i I I λ 4 1 3 1 p E W M g I I p E W F e λ 4 8 0 0 p E W S I I W p E W S i I I λ 5 9 7 2 p E W S i I I λ 6 3 5 5 p E W O I λ 7 7 7 3 p E W C a I I I R λ V S i I I λ 4 1 3 1 λ V W S I I λ 5 4 5 4 λ V W S I I λ 5 6 4 0 λ V S i I I λ 6 3 5 5 0 1 2 3 4 5 Pearson correlation coefficient significance (σ) tivated by the factors derived in Section 3.2 rather than being described only by pEW(Si ii λ4131) and pEW(Ca ii H&K) as was done in Chotard et al. (2011). To model the SED we define for We propose that these are related to the intrinsic absolute magnitude by: where M 0 (λ) is the average spectrum in absolute magnitude, α j (λ) is the spectrum related to the factor h i, j , where j is the factor index running from 1 to 3 (the choice of the number of components is discussed in Section 6.2). A λ 0 ,i is a free parameter that can be interpreted as the absorption due to extinction at a reference wavelength λ 0 (set here at the median wavelength, 5305Å , without loss of generality). γ(λ) is an arbitrary function that represents the effect of the extinction, and we have not set any prior on its shape. Once the function γ(λ) is fixed, it is possible to fit the total-to-selective extinction ratio of the Cardelli et al. (1989) law, R V , and the absorption in V-band, A V to it; this will be described in the next sub-section. The model parameters here are M 0 (λ), the α j (λ), the γ(λ), the h i, j , and the A λ 0 ,i . It is understood that we are modeling the SED at specific wavelengths where we have measurements. For notational efficiency we rewrite Equation 5 as: In Chotard et al. (2011), the parameters α j (λ) were computed in sequential order, and the reddening curve γ(λ) as well as the extinction A λ 0 ,i were determined in a second step. We improve on this procedure by applying a global fit for all parameters at once. This is described below, and allows us to derive a global average R V for all SNe Ia, and an A V for each SN Ia.

Fitting the model at maximum light and R V
In this section we explain the framework for fitting the free parameters of Equation 5 and how we then determine the A V and R V parameters. Fitting the parameters of the absolute SED of Equation 5 is done using an orthogonal distance regression. Estimating these parameters within the framework of an orthogonal distance regression amounts to minimizing the following χ 2 : where M obs i is the observed spectrum of the SN Ia i in absolute AB magnitude and W M i is the weight matrix of M obs i , defined as: where σ λi is the uncertainty of the SN Ia i at the wavelength λ derived from spectral error, σ cal the per-spectrum calibration uncertainty, taken to be 0.03 mag, σ z the combination of the redshift error and the uncertainty of 300 km s −1 due to peculiar velocities, and D is the dispersion matrix. The dispersion matrix is fitted to deal with the remaining variability which is not described by the three factors and the extinction. It includes an estimate of the remaining chromatic and achromatic (grey) dispersions. The estimation of D is described in Appendix D. W q i is the weight matrix of q i that comes from projecting the spectral feature uncertainties into the factor sub-space, and is defined as: where Λ and Ψ i are the same as in Section 3.2. The unknown parameters of the model are the matrix A and the vector h i . These are found by minimizing equation 9 using an expectationminimization algorithm described in Appendix C.
Once the parameters that minimize equation 9 have been found, we can compare γ λ with an extinction curve to see if it is compatible. It is straightforward to directly estimate the global mean R V from γ λ by minimizing where the a and b vectors are the extinction curve coefficients defined in Cardelli et al. (1989). Minimizing χ 2 R V as a function of R V then gives: Note that the value we quote for R V within the context of our model depends on the assumptions adopted in the evaluation of the dispersion matrix (discussed in detail in Section 6.1) and therefore does not necessarily correspond to the true mean properties of dust. This uncertainty associated with separating SN Ia color behavior from dust behavior is common to all SN Ia fitting methods. Once the value of the global mean R V is fixed, we determine the absorption in the V-band, A V for each SN Ia. This amounts to minimizing where A V,i is the absorption in the V-band for the SN Ia i, and δM i are the residuals corrected for the intrinsic variabilities Taking the derivative of χ 2 A V by A V we find:

Results
The model above is trained on the 105 SNe Ia that remain in the training sample. The vectors α j for j = 1, j = 2 and j = 3, as well as their effect on the average spectrum M 0 , are presented in Figure 5. To quantify the overall impact of each vector in units of magnitude, we compute the average RMS of h j α j , the deviation from the average spectrum M 0 . The vector α 1 presented in Figure 5, has an average impact of 0.17 mag when multiplied by the standard deviation of q 1 . This is the expected amplitude for a stretch effect. The structure of this vector is associated with variations of the line depths, consistent with the correlation of the factor q 1 with both the pseudoequivalent widths and the stretch (cf. Figure 3 and Figure 4).
The vector α 2 shown in Figure 5, which is correlated with velocities, has a much weaker impact of 0.05 mag on the average spectrum M 0 . This is consistent with the fact that this effect has not yet been detected on purely photometric data. Moreover, the associated variability is centered in localized structures such as the regions of the Ca ii H&K, Si ii λ6355, and Ca ii IR lines. A closer scrutiny shows that the variability especially affects the bluer edge of the features, leading to an overall effect on the line velocities. This is as expected from the correlations of q 2 and the minima of P-Cygni profile shown in Figure 3.
The vector α 3 shown in Figures 5, has an impact of 0.05 mag on the average spectrum M 0 . Like the first component q 1 , q 3 is correlated with the stretch (cf. Figure 4), but the corresponding vector α 3 is less structured than α 1 in the optical band. However, the expected correlations with the spectral features are consistent with the behavior of q 3 (cf. Figure 3). Moreover, the most prominent features affect the extreme UV part of the spectrum as well as the Ca ii H&K and Ca ii IR regions. For the latter, brighter SNe Ia exhibit a stronger trough in the higher velocity part of the blends, which could link this vector to the presence Article number, page 7 of 25 A&A proofs: manuscript no. sugar of high-velocity calcium structure in ejectas.
In summary, the analysis of the SED components at maximum light confirms that the stretch has the dominant effect on magnitudes as expected, but one also has to take into account other variabilities which are difficult to detect in photometric bands since they are linked to localized features such as velocities.  The γ(λ) curve is shown in Figure 6 and is consistent with a dust extinction curve. This shows that three eigenvectors provide a sufficient description of the intrinsic variability from the purpose of deriving a color law. The fit of γ(λ) to a Cardelli et al. (1989) law gives a value of R V = 2.6. The Cardelli et al. (1989) law fit coincides remarkably well with γ(λ), except in the UV, where it differs slightly. The R V value is mainly driven by the reddest SNe Ia. Indeed, as can be seen in the lower panel of Figure 6, three SNe Ia have the dominant contribution on the final value of R V and removing any of these from the sample would significantly alter the result, hence the rather large value of the uncertainty.
We made the choice of fitting an average extinction curve for the whole sample, but there are indications from observations that extinction curves exhibit some diversity (Amanullah et al. 2015;Kim et al. 2018). Moreover, any such extinction variation, or intrinsic color variation (e.g. Foley & Kasen 2011;Polin et al. 2018), that the model at maximum is unable to capture goes into the dispersion matrix. However, in order to keep the model simple, we keep an average value of R V and we will leave R V variation analysis for the future. Figure 7 presents the set of spectra before any correction and the residuals resulting from the difference between the observed spectrum and Equation 5. In the top panel, the effect of extinction is clearly visible for spectra before any correction. After correction, the residuals between the models and data are mainly shifted in magnitude with respect to one another and no effect on color appears to remain. This is the signature expected for a  mostly grey residual offset. The amplitude of the correction made by the three factors q 1 , q 2 , q 3 , and the extinction curve, given by the wRMS as a function of wavelength, is presented in the lower panel of Figure 7. Before any correction, the dispersion in the B-band is around 0.4 mag, as expected. Spectra before any correction exhibit both localized structure coming from intrinsic variabilities and a slowly increasing dispersion in the bluer part of the spectrum, which is the signature of extinction. Once spectra are corrected by the three factors q 1 , q 2 , q 3 , and the extinction curve, the wRMS of the residuals drops to a floor of 0.1 magnitude. However, localized structures remain, concentrated in UV, Ca ii H&K, Si ii λ6355, O i λ7773 and Ca ii IR regions. The amplitude of the baseline is compatible with the value of the grey intrinsic dispersion added in cosmological fits for nearby supernovae (Betoule et al. 2014). The fact that a part of the residual is grey is also visible in the dispersion matrix as can be seen in Figure 8. Indeed, the high correlation across all wavelengths is consistent with grey fluctuation. However, some effects observed in the dispersion matrix are not due to the grey effect, but to the unmodeled variabilities not captured by SUGAR in certain spectral zones, which explains some features observed in the dispersion matrix and in the Figure 7.

Time interpolation using Gaussian Process
Now, we want to extend our model from maximum light to the full spectral time series. Spectra from the SNfactory are taken at different epochs in the development of each SN Ia. The treatment of these observations taken at different times is an important step to handle before constructing the full SED model. In order to work with a fixed time grid, it is necessary to use a method of interpolation. One solution is to use Gaussian Processes, as was done for SNe Ia in works like Kim et al. (2013), Fakhouri et al. (2015) and Saunders et al. (2018). A complete review of Gaussian Processes can be found in Rasmussen & Williams (2006). The Gaussian Process implementation presented here is based on this review. However, we have developed a specific implementation in order to take into account wavelength dependencies and to accelerate computation of the interpolation. These are described below.
A Gaussian Process is a generalization of the Gaussian probability distribution, and is a non-parametric way to interpolate data. The main assumption of Gaussian Processes is that observed data are the realization of Gaussian random fields that are characterized by an average function and a correlation function. Therefore, the distribution of SNe Ia magnitude m at a given phase t and a given wavelength λ follows this relation: where m 0 (t, λ) is the average function of the SNe Ia SED, and K λ corresponds to the correlation matrix (commonly called the kernel) that describes time correlations between different epochs. In the absence of an explosion model that would yield the analytical form of K λ , we choose a squared exponential kernel, to which the measurements and calibration uncertainties are added: where φ 2 λ represent the spectral variance around the average function for the wavelength λ, l λ corresponds to the temporal correlation length for the wavelength λ, t i and t j correspond to the phase of observation i j, and σ grey is the grey error taken to be 0.03 mag. All wavelengths are treated independently of each other, i.e. the interpolation is performed on y λn , the n th SN Ia light curve for each wavelength λ from which the average function has been subtracted. For a given set of global hyperparam-eters (φ λ , l λ ), the interpolation y λn and the covariance matrix of uncertainties on the interpolation, cov y λn , are given by : where the vector σ 2 λ is the error on the magnitude for the wavelength λ and for all observed phases. t is the new time grid and t the observed phases of the SN Ia. Gaussian Processes assume that the observed data are distributed around an average function. This function is a necessary input, because in a region with no data points, the Gaussian Process interpolation converges to this average function (after ∼ 3 temporal correlation lengths). The calculation of this average function is performed as follows. The training spectra are binned according to phase from maximum light in B-band, ranging from −12 to +48 days. Each phase bin spans two days and contains at least 3 spectra. The weighted average spectrum is computed for each phase bin and given as a first approximation of the average function. However, due to some lower signal-to-noise spectra, this version of the average function is not smooth. A Savitsky-Gollay filter is used to smooth the average function, which is then used for the Gaussian Process interpolation.
What remains is to determine the pair of hyperparameters θ λ = (φ λ , l λ ) that represent, respectively, the amplitude of the fluctuation around the average function and the temporal correlation length. To estimate the hyperparameters for each wavelength, we maximize the product of all individual likelihoods: This differs from previous work for two main reasons: the wavelength dependence of the hyperparameters, and their estimation using all the SNe Ia. These two new features are justified mainly by SNe Ia physics and computation time. Since SNe Ia are standardizable candles, we treat them as realizations of the same Gaussian random field, i.e. we assume that they share the same set of hyperparameters. In this case, we can use all available SNe Ia to estimate these hyperparameters. The wavelength dependence of the hyperparameters is due to e.g. dust extinction, intrinsic variability, the second maximum in infrared, etc. It follows that the standard deviation around the average function and the temporal correlation length should vary across wavelength. Moreover, the wavelength dependence allows us to be very efficient in terms of computation time. Indeed, it sped up matrix inversion from the classical O(N 3 where N λ is the number of bin in wavelength (197 here), and N t the number of observed phases (∼ 14 in average). On a Mac Book Pro with a 2.9 GHz Intel Core i5 processor and 8 GB of RAM, finding all hyperparameters (2×197) and computing the interpolation and the pull distribution of residuals took under 6 minutes.
Results of hyperparameter adjustment in terms of wavelength are shown in Figure 9. First, it can be seen that the structure of hyperparameters as a function of wavelength is not random. Indeed, for the parameter φ λ , its value varies from 0.3 magnitude to more than 0.6 magnitude: this number is difficult to interpret directly because it is the amplitude around an average value at all phases of the data. Nevertheless, the increase in the ultraviolet and blue wavelengths can be explained by the dispersion caused by dust extinction of the host-galaxies and a larger intrinsic dispersion in this wavelength range (Maguire et al. 2012;Nordin et al. 2018). There are also structures in the peaks of this intrinsic dispersion that correspond to the regions of Ca ii H&K, Si ii λ6355 and Ca ii IR lines. The correlation length, l λ , has structure in the silicon and calcium regions and remains globally stable with no features between 6000Å and 8000Å (average value of 7 days).
Once the hyperparameters have been estimated, the interpolations for each SN Ia are performed onto a new phase grid. The phase grid chosen covers a range between −12 and +48 days around maximum brightness in B-band, divided into 3 day bins.
Finally, to check if both the predictions (interpolation) and associated errors are in agreement with what is expected from a Gaussian Process, we compute the pull distribution of the residuals. Because we assume that our data are realizations of a Gaussian random fields, the pull distribution should follow a centered unit normal distribution. For each wavelength we fit a Gaussian to the pull distribution and calculate its standard deviation. The results are shown in the bottom of Fig. 9. The pull standard deviation is on average lower than 1 with an average value of 0.8.
These interpolation results will be used to establish the SED model that is described in the following sections.

The model
The SUGAR model assumes that SED variation at any epoch may be described by spectral features measured at maximum. The SUGAR SED model is based on a linear combination of factors derived in Section 3.2.3 and an extinction curve taken as a Cardelli et al. (1989) law derived in Section 3.3. To model the SUGAR SED, we propose a model similar to the one we P.-F. Léget & SNfactory: SUGAR model constructed for maximum light. This SUGAR model is based on the three factors for each SN Ia i, combined into the vector x i ≡ {h i,1 , h i,2 , h i,3 }, which in the model is the true value of the measured factor q i = {q i,1 , q i,2 , q i,3 }, and are related to the intrinsic magnitude by: where M 0 (t, λ) is the magnitude of the average spectral time series, α j (t, λ) is the intrinsic variation related to the factor h i, j ; as discussed in Section 3.2.3, the numbers of factors is set to three. As at maximum, h i, j represents the true value of the measured factor q i, j derived in Section 3.2. A V,i is the extinction in the Vband, and f (λ, R V ) is the Cardelli et al. (1989) law where R V is the extinction ratio that is derived in Section 3.3. Finally, the term ∆M grey i is a grey and time independent term comparable to the parameter X 0 of SALT2, which allows us to work independently of the distance. Indeed, the effect of the distance on magnitude is equivalent to an additive constant independent of wavelength. Moreover, the parameter ∆M grey i also contains the average of the spectral time series residuals with the SUGAR model. The addition of ∆M grey i differs from what was done when determining the extinction curve in Section 3.3, and this choice is discussed in Section 6.1. We model the SED at the same wavelengths and epochs where the Gaussian Process interpolation is done. For notational efficiency we rewrite Equation 22 as: The unknown parameters of the SUGAR model are therefore the average spectrum M 0 (t, λ), the intrinsic coefficients α j (t, λ), and the grey offset ∆M grey i . Each SN Ia is parametrized during the training by the q i, j determined in Section 3.2.3 and A V,i derived in Section 3.3. The average spectral time series M 0 (t, λ), the intrinsic coefficients α j (t, λ), and the grey offset ∆M grey i are determined in the same way as the modeled described at maximum light in the Section 3.3. The main differences with Section 3.3 are that the model now depends explicitly on the phases of observation, the extinction curve is fixed at the maximum light value (same R V ), and the model is trained on the Gaussian process interpolation of spectra derived in Section 3.4 instead of measured spectra. The procedure is described below.

Fitting the SUGAR model
We use an orthogonal distance regression method to estimate the parameters of the SUGAR model. This is a version of the method described in Section 3.3, modified to include the Gaussian Process interpolation, the known extinction parameters, and the grey offset. To estimate the SUGAR parameters, we minimize the following χ 2 : where the vector M gp i are the Gaussian Process interpolations of the observed spectra (Section 3.4) which is in the following vector format: where y λ i is the light curve at wavelength λ interpolated by Gaussian Processes onto new time phase as described in Section 3.4. As in Section 3.3, the uncertainties affecting the x i are propagated through the weight matrix W x i . The h i plays a role analogous to the q i of Section 3.2 and are estimated like the other free parameters of the model. Finally, the matrix W M i is the weight matrix of the SN Ia i and follows this matrix format: where cov y λ i is the covariance of interpolation uncertainties from the Gaussian Process interpolations determined in Section 3.4. The interpolation is done only in terms of the phase relative to B-band maximum, this makes the weight matrix block diagonal for each SN Ia. In this analysis, we do not add any covariance in terms of wavelength coming from the calibration uncertainties, for reasons of computation time, and because they are expected to be small. These block diagonal matrices allowed us to accelerate the minimization algorithm. To estimate the 16653 free parameters of the SUGAR model, it took just less than 1 hour on a Mac Book Pro with a 2.9 GHz Intel Core i5 processor and 8 GB of RAM, thanks to sparse linear algebra. Note that while the covariance terms were neglected, the uncertainty at each wavelength was taken into account during the training of the Gaussian Processes (these are the blocks cov y λ i ). The unknown parameters of the model (the matrix A and vectors h i ) are computed by minimizing equation 26 using an expectation and minimization algorithm. This algorithm, described in Appendix C, avoids degeneracies between parameters.

The SUGAR components
The SUGAR model is described by a set of spectral time series components (M 0 , α 1 , α 2 , α 3 ) at 3 day intervals. The SED of a given SN Ia is then obtained by a linear combination of those components employing the (q 1 , q 2 , q 3 ) factors describing the supernova. For illustrative purposes, we present a spectral time series in 6 day intervals as well as their integration in synthetic top-hat filter system comprised of five bands with the following wavelength ranges: The upper and lower contours correspond to a variation of ±1σ of the associated q j parameter. The α j (t = 0) components obtained when training the color law with only data at maximum or with the full time series are almost identical: all conclusions from Section 3.3.3 remain valid and we focus here on the temporal behavior.
Of the three factors, q 1 has the strongest impact: this can be seen from the time series presented in Figure 10 and on the broad band light curves presented in Figure 11. This is associated with a stretch effect, visible inÛ,B, andV bands as an enlargement of the variation band as we move away from maximum. Between −12 days and +30 days, the q 1 factor has a strong influence on spectroscopic details. The brightest SNe Ia exhibit shallower troughs in their spectral features: this is especially visible for Si ii λ4131, Si ii λ5972 and Ca ii IR, but also valid for most of the lines. This effect is more pronounced for SNe Ia at earlier phases, and fades over time. After +30 days, the factor q 1 is less sensitive to localized features, and shows a relative enhancement of the opticalB toR bands with respect toÛ and I. Interestingly, between +12 days and +24 days, the brightest SNe Ia are bluer than at maximum or at later phases. This is true in absolute value, but also relative to the average SN Ia. This shows that the SNe Ia color space is driven by a variable intrinsic component in addition to reddening by dust. This intrinsic color variation is linked with the position of the second peak in I, which appears later for brighter SNe Ia.
As at maximum, the effect of the q 2 factor for the whole time series (Figure 12) is mainly localized around specific spectral features, with little impact on broad band light curves (Figure 13), with the exception ofÎ-band. This factor associates higher line velocities around maximum with deeper absorption troughs, strongly visible before maximum and up to +18 days in the Ca ii H&K region, and at all phases in the Ca ii IR region. The net effect described by q 2 is that higher velocities are associated with slightly dimmer SNe Ia. At later phases velocity effects are still observable, and at all phases after −6 days, high-velocity SNe Ia have localized differences in the 4700 − 5100Å spectral region. In addition, they are bluer inV,R, andÎ after maximum, with a maximal effect on theÎ-band at around +12 days. A slight stretch effect is visible inÛ andB, and also results in a later phase for the second maximum inÎ for the brightest SNe Ia in B, as was mentioned for q 1 . The shape of variations of the light curve inÎ are very different for each factor, indicating that this band can help reconstruct the q 2 factor when using photometric data only. Furthermore, while the q 2 factor has a small impact on the variations of individual light curves, it has a sizable influence on colors, especiallyR −Î,Û −B, and after +25 days, onB −V. In addition to theÎ-band, the different color variation pattern in B −V at late phases offers a way of disentangling the q 1 and q 2 factors when dealing with photometric data.
As seen in Figure 14, the influence of q 3 is minimal around maximum light, similar to the results from Section 3.3. It grows at other phases and appears as a stretch effect on the light curves presented in Figure 15. Unlike q 1 , the stretch described by q 3 shows little correlation with the magnitude at maximum. While q 3 has a larger impact on the light curves than q 2 , the reverse is true for individual colors. At late phases, q 3 exhibits a brighterredder correlation that might be employed to distinguish this vector from q 1 when using photometric data only. The influence of q 3 on spectral structures is mainly visible in calcium regions, although variations can also be observed in other regions. Around Ca ii IR, between −6 days and maximum, brighter SNe Ia exhibit a deeper absorption trough at high calcium velocity. This high-velocity calcium feature is visible for all q 3 values at −12 days, but fades faster for dimmer SNe Ia. Meanwhile the lower velocity counterpart is only visible in q 3 for the dimmest supernovae at −12 days, and appears later for brighter SN Ia.    Fig. 13: Average light curves and color evolution with phase in synthetic bands, and the predicted effect of a variation of q 2 by ±1σ.

Fitting spectra with SUGAR
The measurement of spectral indicators is very sensitive to the signal-to-noise ratio of the spectrum, and at high redshift, measuring the indicators in the observer's reference frame is not possible. However, once the model has been trained, the parameters    q 1 , q 2 , q 3 , A V , and ∆M grey for a given SN Ia can be directly estimated from its spectral time series. Estimating the parameters q 1 , q 2 , q 3 , A V , and ∆M grey is done directly on spectra by minimizing the following χ 2 i : Article number, page 13 of 25 where M i is the vector containing the entire spectral time series of SN Ia i, ordered in the same way as for the SUGAR training: where y λ i corresponds to the light curve at wavelength λ for the observed phases. The model is projected onto observation phases using cubic spline interpolation. For notation efficiency, it is thus ordered as: The vector x i contains the parameters that describe the SN Ia i according to the SUGAR model: Finally, W M i is the weight matrix that comes from the M i errors. Estimating x i amounts to minimizing the χ 2 i with respect to x i , giving the solution: Therefore the covariance on the x i is given by: The term x i is computed for the 105 SNe Ia that were used to train SUGAR. The values obtained are compared with the factors and A V measured at maximum light in Figure 16. Both quantities are highly correlated. However, this correlation decreases with decreasing order of the factor index. This is because the factor uncertainties increase with index due to the noise being greater for higher-order components. The increase in noise therefore reduces the correlation. The extinction term is quite compatible between its maximum estimate and its estimation with SUGAR.
The term x i is also computed for the 58 SNe Ia kept for validation. The distribution of the x i parameters from the validation sample is compared to the x i parameters from the training in Figure 17. From this, one can see that the training and validation samples are compatible, as will be confirmed in following sections.

Difference between SUGAR and SALT2 models
Throughout this article, we use the SALT2 model as a benchmark to evaluate the properties of the SUGAR model, because it has become a reference for cosmological analyses. SALT2 describes the SED of SNe Ia in the following functional form (Guy et al. 2007): where the observed flux F depends on the individual SN Ia parameters (X 0 , X 1 , C) and on the model average flux F 0 (defined up to a multiplicative factor), the model flux variation F 1 , and an empirical color law C L . In the regime where X 1 F 1 is small with respect to F 0 , this model, once expressed in magnitudes, can be identified with the SUGAR model: (F 0 , F 1 , C L ) and (X 1 , C) take the roles of (M 0 , α 1 , f (R V )), and (q 1 , A V ), respectively. ∆M grey is the equivalent of a linear combination of (log X 0 , X 1 , C).
Although the major differences between the models come from the inclusion of 2 additional intrinsic components in SUGAR, its color law, and the data on which they were trained, there are many other differing characteristics, which are summarized Table 1. The inclusion of two additional intrinsic components is enabled by the level of detail available in our spectrophotometric time series. We therefore expect the SUGAR model to provide a more faithful representation of the spectral details than SALT2. Though analogous in spirit, X 1 and q 1 are derived employing a quite different paradigm: F 1 comes from a PCA analysis performed in flux space, while the α i are trained in magnitude space and driven to reproduce the prominent spectral indicators. A direct generalization of the SALT2 approach with more components is presented in Saunders et al. (2018). While the empirical SALT2 C L function models the average color variation, our color curve was shown to be accurately described by a CCM extinction law. The SALT2 color parameter c thus mixes information about reddening by interstellar media (described by A V ) with intrinsic SNe Ia properties that were not directly matched with one of the q i parameters. These differences in the treatment of colors would manifest in long-range spectral variation (because the extinction curve is smooth). In regions where the phase coverage is sufficient, the different phase interpolation methods used are not expected to contribute significantly to differences between models. Otherwise, and for early phases in particular, we expect the differences to be driven by the training samples rather than interpolation technique. The wavelength and phase coverage of SUGAR are restricted compared to SALT2. This is directly related to the training data: SUGAR is trained on low redshift spectrophotometric time series, while SALT2 is trained on a wide redshift range of pho-tometric data, employing a handful of spectra to help refine the SED details. As SALT2 uses low and high redshift data it offers a better coverage of the UV domain. It also has a larger phase coverage thanks to the rolling cadence strategy, while SNfactory SNe Ia have to be specifically targeted. While SALT2 could be easily trained with SNfactory data added to their sample, the reverse is not true: major adaptation would be needed to allow SUGAR to incorporate photometric data in its training. However, this does not prevent fitting photometric light curves with SUGAR in the same way as with SALT2, provided they fall within the phase and spectral coverage of SUGAR. In contrast, SALT2 was not designed to fit spectrophotometric data. Color law (third order polynomial) Cardelli et al. (1989) law Error on the model Yes No

Spectral residuals
The SUGAR model is designed to improve the spectral description of SNe Ia by adding components beyond stretch and color. While the ideal model would perfectly match the data, the level of mismatch can be quantified by the residuals, i.e. the difference between the left and right term of Equation 22. The improvement with respect to SALT2 is then studied by comparing the data to the spectral reconstruction predicted by both models. One example of these comparisons is presented Figure 18, which shows the spectra and residuals for SN-training-4. For this supernova, SUGAR gives a better description than SALT2. The improvement is especially clear in the UV after +17 days and in the infrared at any phase, and particularly in the O i λ7773 and Ca ii IR regions. This is not surprising insofar as SALT2 is essentially trained to reproduce the bluer part of the spectrum. The SUGAR description is also better in the regions around lines, even if this description is not totally satisfactory in some areas: Ca ii H&K, Si ii λ6355, and the bluer part of the spectrum at late phases. In the B-band, the reconstruction of the spectral details is quite similar between SUGAR and SALT2, although SUGAR seems to follow more faithfully the spectral details, as can be seen in the Mg ii area.
SN-training-4 is only one example among the 105 SNe Ia on which SUGAR was trained. To have a statistical description of the precision of the models, we derive the dispersion of the residuals obtained for all SNe Ia as a function of wavelength and phase. In Figure 19, we present the wRMS of this dispersion: for each wavelength, we compute the wRMS across all phases and weight it by the spectral variance. To show how the accuracy of the model evolves when going from a classic Tripp (1998) relation (when only q 1 and A V are included) to the full model, the wRMS for SUGAR is computed three times with progressive inclusion of q 1 , q 2 , and q 3 . We compute the residual dispersion for both the training and the validation sample. What was observed for SN-training-4 is confirmed in this statistical analysis across all SNe Ia: at any wavelength, SUGAR gives a better description than SALT2. The strongest improvement is in the IR, but is also significant in the UV. All the spectral details are significantly improved, such as the Ca ii H&K, Si ii λ6355, the O i λ7773, and the Ca ii IR. The best performance of the SUGAR model is obtained in the region around 6500 Å, where the dispersion gets as low as 0.05 magnitude. It should be kept in mind that the spectral dispersion includes noise variance that may be high in some spectra and is therefore not representative of the dispersion expected in photometry after integration over broad band filters. The phase evolution of the wRMS is presented in Figure 20: the dispersion in each phase bin is calculated across all wavelengths. This includes wavelengths for which SALT2 is not adapted: this ex-plains relatively high values of wRMS for this model compared to SUGAR.
The successive inclusion of q 1 , q 2 and q 3 can help us understand the origin of SUGAR's improvement over SALT2. As the training samples for both models are different, one may wonder what SALT2 would have given if trained on SNfactory data. This situation can be studied when including only q 1 : SUGAR| q 1 is slightly worse than SALT2 between Si ii λ4131 and Si ii λ6355 features, and between Si ii λ6355 and O i λ7773. These regions cover the wavelength range where SALT2 was trained and optimized, while SUGAR| q 1 is only the by-product of the 3 component model and is not by itself optimal for a single component. Earlier phase are more difficult to reproduce in both models: this could be an effect of higher variability beyond stretch at those phases. For later phases, SUGAR| q 1 does not exhibit the same increase in wRMS as SALT2. This is also an effect of the lack of coverage for IR wavelength for SALT2. One could therefore expect SALT2 to behave at least as well as SUGAR| q 1 in UV and IR when trained on SNfactory data. As expected, the inclusion of additional components improves the accuracy of SUGAR: q 2 improves the description of the spectral features because it is correlated with (Ca ii H&K, Si ii λ6355 and Ca ii IR) while one has to wait for the inclusion of q 3 to reach the full SUGAR precision.
The improvement brought by SUGAR is thus enabled by two factors: the spectral coverage and the spectral resolution that allows the inclusion of the new components, q 2 , which improves the color description, and q 3 , which helps reduce the dispersion away from maximum light. Moreover, the improvement added by q 2 and q 3 are also seen in the validation sample, which confirms that these components are not due to overtraining.

Relation between SALT2 and SUGAR parameters
In Section 3.2 we presented the correlations between the SALT2 parameters and the first five factors found using factor analysis. After training the full SUGAR model and reconstructing the parameters of individual supernovae as described in Section 5.1, we can study the correlations between the SALT2 parameters (X 0 , X 1 , C) and the SUGAR parameters (∆M grey , q 1 , q 2 , q 3 , A V ). The direct comparison of X 0 and ∆M grey is irrelevant due to the different prescription for lifting the degeneracies of both models. We recognize ∆M grey as the Hubble diagram residuals, a quantity that we can also compute from SALT2 parameters (and will denote as ∆µ corr ). The results of this comparison between SALT2 and SUGAR parameters are presented in Figure 21. The lessons from the correlations of SALT2 parameters and the original factors still hold: X 1 is strongly correlated with q 1 and q 3 ; q 2 has no significant correlation with any of SALT2 parameters, except with ∆µ . The small correlation of q 2 and q 3 with ∆µ shows that the inclusion of this factor can help improve the standardization. Mag AB (t, ) +Cst. On validation data SALT2.4 (X 0 + X 1 + C) SUGAR ( M grey + q 1 + A V ) SUGAR ( M grey +q 1 + q 2 + A V ) SUGAR ( M grey +q 1 + q 2 + q 3 + A V ) The SALT2 color C is strongly correlated with A V , as would be expected since the purpose of these two parameters is to take into account the reddening of SNe Ia. Contrary to the results from Section 3.2 with the factors alone, C now exhibits a 4.5σ correlation with q 3 . This correlation might come from the slight fluctuation in color with respect to phase visible in Figure 15 and the inclusion of the validation sample. The slight correlation of C and ∆M grey comes from the correlation of ∆M grey and A V . Indeed, ∆M grey and A V were determined separately during the training, without fixing potential degeneracy between them. This explains why they are correlated. Finally, ∆µ has a strong correlation with ∆M grey . This indicates that the inclusion of the additional factors in SUGAR is unable to catch the major source of magnitude variability of SNe Ia, even if it provides a much improved description of spectral details.

Grey dispersion and dispersion matrix fitting
For the model at maximum light, we did not fit a grey term but did include a dispersion matrix while estimating the extinction curve. However, for the full spectral time series, we fit for a grey offset and used the extinction curve determined at maximum light. One might wonder why we did not do a simultaneous fit of the full spectral time series, the grey offset and the extinction curve. Our decision was motivated by the following logic.
In order to perform a simultaneous fit, one important thing is to have a good estimate of the dispersion matrix, which is essential for the estimation of the extinction curve, as discussed in Scolnic et al. (2014). In addition, this matrix would have the advantage of taking into account temporal correlations of residuals, in addition to the color correlations of the residuals, which would improve the model developed at maximum light.
However, from a numerical point of view, this would break the sparse algebra approximations that significantly speed up the training. Indeed, the sparse algebra (due to Equation 28) makes it possible to improve speed efficiency of minimization of Equation 26 from O (N λ N t N param ) 3 to O N λ × (N t N param ) 3 , where N λ is the number of wavelength bin, N t the number of bin in terms of time, and N param is the number of parameters to fit for a given wavelength and epoch. In SUGAR training,N λ ∼ 190,N t ∼ 30, and N param = 5 (average spectrum + 3 factors + extinction curve), consequently, not doing sparse algebra will significatively slow down the speed of the training algorithm by O N 2 λ . Moreover, there would be problems with numerical stability for the estimation of the dispersion matrix due to its size; in the case of the simultaneous fit: it would by ∼ 20000 × 20000. Indeed, it would be necessary to make sure that the dispersion matrix is positive definite, which would involve doing a Singular Value Decomposition that will again slow down the speed of the training algorithm. In addition, it is evident that the grey offset is degenerate with the dispersion matrix because the dispersion matrix can fully capture a grey dispersion. Therefore we deliberately did not include it explicitly in the fit of the extinction curve within SUGAR in Section 3.3. There is however a degeneracy between A λ 0 and the grey offset, and part of the latter is captured by this parameter.
The SUGAR model presented here is already a significant improvement over the SALT2 model and provides insights into understanding SN Ia variability.

Test of adding an additional component
In the Section 3.2, we discussed the number of factors needed to describe the final SUGAR SED, and we concluded that this cannot be determined only from the factor description of spectral features at maximum light. Indeed, even if they are strongly related, the main goal is to know the number of components needed to describe the full SED and not the number of components needed to describe the spectral features space at maximum light. One way to check if the choice of three factors used here is optimal is to retrain the SUGAR model with more than three components and observe how the spectral residuals evolve with this change.
In the following, we ran the training of SUGAR twice, each time adding an additional component, i.e. we reproduced Sec. 4.2 with q 1 -q 4 and q 1 -q 5 components. In both cases the value of R V = 2.6 found with three factors is fixed in order to focus only on intrinsic parameters. In Fig. 22 we compare the spectral residuals of the SUGAR model to those of SUGAR trained with the additional factors. As expected, the addition of the two new components from factor analysis does not improve the description of SNe Ia as significantly as the addition of q 2 or q 3 . The factor q 4 slightly improves the description within the Ca ii H&K area (0.05 mag) and the Si ii λ6355 (0.03 mag), but does not improve the SED outside these areas. The factor q 5 does not significantly improve the description of the SED. This is confirmation that our choice of using three factors provides a good description of the SED.

Conclusion
In this paper we have presented a new spectra-temporal empirical model of SNe Ia, named SUGAR. This model significantly improves the spectral description of SNe Ia compared to the current state of the art by going beyond the classical stretch and color parametrization.
In Section 2, we presented the SNfactory spectrophotometric dataset that was used to train the model. In Section 3, we presented the intermediary data that were used to train the full SUGAR model. In a first step, we selected a set of 13 spectral indicators near maximum light in B-band that are composed of pseudo-equivalent widths and minima of P-Cygni profiles. Those spectral indicators were chosen to describe the intrinsic part of the SUGAR model because they are easy to define and are independent of host-galaxy dust extinction. Then, we defined a new basis where the 13 spectral indicators are uncorrelated.  For this we developed a factor analysis algorithm that is more robust in the presence of errors than PCA algorithms. Three factors seems to be effective enough to describe the spectral indicators space at maximum light. The first factor describes the coherent variation of the pseudo-equivalent widths, mainly of the silicon and calcium lines. Like those lines, this factor is strongly correlated with stretch. The second factor is mainly correlated with the velocities and shows a very weak link with pseudo-equivalent widths, except for the Ca ii H&K and S ii W. The third factor shows a slight correlation with the stretch parameter. Once the factors have been defined, we established a model of SED at maximum light based on these three first factors from spectral features, using the same underlying method as in Chotard et al. (2011). This allows us to separate the intrinsic behavior from color variation due to dust. Finally, we find that the color curve obtained is compatible with a Cardelli et al. (1989) extinction curve with R V of 2.6. We then developed an interpolation method using Gaussian process in order to train the SUGAR model on a uniform and fixed time grid. The correlation length obtained varies between 5 and 12 days depending on spectral regions, which justifies the use of a 3-day time step for the grid. The uncertainty from the Gaussian Processes is generally underestimated by a factor of 1.3 and more investigation is needed to understand why this occurs. The training process for the SUGAR model is described in Section 4, and an interpretation of each new component is provided. Both factors q 1 and q 3 resemble a stretch effect, but q 3 has less impact around maximum light and in color space than q 1 . The effects of q 2 are strongest in the areas of spectral features, and mainly evident in the infrared as compared to broad band photometry.
After calculating the model, we showed that, instead of going through the calculation of spectral indicators, we can work directly with the spectral time series to recover the three factors and extinction parameter. By studying model residuals as a function of wavelength, it is shown that SUGAR improves the spectral description 0.1 to 0.4 mag with respect to SALT2. This is valid for both training and validation data sets, which confirms that there was no overtraining resulting from the addition of q 2 and q 3 . This shows that three parameters, defined at a given phase (i.e. maximum light) have predictive power at other phases. Performance of the SUGAR model makes it an excellent candidate for use with surveys such as ZTF, LSST or WFIRST, and offers an alternative way of going beyond stretch and color to measure distance with SNe Ia.

Appendix A: Spectral-indicator measurements
As the spectral indicators are a key ingredient of our statistical analysis, we need a robust and automatic algorithm to derive them from the rest-frame spectra and estimate the associated statistical and systematic uncertainties. The main issue to be addressed is the automatic detection of feature boundariesusually local extrema -as they shift both along the phase and from one supernova to another. Sometimes, the extremum itself cannot be uniquely defined, e.g., when there is a mix of several local extrema at the same position or no significant extremum such as in the blue edge of the O i λ7773 feature. The photon noise will also limit the accuracy when determining the wavelength of the maximum, and may induce a systematic shift due to the non-linear process involved in finding an extremum. In this section, we give a brief overview of the general and automatic method developed and implemented by Chotard (2011) to measure these spectral indicators and evaluate their corresponding uncertainties. This method shares some similarities with previous analyses (Folatelli 2004;Garavini et al. 2007;Nordin et al. 2011;Blondin et al. 2011;Silverman et al. 2012): it is based on extrema searches in a fixed spectral domain after smoothing the data. Improving on previous studies, our smoothing procedure is based on an optimal determination of the regularization parameters, and more noteworthy, we developed a thorough determination of the uncertainties based on a Monte Carlo procedure taking as input the variance of our signal.

Appendix A.1: Method
Most of the spectral indicators presented here are defined by the position in wavelength and flux of at least one local minimum or maximum, i.e., peaks or troughs of the SN Ia spectral features. In order to compute a precise estimate of these local extrema, a Savitsky-Golay (Savitzky & Golay 1964, SG) smoothing is applied to the original spectra before the rebinning at 1500 km s −1 described in Section 2. Since the SG optimal window depends both on the underlying spectral shape and on the S /N ratio, which varies across wavelength, an independent smoothing is applied to each of the nine spectral zones of interest defined in table A.1. Each extremum is then selected on the smoothed spectrum as the local extremum inside a given wavelength range, which can be found in tables A.2 and A.3 for the equivalent widths and the feature velocities respectively. These wavelength ranges have been trained on a set of ∼ 50 SNe Ia within a phase range of ±5 days around B-band maximum light so that they match the observed diversity of our spectra. The positions in wavelength and flux of these extrema are then used in the spectral-indicator Table A.2: Definition of the wavelength regions (in Å) where the extrema are expected to be found. The b and r exponents respectively represent the left (blue) and right (red) peak. Si ii λ5972 5550 -5681 5850 -6015 7 Si ii λ6355 5850 -6015 6250 -6365 8 O i λ7773 7100 -7270 7720 -8000 9 Ca ii IR 7720 -8000 8300 -8800 measurements, either directly, e.g., for the velocities, or indirectly, e.g., for the equivalent widths through the definition of their pseudo-continuum.

Appendix A.2: Optimal smoothing
The purpose of smoothing, also known as regularization, is to transform the original noisy data in order to get closer on average to the unknown original spectrum, based on some regularity hypothesis. A parameter describing how smooth the final function is has to be introduced, and has to be estimated, either based on physical consideration or deduced from the data themselves.
Here we follow the latter approach and describe an optimal way of setting the smoothing parameter given a class of transformations.
Appendix A.2.1: Formalism Smoothing a noisy spectrum Y consists of the determination of the smoothed spectrum Y , which is a function of the initial flux Y = f (Y). Y will be an estimator of the noise-free and unknown original flux, convolved by the instrumental resolution: Y. It is related to the observed flux by: where N is a realization of the spectral noise vector. The goodness of the smoothing being represented by the error function, where W is the inverse of the noise covariance matrix, this requires minimization of a quantity which depends both on the noise properties and the signal shape. As the true spectrum,Ŷ, is unknown, we need to build an estimator of Y −Ŷ which is independent ofŶ. For a linear regularization, one can write: dispersion of 1.06 which indicates that our estimation of the uncertainty is valid up to a possible underestimation of the error by 6%. This number is small enough so that we can trust our uncertainty estimation for the main analysis.

Appendix B: Expectation-Maximization Factor Analysis
Dimensionality reduction in the presence of noisy data is often an overlooked problem. However, standard methods like Principal Component Analysis tend to fail at capturing the intrinsic variability of the data and the principal components will align with the direction of the noise when the latter becomes important. Factor Analysis on the other hand is a statistical method designed to model the covariance structure of high dimensional data using a small number of latent variables. It estimates both the natural variability of the sample and the noise arising from the measurements, under the assumption that the statistics are the same for all data records. Our case is slightly different: on one hand, the noise statistics are different for each measurement, but on the other hand, their variance is already known. We thus adapted the expectation-minimization algorithm presented in Ghahramani & Hinton (1997) to accommodate for the specifics of our problem. The resulting method is also known as Probabilistic Principal Component Analysis. The formalism is the following: x i is a vector of rank l representing the measurement i. It is linked to the factor q i , vector of rank k ≤ l and the noise η i by: where x 0 is a central value which can be further neglected without loss of generality (x 0 = 0), q i is assumed to follow a normal distribution of unit variance, and η i follows a multivariate normal distrubution of variance Ψ i . In our case, Ψ i is diagonal, a property that can be used to speed-up computations. Λ is the matrix containing the k explicative vectors that we need to determine. While Λ is not uniquely defined, ΛΛ T is and represents the intrinsic covariance of the data, that is, the one we would observe in the absence of noise. The eigenvectors of ΛΛ T thus correspond to the k first eigenvectors that principal component analysis would have found in the absence of noise. To find Λ, instead of directly maximizing the likelihood of observing x i , the expectation-maximization algorithm introduces the latent variable q i and then maximizes the expected likelihood over q i . The joint probability of x i and q i is the following multivariate normal distribution: The block-diagonal elements of the covariance matrix represent, respectively, the covariances of x i and q i , and the non blockdiagonal elements represent the covariance arising from the relation B.1. Expectation-maximization is an iterative procedure which ensures that the likelihood increases at each iteration and it has been shown that the convergence is faster than using a gradient method (Dempster et al. 1977). Each iteration proceeds in two steps. The first step, called the E-step, consists of calculating the expectation of the conditional first and second moments of q i for a given Λ : E-step: Once these two quantities are computed, the second step, or Mstep, consists of estimating the Λ matrix by maximizing the likelihood expectation, which provides the condition: In the case where Ψ i is diagonal, one can independently calculate each row of Λ, noted Λ j , according to the relation M-step: After the last iteration, the internal degeneracies of the description are lifted with the transformation of Λ into an orthogonal Λ matrix which satisfies the condition: The columns of Λ are aligned with the eigenvectors of ΛΛ T and the square of their norm are the respective eigenvalues.

Appendix C.1: Expectation and Maximization steps
Similarly to the factor analysis, h i and A will be estimated iteratively. The first step consists of minimizing the χ 2 with respect to h i , which amounts to solving the equation: ∂χ 2 ∂h i = 0 (C.1) and consequently gives: E-step: Once the E-step is performed we can estimate the matrix A by which gives : M-step: whereÃ is the vector that contains all columns of A put end-toend and the ⊗ operator is the Kronecker product. The estimation of the A matrix and the orthogonal projections h i is done by reiterating the E-step and M-step until χ 2 convergences.
where δM i are the residuals of the model once the minimum of the χ 2 is reached, and is defined for a given wavelength as: and C i is the covariance matrix that accounts for the total propagation of residuals error and is defined as: where α is the matrix that contains the intrinsic vectors, and is defined as: α = α 1 , α 2 , α 3 , ...

(D.4)
In order to ensure that the matrix D is positive definite, the negative eigenvalues of the matrix are set to zero. Once the matrix D has been calculated, we add it in the expression of the equation 9 in order to recalculate the spectral distribution in energy and we iterate the calculations of D and the computation of the SED, until reaching the maximum of the Restricted Maximum Likelihood (Guy et al. 2010).