Issue 
A&A
Volume 530, June 2011



Article Number  A50  
Number of page(s)  15  
Section  Extragalactic astronomy  
DOI  https://doi.org/10.1051/00046361/201016233  
Published online  06 May 2011 
A principal component analysis of quasar UV spectra at z ~ 3^{⋆}
^{1}
UPMC Univ Paris06, Institut d’Astrophysique de Paris, UMR
7095CNRS,
75014
Paris,
France
email: paris@iap.fr
^{2}
APC, 10 rue Alice Domon et Léonie Duquet,
75205
Paris Cedex 13,
France
^{3}
CEA, Centre de
Saclay, Irfu/SPP, 91191
GifsurYvette,
France
Received:
30
November
2010
Accepted:
1
April
2011
From a principal component analysis (PCA) of 78 z ~ 3 highquality quasar spectra in the SDSSDR7 we derive the principal components that characterize the QSO continuum over the full available wavelength range. The shape of the mean continuum is similar to that measured at lowz (z ~ 1), but the equivalent width of the emission lines is larger at low redshift. We calculate the correlation between fluxes at different wavelengths and find that the emission line fluxes in the red part of the spectrum are correlated with those in the blue part. We construct a projection matrix to predict the continuum in the Lymanα forest from the red part of the spectrum. We apply this matrix to quasars in the SDSSDR7 to derive the evolution with redshift of the mean flux in the Lymanα forest caused by the absorption by the intergalactic neutral hydrogen. A change in the evolution of the mean flux is apparent around z ~ 3 as a steeper decrease of the mean flux at higher redshifts. The same evolution is found when the continuum is estimated from the extrapolation of a powerlaw continuum fitted in the red part of the quasar spectrum if a correction derived from simple simulations is applied. Our findings are consistent with previous determinations using high spectral resolution data. We provide the PCA eigenvectors over the wavelength range 1020−2000 Å and the distribution of their weights that can be used to simulate QSO mock spectra.
Key words: methods: numerical / intergalactic medium / quasars: absorption lines / quasars: emission lines
Eigenvectors and projection matrix are only available at the CDS via anonymous ftp to cdsarc.ustrasbg.fr (130.79.128.5) or via http://cdsarc.ustrasbg.fr/vizbin/qcat?J/A+A/530/A50
© ESO, 2011
1. Introduction
At high redshift, most of the baryons are located in the intergalactic medium (IGM; e.g. Petitjean et al. 1993) where they are highly ionized by the UVbackground produced by galaxies and QSOs (Gunn & Peterson 1965), at least from z ~ 6 (Fan et al. 2006; Becker et al. 2007). The large absorption cross section of the H i Lymanα transition implies that the small fraction of neutral hydrogen in the IGM produces the socalled Lymanα forest, which is composed of numerous absorption lines detected in the spectra of highredshift quasars (see Lynds 1971; Rauch 1998,for a review).
Analytical models (Bi et al. 1992) and numerical Nbody simulations (Cen et al. 1994; Petitjean et al. 1995; Zhang et al. 1995; Hernquist et al. 1996; Theuns et al. 1998; Riediger et al. 1998) have been very successful at reproducing the properties of the Lymanα forest as measured from high spectral resolution, high SNR data obtained with the Ultraviolet and Visual Echelle Spectrograph (UVES) on the Very Large Telescope (VLT, e.g. Bergeron et al. 2004; Kim et al. 2007) and HIRES on the Keck telescope (e.g. Hu et al. 1995). The overall picture shows that lower columndensity H i absorption lines trace the filaments of the “cosmic web”, and higher columndensity absorption lines trace the surroundings of galaxies. Detailed studies of absorption line properties and of their clustering properties along one or several adjacent lines of sight give additional constraints on the ionization history, correlation length, matter power spectrum etc. (see e.g. Petitjean et al. 1998; Croft et al. 1998; McDonald et al. 2005; Theuns & Srianand 2006). The next generation of quasar surveys from BOSS (SDSSIII, Schlegel et al. 2007; Eisenstein et al. 2011) to BigBOSS (Schlegel et al. 2009) should provide the first detection of baryonic acoustic oscillations in the IGM at z ~ 2−3 (Slosar et al. 2009; White et al. 2010).
An important quantity to measure in a quasar spectrum is the mean amount of absorption in the Lymanα forest, D_{A}, defined as: D_{A} = 1 − ⟨F⟩ (Oke & Korycansky 1982), where F is the quasar normalized flux, F = F_{obs}/F_{cont}, F_{obs} is the observed flux and F_{cont} is the estimated unabsorbed continuum flux. The absorption can be defined by the mean effective optical depth as well, τ_{eff} = −ln⟨F⟩. These quantities are sensitive to the physical properties of the IGM and have been used to constrain Ω_{b} (Rauch 1998; Tytler et al. 2004), the ionization history (Rauch et al. 1997; Kirkman et al. 2005; Bolton et al. 2005; Bolton & Haehnelt 2007; Prochaska et al. 2009), and in particular the He ii reionization (Bernardi et al. 2003; Theuns et al. 2002). The latter could possibly induce a dip in the evolution with redshift of τ_{eff} at z ~ 3.2 (Schaye et al. 2000).
Bernardi et al. (2003) first discovered such a dip in the evolution of the effective optical depth in the Lymanα forest using SDSS spectra. The existence of a feature was later confirmed from highresolution studies (FaucherGiguère et al. 2008; Dall’Aglio et al. 2008) at a more modest statistical significance but at a coincident redshift.
Intermediate resolution data have been used as well (McDonald et al. 2005; Dall’Aglio et al. 2009) but the feature was not detected. However, the methods used may not be entirely appropriate and we come back to this point in the present paper. Note that FaucherGiguère et al. (2008) cautioned that (i) the dip interpretation is only valid if one insists on fitting a single powerlaw to the background evolution, for which there is no clear physical motivation and (ii) this feature is not necessarily caused by He ii reionization, and other interpretations are also possible.
The definition of the unabsorbed quasar continuum over the Lymanα forest is a critical point (e.g. Tytler et al. 2004; Kim et al. 2007). For lowresolution spectra, most analyses first define a continuum redwards of the QSO Lymanα emission, where there are only few absorption lines, and extrapolate the shape of the continuum in the Lymanα forest region (see Sect. 2.1 for more details). It is mostly commonly assumed that the QSO continuum in regions without an emission line is a powerlaw that can be easily extrapolated. However, this assumption usually neglects weak emission lines both in the red and, more importantly, within the Lymanα forest region. Because of this, Suzuki et al. (2005, S05) have applied a principal component analysis (PCA) to HST spectra of quasars at z ≤ 1. Quasar continua are described with a limited set of eigenvectors and a controlled sample is used to define a projection matrix that allows one to recover the continuum in the Lymanα forest from the shape of the continuum in the red part.
The shape of the quasar continuum can evolve from z ≤ 1 to z ~ 3. In this case, a PCA at z ≤ 1 would not give a fair representation of quasar continuum at z ~ 2−3. In this paper, after describing the procedures in Sect. 2, we take advantage of the large database provided by SDSSDR7 to define a large enough sample of quasars at z ~ 3 to which we can apply the same procedure as in S05 (Sect. 3). New eigenvectors and a projection matrix are generated and then used to predict the continuum of all SDSSDR7 spectra. We apply the method to determine the evolution with redshift of the mean flux in the Lymanα forest (Sect. 4) and discuss the significance of the bump at z ~ 3.2−3.4 before drawing our conclusions in Sect. 5.
2. Procedures
2.1. Different methods to estimate the QSO continuum in lowresolution spectra
The methods used to estimate the continuum in lowresolution spectra can be broadly classified as below:

A direct estimate of the continuum in the Lymanα forest region:

Using a spline interpolation: a cubicspline is interpolated on adaptative intervals betweenobserved data points in the forest to construct a local continuum. A correction is thenapplied to take into account that these data pointscan be affected by some absorption. Dall’Aglioet al. (2008, 2009)applied a systematic correction that accounts for resolutioneffects and line blending and is estimated from idealizedMonteCarlo simulated spectra. This approach, however,neglects the possibility of continuous absorption from thesmooth IGM (rather than discrete absorbers) as well as correlations from largescale structure. At highredshift, continuous absorption can be important and maycause the true continuum to be underestimated even afterapplying the MonteCarlo method to highresolution data.FaucherGiguère et al. (2008)developed an alternative method to correct the continuumplacement using cosmological simulations and showed thatthis effect is indeed important at the >10% level at z = 4.

Taking into account the difference between the continuum and absorption wavelength dependencies (e.g. Bernardi et al. 2003; Prochaska et al. 2009): the continuum is a property of the quasar and depends only to first order on the restframe wavelength (λ_{r} = λ/(1 + z_{em})); while the absorption depends on the redshift (z_{abs} = λ/1215.6701 − 1) only. Thus, if one separates the dependencies in the flux, F(λ_{r},z) = C(λ_{r})exp(−τ(z)), both quantities can be recovered in principle.


Using the red part of the QSO spectrum to predict the blue part:

A powerlaw is adjusted to the red part of the spectrum inregions free of emission and absorption lines(see Sect. 2.2). The powerlawis then simply extrapolated over the Lymanα forest wavelength range. This procedure does not account for weak emission lines in the Lymanα forest region.

A PCA applied to a reference sample describes the continuum of a quasar spectrum as a linear combination of eigenvectors. A projection matrix is generated and used to translate the weights of the eigenvectors describing the red side of the spectrum into the weights of the eigenvectors for the whole spectrum.

2.2. Determination of the position of the emission lines and the powerlaw continuum
Fig. 1 Illustration of the method used to estimate the emission redshift and powerlaw continuum of quasars (the quasar shown is SDSS J012156.03+144823.9). Grey areas indicate the regions used to fit the powerlaw and the red line is the estimate of the continuum (powerlaw + C iv and C iii] emission lines). Vertical dashed lines indicate the position of emission lines. 
We derive a redshift of the quasar using C iv and C iii] emission lines to be able to estimate the position of these lines and to avoid them when fitting the powerlaw. This is therefore not an attempt to derive the exact systemic quasar redshift, which is known to be shifted compared to the C ivC iii] redshift (e.g. Vanden Berk et al. 2001; Hennawi & Prochaska 2007).
Here we assume that the continuum redwards of the Lymanα emission line can be described as the sum of a powerlaw component and a Gaussian function for each of C iv and C iii] emission lines. The redshift and the powerlaw component are estimated as follows (see Fig. 1 for a typical example at z_{em} = 2.862):

1.
After convolving the whole spectrum with a Gaussian filter offixed FWHM = 250 km s^{1}, the position where the flux is maximum is associated to the QSO Lymanα emission line. This gives a first rough estimate of the redshift, z_{1}.

2.
The average of the positions of the maximum flux within a window of 100 Å in the restframe of the quasar (at z = z_{1}) around C iv and C iii] emission lines provides a second redshift estimate, z_{2}. This redshift should be more accurate than z_{1} because the peak of the Lymanα emission is a poor estimate of the redshift owing to the Lymanα forest and the blending with N vλ1240.

3.
A powerlaw component of the continuum is fitted using windows devoid of emission lines between 1430−1500 Å, 1600−1830 Å and 2000−2500 Å in the restframe (see grey windows in Fig. 1). This component is subtracted from the spectrum.

4.
Finally, C iv and C iii] emission lines are simultaneously fitted with Gaussian functions. The width and amplitude are independent parameters, but the two Gaussian functions are bound to have the same redshift, z_{3}.
The red line in Fig. 1 is the sum of the powerlaw and the two emission lines. The extrapolation of the powerlaw component bluewards of the Lymanα emission provides a first estimate of the quasar continuum (red line in Fig. 1).
2.3. Principal component analysis
We summarize here the main steps of the method as described in Francis et al. (1992) and Suzuki et al. (2005).
2.3.1. Reconstructed continuum
A representative sample of quasar spectra at the redshift of interest must be gathered for which it is possible to define a true continuum, q(λ), i.e. unspoiled by intervening absorption. Suzuki et al. (2005) used HST spectra at z ≤ 1 because the IGM is sparse at these redshifts and thus the continuum can be easily interpolated above absorption lines. The sample of SDSSDR7 quasars we used has a mean emission redshift of z ~ 2.9 and is defined in Sect. 3. We derived the true continuum, q(λ), by eye and used these fitted quasar continua in the following.
A covariance matrix V is first calculated for the N QSOs in the sample as ${V}\mathrm{\left(}{\mathit{\lambda}}_{\mathit{m}}\mathit{,}{\mathit{\lambda}}_{\mathit{n}}\mathrm{\right)}\mathrm{=}\frac{\mathrm{1}}{\mathit{N}\mathrm{}\mathrm{1}}\sum _{\mathit{i}\mathrm{=}\mathrm{1}}^{\mathit{N}}\left({\mathit{q}}_{\mathit{i}}\mathrm{\left(}{\mathit{\lambda}}_{\mathit{m}}\mathrm{\right)}\mathrm{}\mathit{\mu}\mathrm{\left(}{\mathit{\lambda}}_{\mathit{m}}\mathrm{\right)}\right)\left({\mathit{q}}_{\mathit{i}}\mathrm{\left(}{\mathit{\lambda}}_{\mathit{n}}\mathrm{\right)}\mathrm{}\mathit{\mu}\mathrm{\left(}{\mathit{\lambda}}_{\mathit{n}}\mathrm{\right)}\right)\mathit{,}$(1)where $\mathit{\mu}\mathrm{\left(}\mathit{\lambda}\mathrm{\right)}\mathrm{=}\mathrm{1}\mathit{/}\mathit{N}{\sum}_{\mathit{i}\mathrm{=}\mathrm{1}}^{\mathit{N}}{\mathit{q}}_{\mathit{i}}\mathrm{\left(}\mathit{\lambda}\mathrm{\right)}$ is the mean quasar continuum.
The principal components are found by decomposing the covariance matrix V into the product of the orthonormal matrix P, which is composed of eigenvectors, and the diagonal matrix Λ containing the eigenvalues: ${V}\mathrm{=}{{P}}^{1}\mathrm{\times}{\Lambda}\mathrm{\times}{P}\mathit{.}$(2)We call the eigenvectors (i.e., the columns of the matrix P) the principal components, ξ_{j}. The principal components are ordered according to the amount of variance in the training set they can accommodate, such that the first principal component is the eigenvector that has the highest eigenvalue.
The distribution of the weights, c_{j}, of the jth principal component in Eq. (4) is found from the distribution of the c_{ij} for all i = 1..N QSOs of the sample: ${\mathit{c}}_{\mathit{ij}}\mathrm{=}{\mathrm{\int}}_{\mathrm{1020}\AA}^{\mathrm{2000}\AA}\mathrm{\left(}{\mathit{q}}_{\mathit{i}}\mathrm{\right(}\mathit{\lambda}\mathrm{)}\mathrm{}\mathit{\mu}\mathrm{(}\mathit{\lambda}\mathrm{\left)}\mathrm{\right)}{\mathit{\xi}}_{\mathit{j}}\mathrm{\left(}\mathit{\lambda}\mathrm{\right)}\mathrm{d}\mathit{\lambda}\mathit{.}$(3)Note that the upper limit of the integration is higher here than that of S05. This is discussed further in Sect. 3. A mock continuum can now be constructed using Eq. (4) over the rest frame wavelength range of the spectra in the sample. $\mathit{q}\mathrm{\left(}\mathit{\lambda}\mathrm{\right)}\mathrm{~}\mathit{\mu}\mathrm{\left(}\mathit{\lambda}\mathrm{\right)}\mathrm{+}\sum _{\mathit{j}\mathrm{=}\mathrm{1}}^{\mathit{m}}{\mathit{c}}_{\mathit{j}}{\mathit{\xi}}_{\mathit{j}}\mathrm{\left(}\mathit{\lambda}\mathrm{\right)}\mathit{.}$(4)
2.3.2. Predicted continuum
The goal is to quantify the relationship between the red and blue sides of the spectra in the sample. The first m principal components ξ_{j}(λ) and their weights, c_{ij}, are derived as described above, using the whole rest wavelength range, 1020 to 2000 Å. Another set of m principal components, ζ_{j}(λ) and their weights, d_{ij}, are defined using only the red rest wavelength range, 1216 to 2000 Å. Finally, we solve linear equations to find a projection matrix relating c_{ij} and d_{ij}. Weights can be written in the N × m matrix form C = c_{ij} and similarly for D. We then use singular value decomposition techniques (Press et al. 1992) to derive the m × m projection matrix X = x_{ij} translating weights found with the red side only into the weights for the whole spectrum: ${C}\mathrm{=}{D}\mathrm{\xb7}{X}\mathit{.}$(5)Once matrix X is known, we can estimate the continuum over the Lymanα forest for any quasar spectrum from the red part of the spectrum. We proceed in three steps. The weights for the red spectrum are found, ${\mathit{b}}_{\mathit{j}}\mathrm{=}{\mathrm{\int}}_{\mathrm{1216}\AA}^{\mathrm{2000}\AA}\mathrm{\left(}\mathit{q}\mathrm{\right(}\mathit{\lambda}\mathrm{)}\mathrm{}\mathit{\mu}\mathrm{(}\mathit{\lambda}\mathrm{\left)}\mathrm{\right)}{\mathit{\zeta}}_{\mathit{j}}\mathrm{\left(}\mathit{\lambda}\mathrm{\right)}\mathrm{d}\mathit{\lambda}\mathit{.}$(6)The weights from the red side b_{j} are translated to weights for the whole spectrum, using ${\mathit{a}}_{\mathit{j}}\mathrm{=}\sum _{\mathit{k}\mathrm{=}\mathrm{1}}^{\mathit{m}}{\mathit{b}}_{\mathit{k}}{\mathit{x}}_{\mathit{kj}}\mathit{.}$(7)Then the continuum for the whole spectrum is built as $\mathit{p}\mathrm{\left(}\mathit{\lambda}\mathrm{\right)}\mathrm{=}\mathit{\mu}\mathrm{\left(}\mathit{\lambda}\mathrm{\right)}\mathrm{+}\sum _{\mathit{j}\mathrm{=}\mathrm{1}}^{\mathit{m}}{\mathit{a}}_{\mathit{j}}{\mathit{\xi}}_{\mathit{j}}\mathrm{\left(}\mathit{\lambda}\mathrm{\right)}\mathit{.}$(8)
3. New PCA continuum at z ~ 3
The eigenvectors and coefficients as derived by S05 at low redshift were used to generate mock spectra to test different analyses at high redshift (Dall’Aglio et al. 2008; Kirkman et al. 2005). To our knowledge, one attempt has also been made to derive a similar decomposition at high redshift using SDSS spectra by McDonald et al. (2005). These authors comment that this continuum determination is robust enough to infer the mean flux evolution, but unstable as far as the Lymanα power spectrum is concerned. However, the decomposition is not performed on a well controlled training sample as in S05 and in the present work (see below), therefore, they do not provide a projection matrix. The analysis performed by Yip et al. (2004) is closer to our purpose. They applied a PCA to the 16 707 Sloan Digital Sky Survey DR1 quasar spectra (0.08 < z < 5.41) and reported that the spectral classification depends on redshift and luminosity. No compact set of eigenspectra succeeds in describing the variations observed over the whole redshift range. Besides, it seems that there is a differential evolution with redshift of the coefficients. Since these authors are interested in the quasar continuum only, they do not try to recover the exact continuum over the Lymanα forest and consider only the observed flux (that is, continuum plus absorption).
Fig. 2 Result of the stacking of SDSSDR7 spectra with a damped Lymanα system at an absorption redshift higher than 3.7 and a column density N(H i) ≥ 10^{20.5} cm^{2}. The spectra are normalized to 1 near 1280 Å (in the quasar restframe). Owing to the presence of the DLA, the flux is expected to be equal to zero at observed wavelengths below 4280 Å. Evidently this is not the case in the very blue of the spectrum (λ_{obs} ≤ 4000 Å) where the mean flux is increasing. Consequently, pixels at wavelengths below 4000 Å are not used in this analysis. 
Owing to the large difference of redshift between the quasars used in S05 and those involved in any Lymanα forest study using SDSS spectra, one may wonder if there is any evolution in the continuum of quasars or any change in the correlation between the shapes of the continuum over the Lymanα forest and redwards of the Lymanα emission line compared to what is found by S05. Deriving new components at redshift 3 should answer this question.
List of SDSSDR7 quasars used to define the correlation matrix and the eigenvectors at z ~ 3.
The other motivation for this work is to provide principal components and distributions of coefficients over a larger wavelength coverage than in previous studies: the S05 matrix allows one to generate continua from Lymanβ to C iv emission lines, while in the present work we will extend the wavelength coverage beyond the C iii] emission line (until 2000 Å in the restframe). This should in principle facilitate the extrapolation in the blue.
3.1. Deriving new principal components from a subsample of z ~ 3 SDSSDR7 quasar spectra
The difficulty of this analysis is to estimate the continuum in the Lymanα forest, where absorption can be neither neglected nor easily removed because of the low resolution of the SDSS spectra. In particular it would be very difficult to define the true continuum automatically.
We first selected spectra with a signaltonoise ratio per pixel greater than 14 redwards of the Lymanα emission (the SNR is computed around 1280 Å in the restframe). We require the redshift of the quasars to be greater than 2.82 and lower than 3.00. The lower limit is chosen as such for two reasons: the Lymanα forest has to be complete (z > 2.7) and we noticed that there are some problems with the flux calibration at the very blue end of the SDSS spectra, which is why we would like to avoid this part of the spectrum. To illustrate these problems and estimate the exact wavelength at which to start the study, we selected spectra where a damped Lymanα system (DLA) is observed with aborption redshift greater than 3.7 and with a column density N(H i) ≥ 10^{20.5} cm^{2} (the list is available in Noterdaeme et al. 2009). In these spectra, and because of the presence of the DLA, the flux is expected to be equal to zero for λ_{obs} ≤ 4280 Å. When stacking the selected lines of sight (Fig. 2), we note instead that the flux increases for wavelength lower than 4000 Å. The difference with zero is as high as 0.05 for a normalized spectrum, meaning that this part of the spectrum should probably not be used for the analysis. Because the study of the forest is usually limited to beyond the O vi emission line (λ_{rest} > 1050 Å), the minimum emission redshift will be 2.82. The upper limit is a compromise between the number of spectra needed for the analysis and our ability to estimate confidently a continuum by eye: it has been set to z = 3. BAL quasars, lines of sight containing a DLA or any spectrum for which fitting a continuum is too risky because of missing pixels or reduction issues are removed from the analysis. The SDSS spectra are observed with two cameras, one for the blue and one for the red part of the spectrum and we were concerned about the presence of a possible discontinuity or break at the merging point. We avoided any spectra that were possibly affected by this. After we applied all those constraints, 78 spectra remained in the training set (Table 1).
Fig. 3 Spectrum and continuum of the quasar SDSS J134826.65+290623.0. This quasar belongs to the sample of SDSS z ~ 3 quasars that is used to derive the PCA eigenvectors (Sect. 3). Our estimate of the continuum is shown with the thick red line and a zoom of the Lymanα forest region is shown in the inset. 
Fig. 4 Mean flux evolution in the training set (with redshift bins of size Δz = 0.1; red triangles) is compared to FG08 measurements from highresolution and high signaltonoise spectra (black circles). The continuum of the 78 SDSS spectra in the training set were fitted by hand using spline interpolation. Error bars from our measurements were computed by bootstrapping pixels of our sample. Both measurements are consistent, which makes us confident in our continuum estimate. 
Once the training set was defined, spectra were smoothed and the continuum was “handfitted”. Redwards of the Lymanα emission line, we followed the different emission lines and ignored isolated absorption lines. In the Lymanα forest, the continuum cannot be uniquely defined. We assumed that the continuum has a smooth shape, that it roughly follows the peaks of the spectrum, and that the blending of lines at the SDSS resolution is large. Points are placed around the peaks of the flux, and a spline interpolation is used to connect them. After a first try, we minimized the number of points used and checked that the continuum is indeed located above the blends of lines, but at about the level of “flat” regions. A typical example is given in Fig. 3. Note that with this procedure we de facto take into account that the chosen points can be affected by some absorption. To check that this is indeed the case, the spectra were then rebinned with 0.5 Å restframe pixels and the mean flux evolution was computed and compared to FaucherGiguère et al. (2008) measurements from high and medium resolution and high signaltonoise spectra. Both estimates agree which gives us confidence in our handfitted continua (Fig. 4).
The procedure described in Sect. 2.3 was then performed and the correlation matrix was computed and is displayed in Fig. 5. In agreement with S05, a moderate correlation (0.3−0.6) is found between the shape of the continuum in the forest and the region between Lymanα and C iv emission lines. Thanks to the larger restframe wavelength coverage of this study, a moderate anticorrelation (from −0.6 to −0.4) between the shape of the continuum in the forest and in the region between C iv and C iii] emission lines is found. Suzuki et al. (2005) noticed that a PCA continuum has the good shape in the forest but that the amplitude of the powerlaw component is unstable. The anticorrelation in that extrapart of the continuum may improve the stability of the prediction of the continuum amplitude in the Lymanα forest. This is discussed in more details in Sect. 3.2.1.
The mean continuum is shown in Fig. 6 together with the mean continuum of S05 (z ≤ 1) and the composite spectrum derived by Vanden Berk et al. (2001). In the wavelength range of interest here, quasars contributing to the Vanden Berk et al. (2001) composite cover a redshift range from 2.13 to 4.789 for the Lymanα region and from 1.5 to 4.789 for C iv. Our mean continuum excellently agrees redwards of the Lymanα emission line with the Vanden Berk et al. (2001) composite. The discrepancy in the blue is simply caused by the absorption in the Lymanα forest that Vanden Berk et al. (2001) did not try to remove. When comparing to S05, one can see that the amplitude of C iv, Lymanα and Lymanβ emission lines relative to the continuum are less important in the SDSS spectra. While the determination of the C iv emission is relatively straightforward, the presence of absorption at the position of the Lymanβ emission line and in the blue wing of the Lymanα emission line makes the continuum difficult to estimate. Thus, the main and robust difference between the mean continua in SDSS and HST spectra is the variation of the C iv equivalent width. This evolution of the QSO emission line equivalent widths has been noted for a long time (Baldwin 1977) and has also been reported by Zheng et al. (1997). The latter authors used 101 HST spectra to compute a lowz composite spectrum (90% of the quasars had a redshift lower than 1.5) and compared it to the Francis et al. (1991) composite (z ~ 3). This evolution is probably related to the quasar luminosity. Note that no evolution is found by Fan (2009) from z ~ 2 to z > 6.
Fig. 5 Correlation matrix computed with the training set. A moderate correlation (0.3−0.6) is found between the shape of the continuum in the Lymanα forest (1020 ≤ λ ≤ 1210 Å) and in the region between Lymanα and C iv emission lines (1216 ≤ λ ≤ 1600 Å), in agreement with S05. We also note a moderate anticorrelation (from −0.6 to −0.4) between the continuum in the Lymanα forest and the region further to the C iv emission line. 
Fig. 6 The solid black (dashed red) line is the mean continuum of quasars at z = 3 (z ≤ 1, S05). The wavelength coverage at z = 3 corresponds to SDSS spectra and is larger than at z ≤ 1. The main difference between the two mean spectra is visible in the amplitude of emission lines. The composite spectrum from Vanden Berk et al. (2001) computed from 2200 SDSS spectra (dashdot blue line) is agrees well with our mean continuum. The difference in the Lymanα forest arises because Vanden Berk et al. (2001) did not try to avoid the absorption from the IGM. A small shift in the position of emission lines can be noticed: this is because Vanden Berk et al. (2001) have used the Mg ii line as a reference to compute the redshift, whereas we used the C iv and C iii] lines. 
Fig. 7 First ten principal components of a PCA applied to (i) z ~ 3 SDSS quasar spectra (black solid lines) over the range 1020−2000 Å and to (ii) z ≤ 1 HST quasar spectra (S05, red dashed lines) over the range 1020−1600 Å. The distributions of the coefficient associated to each component are shown in the right panel (grey histogram) together with their fit with a Gaussian (except for the first component for which the distribution is lognormal; thick black line). 
Fig. 8 First ten principal components obtained from a PCA of the QSO spectrum redwards of the Lymanα emission line, for (i) the z ~ 3 SDSS quasar spectra (black solid lines) and (ii) the z ≤ 1 HST quasar spectra (S05, red dashed lines). The distributions of the coefficient associated to each component are shown in the right panels (grey histogram) together with a Gaussian fit (except for the first component for which the distribution is lognormal; thick black line). 
The first ten eigenvectors are displayed in Fig. 7 when derived from the full wavelength coverage and in Fig. 8 when derived from the region redwards of the Lymanα emission together with the components provided by S05. The distributions of the coefficients, c_{i,j} and d_{i,j}, computed on our sample using Eq. (3), are also shown. The first component looks very similar in the two decompositions and is dominated by the amplitude of the Lymanα and C iv emission lines. The most significant difference with S05 lies in the shape of the distribution of the associated coefficients: in their study, the distribution is Gaussian whereas in ours this distribution is lognormal. This trend (concerning the eigenvectors and the distributions of coefficients) agrees with what has been found by Francis et al. (1992). The discrepancy between the shapes of the coefficient distributions could mean that the S05 sample is more homogeneous than ours in terms of the emission line amplitudes.
The second component is dominated by the continuum slope and it seems that there is a difference between what is found here and in S05, which seems to be somewhat compensated by the difference in the third component. Other components show small differences but they are less pronounced and the coefficient distributions are very similar.
To estimate more quantitatively the similarity of the low and high redshift sets of eigenspectra, we follow Yip et al. (2004) and compute the sum of the projection operators of each set of eigenvectors ξ_{j} > : $\begin{array}{ccc}{\Xi}\mathrm{=}\sum _{\mathit{j}\mathrm{=}\mathrm{1}\mathit{,m}}\mathrm{}{\mathit{\xi}}_{\mathit{j}}\mathit{>}\mathit{<}{\mathit{\xi}}_{\mathit{j}}\mathrm{}\mathit{,}& & \end{array}$(9)and then the trace of the products of the projection operators: $\begin{array}{ccc}\mathit{Tr}\mathrm{\left(}{{\Xi}}_{\mathit{z}\mathrm{=}\mathrm{3}}{{\Xi}}_{\mathit{S}\mathrm{05}}{{\Xi}}_{\mathit{z}\mathrm{=}\mathrm{3}}\mathrm{\right)}\mathrm{=}\mathit{D,}& & \end{array}$(10)where D will be the common dimension of both sets. The two sets are disjoint if the trace is zero. If the bases are completely alike, D should be equal to their dimension, therefore D = 10 in our case. To compute this number, we cut our eigenspectra at 1600 Å (rest) and we find D = 7.6. This means that the two decompositions are similar but not exactly the same, confirming the slight evolution of the decomposition with redshift.
We provide the first 10 eigenvectors of the PCA in an electronic form. The distributions of associated coefficients are very close to Gaussian functions, except for the first coefficient, the distribution of which is fitted with a lognormal distribution. Their characteristics are listed in Table 2.
Fit parameters of distributions of weights, c_{i,j}, for the first ten principal components.
3.2. Quality of the predicted continuum
The decomposition of the quasar emission investigated at z ≤ 1 by S05 with HST spectra and at z ~ 3 in this paper with SDSS spectra yield two similar but different bases of eigenvectors, as shown in the previous section. One would like to know if the larger wavelength coverage of our eigenvectors provides any advantage and how far the new determination is required to reproduce the correct quasar continuum at z ~ 3. In other words, is the prediction of quasar continuum at z ~ 3 better if one uses the new PCA eigenvectors derived in this paper? To answer this question, we applied three tests to the predicted continua.
3.2.1. Error on the predicted PCA continuum in the Lymanα forest
In order to estimate the difference between the true and predicted continuum in the Lymanα forest, we derived a set of eigenvectors and a projection matrix using 77 spectra of the training set (out of 78) and estimated the continuum over the Lymanα forest for the remaining quasar using these parameters and following the method described in Sect. 2.3. This procedure was repeated on each of the 78 spectra in the training set.
Following S05, we estimate for each spectrum the absolute fractional flux error δF, defined as $\left\mathit{\delta F}\right\mathrm{=}{\mathrm{\int}}_{{\mathit{\lambda}}_{\mathrm{1}}}^{{\mathit{\lambda}}_{\mathrm{2}}}\left\frac{\mathit{p}\mathrm{\left(}\mathit{\lambda}\mathrm{\right)}\mathrm{}\mathit{q}\mathrm{\left(}\mathit{\lambda}\mathrm{\right)}}{\mathit{q}\mathrm{\left(}\mathit{\lambda}\mathrm{\right)}}\right\mathrm{d}\mathit{\lambda}/{\mathrm{\int}}_{{\mathit{\lambda}}_{\mathrm{1}}}^{{\mathit{\lambda}}_{\mathrm{2}}}\mathrm{d}\mathit{\lambda ,}$(11)where p and q are the predicted and real continua respectively.
Fig. 9 Cumulative distributions of the absolute fractional flux error (Eq. (11)) redwards of the Lymanα emission line when predicting spectra with PCA decompositions over a wavelength range extending up to 2000 Å (black solid line) or up to 1600 Å (grey solid line). Median values are similar for the two estimates (black and grey dashed vertical lines respectively). The 90th percentile from the 1600 Å decomposition (dashdotdot grey vertical line) is lower than one for the 2000 Å decomposition (dashdotdot black vertical line). 
Fig. 10 Same as Fig. 9 for the cumulative distributions of the absolute fractional flux error (Eq. (11)) in the Lymanα forest. 
Fig. 11 Example of a spectrum with (i) a large absolute fractional flux error in the red part of the spectrum and a small error in the forest (upper panel) and (ii) a large absolute fractional flux error in the forest and a small error in the red part of the spectrum (lower panel). The handfitted continuum is the red dashed line and the predicted one is the solid blue line. 
This was computed over the restframe wavelength ranges, 1050−1170 Å in the forest, and 1280−2000 Å (or 1280−1600 Å) in the red. The cumulative distributions of the absolute fractional flux errors are plotted in Fig. 9 (in the red) and Fig. 10 (in the forest) using two different wavelength coverages (black line for 2000 Å and grey line for 1600 Å).
In the red, the median error is 5.8% and 5% using the 2000 Å and the 1600 Å decompositions (Fig. 9, black dashed and grey dotted vertical lines respectively) and the 90th percentile error with the 2000 Å decomposition is larger than the error using the 1600 Å set of eigenspectra. This indicates simply that the range up to 1600 Å is easier to fit.
The important result is that the opposite trend is observed in the forest (Fig. 10): median values (around 5%) are very close for the two wavelength coverages (black and grey dashed vertical lines) whereas the 90th percentiles are different (black and grey dotdotdash vertical lines) with 9.5% and 12.4% errors for the 2000 and 1600 Å decompositions. This means that using the full coverage reduces the number of outliers with more than 12% error in the Lymanα forest. Errors in S05 are similar to those we find here but, by using the 2000 Å decomposition, outliers are less frequent than in the previous study. To illustrate what those numbers mean on the continuum level, examples of spectra with their predicted continua are displayed in Fig. 11.
3.2.2. Distribution of spectral indices
We can also compare the characteristics of mock continua generated from the set of computed eigenvectors and the distribution of weights c_{ij} given in Table 2 with those of real spectra from SDSSDR7. For this, we fitted a powerlaw to mock continua and to SDSSDR7 spectra in the same way (see Sect. 4.2 for more details). The normalized (sum equal 1) distributions of the derived powerlaw index are displayed in Fig. 12. The distribution from mock spectra is more peaked than the one from SDSS spectra but is centred around the same value. This behaviour is expected because the PCA gives us a mean description of the whole quasar population.
Fig. 12 Distribution of spectral indices derived by fitting a powerlaw to mock continua generated using the principal components and the distributions of coefficients c_{ij} derived from our SDSSDR7 subsample (grey histogram) compared to the spectral index distribution obtained from fitting SDSSDR7 spectra (black histogram). 
3.2.3. Prediction of the continuum: fitting coefficients versus using the projection matrix
Our goal is to estimate the quasar continuum over the Lymanα forest. For this, we first estimate the weights of the red part of the spectrum in the basis obtained from PCA of the red part of the spectra. We then multiply these weights by the projection matrix (see Eq. (5)) to compute the weights to be used in the basis obtained from the overall spectrum. This gives us the spectrum reconstructed in the whole wavelength range (method 1). One may be tempted to directly use the coefficients obtained from the red part of the spectrum as representative of the whole spectrum by just replacing the eigenvectors obtained from the red part with those obtained from the full wavelength coverage (method 2). To test how useful the projection is, both methods were used to predict the continuum and the distributions of the absolute fractional flux error (Eq. (11)) were computed and are displayed in Figs. 13 and 14.
When using method 2, not surprinsigly, the red part of the spectrum is very well fitted with a median error of less than 2.5% (Fig. 13, dashed grey line) and method 1 (projection) leads to larger errors (dashed black line in Fig. 13, median error ~6%). In the forest, the trend is opposite with a median error of less than 5% when method 1 is applied (Fig. 14, dashed black line) and more than 7% when method 2 is applied (dashed grey line). In addition, with method 2, 10% of the spectra have an error larger than 15% error (dotdotdash grey line) for only one percent with method 1. Clearly the projection matrix should be used.
Fig. 13 Cumulative distributions of the absolute fractional flux error in the red part of the spectra in the training set: (i) when the weights used to reconstruct the spectrum are those obtained from the red part of the spectrum (method 1, grey line); (ii) when the projection matrix is used (method 2, black lines). Median values are displayed (dashed vertical lines) together with 90th percentiles (dotdotdashed vertical lines). 
Fig. 14 Same as Fig. 13 in the Lymanα forest. Errors are less when the projection matrix is used (method 2, black line, see text). The number of outliers (spectra with large errors) is much smaller in that case. 
4. Evolution of the mean flux
An important application of the quasar continuum estimate over the Lymanα forest wavelength range is the determination of the redshift evolution of the mean flux in the IGM. Numerous authors have performed this measurement using high and/or intermediate spectral resolution data (e.g. Songaila 2004; Bernardi et al. 2003; Dall’Aglio et al. 2008, 2009; FaucherGiguère et al. 2008). The evolution is smooth except for a possible bump at z ~ 3.2, which could be related to the He ii reionization (Schaye et al. 2000), although this is not the only possible explanation (FaucherGiguère et al. 2008). In this section we reinvestigate this question applying the method developed in Sect. 3.
4.1. Comparison with B03
We first would like to check if we can recover the Bernardi et al. (2003) results. The sample used in the Bernardi et al. (2003) study is a subsample of SDSSDR7 that contains all spectra observed up to the end of 2001 (corresponding to a modified Julian day MJD = 52 274). Some selection was applied to remove the most prominent BALs and DLAs. These objects are not clearly defined in Bernardi et al. (2003) so that we had to apply our own selection. We avoided all BALs and DLAs as defined by Noterdaeme et al. (2009). Our final sample has 837 QSOs when B03 had 1041. The comparison to B03 is shown in Fig. 15 (left handside panel) for powerlaw (triangles) and PCA (squares) estimates of the continuum. We also show in the figure the FaucherGiguère et al. (2008) results. As expected (see next section), the powerlaw estimate is lower at z < 3 than other estimates. The evolutions found by us and B03 agree and a departure from a smooth evolution is visible at z ~ 3.2.
We randomly drew from SDSSDR7 a large number (500) of samples identical in size and redshift distribution to the B03 sample. For each sample, we derived the mean flux observed at each redshift and calculated the mean over the 500 samples. The result is shown in Fig. 15 (right handside panel). The feature at z ~ 3.2 is still visible and could be a “bump” or a break in the evolution. Note that, as emphasized by FaucherGiguère et al. (2008), a bump is visible only if one insists on fitting a single powerlaw.
Fig. 15 Left panel: mean flux redshift evolution inferred from powerlaw (red triangles) and PCA (blue squares) estimates of the continuum using a sample similar to Bernardi et al. (2003); the mean flux evolutions reported by Bernardi et al. (2003) and FaucherGiguère et al. (2008) are shown as grey points and black open circles. Our measurements are consistent with the B03 and FG08 results. In particular, a departure from a smooth evolution is visible at z ~ 3.2. Right panel: 500 samples were randomly drawn from the SDSSDR7 (similar in size and redshift distribution to the B03 sample) and the mean flux evolution for each sample was then computed. The average of these measurements is displayed (PCA continuum, blue squares) and excellently agrees with the FaucherGiguère et al. (2008) measurement. 
4.2. Redshift evolution of the mean flux in the IGM using SDSSDR7
Continua of SDSSDR7 spectra were fitted using the two different methods we described above. We derived a powerlaw and a PCA (using our and S05 sets of eigenspectra) continua. We restricted our study to spectra with a signaltonoise ratio higher than 8 around 1280 Å in the restframe to avoid instabilities in the fit of the powerlaw. We restricted the analysis to quasars with redshifts higher than z > 2.45 to avoid the blue end of the spectra.
Lines of sight containing damped Lymanα systems (DLAs) were removed following the lists provided by Noterdaeme et al. (2009) and Prochaska et al. (2005). Broad absorption line quasars (BALs) flagged by Shen et al. (2010) were avoided as well. After this selection, we are left with 2576 quasars. Following Bernardi et al. (2003), we computed the mean flux from the Lymanα forest between 1080 and 1160 Å in the restframe to avoid the O viLymanβ and Lymanα emission lines.
The mean flux in the Lymanα forest was then computed using three different continua: two PCA continua obtained from S05 principal components and the principal components derived in this work, and a powerlaw continuum. Figure 16 shows the redshift evolution of this quantity in redshift bins of size Δz = 0.1. Error bars were computed from a bootstrap resampling. There is apparently no difference in the results when using the two PCA decompositions derived at low (z ~ 1) and high (z ~ 3) redshifts. The mean flux derived from the powerlaw continuum is systematically lower at z < 3. This is expected because the powerlaw tends to overestimate the continuum in the forest (see Sect. 4.3).
Evidently there is a change in the evolution of the mean flux at z ~ 3 with a steepening of the evolution at large redshift.
Fig. 16 Redshift evolution of the mean flux from SDSS DR7 quasar spectra. Three different estimates of the continuum are assumed (Sect. 2.1): (i) an extrapolation of a powerlaw fit (see Sect. 2.2; red triangles); or a prediction (Eq. (4)) using the output of a principal components analysis of (ii) SDSS spectra at z ~ 3 as described in Sect. 3 (blue squares) or of (iii) HST spectra at z ≤ 1 (S05; orange diamonds). Error bars are computed from bootstrapping and are at the 3σ level. A change in the evolution can be noticed at z ~ 3 in the sense of a steeper slope at high redshift. A featureless evolution (fitted from z ≤ 3 points using z ~ 3 PCA measurement) is shown for guidance. 
To estimate what kind of feature we are able to recover with our procedure, we construct in the following DR7like samples of simulated intermediateresolution quasar spectra probing an IGM with a mean flux evolution as seen in the highresolution data and test if we can recover this evolution by applying our procedures.
4.3. Detectability of the bump with DR7
In order to test the detectability of a feature equivalent to the bump seen at z ~ 3.2 by FaucherGiguère et al. (2008), we performed simulations of this effect and its measurement with mock spectra.
4.3.1. Mock spectra
We aim at simulating the Lymanα forest that will give a mean flux redshift evolution similar to what is seen in highresolution data. This evolution was fitted by FaucherGiguère et al. (2008) as $\mathit{F}\mathrm{\left(}\mathit{z}\mathrm{\right)}\mathrm{=}{\mathrm{e}}^{\mathrm{}\mathit{A}\mathrm{(}\mathrm{1}\mathrm{+}\mathit{z}{\mathrm{)}}^{\mathit{B}}\mathrm{}\mathit{C}{\mathrm{e}}^{\mathrm{}\frac{\mathrm{\left[}\mathrm{\right(}\mathrm{1}\mathrm{+}\mathit{z}\mathrm{)}\mathrm{}\mathit{D}{\mathrm{]}}^{\mathrm{2}}}{\mathrm{2}{\mathit{E}}^{\mathrm{2}}}}}\mathit{,}$(12)where A = 0.00153, B = 4.060, C = −0.0969, D = 4.267 and E = 0.0769.
We assume that the Lymanα forest is made up of absorption lines with a column density distribution ${\mathit{f}}_{\mathrm{NHI}}\mathrm{=}\mathit{\alpha}{\mathit{N}}_{\mathrm{HI}}^{\mathrm{}\mathit{\beta}}\mathit{,}$(13)where α = 4.9 × 10^{7} and β = 1.46 over the column density range 10^{13}−10^{17} cm^{2} and a Doppler parameter distribution given by $\frac{\mathrm{d}\mathit{n}}{\mathrm{d}\mathit{b}}\mathrm{=}\mathit{K}\frac{{\mathit{b}}_{\mathit{\sigma}}^{\mathrm{4}}}{{\mathit{b}}^{\mathrm{5}}}{\mathrm{e}}^{\mathrm{}\frac{{\mathit{b}}_{\mathit{\sigma}}^{\mathrm{4}}}{{\mathit{b}}^{\mathrm{4}}}}\mathit{,}$(14)with K = 6.82 and b_{σ} = 24.09 km s^{1} (Kim et al. 2001). Therefore the evolution in F(z) is supposed to be caused by an evolution in the number of clouds per unit redshift. This is probably oversimplistic because, if any, the feature at z ~ 3.2 is claimed to be possibly caused by an ionization process but this is probably fine for what we aim to estimate.
We first constructed the relation that gives the mean flux versus the number of clouds present in a redshift bin of Δz = 0.1 at a given redshift. This relation is parametrized by ${\mathit{F}}_{\mathit{z}}\mathrm{\left(}{\mathit{n}}_{\mathrm{abs}}\mathrm{\right)}\mathrm{=}{\mathit{a}}_{\mathit{z}}{\mathrm{e}}^{\mathrm{}{\mathit{\u03f5}}_{\mathit{z}}{\mathit{n}}_{\mathrm{abs}}}\mathit{,}$(15)where n_{abs} is the number of clouds drawn at random from the above population of clouds and a_{z} and ϵ_{z} are determined by the simulation and depend on redshift.
Combining Eqs. (12) and (15), we then derived at each redshift the actual number of clouds that is needed to reproduce the relation given by FaucherGiguère et al. (2008). We finally fitted the redshift evolution of the number of clouds by a similar function: $\frac{\mathrm{d}\mathit{n}}{\mathrm{d}\mathit{z}}\mathrm{=}{\mathit{n}}_{\mathrm{0}}\mathrm{(}\mathrm{1}\mathrm{+}\mathit{z}{\mathrm{)}}^{\mathit{\gamma}}\mathrm{+}\mathit{C}{\mathrm{e}}^{\mathrm{}\frac{\mathrm{\left[}\mathrm{\right(}\mathrm{1}\mathrm{+}\mathit{z}\mathrm{)}\mathrm{}\mathit{D}{\mathrm{]}}^{\mathrm{2}}}{\mathrm{2}{\mathit{E}}^{\mathrm{2}}}}\mathit{.}$(16)Our best fit gives the values n_{0} = 8.281 ± 0.080 and γ = 3.076 ± 0.006, C = −121.3 ± 4.5, D = 4.267 ± 0.003 and E = −0.081 ± 0.003.
Once the number of clouds per unit redshift was correctly calibrated, we generated 50 000 mock spectra with a uniform emission redshift distribution in the range 2.3−4.5. For a given emission redshift, the number of absorption lines was computed from the line number density and was modulated to introduce Poisson noise. For each absorption line, the column density N_{HI} and the Doppler parameter were randomly chosen following Eqs. (13) and (14). The spectrum was then degraded at the SDSS resolution (R ~ 1800) and a PCA continuum was added using 10 principal components and choosing the weights at random within the calculated distributions. The wavelength scale was binned as for SDSS spectra and noise was added following the SDSS gmagnitude distribution.
To check the validity of our procedures, the mean flux evolution was computed from spectra with no noise and no continuum added (see Fig. 17). The mean flux evolution recovered by our procedure (grey diamonds) excellently agrees with the theoritical input (black dashed line) assumed to follow the evolution as derived by FaucherGiguère et al. (2008).
4.3.2. Should we detect any feature with DR7?
We computed 100 mock samples with the same number of quasars as the SDSSDR7 and the same distributions of emission redshift and signaltonoise ratio. For each sample, we computed the mean flux evolution fitting quasar continuum with a powerlaw as performed on real data. At each redshift, we computed the scatter in the mean flux derived from these samples. The recovered evolution is shown as black points in Fig. 17, to be compared with the dashed line showing the input assumed for the simulation. As already mentioned, obviously the powerlaw fit of the QSO continuum underestimates the mean flux. However, the bump at z ~ 3.2, introduced in the input, is recovered although slightly smoothed out by the procedure. The errors derived from the simulations are shown as a grey area. They have to be compared with errors expected from the data. The latter were estimated using the errors obtained in SDSSDR7 from B03like samples (as in Fig. 15) but scaled by the square root of the ratio of the number of quasars in the B03 and SDSSDR7 samples. These errors are shown in Fig. 17 by vertical error bars. They are larger than the errors from the simulations, as expected. Mock spectra are indeed idealized and additional sources of uncertainty are present in the calibration of the data and the consequences on the continuum fit of the somewhat odd shape of some quasars.
Fig. 17 Redshift evolution of the mean flux as measured from powerlaw continuum fitting of mock quasar spectra (black points and grey area). The evolution of the mean flux assumed as an input of the simulation is taken from FaucherGiguère et al. (2008) (black dashed line). As already mentioned, the mean flux is underestimated by the powerlaw procedure but the shape of the evolution is recovered. Vertical grey bars are the errors expected from the data. They were computed from the errrors derived in the B03 sample (see Fig. 15), scaled with respect to the different number of spectra in the SDSSDR7 and B03 samples. They are larger than the errors derived from the simulations (grey area), as expected. 
4.3.3. Feature at z ~ 3.2
We summarize in Fig. 18 the mean flux evolution measured from the SDSSDR7 data for a continuum estimated with a PCA (blue squares) or a powerlaw (red triangles) corrected for the systematic bias as seen in Fig. 17. Indeed, the powerlaw continuum systematically overestimates the amount of absorption in the Lymanα forest. The mean flux evolution derived with a powerlaw continuum is corrected by the expected difference seen in mocks between the measured mean flux and the input of the simulation.
The results of both methods agree excellently. Overplotted as black circles in Fig. 18 are the results by FaucherGiguère et al. (2008), which agree very well with the PCA estimate except, maybe, in the bin around z ~ 3.2. Note that the discrepancy is less than 2σ. In any case, obviously the smooth redshift evolution of the mean flux becomes steeper around redshift z ~ 3.
The slight difference between high and intermediate resolution data at z ~ 3.2 may be explained by a different selection of the quasars. Indeed, Worseck & Prochaska (2011) have argued that SDSS preferentially selects 3 < z_{em} < 3.5 quasars with intervening H i Lyman limit systems. However, this should have little influence on the mean flux in the overall forest and cannot explain the discrepancy by itself. This could also be because our procedure smoothes out a sharp feature. However, this would be surprising given the width of the bins and the results of our simulations (see Fig. 17).
Fig. 18 Mean flux redshift evolution inferred from SDSSDR7 quasars (red triangles: corrected powerlaw continuum, and blue squares: z ~ 3 PCA continuum) compared to the evolution from FaucherGiguère et al. (2008) (black open circles). The mean flux evolution derived with a powerlaw continuum is corrected from the bias predicted in simulation (see Fig. 17). All measurements agree with each other within errors. No “bump” is seen in any evolution inferred from SDSSDR7 but there is a definite change in the slope of the evolution at z ~ 3 in the sense of a steeper evolution beyond this redshift. 
5. Conclusion
The first goal of this paper was to provide a new PCA decomposition of the continuum of z ~ 3 quasars. This should be useful for studies of the Lymanα forest and to generate mock quasar continua that are to be implemented in simulations constructed to search for systematic effects in future analyses or surveys. We took the opportunity to enlarge to 1020−2000 Å the wavelength range over which the spectra are decomposed to compare them with the previous wavelength range 1020−1600 Å used by S05. The mean spectrum at z ~ 3 has a similar shape as the mean spectrum derived at z ~ 1 by S05, except that the strength of the Lymanα and C iv emission lines relative to the continuum is less at high redshift. We concentrated on the estimate of the continuum in the Lymanα forest and provided all outputs of this analysis that are required to generate mock continua.
We used this decomposition to revisit the evolution with redshift of the mean flux in the Lymanα forest and compared two methods to estimate the quasar flux in the Lymanα forest: the extrapolation of a powerlaw fitted to the red part of the spectrum and an estimate of the flux from PCA coefficients.
We find that
(i): the powerlaw method systematically underestimates themean flux by an amount decreasing with redshift. Whencorrecting for this bias, as estimated with simulations, we findthat the method gives similar results to thePCA method;
(ii): the PCA method yields results very similar to what is measured by Bernardi et al. (2003) and from high spectral resolution data (FaucherGiguère et al. 2008);
(iii): from our simulations a bump at z ~ 3.2, if present, should be marginally detected with the data set of SDSSDR7;
(iv): finally, from our analysis, we find that there is a definite break in the evolution of the mean flux at z ~ 3 in the sense of a steeper decrease of the mean flux at high redshift. We caution that this could be a consequence of a more prominent bump, which can be slightly smoothed out by our procedures.
The increase of the statistics but most importantly of the quality of the data that will soon be provided by the BOSS survey should definitely settle this point.Acknowledgments
We thank an anonymous referee for important and very useful comments. This project was supported by the Agence Nationale de la Recherche under contract ANR08BLAN0222.
References
 Baldwin, J. A. 1977, ApJ, 214, 679 [NASA ADS] [CrossRef] [Google Scholar]
 Becker, G. D., Rauch, M., & Sargent, W. L. W. 2007, ApJ, 662, 72 [NASA ADS] [CrossRef] [Google Scholar]
 Bergeron, J., Petitjean, P., Aracil, B., et al. 2004, The Messenger, 118, 40 [NASA ADS] [Google Scholar]
 Bernardi, M., Sheth, R. K., SubbaRao, M., et al. 2003, AJ, 125, 32 [NASA ADS] [CrossRef] [Google Scholar]
 Bi, H. G., Boerner, G., & Chu, Y. 1992, A&A, 266, 1 [NASA ADS] [Google Scholar]
 Bolton, J. S., & Haehnelt, M. G. 2007, MNRAS, 382, 325 [NASA ADS] [CrossRef] [Google Scholar]
 Bolton, J. S., Haehnelt, M. G., Viel, M., & Springel, V. 2005, MNRAS, 357, 1178 [NASA ADS] [CrossRef] [Google Scholar]
 Cen, R., MiraldaEscudé, J., Ostriker, J. P., & Rauch, M. 1994, ApJ, 437, L9 [Google Scholar]
 Croft, R. A. C., Weinberg, D. H., Katz, N., & Hernquist, L. 1998, ApJ, 495, 44 [NASA ADS] [CrossRef] [Google Scholar]
 Dall’Aglio, A., Wisotzki, L., & Worseck, G. 2008, A&A, 491, 465 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Dall’Aglio, A., Wisotzki, L., & Worseck, G. 2009 [arXiv:0906.1484] [Google Scholar]
 Eisenstein, D. J., Weinberg, D. H., Agol, E., et al. 2011 [arXiv:1101.1529] [Google Scholar]
 Fan, X. 2009, in ASP Conf. Ser. 408, ed. W. Wang, Z. Yang, Z. Luo, & Z. Chen, 439 [Google Scholar]
 Fan, X., Carilli, C. L., & Keating, B. 2006, ARA&A, 44, 415 [NASA ADS] [CrossRef] [Google Scholar]
 FaucherGiguère, C.A., Prochaska, J. X., Lidz, A., Hernquist, L., & Zaldarriaga, M. 2008, ApJ, 681, 831 [NASA ADS] [CrossRef] [Google Scholar]
 Francis, P. J., Hewett, P. C., Foltz, C. B., et al. 1991, ApJ, 373, 465 [NASA ADS] [CrossRef] [Google Scholar]
 Francis, P. J., Hewett, P. C., Foltz, C. B., & Chaffee, F. H. 1992, ApJ, 398, 476 [NASA ADS] [CrossRef] [Google Scholar]
 Gunn, J. E., & Peterson, B. A. 1965, ApJ, 142, 1633 [NASA ADS] [CrossRef] [Google Scholar]
 Hennawi, J. F., & Prochaska, J. X. 2007, ApJ, 655, 735 [NASA ADS] [CrossRef] [Google Scholar]
 Hernquist, L., Katz, N., Weinberg, D. H., & MiraldaEscudé, J. 1996, ApJ, 457, L51 [NASA ADS] [CrossRef] [Google Scholar]
 Hu, E. M., Kim, T.S., Cowie, L. L., Songaila, A., & Rauch, M. 1995, AJ, 110, 1526 [NASA ADS] [CrossRef] [Google Scholar]
 Kim, T., Cristiani, S., & D’Odorico, S. 2001, A&A, 373, 757 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Kim, T.S., Bolton, J. S., Viel, M., Haehnelt, M. G., & Carswell, R. F. 2007, MNRAS, 382, 1657 [NASA ADS] [CrossRef] [Google Scholar]
 Kirkman, D., Tytler, D., Suzuki, N., et al. 2005, MNRAS, 360, 1373 [NASA ADS] [CrossRef] [Google Scholar]
 Lynds, R. 1971, ApJ, 164, L73 [NASA ADS] [CrossRef] [Google Scholar]
 McDonald, P., Seljak, U., Cen, R., et al. 2005, ApJ, 635, 761 [NASA ADS] [CrossRef] [Google Scholar]
 Noterdaeme, P., Petitjean, P., Ledoux, C., & Srianand, R. 2009, A&A, 505, 1087 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Oke, J. B., & Korycansky, D. G. 1982, ApJ, 255, 11 [NASA ADS] [CrossRef] [Google Scholar]
 Petitjean, P., Webb, J. K., Rauch, M., Carswell, R. F., & Lanzetta, K. 1993, MNRAS, 262, 499 [NASA ADS] [CrossRef] [Google Scholar]
 Petitjean, P., Mueket, J. P., & Kates, R. E. 1995, A&A, 295, L9 [NASA ADS] [Google Scholar]
 Petitjean, P., Surdej, J., Smette, A., et al. 1998, A&A, 334, L45 [NASA ADS] [Google Scholar]
 Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. 1992, Numerical recipes in C. The art of scientific computing, ed. W. H. Press, S. A. Teukolsky, W. T. Vetterling, & B. P. Flannery (Cambridge: University Press) [Google Scholar]
 Prochaska, J. X., HerbertFort, S., & Wolfe, A. M. 2005, ApJ, 635, 123 [NASA ADS] [CrossRef] [Google Scholar]
 Prochaska, J. X., Worseck, G., & O’Meara, J. M. 2009, ApJ, 705, L113 [NASA ADS] [CrossRef] [Google Scholar]
 Rauch, M. 1998, ARA&A, 36, 267 [NASA ADS] [CrossRef] [Google Scholar]
 Rauch, M., MiraldaEscude, J., Sargent, W. L. W., et al. 1997, ApJ, 489, 7 [NASA ADS] [CrossRef] [Google Scholar]
 Riediger, R., Petitjean, P., & Mücket, J. P. 1998, A&A, 329, 30 [NASA ADS] [Google Scholar]
 Schaye, J., Theuns, T., Rauch, M., Efstathiou, G., & Sargent, W. L. W. 2000, MNRAS, 318, 817 [NASA ADS] [CrossRef] [Google Scholar]
 Schlegel, D. J., Blanton, M., Eisenstein, D., et al. 2007, in BAAS, 38, 966 [Google Scholar]
 Schlegel, D. J., Bebek, C., Heetderks, H., et al. 2009 [arXiv:0904.0468] [Google Scholar]
 Shen, Y., Hall, P. B., Richards, G. T., et al. 2010 [arXiv:1006.5178] [Google Scholar]
 Slosar, A., Ho, S., White, M., & Louis, T. 2009, J. Cosmol. AstroPart. Phys., 10, 19 [Google Scholar]
 Songaila, A. 2004, AJ, 127, 2598 [NASA ADS] [CrossRef] [Google Scholar]
 Suzuki, N., Tytler, D., Kirkman, D., O’Meara, J. M., & Lubin, D. 2005, ApJ, 618, 592 [NASA ADS] [CrossRef] [Google Scholar]
 Theuns, T., & Srianand, R. 2006, in The Scientific Requirements for Extremely Large Telescopes, ed. P. Whitelock, M. Dennefeld, & B. Leibundgut, IAU Symp., 232, 464 [Google Scholar]
 Theuns, T., Leonard, A., Efstathiou, G., Pearce, F. R., & Thomas, P. A. 1998, MNRAS, 301, 478 [NASA ADS] [CrossRef] [Google Scholar]
 Theuns, T., Bernardi, M., Frieman, J., et al. 2002, ApJ, 574, L111 [NASA ADS] [CrossRef] [Google Scholar]
 Tytler, D., Kirkman, D., O’Meara, J. M., et al. 2004, ApJ, 617, 1 [NASA ADS] [CrossRef] [Google Scholar]
 Vanden Berk, D. E., Richards, G. T., Bauer, A., et al. 2001, AJ, 122, 549 [NASA ADS] [CrossRef] [Google Scholar]
 White, M., Pope, A., Carlson, J., et al. 2010, ApJ, 713, 383 [NASA ADS] [CrossRef] [Google Scholar]
 Worseck, G., & Prochaska, J. X. 2011, ApJ, 728, 23 [NASA ADS] [CrossRef] [Google Scholar]
 Yip, C. W., Connolly, A. J., Vanden Berk, D. E., et al. 2004, AJ, 128, 2603 [NASA ADS] [CrossRef] [Google Scholar]
 Zhang, Y., Anninos, P., & Norman, M. L. 1995, ApJ, 453, L57 [NASA ADS] [CrossRef] [Google Scholar]
 Zheng, W., Kriss, G. A., Telfer, R. C., Grimes, J. P., & Davidsen, A. F. 1997, ApJ, 475, 469 [NASA ADS] [CrossRef] [Google Scholar]
All Tables
List of SDSSDR7 quasars used to define the correlation matrix and the eigenvectors at z ~ 3.
Fit parameters of distributions of weights, c_{i,j}, for the first ten principal components.
All Figures
Fig. 1 Illustration of the method used to estimate the emission redshift and powerlaw continuum of quasars (the quasar shown is SDSS J012156.03+144823.9). Grey areas indicate the regions used to fit the powerlaw and the red line is the estimate of the continuum (powerlaw + C iv and C iii] emission lines). Vertical dashed lines indicate the position of emission lines. 

In the text 
Fig. 2 Result of the stacking of SDSSDR7 spectra with a damped Lymanα system at an absorption redshift higher than 3.7 and a column density N(H i) ≥ 10^{20.5} cm^{2}. The spectra are normalized to 1 near 1280 Å (in the quasar restframe). Owing to the presence of the DLA, the flux is expected to be equal to zero at observed wavelengths below 4280 Å. Evidently this is not the case in the very blue of the spectrum (λ_{obs} ≤ 4000 Å) where the mean flux is increasing. Consequently, pixels at wavelengths below 4000 Å are not used in this analysis. 

In the text 
Fig. 3 Spectrum and continuum of the quasar SDSS J134826.65+290623.0. This quasar belongs to the sample of SDSS z ~ 3 quasars that is used to derive the PCA eigenvectors (Sect. 3). Our estimate of the continuum is shown with the thick red line and a zoom of the Lymanα forest region is shown in the inset. 

In the text 
Fig. 4 Mean flux evolution in the training set (with redshift bins of size Δz = 0.1; red triangles) is compared to FG08 measurements from highresolution and high signaltonoise spectra (black circles). The continuum of the 78 SDSS spectra in the training set were fitted by hand using spline interpolation. Error bars from our measurements were computed by bootstrapping pixels of our sample. Both measurements are consistent, which makes us confident in our continuum estimate. 

In the text 
Fig. 5 Correlation matrix computed with the training set. A moderate correlation (0.3−0.6) is found between the shape of the continuum in the Lymanα forest (1020 ≤ λ ≤ 1210 Å) and in the region between Lymanα and C iv emission lines (1216 ≤ λ ≤ 1600 Å), in agreement with S05. We also note a moderate anticorrelation (from −0.6 to −0.4) between the continuum in the Lymanα forest and the region further to the C iv emission line. 

In the text 
Fig. 6 The solid black (dashed red) line is the mean continuum of quasars at z = 3 (z ≤ 1, S05). The wavelength coverage at z = 3 corresponds to SDSS spectra and is larger than at z ≤ 1. The main difference between the two mean spectra is visible in the amplitude of emission lines. The composite spectrum from Vanden Berk et al. (2001) computed from 2200 SDSS spectra (dashdot blue line) is agrees well with our mean continuum. The difference in the Lymanα forest arises because Vanden Berk et al. (2001) did not try to avoid the absorption from the IGM. A small shift in the position of emission lines can be noticed: this is because Vanden Berk et al. (2001) have used the Mg ii line as a reference to compute the redshift, whereas we used the C iv and C iii] lines. 

In the text 
Fig. 7 First ten principal components of a PCA applied to (i) z ~ 3 SDSS quasar spectra (black solid lines) over the range 1020−2000 Å and to (ii) z ≤ 1 HST quasar spectra (S05, red dashed lines) over the range 1020−1600 Å. The distributions of the coefficient associated to each component are shown in the right panel (grey histogram) together with their fit with a Gaussian (except for the first component for which the distribution is lognormal; thick black line). 

In the text 
Fig. 8 First ten principal components obtained from a PCA of the QSO spectrum redwards of the Lymanα emission line, for (i) the z ~ 3 SDSS quasar spectra (black solid lines) and (ii) the z ≤ 1 HST quasar spectra (S05, red dashed lines). The distributions of the coefficient associated to each component are shown in the right panels (grey histogram) together with a Gaussian fit (except for the first component for which the distribution is lognormal; thick black line). 

In the text 
Fig. 9 Cumulative distributions of the absolute fractional flux error (Eq. (11)) redwards of the Lymanα emission line when predicting spectra with PCA decompositions over a wavelength range extending up to 2000 Å (black solid line) or up to 1600 Å (grey solid line). Median values are similar for the two estimates (black and grey dashed vertical lines respectively). The 90th percentile from the 1600 Å decomposition (dashdotdot grey vertical line) is lower than one for the 2000 Å decomposition (dashdotdot black vertical line). 

In the text 
Fig. 10 Same as Fig. 9 for the cumulative distributions of the absolute fractional flux error (Eq. (11)) in the Lymanα forest. 

In the text 
Fig. 11 Example of a spectrum with (i) a large absolute fractional flux error in the red part of the spectrum and a small error in the forest (upper panel) and (ii) a large absolute fractional flux error in the forest and a small error in the red part of the spectrum (lower panel). The handfitted continuum is the red dashed line and the predicted one is the solid blue line. 

In the text 
Fig. 12 Distribution of spectral indices derived by fitting a powerlaw to mock continua generated using the principal components and the distributions of coefficients c_{ij} derived from our SDSSDR7 subsample (grey histogram) compared to the spectral index distribution obtained from fitting SDSSDR7 spectra (black histogram). 

In the text 
Fig. 13 Cumulative distributions of the absolute fractional flux error in the red part of the spectra in the training set: (i) when the weights used to reconstruct the spectrum are those obtained from the red part of the spectrum (method 1, grey line); (ii) when the projection matrix is used (method 2, black lines). Median values are displayed (dashed vertical lines) together with 90th percentiles (dotdotdashed vertical lines). 

In the text 
Fig. 14 Same as Fig. 13 in the Lymanα forest. Errors are less when the projection matrix is used (method 2, black line, see text). The number of outliers (spectra with large errors) is much smaller in that case. 

In the text 
Fig. 15 Left panel: mean flux redshift evolution inferred from powerlaw (red triangles) and PCA (blue squares) estimates of the continuum using a sample similar to Bernardi et al. (2003); the mean flux evolutions reported by Bernardi et al. (2003) and FaucherGiguère et al. (2008) are shown as grey points and black open circles. Our measurements are consistent with the B03 and FG08 results. In particular, a departure from a smooth evolution is visible at z ~ 3.2. Right panel: 500 samples were randomly drawn from the SDSSDR7 (similar in size and redshift distribution to the B03 sample) and the mean flux evolution for each sample was then computed. The average of these measurements is displayed (PCA continuum, blue squares) and excellently agrees with the FaucherGiguère et al. (2008) measurement. 

In the text 
Fig. 16 Redshift evolution of the mean flux from SDSS DR7 quasar spectra. Three different estimates of the continuum are assumed (Sect. 2.1): (i) an extrapolation of a powerlaw fit (see Sect. 2.2; red triangles); or a prediction (Eq. (4)) using the output of a principal components analysis of (ii) SDSS spectra at z ~ 3 as described in Sect. 3 (blue squares) or of (iii) HST spectra at z ≤ 1 (S05; orange diamonds). Error bars are computed from bootstrapping and are at the 3σ level. A change in the evolution can be noticed at z ~ 3 in the sense of a steeper slope at high redshift. A featureless evolution (fitted from z ≤ 3 points using z ~ 3 PCA measurement) is shown for guidance. 

In the text 
Fig. 17 Redshift evolution of the mean flux as measured from powerlaw continuum fitting of mock quasar spectra (black points and grey area). The evolution of the mean flux assumed as an input of the simulation is taken from FaucherGiguère et al. (2008) (black dashed line). As already mentioned, the mean flux is underestimated by the powerlaw procedure but the shape of the evolution is recovered. Vertical grey bars are the errors expected from the data. They were computed from the errrors derived in the B03 sample (see Fig. 15), scaled with respect to the different number of spectra in the SDSSDR7 and B03 samples. They are larger than the errors derived from the simulations (grey area), as expected. 

In the text 
Fig. 18 Mean flux redshift evolution inferred from SDSSDR7 quasars (red triangles: corrected powerlaw continuum, and blue squares: z ~ 3 PCA continuum) compared to the evolution from FaucherGiguère et al. (2008) (black open circles). The mean flux evolution derived with a powerlaw continuum is corrected from the bias predicted in simulation (see Fig. 17). All measurements agree with each other within errors. No “bump” is seen in any evolution inferred from SDSSDR7 but there is a definite change in the slope of the evolution at z ~ 3 in the sense of a steeper evolution beyond this redshift. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.