Cosmic star-formation history from a non-parametric inversion of infrared galaxy counts

Home

All issues

Volume 504 / No 3 (September IV 2009)

A&A, 504 3 (2009) 727-740

Online Material

Highlight

Free Access

Issue		A&A Volume 504, Number 3, September IV 2009


Page(s)		727 - 740
Section		Extragalactic astronomy
DOI		https://doi.org/10.1051/0004-6361/200809945
Published online		09 July 2009

Online Material

Appendix A: The inverse problem

A.1 The model

As argued in the main text, (Sect. 2.1) the formal equation relating the number of counts of galaxies $\mathcal{N}(\lambda_i,S_k)$ with the flux S_k (within ${\rm d}{S}$ ) at wavelength $\lambda_i$ (within ${\rm d}{\lambda}$ ) to the number of counts of galaxies, $N(z,L_{\rm IR})$ , at redshift z (within ${\rm d}{z}$ ) and IR luminosity $L_{\rm IR}$ (within ${\rm d}{L_{\rm IR}}$ ) is given by

$\begin{displaymath}\mathcal{N}(\lambda_i,S_k) = \!\!\int\!\!\!\int\!\delta_{\rm ... ...R}{})\right]\!N(z,L_{\rm IR}){\rm d}{z}~ {\rm d}{L_{\rm IR}} , \end{displaymath}$

(A.1)

where $\delta_{\rm D}$ is the standard Dirac function, F the flux observed in a photometric band centered on wavelength $\lambda_i$ of a galaxy at redshift z with a $L_{\rm IR}$ luminosity:

$\begin{displaymath}F(\lambda_i,z,L_{\rm IR}{})= A / D_{\rm L}^2(z) K(\lambda_i,z,L_{\rm IR}{}). \end{displaymath}$

(A.2)

Here, $D_{\rm L}(z)$ is the luminosity distance for an object at redshift zwith the standard cosmology used in this paper, A is the solid angle corresponding to one square degree, and K corresponds to the k-corrections:

$\begin{displaymath}K(\lambda_i,z,L_{\rm IR}{})= 1/R \int_{\lambda_i^{\rm min}}^... ...rm IR}{}} (\lambda/(1+z))}{1+z} T_i(\lambda) {\rm d}\lambda~, \end{displaymath}$

(A.3)

where $T_i(\lambda)$ is the transmission curve for the filter centered on $\lambda_i$ , $R=\int_{\lambda_i^{\rm min}}^{\lambda_i^{\rm max}} T_i(\lambda) {\rm d}\lambda$ , and $L^{L_{\rm IR}{}}(\lambda)$ is the underlying library of SEDs (CE01) for which the SED of a galaxy depends only on its total luminosity $L_{\rm IR}$ .

As mentioned in the main text, from the point of view of the conditioning of the inverse problem, it is preferable to reformulate Eq. (A.1) in terms of $\mathcal{Z}\equiv \log_{10}(1+z)$ , $\mathcal{S}\equiv \log_{10}(S)$ and $m_{\rm IR}\equiv\log_{10} L_{\rm IR}{}$ :

$\begin{displaymath}{\hat \mathcal{N}}(\lambda_i,\mathcal{S}_k) = \!\!\int\!\!\!\... ...thcal{Z},m_{\rm IR}){\rm d}{ \mathcal{Z}} {\rm d}{m_{\rm IR}}, \end{displaymath}$

(A.4)

where the kernel of Eq. (A.4) reads

$\begin{displaymath}H(\mathcal{S},\lambda,\mathcal{Z},m_{\rm IR})\equiv 10^{2.5 \... ... D}\!\left[S-F(\lambda,10^\mathcal{Z},10^{m_{\rm IR}})\right] \end{displaymath}$

with

$\begin{displaymath}{\tilde N}(\mathcal{Z},m_{\rm IR})\equiv N(10^\mathcal{Z},10^{m_{\rm IR}})~, \end{displaymath}$

(A.5)

$\begin{displaymath}{\hat \mathcal{N}}(\lambda_i,\mathcal{S}_k)\equiv{\mathcal{N}}(\lambda_i,10^{\mathcal{S}_k}) 10^{2.5 \mathcal{S}_k}. \end{displaymath}$

(A.6)

Here we have introduced the Euclidian-normalized number count, ${\hat \mathcal{N}}$ , by multiplying the number count by the expected S^2.5 power law.

A.2 Discretization

Let us project ${\tilde N}(\mathcal{Z},m_{\rm IR})$ onto a complete basis of $p \times q$ functions

$\begin{displaymath}\{e_k(\mathcal{Z}) e_l(m_{\rm IR})\}_{ j=1,\ldots,p \ l=1,\ldots,q}~,\end{displaymath}$

of finite (asymptotically zero) support, which are chosen here to be piecewise constant step functions:

$\begin{displaymath}{\tilde N}(\mathcal{Z},m_{\rm IR}) = \sum_{j=1}^{p} \sum_{l=1}^{q} n_{jl} ~ ~ e_{j}(\mathcal{Z}) e_{l}(m_{\rm IR}). \end{displaymath}$

(A.7)

The parameters to fit are the weights n_jl. Calling ${\mathbf{X}}=\{ n_{jl}\}_{j=1,..p,l=1,..q}$ (the $p \times q$ parameters) and ${\mathbf{Y}}=\{ {\hat \mathcal{N}}(\lambda_i,\mathcal{S}_k)\}_{i=1,..r,k=1,..s}$ (the $r \times s$ measurements), Eq. (A.4) then becomes formally

$\begin{displaymath}{\mathbf{Y}}= {\mathbf{M}}\cdot {\mathbf{X}}, \end{displaymath}$

(A.8)

where M is a $(r,s)\times (p,q)$ matrix with entries given by

$\begin{eqnarray*}M_{i k j l}\! = \! \left\{ {\int \!\!\! {\int} ~ e_{j}(\mathcal... ...R}) {\rm d}{\mathcal{Z}}{\rm d}{m_{\rm IR}}} \right\}_{i k j l}. \end{eqnarray*}$

A.3 Penalties

Assuming that the noise in ${\hat \mathcal{N}}$ can be approximated to be Normal, we can estimate the error between the measured counts and the non-parametric model by

$\begin{displaymath}{\mathit{L}}_{\mathit{}}({\mathbf{X}}) \equiv \chi^2({\mathbf... ...\!\cdot\!({{\mathbf{Y}}} - {\mathbf{M}}\!\cdot\!{\mathbf{X}}), \end{displaymath}$

(A.9)

where the weight matrix W is the inverse of the covariance matrix of the data (which is diagonal for uncorrelated noise with diagonal elements equal to one over the data variance). Since we are interested here in a non-parametric inversion, the decomposition in Eq. (A.7) typically involves many more parameters than constraints, such that each parameter controls the shape of the function, ${\tilde N}$ , only locally. As mentioned in the main text, some trade-off must therefore be found between the level of smoothness imposed on the solution in order to deal with the artefacts induced by the ill-conditioning, on the one hand, and the level of fluctuations consistent with the amount of information in the counts, on the other hand. Between two solutions yielding equivalent likelihood, the smoothest is chosen on the basis of the quadratic penalty:

$\begin{displaymath}{\mathit{R}}_{\mathit{}}({\mathbf{X}}) = {{{{\mathbf{X}}}}^{\bot}} \!\cdot\!{\mathbf{K}} \!\cdot\!{\mathbf{X}}~, \end{displaymath}$

(A.10)

where K is a positive definite matrix, which is chosen so that R in Eq. (A.10) should be non zero when X is strongly varying as a function of its indices. In practice, we use a square Laplacian penalization D2 norm as defined by Eq. (30) of Ocvirk et al. (2006b). Indeed, a Tikhonov penalization does not explicitly enforce smoothness of the solution, and a square gradient penalization favors flat solutions that are unphysical in our problem.

As mentioned in the main text, for a range of redshifts, a direct measurement, X₀, which can be used as a prior for ${\tilde N}$ , is available. We may therefore add as a supplementary constraint that

$\begin{displaymath}{\mathit{P}}_{\mathit{}}({\mathbf{X}}) = {({{\mathbf{X}}} -... ...ot\!{\mathbf{W}}_2 \!\cdot\!({{\mathbf{X}}} - {\mathbf{X}}_0) \end{displaymath}$

(A.11)

should remain small, where the weight matrix, W₂, is the inverse of the covariance matrix of the prior, X₀, and should be non zero over the appropriate redshift range. In short, the penalized non-parametric solution of Eq. (A.8) accounting for both penalties is found by minimizing the so-called objective function

$\begin{displaymath}Q({\mathbf{X}})=L({\mathbf{X}})+\lambda~R({\mathbf{X}})+\mu~P({\mathbf{X}}), \end{displaymath}$

(A.12)

where L(X), R(X), and P(X) are the likelihood and regularization terms given by Eqs. (A.9)- (A.11), respectively. The Lagrange multipliers $\lambda, \mu\geq0$ allow us to tune the level of smoothness of the solution (in practice, we set $\lambda=0.02$ for the reasons given below) and the requirement that X should remain close to its prior for the range of redshifts for which data is available. The introduction of the Lagrange multipliers is formally justified by our wanting to minimize the objective function Q(X), subject to the constraint that L(X) and P(X) should fall in the range $N_{\mathit{\rm data}}\pm\sqrt{2~N_{\mathit{\rm data}}}$ and $N_{\mathit{\rm param}}\pm\sqrt{2~N_{\mathit{\rm param}}}$ , respectively.

The minimum of the objective function, Q(X), given by Eq. (A.12) reads formally as

$\begin{eqnarray*}\hat {\mathbf{X}} = ({{{{\mathbf{M}}}}^{\bot}} \!\cdot\!{\mathb... ...{\mathbf{Y}}\!+\mu {\mathbf{W}}_2\!\cdot\!{\mathbf{X}}_0\right). \end{eqnarray*}$

This equation clearly shows that the solution tends towards X₀when $\mu \rightarrow \infty$ , while the smoothing Lagrange multiplier, $\lambda$ , damps counterparts of the components of Y corresponding to the higher singular vectors of M (Ocvirk et al. 2006b). When dealing with noisy datasets, the non-parametric inversion technique may produce negative coefficients for the reconstructed luminosity function. To avoid such effects, positivity is imposed on those coefficients n_jl in Eq. (A.7), see for instance Ocvirk et al. (2006b) or Pichon et al. (2002). In practice, the minimum of the objective function is found iteratively, using optimpack (Thiébaut 2005). The relative weight on the likelihood and the two penalties is chosen so that the three quantities have a comparable contribution to the total likelihood after convergence. This corresponds to a reasonably smooth variation in the LF both in redshift and $L_{\rm IR}$ , and imposes a solution that is always within 1 $\sigma$ of the observed low-redshift LF, X₀, when $\mu$ is not set to zero in Eq. (A.12).

Appendix B: Test of robustness

To quantify the confidence level of the inversion technique, we test its robustness. Starting from an arbitrary LF, we produce IR counts in the bands and flux ranges corresponding to the observations from this LF. Then, we add some random Gaussian noise to the simulated counts, using the real uncertainty on the observations as the $\sigma$ of the error distribution for each flux bin. Finally, we apply the inversion technique described in Sect. 2.1 to these noisy counts and obtain an output LF.

The comparison of the input and output LFs is shown in Fig. B.1. The error on the absolute difference in $\log_{10} LF_{\rm in} - \log_{10} LF_{\rm out}$ is represented in gray levels and contours. The difference is generally less than 0.4 dex (factor 2.5) in the range where the LF can be constrained from the observed counts (range of the z-L plane encompassed by the dashed lines). A noticeable exception is the very low-redshift range (z<0.1), which corresponds to bright sources. For such large fluxes, the considerable noise in the observed counts produces large errors on the recovered LF. At high redshift, recall that the ultra-luminous population of galaxies appears as rare and very bright objects, in a flux range where the number counts are poorly known.

$\begin{figure} \par\includegraphics[width=9cm,clip]{09945f11.ps} \end{figure}$

Figure B.1:

Estimated robustness of the LF inversion used in this paper. The relative difference between the input LF and the recovered LF (when a realistic noise is added to the corresponding input counts) is larger for darker parts of the diagram. This difference is relatively small (<0.4 dex) in the region of the z-L space effectively constrained by observations: the dotted and dashed lines correspond to the extreme fluxes considered at 24 $\mu$ m and 850 $\mu$ m, respectively, for this study. See main text for details.

Open with DEXTER

Appendix C: Model predictions for Herschel

In Sect. 4, we have inverted the known IR counts to obtain constraints on the evolving total IR LF. We have seen that the LF obtained through this inversion is realistic and matches most of the recent observations (counts, CIRB, Mid-IR LF at low redshift). Then, in Sect. 5.2, we have shown how we can measure directly a part of this LF with a good confidence and that the LF resulting from the inversion is in good agreement with this solid measurement, validating the LF obtained by this empirical modeling approach. In this section, we use the median LF obtained from the inversions to predict some counts which should be observed with future observations in the FIR with Herschel or SCUBA2.

At the time of publication, several new facilities are in preparation to observe the Universe in the far-IR to sub-mm regimes. The differential counts (normalized to Euclidean) at wavelengths ranging from 16 to 850 $\mu$ m, which we derived from the inversion technique, are presented in Fig. C.3. The separation of the contribution of local, intermediate, and distant galaxies in different colors illustrates the expected trend that larger wavelengths are sensitive to higher redshifts, hence the relative complementarity of all IR wavelengths. There will be a bias towards more luminous and distant objects with increasing wavelength, illustrated here for the Herschel passbands (see Fig. C.4), but this may be used to pre-select the most distant candidates expected to be detected only at the largest wavelengths. In the following, we discuss the predictions of the inversion technique for those instruments, as well as their respective confusion limits, which is the main limitation of far-IR extragalactic surveys.

The ESA satellite Herschel is scheduled to be launched within the next year, while the next-generation IR astronomical satellite of the Japanese space agency, SPICA, is scheduled for 2010, with a contribution by ESA under discussion, including a mid-IR imager named SAFARI. Both telescopes share the same diameter of 3.5 m, but the lower telescope temperature of SPICA, combined with projected competitive sensitivities, will make it possible to reach confusion around 70 $\mu$ m (where Herschel is limited by integration time). The 5 $\sigma$ -1hour limits of the instruments SAFARI onboard SPICA (50 $\mu$ Jy, 33-210 $\mu$ m, dashed line), PACS (3mJy, 55-210 $\mu$ m, light blue line) and SPIRE (2 mJy, 200-670 $\mu$ m, blue line) onboard Herschel are compared in Fig. C.1 to the confusion limits that we derive from the best-fit model of the inversion, at all wavelengths between 30 and 850 $\mu$ m, assuming the the confusion limit definition given below.

$\begin{figure} \par\includegraphics[width=9cm,clip]{09945f12.ps} \end{figure}$

Figure C.1:

Confusion limit for a 3.5 m telescope. The 5 $\sigma$ -1 h limits of SPICA-SAFARI (50 $\mu$ Jy, 33-210 $\mu$ m, dashed line), Herschel PACS (3 mJy, 55-210 $\mu$ m, light blue line) and SPIRE (2 mJy, 200-670 $\mu$ m, blue line) are shown together with their wavelength ranges. The blue part of the curve is determined by the source density criterion (i.e. the requirement to have less than 30% of the sources closer than 0.8 $\times$ FWHM), the red part is defined by the photometric criterion, i.e. sources must be brighter than 5 times the rms due to very faint sources below the detection limit.

Open with DEXTER

The definition of the confusion limit is not trivial, in particular because it depends on the level of clustering of galaxies; the optimum way to define it would be to perform simulations to compute the photometric error as a function of flux density, and then decide that the confusion limit is e.g. the depth above which 68% of the detected sources are measured with a photometric accuracy better than 20%. In the following, we only consider a simpler approach that involves computing the two sources of confusion that were discussed in Dole et al. (2003):

the photometric confusion noise: the noise produced by sources fainter than the detection threshold. The photometric criterion corresponds to the requirement that sources are detected with an S/N(photometric) > 5;
the fraction of blended sources: a requirement for the quality of the catalog of sources will be that less than N % of the sources are closer than 0.8 $\times$ FWHM, i.e. close enough to not be separated.

We tested various levels for N and found that N=30% was equivalent to the above requirement that 68% of the detected sources are measured with a photometric accuracy better than 20% using realistic simulations in the far IR for Herschel. We therefore use the value N=30%. The confusion limit is then defined as the flux density above which both criteria are respected. As a result, it is found that the main limitation is the fraction of blended sources at $\lambda=50{-}105$ $\mu$ m (blue part of the curve in Fig. C.1) and the photometric confusion noise below and above this range, i.e. at $\lambda= 33{-}50$ and $\lambda=105{-}210$ $\mu$ m (red parts of curve in Fig. C.1). As a result of their smaller beam, shorter IR wavelength are more efficient at detecting faint star-forming galaxies than longer ones (see Fig. C.2). This is at the expense of observing farther away from the peak of the far IR emission, which implies larger uncertainties on the derivation of the total IR luminosity due to the uncertain dust temperature.

$\begin{figure} \par\includegraphics[width=8.7cm,clip]{09945f13.ps} \end{figure}$

Figure C.2:

Detection limits for confusion limited surveys from 70 to 850 $\mu$ m. The curves show the minimum IR luminosity (8-1000 $\mu$ m), or equivalently SFR (= $L_{\rm IR}\times 1.72 \times 10^{-10}$ ), that can be detected for a star-forming galaxy assuming that it has an SED similar to the Chary & Elbaz (2001) ones. The 70, 100, 160, 250, 350 and 500 $\mu$ m limits correspond to a 3.5 m telescope diameter, such as Herschel or SPICA, while the 400 $\mu$ m is for a 12 m class telescope such as APEX (e.g. ARTEMIS, we show the average between the two bands at 350 or 450 $\mu$ m to avoid confusion with Herschel) and the 850 $\mu$ m is for a 15 m telescope as the JCMT (SCUBA).

Open with DEXTER

Table C.1: Fraction of the CIRB resolved by confusion-limited Herschel surveys.

We note that the confusion limit for a 3.5 m-class telescope, such as Herschel, is ten times more than the depth it can reach in one hour (5 $\sigma$ ). With a source density of 12.8 sources per square degree at the 500 $\mu$ m confusion limit (25 mJy), or equivalently one source in a field of 17 arcmin on a side, this shows that the best strategy at this wavelength is to go for very large and moderately shallow surveys, in order to identify the rare and very luminous distant galaxies.

$\begin{figure} \par\includegraphics[width=16.5cm,clip]{09945f14.ps} \end{figure}$

Figure C.3:

Counts predicted from the inversion in the far-infrared and sub-mm (solid line). The counts are decomposed in redshift bins (blue = z<0.5; green = 0.5<z<1.5; orange = 1.5<z<2.5; red = z>2.5). The oblique dashed line corresponds to the limit in statistics due to the smallness of a field like GOODS North+South or 0.07 square degrees: less than 2 galaxies per flux bin of width $\delta \log F=0.1$ dex are expected below this limit.

Open with DEXTER

$\begin{figure} \par\includegraphics[width=16.6cm,clip]{09945f15.ps} \end{figure}$	Figure C.4: Differential counts predicted from the non-parametric inversion for future Herschel observations: PACS 100 $\mu$ m (solid black), SPIRE 250 $\mu$ m (dotted blue), SPIRE 350 $\mu$ m (dashed green), SPIRE 500 $\mu$ m (dot-dashed red) decomposed simultaneously in redshift and $L_{\rm IR}$ .
Open with DEXTER

For comparison, we also illustrated the ground-based capacity of ARTEMIS built by CEA-Saclay which will operate at the ESO 12 m-telescope facility APEX (Atacama Pathfinder EXperiment) at 200, 350 and 450 $\mu$ m and SCUBA-2 that will operate at the 15 m telescope JCMT at 450 and 850 $\mu$ m. To avoid confusion between all instruments, we only show the average wavelength 400 $\mu$ m for a 12 m-class telescope and 850 $\mu$ m for a 15 m-class telescope (Fig. C.1). Although the confusion limit in the 850 $\mu$ m passband is ten times below that of Herschel at the longest wavelengths, this band is not competitive with the $\sim$ 400 $\mu$ m one, which should be priorities for ARTEMIS and SCUBA-2 for the study of distant galaxies, or with the 70 and 100 $\mu$ m ones for a 3.5 m space experiment such as SPICA and Herschel, for redshifts below $z\sim5$ . We also note that only in these two passbands will the cosmic IR background be resolved with these future experiments (see Table C.1), which suggests that a larger telescope size should be considered for a future experiment to observe the far IR Universe above 100 $\mu$ m and below the wavelength domain of ALMA. We did not mention here ALMA since it will not be affected by these confusion issue: due to its very good spatial resolution, it will be limited to either small ultradeep survey, hence missing rare objects or follow-ups of fields observed with single dish instruments, e.g. ARTEMIS. Finally, the JWST that will operate in the mid IR will be a very powerful instrument for probing the faintest star-forming galaxies in the distant Universe, but predictions are difficult to produce at the present stage since it has already been found that extrapolations from the mid to far IR become less robust already at $z\sim 2$ (e.g. Papovich et al. 2007; Pope et al. 2008; Daddi et al. 2007b).

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.