A&A 487, 775-780 (2008)
DOI: 10.1051/0004-6361:200809345

A statistical analysis of the ``internal linear combination'' method in problems of signal separation as in cosmic microwave background observations

R. Vio1 - P. Andreani2,3


1 - Chip Computers Consulting s.r.l., Viale Don L. Sturzo 82, S.Liberale di Marcon, 30020 Venice, Italy
2 - ESO, Karl Schwarzschild strasse 2, 85748 Garching, Germany
3 - INAF - Osservatorio Astronomico di Trieste, via Tiepolo 11, 34143 Trieste, Italy

Received 2 January 2008 / Accepted 26 May 2008

Abstract
Aims. The separation of foreground contamination from cosmic microwave background (CMB) observations is one of the most challenging and important problems of digital signal processing in Cosmology. In the literature, various techniques have been presented but no general consensus about their real performance and properties has been reached. This is due to the characteristics of these techniques that have been studied essentially through numerical simulations based on semi-empirical models of the CMB and the Galactic foreground. Such models often have different levels of sophistication and/or are based on different physical assumptions (e.g. the number of Galactic components and level of the noise). Hence, a reliable comparison is difficult. What is missing is a statistical analysis of the properties of the proposed methodologies. Here, we consider the Internal Linear Combination method (ILC) which, among the separation techniques, requires the smallest number of a priori assumptions. This feature is of particular interest in the context of CMB polarization measurements at small angular scales where the lack of knowledge of the polarized backgrounds represents a serious limit.
Methods. The statistical characteristics of ILC are examined through an analytical approach and the basic conditions are fixed so as to work satisfactorily.
Results. ILC provides satisfactory results only under rather restrictive conditions. This is a critical fact to take into consideration in planning future ground-based observations (e.g., with ALMA) where, contrary to satellite experiments, there is the possibility of having a certain control over the experimental conditions.

Key words: methods: data analysis - methods: statistical - cosmology: cosmic microwave background

1 Introduction

The experimental progresses in the detection of cosmological and astrophysical emissions require a parallel development of data analysis techniques in order to extract the maximum physical information from the data. An example of interest is represented by signals that are mixture of the emission of distinct physical mechanisms. The study of the underlying physical processes needs the separation of the different components that contribute to the observed signals. Such is the case of the Cosmic Microwave Background (CMB) observations where there is the need to separate the CMB from diffuse foreground originated by our own Galaxy. In this context, an extensive literature is available and many approaches have been proposed (for a review, see Delabrouille & Cardoso 2007). However, no general consensus about their real performances and properties has been reached. This is due to the approach followed to determine the characteristics of these techniques that is based on numerical simulations that make use of semi-empirical models of the CMB and the Galactic foreground. Such models often have different level of sophistication and/or are based on different physical assumptions (e.g., the number of Galactic components and level of the noise). Hence, a reliable comparison is difficult. A trustworthy assessment of the real capabilities of such methodologies would require a rigorous analysis of their theoretical statistical characteristics independently of the specific context where they are applied. However, to our knowledge, in the literature nothing has never been presented in this sense (however, see Saha et al. 2007, for a discussion concerning the estimation of the CMB power spectrum). This is even more true for CMB polarization measurements where the available a priori information is quite limited and the use of blind separation techniques an unavoidable choice (Stivoli et al. 2006).

A situation such as this one is clearly unsatisfactory. This also because in the near future some innovative ground-based experiments are planned for polarization observations at very high spatial resolution, such as with the Atacama Large submmillimetre/millimetre array (ALMA). One important advantage of these experiments is that, contrary to the satellite observations, they will allow a certain control of the experimental conditions. Hence, a full exploitation of the capabilities of this facility (and other instruments) requires a careful planning of the observations. The recording of good quality data will allow an effective application of the chosen separation methodology. For this reason, in this work we start exploring the capabilities of algorithms aimed at a careful subtraction of the foreground sources when the amount of available a priori information is limited (an expected situation for polarization observations). In particular, we consider one of the most widely used approaches for the separation of different emissions, the internal linear combination method (ILC), which was adopted, for instance, in the reduction of the data from the Wilkinson Microwave Anisotropy Probe (WMAP) satellite for CMB observations (Bennett et al. 2003). Among the separation techniques, ILC requires the smallest number of a priori assumptions. For the reason presented above, we do not perform an application to any astrophysical dataset here, but rather we study the general properties of this technique in order to fix the conditions under which it can be expected to produce reliable results.

In the following, the available data are assumed in the form of $N_{\rm o}$ maps, taken at different frequencies, containing $N_{\rm p}$ pixels each. More precisely, if S(i)(p) provides the value of the pth pixel for a map obtained at channel ``i''[*], our starting model is:

 \begin{displaymath}S^{(i)}(p)= \mathfrak{S}^{(i)}_{\rm c}(p)+ S^{(i)}_{\rm f}(p)+ {\mathcal N}^{(i)}(p)
\end{displaymath} (1)

where $\mathfrak{S}^{(i)}_{\rm c}(p)$, $S^{(i)}_{\rm f}(p)$ and ${\mathcal N}^{(i)}(p)$ are the contributions due to the CMB, the diffuse Galactic foreground and the experimental noise, respectively. Although not necessary for later arguments, it is assumed that all of these contributions are representable by means of stationary random fields. At least locally, in many experimental situations this is an acceptable approximation. If not, it is often made anyway since it permits a statistical treatment of the problem of interest and the results can be used as a benchmark in the analysis of more complex scenarios. In the present context, this assumption holds on small patches of the sky. Finally, without loss of generality, for ease of notation the random fields are supposed as the realization of zero-mean spatial processes. In the present work the contribution of non-diffuse components (e.g., due to SZ cluster, point-sources, ...) are not considered and they are assumed to have been removed through other methodologies.

The paper is organized as follows: in Sect. 2 the relevant analytical formulas are introduced, in Sect. 3 the statistical framework of ILC in noiseless observations is presented, in Sect. 4 the case of noisy data is considered and the conclusions are drawn in Sect. 5.

   
2 Formalization of the problem

The idea behind ILC is simple. The main assumption is that model (1) can be written as

 \begin{displaymath}S^{(i)}(p)= \mathfrak{S}_{\rm c}(p)+ S^{(i)}_{\rm f}(p)+ {\mathcal N}^{(i)}(p),
\end{displaymath} (2)

i.e. the template of the CMB component is independent of the observing channel. A way to exploit this fact is to average $N_{\rm o}$ images $\{ S^{(i)}(p)\}_{i=1}^{N_{\rm o}}$ giving a specific weight wi to each of them so as to minimize the impact of the foreground and noise (Bennett et al. 2003). This means to look for a solution of type

 \begin{displaymath}\widehat{\mathfrak{S}}_{\rm c}(p)= \sum_{i=1}^{N_{\rm o}} w_{i} S^{(i)}(p).
\end{displaymath} (3)

If the constraint $\sum_{i=1}^{N_{\rm o}} w_{i} = 1$ is imposed, Eq. (3) becomes

 \begin{displaymath}\widehat{\mathfrak{S}}_{\rm c}(p)= \mathfrak{S}_{\rm c}(p)+ \...
..._{\rm o}} w_{i} [ S^{(i)}_{\rm f}(p)+ {\mathcal N}^{(i)}(p) ].
\end{displaymath} (4)

Now, from this equation it is clear that, for a given pixel ``p'', the only variable terms are in the summation. Hence, under the assumption of independence of $\mathfrak{S}_{\rm c}(p)$ from $S^{(i)}_{\rm f}(p)$ and ${\mathcal N}^{(i)}(p)$, the weights  $\{ w_{i} \}$ have to minimize the variance of $\widehat{\mathfrak{S}}_{\rm c}(p)$, i.e.


    $\displaystyle \{ w_{i} \} = \arg\min_{\kern-6mm\raise-0.75mm\hbox{\scriptsize$\{ w_{i} \}$ }}$  
    $\displaystyle {\rm VAR} \left[\mathfrak{S}_{\rm c}(p)\right] + {\rm VAR}\left[\sum_{i=1}^{N_{\rm o}} w_{i} (S^{(i)}_{\rm f}(p)+ {\mathcal N}^{(i)}(p)) \right],$ (5)

where ${\rm VAR}[s(p)]$ is the expected variance of s(p). If ${\vec{S}}^{(i)}$ denotes a row vector such as ${\vec{S}}^{(i)}= [S^{(i)}(1), S^{(i)}(2), \ldots, S^{(i)}(N_{\rm p})]$ and the $N_{\rm o} \times N_{\rm p}$ matrix ${\bf {S}}$ is defined as

\begin{displaymath}{\bf {S}}=
\left( \begin{array}{c}
{\vec{S}}^{(1)} \\
{\vec...
...)} \\
\vdots \\
{\vec{S}}^{(N_{\rm o})}
\end{array} \right),
\end{displaymath} (6)

then Eq. (2) becomes

\begin{displaymath}{\bf {S}}= {\vec{\mathfrak{S}}_{\rm c}}+ {\vec{S}}_{\rm f}+ {\vec{\mathcal N}}.
\end{displaymath} (7)

In this case, the weights are given by (Eriksen et al. 2004)

 \begin{displaymath}{\vec{w}}= \frac{\Cb_{{\bf {S}}}^{-1} {\vec{1}}}{{\vec{1}}^{\rm T} \Cb_{{\bf {S}}}^{-1} {\vec{1}}},
\end{displaymath} (8)

where $\Cb_{{\bf {S}}}$ is the $N_{\rm o} \times N_{\rm o}$ cross-covariance matrix of the random processes that generate ${\bf {S}}$, i.e.

\begin{displaymath}\Cb_{{\bf {S}}} = {\rm E}[{\bf {S}}{\bf {S}}^{\rm T}],
\end{displaymath} (9)

and ${\vec{1}}= (1, 1, \ldots, 1)^{\rm T}$ is a column vector of all ones. Here, ${\rm E}[.]$ denotes the expectation operator. Hence, the ILC estimator takes the form
  
       $\displaystyle \widehat{{\vec{\mathfrak{S}}}}_{\rm c}$ = $\displaystyle {\vec{w}}^{\rm T} {\bf {S}},$ (10)
  = $\displaystyle \alpha {\vec{1}}^{\rm T} \Cb_{{\bf {S}}}^{-1} {\bf {S}},$ (11)

with ${\vec{1}}^{\rm T} {\vec{w}}= 1$ and the scalar quantity $\alpha$ given by

 \begin{displaymath}\alpha = [ {\vec{1}}^{\rm T} \Cb_{{\bf {S}}}^{-1} {\vec{1}}]^{-1}.
\end{displaymath} (12)

In practical applications, matrix $\Cb_{{\bf {S}}}$ is unknown and has to be estimated from the data. Typically, this is done by means of the estimator

 \begin{displaymath}\widehat{{\bf {C}}}_{{\bf {S}}} = \frac{1}{N_{\rm p}} {\bf {S}}{\bf {S}}^{\rm T}.
\end{displaymath} (13)

In this case, the ILC estimator is given by Eqs. (10)-(12) with $\Cb_{{\bf {S}}}$ and ${\vec{w}}$ replaced, respectively, by $\widehat{{\bf {C}}}_{{\bf {S}}}$ and

 \begin{displaymath}\widehat{{\vec{w}}} = \frac{\widehat{{\bf {C}}}_{{\bf {S}}}^{...
...}^{\rm T} \widehat{{\bf {C}}}_{{\bf {S}}}^{-1} {\vec{1}}}\cdot
\end{displaymath} (14)

If the observed maps are not zero-mean, they have to be centered before the computation of $\widehat{{\bf {C}}}_{{\bf {S}}}$. After that, the resulting weights can be applied directly to the original (i.e., non-centered) ${\bf {S}}$. The computation of $\widehat{{\bf {C}}}_{{\bf {S}}}$ is the only point where the fact of working with non-zero mean maps has to be taken into account.

Although in the literature it might appear that the estimator (11) has optimal properties, this is not true. In Eq. (2) the term  $S^{(i)}_{\rm f}(p)+ {\mathcal N}^{(i)}(p)$ is considered as a single noise component (e.g., see Hinshaw et al. 2007; Eriksen et al. 2004). In this way the problem is apparently simplified since it is reduced to the separation of two components only. No a priori information on this ``global'' noise is required. However, this approach can lead to wrong conclusions. For example, since all the components in the mixtures ${\bf {S}}$ are assumed to be zero-mean, from Eq. (4) one could conclude that

\begin{displaymath}{\rm E}[\widehat{{\vec{\mathfrak{S}}}}_{\rm c}\vert {\vec{\ma...
...}^T{\rm E}[{\vec{\mathcal N}}] = {\vec{\mathfrak{S}}_{\rm c}},
\end{displaymath} (15)

i.e. the ILC estimator is unbiased[*]. This is not correct: the claim that $\widehat{{\vec{\mathfrak{S}}}}_{\rm c}$ is unbiased requires ones to prove that

\begin{displaymath}{\rm E}[\widehat{{\vec{\mathfrak{S}}}}_{\rm c}\vert {\vec{\ma...
...T} {\rm E}[{\vec{\mathcal N}}] = {\vec{\mathfrak{S}}_{\rm c}}.
\end{displaymath} (16)

The reason is that ${\vec{S}}_{\rm f}$ is a fixed realization of a random process. There is no way to obtain another one. Even if observed many times (under the same experimental conditions) the foreground components (for instance the Galaxy) will always appear the same. Only the noise component  ${\vec{\mathcal N}}$ will change. This has important consequences. In order to discuss this issue, it is useful to start with the case of noiseless observations that can be thought to reproduce a situation of very high signal-to-noise ratio. In this case, model (2) becomes

 \begin{displaymath}{\vec{S}}^{(i)}= {\vec{\mathfrak{S}}_{\rm c}}+ {\vec{S}}^{(i)}_{\rm f}.
\end{displaymath} (17)

   
3 Statistical characteristics of ILC: noiseless observations

A common assumption in CMB observations is that ${\vec{S}}^{(i)}_{\rm f}$ is given by the linear mixture of the contribution of $N_{\rm c}$ physical processes $\{ {\vec{\mathfrak{S}}_j}\}_{j=1}^{N_{\rm c}}$ (e.g., free-free, dust re-radiation, ...)

 \begin{displaymath}{\vec{S}}^{(i)}_{\rm f}= \sum_{j=1}^{N_{\rm c}} a_{ij} {\vec{\mathfrak{S}}_j},
\end{displaymath} (18)

with aij constant coefficients. In practice, it is assumed that for the jth physical process a template ${\vec{\mathfrak{S}}_j}$ exists independent of the specific channel ``i''. Although rather strong, it is not unrealistic to assume that this condition is satisfied when small enough patches of the sky are considered. Inserting Eq. (18) into Eq. (17) one obtains

 \begin{displaymath}{\bf {S}}= {\bf {A}}{\vec{\mathfrak S}},
\end{displaymath} (19)

with

\begin{displaymath}{\vec{\mathfrak S}}= \left( \begin{array}{l}
{\vec{\mathfrak{...
...
\vdots \\
{\vec{\mathfrak{S}}_{Nc}}\\
\end{array} \right),
\end{displaymath} (20)

and

 \begin{displaymath}{\bf {A}}=
\left( \begin{array}{cccccc}
1 & \vline & a_{11} ...
... o} 2} & \ldots & a_{N_{\rm o} N_{\rm c}}
\end{array} \right).
\end{displaymath} (21)

Here, matrix ${\bf {A}}$, assumed to be of full rank, is shown partitioned in a way that will be useful for later calculations.

In the following, three cases are considered that correspond to possible experimental situations.

   
3.1 Case No = Nc + 1

If $N_{\rm o} = N_{\rm c} + 1$, i.e. when number of observations is equal to the number of the components (CMB included), then ${\bf {A}}$ is a square $N_{\rm o} \times N_{\rm o}$ matrix. In this case,

 \begin{displaymath}\Cb_{{\bf {S}}} = {\bf {A}}\Cb_{{\vec{\mathfrak S}}} {\bf {A}}^{\rm T}
\end{displaymath} (22)

with $\Cb_{{\vec{\mathfrak S}}}$ the $N_{\rm o} \times N_{\rm o}$ cross-covariance matrix of the random processes that generate the templates ${\vec{\mathfrak S}}$, i.e.

\begin{displaymath}\Cb_{{\vec{\mathfrak S}}} = {\rm E}[{\vec{\mathfrak S}}{\vec{\mathfrak S}}^{\rm T}].
\end{displaymath} (23)

If this equation is inserted in Eq. (11), one obtains

 \begin{displaymath}\widehat{{\vec{\mathfrak{S}}}}_{\rm c}= \alpha {\vec{1}}^{\rm...
...}^{\rm -T} \Cb_{{\vec{\mathfrak S}}}^{-1} {\vec{\mathfrak S}},
\end{displaymath} (24)

with the scalar $\alpha$ given by

\begin{displaymath}\alpha = [ {\vec{1}}^{\rm T} {\bf {A}}^{\rm -T} \Cb_{{\vec{\mathfrak S}}}^{-1} {\bf {A}}^{-1} {\vec{1}}]^{-1},
\end{displaymath} (25)

and ${\bf {A}}^{\rm -T} \equiv ({\bf {A}}^{-1})^{\rm T}$. Now, since it is trivially verified that

\begin{displaymath}{\vec{1}}^{\rm T} = {\vec{e}}_1^{\rm T} {\bf {A}}^{\rm T}
\end{displaymath} (26)

with

 \begin{displaymath}{\vec{e}}_1 \equiv (1, 0, \ldots, 0)^{\rm T},
\end{displaymath} (27)

it is

 \begin{displaymath}{\vec{1}}^{\rm T} {\bf {A}}^{\rm -T} = {\vec{e}}_1^{\rm T}.
\end{displaymath} (28)

Hence, $\alpha = (\Cb_{{\vec{\mathfrak S}}}^{-1})_{1 1} = ({\rm E}[{\vec{\mathfrak{S}}_{\rm c}}{\vec{\mathfrak{S}}_{\rm c}}^{\rm T}])^{-1} = \sigma_{\rm cc}^{-1}$ is the inverse of the expected variance of the CMB template. As a consequence, if the random process that generates the template  ${\vec{\mathfrak{S}}_{\rm c}}$ is uncorrelated with those that generate the templates  $\{ {\vec{\mathfrak{S}}_j}\}$ (in general, this is a reasonable assumption), i.e. if

 \begin{displaymath}\Cb_{{\vec{\mathfrak S}}} =
\left( \begin{array}{cccccc}
\si...
...} & \ldots & \sigma_{N_{\rm o} N_{\rm o}}
\end{array} \right),
\end{displaymath} (29)

from Eq. (24) and the fact that $\Cb_{{\vec{\mathfrak S}}}^{-1}$ has the form

\begin{displaymath}{\bf {C}}^{-1}_{{\vec{\mathfrak S}}} =
\left( \begin{array}{...
...ec{\Omega}}^{-1} & \\
0 & \vline & & & &
\end{array} \right),
\end{displaymath} (30)

with ${\vec{\Omega}}$ the bottom-right block of the matrix in the rhs of Eq. (29), one obtains that

\begin{displaymath}\widehat{{\vec{\mathfrak{S}}}}_{\rm c}= {\vec{\mathfrak{S}}_{\rm c}}.
\end{displaymath} (31)

Here, no use is made of the operator ${\rm E}[.\vert.]$ since we are dealing with fixed realizations of the random processes that generate the CMB as well the Galactic components. Therefore, if matrix  $\Cb_{{\bf {S}}}$ is known, the ILC solution is exact.

This condition changes if matrix $\Cb_{{\bf {S}}}$ has to be estimated through Eq. (13) and a general treatment of this problem is quite difficult. For this reason, we consider the case of observations that span a sky area much wider than the correlation lengths of the maps  $\{ {\vec{S}}^{(i)}\}$. Under this condition, $\widehat{{\bf {C}}}_{{\vec{\mathfrak S}}} = {\vec{\mathfrak S}}{\vec{\mathfrak S}}^{\rm T}/ N_{\rm p}$ can be written in the form

\begin{displaymath}\widehat{{\bf {C}}}_{{\vec{\mathfrak S}}} = \Cb_{{\vec{\mathfrak S}}} + {\vec{\Delta}}\Cb_{{\vec{\mathfrak S}}}
\end{displaymath} (32)

with

 \begin{displaymath}{\vec{\Delta}}\Cb_{{\vec{\mathfrak S}}} =
\left( \begin{array...
...}} & \ldots & \delta_{N_{\rm o} N_{\rm o}}
\end{array} \right)
\end{displaymath} (33)

a perturbing matrix with small zero-mean entries. Because of this, it can be expanded in a Taylor series around $\Cb_{{\vec{\mathfrak S}}}^{-1}$ up to the linear term obtaining

 \begin{displaymath}\widehat{{\bf {C}}}_{{\vec{\mathfrak S}}}^{-1} \approx \Cb_{{...
...lta}}\Cb_{{\vec{\mathfrak S}}} \Cb_{{\vec{\mathfrak S}}}^{-1},
\end{displaymath} (34)

or

\begin{displaymath}\widehat{{\bf {C}}}_{{\vec{\mathfrak S}}}^{-1}
\approx \left(...
...}}^{-1} {\vec{\Theta}}{\vec{\Omega}}^{-1}
\end{array} \right),
\end{displaymath} (35)

with ${\vec{\delta}}= (\delta_{\rm c1}, \delta_{\rm c2}, \ldots, \delta_{{\rm c} N_{\rm o}})^{\rm T}$ and ${\vec{\Theta}}$ the bottom-right block of the matrix on the rhs of Eq. (33)[*]. From this result and from Eqs. (24), (27) and (28), one obtains that

 \begin{displaymath}\widehat{{\vec{\mathfrak{S}}}}_{\rm c}= (1, {\vec{\delta}}_{{\vec{\mathfrak{S}}}}) {\vec{\mathfrak S}},
\end{displaymath} (36)

or

\begin{displaymath}\Delta {\vec{\mathfrak{S}}_{\rm c}}= \widehat{{\vec{\mathfrak...
...0, {\vec{\delta}}_{{\vec{\mathfrak{S}}}}) {\vec{\mathfrak S}},
\end{displaymath} (37)

with ${\vec{\delta}}_{{\vec{\mathfrak{S}}}}$ a row vector given by

 \begin{displaymath}{\vec{\delta}}_{{\vec{\mathfrak{S}}}} = - \frac{\sigma_{\rm c...
...a}}^{-1} \approx
- {\vec{\delta}}^{\rm T} {\vec{\Omega}}^{-1}.
\end{displaymath} (38)

Hence, not unexpectedly, the ILC solution differs from the true one by an amount that depends on the sample correlation between ${\vec{\mathfrak{S}}_{\rm c}}$ (the CMB) and the Galactic templates.

   
3.2 Case No < Nc + 1

If $N_{\rm o} < N_{\rm c} + 1$, the number of the components (CMB included) is larger than the number of channels. Since in this case ${\bf {A}}$ is a rectangular $N_{\rm o} \times (N_{\rm c}+1)$ matrix, the inverse  ${\bf {A}}^{-1}$ is not defined. Hence, the ILC solution $\widehat{{\vec{\mathfrak{S}}}}_{\rm c}$ as given by Eq. (11) cannot be written in the form (24). The only possibility to still have $\widehat{{\vec{\mathfrak{S}}}}_{\rm c}= {\vec{\mathfrak{S}}_{\rm c}}$ is that

\begin{displaymath}\alpha {\vec{1}}^{\rm T} ( {\bf {A}}\Cb_{{\vec{\mathfrak S}}}^{-1} {\bf {A}}^{\rm T} )^{-1} {\bf {A}}= {\vec{e}}_1^{\rm T}.
\end{displaymath} (39)

In the present case, there is no reason to expect that this condition will be satisfied. As a consequence, the ILC solution differs from the true one by an amount given by

 \begin{displaymath}\Delta {\vec{\mathfrak{S}}_{\rm c}}= [\alpha {\vec{1}}^{\rm T...
... T})^{-1} {\bf {A}}- {\vec{e}}_1^{\rm T}] {\vec{\mathfrak S}}.
\end{displaymath} (40)

This result implies that, similarly to other techniques (e.g., Generalized Least Squares), it is risky to use the ILC technique in situations where the number of observing channels is smaller than the number of components. Unfortunately, the importance of $\Delta {\vec{\mathfrak{S}}_{\rm c}}$ depends on the specific characteristics of both ${\bf {A}}$ and $\Cb_{{\vec{\mathfrak S}}}$ and, hence, a general treatment is not possible. However, when the Galactic component is the dominant one, it can even happen that $\Delta {\vec{\mathfrak{S}}_{\rm c}}> {\vec{\mathfrak{S}}_{\rm c}}$. For example, in the hypothesis that $\Cb_{{\vec{\mathfrak S}}} = \sigma^2 {\vec{I}}$ and

\begin{displaymath}{\bf {A}}=
\left( \begin{array}{ccc}
1.0 & 2.0 & 3.0 \\
1.0 & 2.5 & 3.5
\end{array} \right),
\end{displaymath} (41)

Eq. (40) provides

\begin{displaymath}\Delta {\vec{\mathfrak{S}}_{\rm c}}= (-0.33, -0.33, 0.33) {\vec{\mathfrak S}}.
\end{displaymath} (42)

This example does not illustrate a situation excessively unfavorable for the separation. In fact, close to the Galactic plane the CMB component is expected to be largely dominated by the Galactic emission.

   
3.3 Case No > Nc + 1

If $N_{\rm o} > N_{\rm c} + 1$, the number of components (CMB included) is smaller than the number of channels. As in the previous section, ${\bf {A}}$ is a $N_{\rm o} \times (N_{\rm c}+1)$ rectangular matrix but now we face a more difficult situation. The ILC solution $\widehat{{\vec{\mathfrak{S}}}}_{\rm c}$ cannot be written in the form (11) since matrix  $\Cb_{{\bf {S}}}$ is singular. The only way out is to resort to the pseudo-inverse $\Cb_{{\bf {S}}}^{\dag }$

\begin{displaymath}\Cb_{{\bf {S}}}^{\dag } = {\bf {V}}{\bf {D}}^{\dag } {\bf {V}}^{\rm T}.
\end{displaymath} (43)

Here, ${\bf {V}}$ is the orthogonal matrix obtained from the singular value decomposition of $\Cb_{{\bf {S}}}$,

 \begin{displaymath}\Cb_{{\bf {S}}} = {\bf {V}}{\bf {D}}{\bf {V}}^{\rm T},
\end{displaymath} (44)

${\bf {D}}^{\dag }$ is a diagonal matrix whose entries are given by $d^{\dag }_{ii} = d_{ii}^{-1}$ if $d_{ii} \neq 0$, 0 otherwise, and $\{d_{ii} \}$ are the elements of the diagonal matrix ${\bf {D}}$. In this case, the ILC solution is given by

 \begin{displaymath}\widehat{{\vec{\mathfrak{S}}}}_{\rm c}= \alpha {\vec{1}}^{\rm T} \Cb_{{\bf {S}}}^{\dag } {\bf {S}},
\end{displaymath} (45)

where

 \begin{displaymath}\alpha = [ {\vec{1}}^{\rm T} \Cb_{{\bf {S}}}^{\dag } {\vec{1}}]^{-1}.
\end{displaymath} (46)

If ${\bf {A}}$ is a full column-rank matrix, then Eq. (45) can be rewritten in the form

 \begin{displaymath}\widehat{{\vec{\mathfrak{S}}}}_{\rm c}= \alpha {\vec{1}}^{\rm...
... )^{\rm T} \Cb_{{\vec{\mathfrak S}}}^{-1} {\vec{\mathfrak S}},
\end{displaymath} (47)

where

\begin{displaymath}\alpha = [ {\vec{1}}^{\rm T} ({\bf {A}}^{\dag })^{\rm T} \Cb_{{\vec{\mathfrak S}}}^{-1} {\bf {A}}^\dag {\vec{1}}]^{-1},
\end{displaymath} (48)

and

\begin{displaymath}{\bf {A}}^{\dag } = ( {\bf {A}}^{\rm T} {\bf {A}})^{-1} {\bf {A}}^{\rm T},
\end{displaymath} (49)

and

\begin{displaymath}({\bf {A}}^{\dag } )^{\rm T} = {\bf {A}}( {\bf {A}}^{\rm T} {\bf {A}})^{-1}.
\end{displaymath} (50)

Now, since it is trivially verified that ${\vec{1}}^{\rm T} = {\vec{e}}_1^{\rm T} {\bf {A}}^{\rm T}$, one obtains that

\begin{displaymath}{\vec{1}}^{\rm T} ({\bf {A}}^{\dag })^{\rm T} = {\vec{e}}_1^{\rm T},
\end{displaymath} (51)

and then, from Eq. (45), $\widehat{{\vec{\mathfrak{S}}}}_{\rm c}= {\vec{\mathfrak{S}}_{\rm c}}$. In other words, if matrix  $\Cb_{{\bf {S}}}$ is known, in this case the ILC method also provides an exact separation. When $\widehat{{\bf {C}}}_{{\bf {S}}}$ has to be used, results similar to those presented in Sect. 3.1 are obtained.

   
4 Statistical characteristics of ILC: noisy observations

In situations of maps contaminated by measurement noise ${\vec{\mathcal N}}$, assumed to be the realization of zero-mean stationary stochastic processes, model (19) becomes

 \begin{displaymath}{\bf {S}}= {\bf {A}}{\vec{\mathfrak S}}+ {\vec{\mathcal N}}.
\end{displaymath} (52)

In turn, under the condition of ${\vec{\mathcal N}}$ uncorrelated with ${\bf {S}}$, $\Cb_{{\bf {S}}}$ becomes

 \begin{displaymath}\Cb_{{\bf {S}}} = {\bf {A}}\Cb_{{\vec{\mathfrak S}}} {\bf {A}}^{\rm T} + {\vec{\Omega}}_{{\vec{\mathcal N}}},
\end{displaymath} (53)

with

\begin{displaymath}{\vec{\Omega}}_{{\vec{\mathcal N}}} = {\rm E}[ {\vec{\mathcal N}}{\vec{\mathcal N}}^{\rm T}].
\end{displaymath} (54)

If noise is small and $N_{\rm o} = N_{\rm c} + 1$, inserting Eq. (53) in Eq. (11) and expanding the resulting expression in a Taylor series up to the linear term, it is possible to see that

 \begin{displaymath}\widehat{{\vec{\mathfrak{S}}}}_{\rm c}= \alpha^* [{\vec{1}}^{...
...}^{\rm -T} \Cb_{{\vec{\mathfrak S}}}^{-1} {\vec{\mathfrak S}},
\end{displaymath} (55)

with ${\vec{\gamma}}= \sigma_{\rm cc}^{-1} {\vec{e}}_1^{\rm T} {\bf {A}}^{-1} {\vec{\Omega}}_{{\vec{\mathcal N}}}$ and $\alpha^* = 1/ [\sigma_{\rm cc}^{-1} - \sigma_{\rm cc}^{-2} {\vec{e}}_1^{\rm T} {\bf {A}}^{-1} {\vec{\Omega}}_{{\vec{\mathcal N}}} {\bf {A}}^{\rm -T} {\vec{e}}_1]$. A similar result holds when $N_{\rm o} > N_{\rm c} + 1$ with $({\bf {A}}^{\dag })^{\rm T}$ substituting ${\bf {A}}^{\rm -T}$. From Eq. (55) it is clear that ${\rm E}[\widehat{{\vec{\mathfrak{S}}}}_{\rm c}\vert {\bf {A}}{\vec{\mathfrak{S}}}] \neq {\vec{\mathfrak{S}}_{\rm c}}$, i.e. the ILC estimator is biased. Since both ${\vec{\gamma}}$ and $\alpha^*$ depend on the specific characteristics of ${\bf {A}}$ and ${\vec{\Omega}}_{{\vec{\mathcal N}}}$, this does not allow a general treatment of the question. However, especially in the case of a high level of noise contamination, the bias affecting $\widehat{{\vec{\mathfrak{S}}}}_{\rm c}$ is expected to be severe.

It is possible to obtain an unbiased ILC estimator computing the weights ${\vec{w}}$ by means of Eq. (11) with the cross-covariance matrix  $\Cb_{{\bf {S}}}$ substituted by

\begin{displaymath}\Cb_{{\bf {S}}}^* = \Cb_{{\bf {S}}} - {\vec{\Omega}}_{{\vec{\mathcal N}}}.
\end{displaymath} (56)

This operation, however, has a cost: the weights ${\vec{w}}$ no longer minimize the variance of  $\{ \widehat{\mathfrak{S}}_{\rm c}(p)\}$. As a consequence, the influence of noise can be dramatically amplified. The reason for this result lies in the condition number, $\kappa(\Cb_{{\bf {S}}}^*)$, of $\Cb_{{\bf {S}}}^*$ that tends to be larger than $\kappa(\Cb_{{\bf {S}}})$. Because of this, the entries of $(\Cb_{{\bf {S}}}^*)^{-1}$ can take values in a range much wider than those of $\Cb_{{\bf {S}}}^{-1}$ and the same holds for the corresponding weights ${\vec{w}}$. According to a theorem of Weyl (e.g., see Bhatia 1997, Theorem III.2.2), it is


    $\displaystyle \lambda_1(\Cb_{{\bf {S}}}) - \lambda_1({\vec{\Omega}}_{{\vec{\mat...
..._1(\Cb_{{\bf {S}}}) - \lambda_{N_{\rm o}}({\vec{\Omega}}_{{\vec{\mathcal N}}}),$ (57)
    $\displaystyle \lambda_{N_{\rm o}}(\Cb_{{\bf {S}}}) - \lambda_1({\vec{\Omega}}_{...
...}}(\Cb_{{\bf {S}}}) - \lambda_{N_{\rm o}}({\vec{\Omega}}_{{\vec{\mathcal N}}}),$ (58)

where $\lambda_{i}({\bf {H}})$ denotes the ith eigenvalues of a $N \times N$ symmetric, positive definite matrix ${\bf {H}}$ with $ \lambda_1 \ge \lambda_2 \ldots
\ge \lambda_{N} > 0 $. These inequalities indicate a high probability that $\lambda_{N_{\rm o}}(\Cb_{{\bf {S}}}^*) / \lambda_{N_{\rm o}}(\Cb_{{\bf {S}}}) <
\lambda_1(\Cb_{{\bf {S}}}^*) / \lambda_1(\Cb_{{\bf {S}}})$. In this way,

\begin{displaymath}\kappa(\Cb_{{\bf {S}}}^*) = \frac{\lambda_1(\Cb_{{\bf {S}}}^*...
...ambda_{N_{\rm o}}(\Cb_{{\bf {S}}})} = \kappa(\Cb_{{\bf {S}}}).
\end{displaymath} (59)

For example, this is strictly true if ${\vec{\Omega}}_{{\vec{\mathcal N}}}$ is a diagonal matrix with non-zero entries equal to a constant value $\varrho$, since

\begin{displaymath}\kappa(\Cb_{{\bf {S}}}^*) = \frac{\lambda_1(\Cb_{{\bf {S}}}) - \varrho}{\lambda_{N_{\rm o}}(\Cb_{{\bf {S}}}) - \varrho}\cdot
\end{displaymath} (60)

From these consideration, in the case of a low ${\rm SNR}$ and large $\kappa(\Cb_{{\bf {S}}}^*)$, one has to expect significant amplification of the noise for the unbiased ILC estimator. However, matrix  $\Cb_{{\bf {S}}}^*$ is very ill-conditioned only in situations where the images  $\{ {\vec{S}}^{(i)}\}$ are very similar. For example, in CMB applications this could happen if the maps were obtained at very close frequencies, and this should be considered when planning the experiment. As a consequence, in practical applications it is advisable to check whether the resulting $\kappa(\Cb_{{\bf {S}}}^*)$ permits the use of the unbiased ILC estimator. The benefit is that noise filtering is typically an easier operation than bias removal. Even if $\Cb_{{\bf {S}}}^*$ is ill-conditioned, as it happens when $N_{\rm o} > N_{\rm c} + 1$, its $\kappa$ can be reduced by means of the SVD technique presented in Sect. 3.3.

These conclusions hold only in situations of isotropic noise (i.e. noise with identical characteristics everywhere). In the contrary case, poor results are to be expected since noise behaves as an additional channel-dependent component.

   
5 Conclusions

The arguments presented in the previous sections show that for a reliable use of the ILC estimator some conditions have to be satisfied. In particular:

1.
The observations have to cover a sky area much wider than the spatial scale (correlation length) of the observed maps in such a way as to allow an accurate estimate of the cross-covariance matrix  $\Cb_{{\bf {S}}}$.

2.
Model (19)-(21) has to hold. In particular, ILC cannot be expected to produce satisfactory results if some of the Galactic templates depend on the observing channels. This point can be realized if a Galactic channel-dependent template  ${\vec{\mathfrak{S}}_j}$ is thought of as the sum of Nj channel-independent templates. In this case, model (19)-(21) still holds but with an effective number of Galactic components that now is $N_{\rm c}^* = \sum_{j=1}^{N_{\rm c}} N_j$. Since the number of observing channel is typically rather limited, in practical applications it has to be expected that $N_{\rm o} < N_{\rm c}^* + 1$ and then the point below applies.

3.
The condition $N_{\rm o} \ge N_{\rm c}^* + 1$ has to hold, i.e. the number of observing channels has to be equal to or larger than the number of the physical components (CMB included) that contribute to form the observed maps. In the contrary case, the solution can suffer severe distortion.

4.
In the observed maps the level of the noise has to be rather low otherwise a severe bias can be introduced in the ILC estimator. This can be removed but at the price of an amplification of the noise level in the final result. However, especially in situations of high ${\rm SNR}$, the use of the unbiased ILC estimator can offer some advantages.
Thus, using the ILC estimator is not trivial. In particular, the first two points are in conflict with each other. In the case of wide maps, model (19)-(21) is not applicable since on large spatial scales the Galactic templates are expected to depend on the observing channel. In this respect, a simple tests can be of help that is based on the analysis of the eigenvalues of $\widehat{{\bf {C}}}_{{\bf {S}}}$. If $N_{\rm o} > N_{\rm c}^* + 1$, matrix $\widehat{{\bf {C}}}_{{\bf {S}}}$ has to be (almost) singular with a number of (almost) zero elements in the diagonal matrix $\widehat{{\bf {D}}}$ of Eq. (44) equal to $N_{\rm o} - N_{\rm c}^* - 1$. Unfortunately, this test can be influenced by the noise; because of the statistical fluctuations, the entries of $\widehat{{\bf {D}}}$ that should be close to zero can take larger values.

The obvious conclusion is that, in order to obtain reliable results, the use of ILC requires a careful planning of the observations. The area of the sky to observe as well as the acceptable level of the noise are factors that have to be fixed in advance. An ``a posteriori'' correction of the distortions introduced in the ILC solution by the violation of the above conditions is unreliable as all of these distortions critically depend on the true solution that one tries to estimate. Therefore, they cannot be obtained from the data only. The alternative represented by the numerical simulations that make use of semi-empirical templates is also unreliable. In particular, there is the possibility of introducing spurious features in the final results. This holds also when these results are used as ``priors'' in more sophisticated separation techniques.

As the use of ILC in the Fourier domain provides the same results as those obtainable in the spatial one, it could seem that this approach does not offer particular benefit. This can be false if the maps have different spatial resolutions (i.e. the observing channels have different point spread functions) and/or whether a frequency-dependent separation is desired (Tegmark et al. 2003). Although in this way it is possible to improve some properties of the separated maps (e.g., the spatial resolution), this has a cost in the amplification of the noise level.

Acknowledgements
We thank Dr. Carlo Baccigalupi for careful reading of this manuscript.

References

 

Copyright ESO 2008