A&A 487, 775-780 (2008)
DOI: 10.1051/0004-6361:200809345
A statistical analysis of the ``internal linear combination'' method in problems of signal separation as in cosmic microwave background observations
R. Vio1 - P. Andreani2,3
1 - Chip Computers Consulting s.r.l., Viale Don L. Sturzo 82,
S.Liberale di Marcon, 30020 Venice, Italy
2 -
ESO, Karl Schwarzschild strasse 2, 85748 Garching, Germany
3 - INAF - Osservatorio Astronomico di Trieste, via Tiepolo 11, 34143 Trieste, Italy
Received 2 January 2008 / Accepted 26 May 2008
Abstract
Aims. The separation of foreground contamination from cosmic microwave background (CMB) observations is one of the most challenging and important problems of digital signal processing in Cosmology. In the literature, various techniques have been presented but no general consensus about their real performance and properties has been reached. This is due to the characteristics of these techniques that have been studied essentially through numerical simulations based on semi-empirical models of the CMB and the Galactic foreground. Such models often have different levels of sophistication and/or are based on different physical assumptions (e.g. the number of Galactic components and level of the noise). Hence, a reliable comparison is difficult. What is missing is a statistical analysis of the properties of the proposed methodologies. Here, we consider the Internal Linear Combination method (ILC) which, among the separation techniques, requires the smallest number of a priori assumptions. This feature is of particular interest in the context of CMB polarization measurements at small angular scales where the lack of knowledge of the polarized backgrounds represents a serious limit.
Methods. The statistical characteristics of ILC are examined through an analytical approach and the basic conditions are fixed so as to work satisfactorily.
Results. ILC provides satisfactory results only under rather restrictive conditions. This is a critical fact to take into consideration in planning future ground-based observations (e.g., with ALMA) where, contrary to satellite experiments, there is the possibility of having a certain control over the experimental conditions.
Key words: methods: data analysis - methods: statistical - cosmology: cosmic microwave background
The experimental progresses in the detection of cosmological and astrophysical emissions require a
parallel development of data analysis techniques in order to extract the maximum physical information from the data. An example of interest is represented by signals that are mixture of the emission of distinct physical mechanisms. The study of the underlying physical processes needs the separation of the different components that contribute to the observed signals. Such is the case of the Cosmic Microwave Background (CMB) observations where
there is the need to separate the CMB from diffuse foreground originated by our own Galaxy. In this context, an extensive literature is available
and many approaches have been proposed (for a review, see Delabrouille & Cardoso 2007). However, no general consensus about their real performances and properties has been reached. This is due to the approach followed to determine the characteristics of these techniques
that is based on numerical simulations that make use of semi-empirical models of the CMB and the Galactic foreground. Such models often have different level of sophistication and/or are based on different physical assumptions (e.g., the number of Galactic components and
level of the noise). Hence, a reliable comparison is difficult.
A trustworthy assessment of the real capabilities of such methodologies would require a rigorous analysis of their theoretical statistical characteristics independently of the specific context where they are applied. However, to our knowledge, in the literature nothing has never been presented in this sense (however, see Saha et al. 2007, for a discussion concerning
the estimation of the CMB power spectrum). This is even more true for CMB polarization
measurements where the available a priori information is quite limited and the use of blind separation techniques an unavoidable choice (Stivoli et al. 2006).
A situation such as this one is clearly unsatisfactory. This also because in the near future some innovative ground-based experiments are planned for polarization
observations at very high spatial resolution, such as with the Atacama Large submmillimetre/millimetre array (ALMA). One important
advantage of these experiments is that, contrary to the satellite observations, they will allow a certain control of the experimental conditions.
Hence, a full exploitation of the capabilities of this facility (and other instruments) requires a careful planning of the observations. The recording of good quality data will allow an effective application of the chosen separation methodology. For this reason, in this work we start exploring
the capabilities of algorithms aimed at a careful subtraction of the foreground sources when the
amount of available a priori information is limited (an expected situation for polarization observations). In particular, we consider one of the most widely used approaches for the separation of different emissions, the internal linear combination method (ILC),
which was adopted, for instance, in the reduction of the data from the Wilkinson Microwave Anisotropy Probe (WMAP) satellite for CMB observations
(Bennett et al. 2003). Among the separation techniques, ILC requires the smallest number of a priori assumptions. For the reason presented above, we do not perform an application to any astrophysical dataset here, but rather we study the general properties of
this technique in order to fix the conditions under which it can be expected to produce reliable results.
In the following, the available data are assumed in the form of
maps, taken at different frequencies, containing
pixels each. More precisely, if
S(i)(p) provides the value of the pth pixel for a map obtained at channel ``i''
, our starting model is:
 |
(1) |
where
,
and
are the contributions due to the CMB, the diffuse Galactic foreground and the experimental noise,
respectively. Although not necessary for later arguments, it is assumed that all of these contributions are representable by means of stationary
random fields. At least locally, in many experimental situations this is an acceptable approximation. If not, it is often made anyway since it permits a statistical treatment of the problem of interest and the results can be used as a benchmark in the analysis of more complex scenarios. In the present context, this assumption holds on small
patches of the sky. Finally, without loss of generality, for ease of notation the random fields are supposed as the realization of zero-mean spatial processes. In the present work the contribution of non-diffuse components (e.g., due to SZ cluster, point-sources, ...) are not considered and they are assumed to have been removed through other methodologies.
The paper is organized as follows: in Sect. 2 the relevant analytical formulas are introduced, in Sect. 3 the statistical framework of ILC in noiseless observations is presented, in Sect. 4 the case of noisy data is considered and the conclusions are drawn in Sect. 5.
2 Formalization of the problem
The idea behind ILC is simple. The main assumption is that model (1) can be written as
 |
(2) |
i.e. the template of the CMB component is independent of the observing channel. A way to exploit this fact is to average
images
giving a specific weight wi to each of them so as to minimize the impact of the foreground and noise
(Bennett et al. 2003). This means to look for a solution of type
 |
(3) |
If the constraint
is imposed, Eq. (3) becomes
![\begin{displaymath}\widehat{\mathfrak{S}}_{\rm c}(p)= \mathfrak{S}_{\rm c}(p)+ \...
..._{\rm o}} w_{i} [ S^{(i)}_{\rm f}(p)+ {\mathcal N}^{(i)}(p) ].
\end{displaymath}](/articles/aa/full/2008/32/aa09345-08/img11.gif) |
(4) |
Now, from this equation it is clear that, for a given pixel ``p'', the only variable terms are in the summation. Hence, under the assumption of independence of
from
and
,
the weights
have to minimize the variance of
,
i.e.
|
|
 |
|
|
|
![$\displaystyle {\rm VAR} \left[\mathfrak{S}_{\rm c}(p)\right] + {\rm VAR}\left[\sum_{i=1}^{N_{\rm o}} w_{i} (S^{(i)}_{\rm f}(p)+ {\mathcal N}^{(i)}(p)) \right],$](/articles/aa/full/2008/32/aa09345-08/img16.gif) |
(5) |
where
is the expected variance of s(p).
If
denotes a row vector such as
and the
matrix
is defined as
 |
(6) |
then Eq. (2) becomes
 |
(7) |
In this case, the weights are given by (Eriksen et al. 2004)
 |
(8) |
where
is the
cross-covariance matrix of the random processes that generate
,
i.e.
![\begin{displaymath}\Cb_{{\bf {S}}} = {\rm E}[{\bf {S}}{\bf {S}}^{\rm T}],
\end{displaymath}](/articles/aa/full/2008/32/aa09345-08/img27.gif) |
(9) |
and
is a column vector of all ones. Here,
denotes the expectation operator. Hence, the ILC estimator takes the form
with
and the scalar quantity
given by
![\begin{displaymath}\alpha = [ {\vec{1}}^{\rm T} \Cb_{{\bf {S}}}^{-1} {\vec{1}}]^{-1}.
\end{displaymath}](/articles/aa/full/2008/32/aa09345-08/img35.gif) |
(12) |
In practical applications, matrix
is unknown and has to be estimated from the data. Typically, this is done by means of the estimator
 |
(13) |
In this case, the ILC estimator is given by Eqs. (10)-(12) with
and
replaced, respectively, by
and
 |
(14) |
If the observed maps are not zero-mean, they have to be centered before the computation of
.
After that, the resulting weights can be applied directly to the original (i.e., non-centered)
.
The computation of
is
the only point where the fact of working with non-zero mean maps has to be taken into account.
Although in the literature it might appear that the estimator (11) has optimal
properties, this is not true. In Eq. (2) the term
is considered as a single noise component (e.g., see Hinshaw et al. 2007; Eriksen et al. 2004).
In this way the problem is apparently simplified since it is reduced to the separation of two components only. No a priori information on this ``global'' noise is required. However, this approach can lead to wrong conclusions. For example, since all the components in the mixtures
are assumed to be zero-mean, from Eq. (4) one could conclude that
![\begin{displaymath}{\rm E}[\widehat{{\vec{\mathfrak{S}}}}_{\rm c}\vert {\vec{\ma...
...}^T{\rm E}[{\vec{\mathcal N}}] = {\vec{\mathfrak{S}}_{\rm c}},
\end{displaymath}](/articles/aa/full/2008/32/aa09345-08/img41.gif) |
(15) |
i.e. the ILC estimator is unbiased
. This is not correct: the claim that
is unbiased requires ones to prove that
![\begin{displaymath}{\rm E}[\widehat{{\vec{\mathfrak{S}}}}_{\rm c}\vert {\vec{\ma...
...T} {\rm E}[{\vec{\mathcal N}}] = {\vec{\mathfrak{S}}_{\rm c}}.
\end{displaymath}](/articles/aa/full/2008/32/aa09345-08/img44.gif) |
(16) |
The reason is that
is a fixed realization of a random process. There is no way to obtain another one. Even if observed many times (under the same experimental conditions) the foreground components (for instance the Galaxy) will always appear the same.
Only the noise component
will change. This has important consequences. In order to discuss this issue, it is useful to start with the case of noiseless observations that can be thought to reproduce a situation of very high signal-to-noise ratio. In this case,
model (2) becomes
 |
(17) |
3 Statistical characteristics of ILC: noiseless observations
A common assumption in CMB observations is that
is given by the linear mixture of the contribution of
physical processes
(e.g., free-free, dust re-radiation, ...)
 |
(18) |
with aij constant coefficients. In practice, it is assumed that for the jth physical process a template
exists independent of the specific channel ``i''. Although rather strong, it is not unrealistic to assume that this condition is satisfied
when small enough patches of the sky are considered. Inserting Eq. (18) into Eq. (17) one obtains
 |
(19) |
with
 |
(20) |
and
 |
(21) |
Here, matrix
,
assumed to be of full rank, is shown partitioned in a way that will be useful for later calculations.
In the following, three cases are considered that correspond to possible experimental situations.
3.1 Case No = Nc + 1
If
,
i.e. when number of observations is equal to the number of the components (CMB included), then
is a square
matrix. In this case,
 |
(22) |
with
the
cross-covariance matrix of the random processes that generate the templates
,
i.e.
![\begin{displaymath}\Cb_{{\vec{\mathfrak S}}} = {\rm E}[{\vec{\mathfrak S}}{\vec{\mathfrak S}}^{\rm T}].
\end{displaymath}](/articles/aa/full/2008/32/aa09345-08/img61.gif) |
(23) |
If this equation is inserted in Eq. (11), one obtains
 |
(24) |
with the scalar
given by
![\begin{displaymath}\alpha = [ {\vec{1}}^{\rm T} {\bf {A}}^{\rm -T} \Cb_{{\vec{\mathfrak S}}}^{-1} {\bf {A}}^{-1} {\vec{1}}]^{-1},
\end{displaymath}](/articles/aa/full/2008/32/aa09345-08/img63.gif) |
(25) |
and
.
Now, since it is trivially verified that
 |
(26) |
with
 |
(27) |
it is
 |
(28) |
Hence,
is the inverse of the expected variance of the CMB template.
As a consequence, if the random process that generates the template
is uncorrelated with those that generate the templates
(in general, this is a reasonable assumption), i.e. if
 |
(29) |
from Eq. (24) and the fact that
has the form
 |
(30) |
with
the bottom-right block of the matrix in the rhs of Eq. (29), one obtains that
 |
(31) |
Here, no use is made of the operator
since we are dealing with fixed realizations of the random processes that generate the CMB as well the Galactic components. Therefore, if matrix
is known, the ILC solution is exact.
This condition changes if matrix
has to be estimated through Eq. (13) and a general treatment of this problem is quite difficult.
For this reason, we consider the case of observations that span a sky area much wider than the correlation lengths of the maps
.
Under this
condition,
can be written in the form
 |
(32) |
with
 |
(33) |
a perturbing matrix with small zero-mean entries. Because of this, it can be expanded in a Taylor
series around
up to the linear term obtaining
 |
(34) |
or
 |
(35) |
with
and
the bottom-right block of the matrix on the rhs
of Eq. (33)
. From this result and from Eqs. (24), (27) and (28), one obtains that
 |
(36) |
or
 |
(37) |
with
a row vector given by
 |
(38) |
Hence, not unexpectedly, the ILC solution differs from the true one by an amount that depends on the sample correlation between
(the CMB) and the Galactic templates.
3.2 Case No < Nc + 1
If
,
the number of the components (CMB included) is larger than the number of channels. Since in this case
is a rectangular
matrix, the inverse
is not defined. Hence, the ILC solution
as given by Eq. (11) cannot be
written in the form (24). The only possibility to still have
is that
 |
(39) |
In the present case, there is no reason to expect that this condition will be satisfied. As a consequence, the ILC solution differs from the true one by an amount given by
![\begin{displaymath}\Delta {\vec{\mathfrak{S}}_{\rm c}}= [\alpha {\vec{1}}^{\rm T...
... T})^{-1} {\bf {A}}- {\vec{e}}_1^{\rm T}] {\vec{\mathfrak S}}.
\end{displaymath}](/articles/aa/full/2008/32/aa09345-08/img104.gif) |
(40) |
This result implies that, similarly to other techniques (e.g., Generalized Least Squares), it is risky to use the ILC technique in situations where the number of observing channels is smaller than the number of components. Unfortunately, the importance of
depends on the specific
characteristics of both
and
and, hence, a general treatment is not possible. However, when the Galactic component is the dominant one, it can even happen that
.
For example, in the hypothesis that
and
 |
(41) |
Eq. (40) provides
 |
(42) |
This example does not illustrate a situation excessively unfavorable for the separation. In fact, close to the Galactic plane the CMB component is expected to be largely dominated by the Galactic emission.
3.3 Case No > Nc + 1
If
,
the number of components (CMB included) is smaller than the number of channels. As in the previous section,
is a
rectangular matrix but now we face a more difficult situation. The ILC solution
cannot be written in the form (11) since matrix
is singular. The only way out is to resort to the pseudo-inverse

 |
(43) |
Here,
is the orthogonal matrix obtained from the singular value decomposition of
,
 |
(44) |
is a diagonal matrix whose entries are given by
if
,
0 otherwise, and
are the elements of the diagonal matrix
.
In this case, the ILC solution is given by
 |
(45) |
where
![\begin{displaymath}\alpha = [ {\vec{1}}^{\rm T} \Cb_{{\bf {S}}}^{\dag } {\vec{1}}]^{-1}.
\end{displaymath}](/articles/aa/full/2008/32/aa09345-08/img121.gif) |
(46) |
If
is a full column-rank matrix, then Eq. (45) can be rewritten in the form
 |
(47) |
where
![\begin{displaymath}\alpha = [ {\vec{1}}^{\rm T} ({\bf {A}}^{\dag })^{\rm T} \Cb_{{\vec{\mathfrak S}}}^{-1} {\bf {A}}^\dag {\vec{1}}]^{-1},
\end{displaymath}](/articles/aa/full/2008/32/aa09345-08/img123.gif) |
(48) |
and
 |
(49) |
and
 |
(50) |
Now, since it is trivially verified that
,
one obtains that
 |
(51) |
and then, from Eq. (45),
.
In other words, if matrix
is
known, in this case the ILC method also provides an exact separation. When
has to be used, results similar to those presented in Sect. 3.1 are obtained.
4 Statistical characteristics of ILC: noisy observations
In situations of maps contaminated by measurement noise
,
assumed to be the realization of zero-mean stationary stochastic processes, model (19) becomes
 |
(52) |
In turn, under the condition of
uncorrelated with
,
becomes
 |
(53) |
with
![\begin{displaymath}{\vec{\Omega}}_{{\vec{\mathcal N}}} = {\rm E}[ {\vec{\mathcal N}}{\vec{\mathcal N}}^{\rm T}].
\end{displaymath}](/articles/aa/full/2008/32/aa09345-08/img130.gif) |
(54) |
If noise is small and
,
inserting Eq. (53) in Eq. (11) and expanding the resulting expression
in a Taylor series up to the linear term, it is possible to see that
 |
(55) |
with
and
.
A similar result holds when
with
substituting
.
From Eq. (55) it is clear that
,
i.e. the ILC estimator is biased. Since both
and
depend on the specific characteristics of
and
,
this does not allow a general
treatment of the question. However, especially in the case of a high level of noise contamination, the bias affecting
is expected to be severe.
It is possible to obtain an unbiased ILC estimator computing the weights
by means of Eq. (11) with the cross-covariance matrix
substituted by
 |
(56) |
This operation, however, has a cost: the weights
no longer minimize
the variance of
.
As a consequence, the influence of noise can be dramatically amplified. The reason for this result lies in the condition number,
,
of
that tends to be larger than
.
Because of this, the entries of
can take values in a range much wider than those of
and
the same holds for the corresponding weights
.
According to a theorem of Weyl (e.g., see Bhatia 1997, Theorem III.2.2), it is
|
|
 |
(57) |
|
|
 |
(58) |
where
denotes the ith eigenvalues of a
symmetric, positive definite matrix
with
.
These inequalities indicate a high probability that
.
In this way,
 |
(59) |
For example, this is strictly true if
is a diagonal matrix with non-zero entries equal to a constant value
,
since
 |
(60) |
From these consideration, in the case of a low
and large
,
one has to expect significant amplification of the noise for the unbiased ILC estimator. However, matrix
is very ill-conditioned only
in situations where the images
are very similar. For example, in CMB applications this could happen if the maps
were obtained at very close frequencies, and this should be considered when planning the experiment. As a consequence, in practical applications it is advisable to check whether
the resulting
permits the use of the unbiased ILC estimator. The benefit is that noise filtering is typically an easier operation than bias removal. Even if
is ill-conditioned, as it happens when
,
its
can be reduced by means of the SVD technique presented in Sect. 3.3.
These conclusions hold only in situations of isotropic noise
(i.e. noise with identical characteristics everywhere). In the contrary case, poor results are to be expected since noise behaves as an additional channel-dependent component.
5 Conclusions
The arguments presented in the previous sections show that for a reliable use of the ILC estimator some conditions have to be satisfied. In particular:
- 1.
- The observations have to cover a sky area much wider than the spatial scale (correlation length) of the observed maps in such a way as to
allow an accurate estimate of the cross-covariance matrix
.
- 2.
- Model (19)-(21) has to hold. In particular, ILC cannot be expected to produce satisfactory results if some of the Galactic templates depend on the observing channels. This point can be realized if a Galactic channel-dependent template
is thought of as the sum of Nj channel-independent templates.
In this case, model (19)-(21) still holds but with an effective number of Galactic components that now is
.
Since the number of observing channel is typically rather limited, in practical applications it has to be expected that
and then the point below applies.
- 3.
- The condition
has to hold, i.e. the number of observing channels has to be equal to or larger than the number of the
physical components (CMB included) that contribute to form the observed maps. In the contrary case, the solution can suffer severe distortion.
- 4.
- In the observed maps the level of the noise has to be rather low otherwise a severe bias can be introduced in the ILC estimator. This can be removed but at the price of an amplification of the noise level in the final result. However, especially in situations of
high
,
the use of the unbiased ILC estimator can offer some advantages.
Thus, using the ILC estimator is not trivial. In particular,
the first two points are in conflict with each other. In the case of wide maps,
model (19)-(21) is not applicable since on large spatial scales the Galactic templates are expected to depend on the
observing channel. In this respect, a simple tests can be of help that is based on the analysis of the eigenvalues of
.
If
,
matrix
has to be (almost) singular with a
number of (almost) zero elements in the diagonal matrix
of Eq. (44) equal to
.
Unfortunately, this
test can be influenced by the noise; because of
the statistical fluctuations, the entries of
that should be close to zero can take larger values.
The obvious conclusion is that, in order to obtain reliable results, the use of ILC requires a careful planning of the observations. The area of the sky to observe as well as the acceptable level of the noise are factors that have to be fixed in advance. An
``a posteriori'' correction of the distortions introduced in the ILC solution by the violation of the above conditions is unreliable as
all of these distortions critically depend on the true solution that one tries to estimate. Therefore, they cannot be obtained
from the data only. The alternative represented by the numerical simulations that make use of semi-empirical templates is also unreliable. In particular,
there is the possibility of introducing spurious features in the final results. This holds also when these results are used as ``priors'' in more sophisticated separation techniques.
As the use of ILC in the Fourier domain provides the same results as those obtainable in the spatial one, it could seem that this approach does not offer particular benefit. This can be false if the maps have different spatial resolutions
(i.e. the observing channels have different point spread functions) and/or whether a frequency-dependent separation is desired (Tegmark et al. 2003).
Although in this way it is possible to improve some properties of the separated maps (e.g., the spatial resolution), this has a cost in the amplification of the noise level.
Acknowledgements
We thank Dr. Carlo Baccigalupi for careful reading of this manuscript.
- Baccigalupi, C.,
Perrotta, F., de Zotti, G., et al. 2004, MNRAS, 354, 55 [NASA ADS] [CrossRef]
- Bhatia, R. 1997, Matrix
Analysis (Berlin: Springer)
(In the text)
- Bennett, C. L., Hill, R.
S., Hinshaw, G., et al. 2003, ApJS, 148, 97 [NASA ADS] [CrossRef]
(In the text)
- Cardoso, J. F. 1999,
Neural Computation, 11, 157 [CrossRef]
- Delabrouille, J., &
Cardoso, J. F. 2007 [arXiv:astro-ph/0702198v2]
(In the text)
- Eriksen, H. K., Banday,
A. J., Górski, K. M., & Lilje, P. B. 2004, ApJ, 612,
633 [NASA ADS] [CrossRef]
(In the text)
- Hinshaw, G., Nolta, M.
R., Bennett, C. L., et al. 2007, ApJS, 170, 288 [NASA ADS] [CrossRef]
- Hyvärinnen, A.,
Karhunen, J., & Oja, E. 2001, Independent Component Analysis
(New York: John Wiley & Sons)
- Maino, D., Farusi, A.,
Baccigalupi, C., et al. 2002, MNRAS, 334, 53 [NASA ADS] [CrossRef]
- Saha, R., et al.
2007 [arXiv:0706.3567v1]
(In the text)
- Stivoli, F., Baccigalupi,
C., Maino, D., & Stompor, R. 2006, MNRAS, 372, 615 [NASA ADS] [CrossRef]
(In the text)
- Tegmark, M., de
Oliveira-Costa, A., & Hamilton, A. J. 2003, Phys. Rev. D.,
68, 123523 [NASA ADS] [CrossRef]
(In the text)
Copyright ESO 2008