Radio astronomical image formation using constrained least squares and Krylov subspaces

Ahmad Mouri Sardarabadi; Amir Leshem; Alle-Jan van der Veen

doi:10.1051/0004-6361/201526214

Home

All issues

Volume 588 (April 2016)

A&A, 588 (2016) A95

Full HTML

Free Access

Issue		A&A Volume 588, April 2016


Article Number		A95
Number of page(s)		19
Section		Astronomical instrumentation
DOI		https://doi.org/10.1051/0004-6361/201526214
Published online		23 March 2016

A&A 588, A95 (2016)

Radio astronomical image formation using constrained least squares and Krylov subspaces^⋆

Ahmad Mouri Sardarabadi¹, Amir Leshem² and Alle-Jan van der Veen¹

¹ Department of electrical engineering, Delft University of Technology, PO Box 5031, 2600 GA Delft, The Netherlands
e-mail: a.mourisardarabadi@tudelft.nl
² Faculty of Engineering, Bar-Ilan University, Aharon and Rachel Dahan Electronic Technology Buildilng, Ramat Gan, Israël

Received: 27 March 2015
Accepted: 27 October 2015

Abstract

Aims. Image formation for radio astronomy can be defined as estimating the spatial intensity distribution of celestial sources throughout the sky, given an array of antennas. One of the challenges with image formation is that the problem becomes ill-posed as the number of pixels becomes large. The introduction of constraints that incorporate a priori knowledge is crucial.

Methods. In this paper we show that in addition to non-negativity, the magnitude of each pixel in an image is also bounded from above. Indeed, the classical “dirty image” is an upper bound, but a much tighter upper bound can be formed from the data using array processing techniques. This formulates image formation as a least squares optimization problem with inequality constraints. We propose to solve this constrained least squares problem using active set techniques, and the steps needed to implement it are described. It is shown that the least squares part of the problem can be efficiently implemented with Krylov-subspace-based techniques. We also propose a method for correcting for the possible mismatch between source positions and the pixel grid. This correction improves both the detection of sources and their estimated intensities. The performance of these algorithms is evaluated using simulations.

Results. Based on parametric modeling of the astronomical data, a new imaging algorithm based on convex optimization, active sets, and Krylov-subspace-based solvers is presented. The relation between the proposed algorithm and sequential source removing techniques is explained, and it gives a better mathematical framework for analyzing existing algorithms. We show that by using the structure of the algorithm, an efficient implementation that allows massive parallelism and storage reduction is feasible. Simulations are used to compare the new algorithm to classical CLEAN. Results illustrate that for a discrete point model, the proposed algorithm is capable of detecting the correct number of sources and producing highly accurate intensity estimates.

Key words: techniques: image processing / instrumentation: interferometers / methods: numerical

^⋆

This research was supported by NWO-TOP 2010, 614.00.005. The research of A. Leshem was supported by the Israeli Science foundation, grant 1240−2009.

© ESO, 2016

1. Introduction

Image formation for radio astronomy can be defined as estimating the spatial intensity distribution of celestial sources over the sky. The measurement equation (“data model”) is linear in the source intensities, and the resulting least squares problem has classically been implemented in two steps: formation of a “dirty image”, followed by a deconvolution step. In this process, an implicit model assumption is made that the number of sources is discrete, and subsequently the number of sources has been replaced by the number of image pixels (assuming each pixel may contain a source).

The deconvolution step becomes ill-conditioned if the number of pixels is large (Wijnholds & van der Veen 2008). Alternatively, the directions of sources may be estimated along with their intensities, but this is a complex nonlinear problem. Classically, this has been implemented as an iterative subtraction technique, wherein source directions are estimated from the dirty image, and their contribution is subtracted from the data. This mixed approach is the essence of the CLEAN method proposed by Högbom (Högbom 1974), which was subsequently refined and extended in several ways, leading to the widely used approaches described in Cornwell (2008), Rau et al. (2009), Bhatnager & Cornwell (2004).

The conditioning of the image deconvolution step can be improved by incorporating side information such as non-negativity of the image (Briggs 1995), source model structure beyond simple point sources (e.g., shapelets and wavelets, Reid 2006), sparsity or ℓ₁ constraints on the image (Levanda & Leshem 2008; Wiaux et al. 2009), or a combination of both wavelets and sparsity (Carrillo et al. 2012, 2014). Beyond these, some fundamental approaches based on parameter estimation techniques have been proposed, such as the least squares minimum variance imaging (LS-MVI) (Ben-David & Leshem 2008), maximum-likelihood-based techniques (Leshem & van der Veen 2000), and Bayesian-based techniques (Junklewitz et al. 2016; Lochner et al. 2015). Computational complexity is a concern that has not been addressed in these approaches.

New radio telescopes such as the Low Frequency Array (LOFAR), the Allen Telescope Array (ATA), Murchison Widefield Array (MWA), and the Long Wavelength Array (LWA) are composed of many stations (each station made up of multiple antennas that are combined using adaptive beamforming), and the increase in number of antennas and stations continues in the design of the square kilometer array (SKA). These instruments have or will have a significantly increased sensitivity and a larger field of view compared to traditional telescopes, leading to many more sources that need to be considered. They also need to process larger bandwidths to reach this sensitivity. Besides the increased requirements on the performance of imaging, the improved spatial resolution leads to an increasing number of pixels in the image, and the development of computationally efficient techniques is critical.

To benefit from the vast literature related to solving least squares problems, but also to gain from the nonlinear processing offered by standard deconvolution techniques, we propose to reformulate the imaging problem as a parameter-estimation problem described by a weighted least squares optimization problem with several constraints. The first is a non-negativity constraint, which would lead to the non-negative least squares algorithm (NNLS) proposed in Briggs (1995). But we show that the pixel values are also bounded from above. A coarse upper bound is provided by the classical dirty image, and a much tighter bound is the “minimum variance distortionless response” (MVDR) dirty image that was proposed in the context of radio astronomy in Leshem & van der Veen (2000).

We propose to solve the resulting constrained least squares problems using an active set approach. This results in a computationally efficient imaging algorithm that is closely related to existing nonlinear sequential source estimation techniques such as CLEAN with the benefit of accelerated convergence thanks to tighter upper bounds on the intensity over the complete image. Because the constraints are enforced over the entire image, this eliminates the inclusion of negative flux sources and other anomalies that appear in some existing sequential techniques.

To reduce the computational complexity further, we show that the data model has a Khatri-Rao structure. This can be exploited to significantly improve the data management and parallelism compared to general implementations of least squares algorithms.

The structure of the paper is as follows. In Sect. 2 we describe the basic data model and the image formation problem in Sect. 3. A constrained least squares problem is formulated, using various intensity constraints that take the form of dirty images. The solution of this problem using active set techniques in Sect. 4 generalizes the classical CLEAN algorithm. In Sect. 5 we discuss the efficient implementation of a key step in the active set solution using Krylov subspaces. We end up with some simulated experiments that demonstrate the advantages of the proposed technique and conclusions regarding future implementation.

Notation

A boldface letter such as a denotes a column vector, a boldface capital letter such as A denotes a matrix. Thena_ij = [A] _ij corresponds to the entry of A in the ith row and jth column, a_i = [A] _i is the ith column of A, a_i is the ith element of the vector a, I is an identity matrix of appropriate size, and I_p is a p × p identity matrix.

The symbol (·)^T is the transpose operator, (·)^∗ is the complex conjugate operator, (·)^H the Hermitian transpose, ∥ ·∥ _F the Frobenius norm of a matrix, ∥ . ∥ the two norm of a vector, ℰ {· } the expectation operator, and $\hbox{$\MCN(\bmu,\bSigma)$}$ represents the multivariate complex normal distribution with expected value μ and covariance matrix Σ.

A tilde, $\begin{matrix} ˜ \\ . \end{matrix}$ $\hbox{$\tilde{.}$}$ , denotes parameters and related matrices that depend on the “true” direction of the sources. However, in most of the paper, we work with parameters that are discretized on a grid, in which case we drop the tilde. The grid points correspond to the image pixels and do not necessarily coincide with the actual positions of the sources.

A calligraphic capital letter such as $\hbox{$\MCX$}$ represents a set of indices, and $\hbox{$\ba_{\MCX}$}$ is a column vector constructed by stacking the elements of a that belong to $\hbox{$\MCX$}$ . The corresponding indices are stored with the vector as well (similar to the storage of matlab “sparse” vectors).

The operator vect(·) stacks the columns of the argument matrix to form a vector, vectdiag(·) stacks the diagonal elements of the argument matrix to form a vector, and diag(·) is a diagonal matrix with its diagonal entries from the argument vector (if the argument is a matrix diag(·) = diag(vectdiag(·))).

Let ⊗ denote the Kronecker product, i.e., $\begin{matrix} A \otimes B : = [\begin{matrix} \end{matrix}] . \end{matrix}$ $\begin{eqnarray*} \bA \otimes \bB := \left[\begin{array}{ccc} a_{11} \bB & a_{12} \bB & \cdots\\ a_{21} \bB & a_{22} \bB & \cdots\\ \vdots & \vdots & \ddots \end{array} \right]. \end{eqnarray*}$ Furthermore, ° denotes the Khatri-Rao product (column-wise Kronecker product), i.e., $\begin{matrix} A ◦ B : = [a_{1} \otimes b_{1}, a_{2} \otimes b_{2}, \cdot \cdot \cdot], \end{matrix}$ $\begin{eqnarray*} \bA \circ \bB := [\ba_1\otimes \bb_1, \ba_2 \otimes \bb_2 , \cdots], \end{eqnarray*}$ and ⊙ denotes the Schur-Hadamard (elementwise) product. The following properties are used throughout the paper (for matrices and vectors with compatible dimensions): $\begin{matrix} \end{matrix}$ $\begin{eqnarray} &&\label{kron:1} (\bB^T \otimes \bA)\vect(\bX) =\vect(\bA\bX\bB) \\ &&\label{kron:2} (\bB \otimes \bA)^H =(\bB^H \otimes \bA^H)\\ &&\label{kron:3} (\bB \otimes \bA)^{-1} =(\bB^{-1} \otimes \bA^{-1})\\ &&\label{kron:4} (\bB^T \circ \bA)\bx =\vect(\bA\diag(\bx)\bB)\\ &&\label{kron:5} (\bB\bC \otimes \bA\bD) =(\bB \otimes \bA)(\bC \otimes \bD)\\ &&\label{kron:6} (\bB\bC \circ \bA\bD) =(\bB \otimes \bA)(\bC \circ \bD)\\ &&\label{kron:7} (\bB^H\bC \odot \bA^H\bD) =(\bB \circ \bA)^H(\bC \circ \bD)\\ &&\label{kron:8} \vectdiag(\bA^H\bX\bA) =(\bA^* \circ \bA)^H\vect(\bX). \end{eqnarray}$

2. Data model

We consider an instrument where P receivers (stations or antennas) observe the sky. Assuming a discrete point source model, we let Q denote the number of visible sources. The received signals at the antennas are sampled and subsequently split into narrow sub-bands. For simplicity, we consider only a single sub-band in the rest of the paper. Although the sources are considered stationary, the apparent position of the celestial sources will change with time because of the earth’s rotation. For this reason the data is split into short blocks or “snapshots” of N samples, where the exact value of N depends on the resolution of the instrument.

We stack the output of the P antennas at a single sub-band into a vector y_k [n], where n = 1,··· ,N denotes the sample index, and k = 1,··· ,K denotes the snapshot index. The signals of the qth source arrive at the array with slight delays for each antenna that depend on the source direction and the Earth’s rotation (the geometric delays), and for sufficiently narrow sub-bands these delays become phase shifts,i.e., multiplications by complex coefficients. The coefficients are later stacked into the so-called array response vector. To describe this vector, we first need to define a coordinate system. Assume a fixed coordinate system based on the right ascension (α) and declination (δ) of a source, and define the corresponding direction vector $\begin{matrix} β = [\begin{matrix} \end{matrix}] . \end{matrix}$ $\begin{eqnarray*} \bbeta=\begin{bmatrix} \cos(\delta)\cos(\alpha) \\ \cos(\delta)\sin(\alpha) \\ \sin(\delta) \end{bmatrix}. \end{eqnarray*}$ The related earth-bound direction vector s with coordinates (l,m,n) (taking earth rotation into account) is given by $\begin{matrix} s = Q_{k} (L,B) β, \end{matrix}$ $\begin{eqnarray*} \bs=\bQ_k(L,B)\bbeta, \end{eqnarray*}$ where Q_k(L,B) is a 3 × 3 rotation matrix that accounts for the earth rotation and depends on the time k and the observer’s longitude L and latitude B. Because s has a unit norm, we only need two coordinates (l,m), while the third coordinate can be calculated using $n = \sqrt{1 - l^{2} - m^{2}}$ $\hbox{$n=\sqrt{1-l^2-m^2}$}$ .

For the qth source with coordinates (l_q,m_q) at the kth snapshot, the direction vector is s_q. Let the vector ξ_i = [x_i,y_i,z_i] ^T denote the position of the ith receiving element in earth-bound coordinates. At this position, the phase delay (geometric delay) experienced by the q source is given by the inner product of these vectors, and the effect of this delay on the signal is multiplication by a complex coefficient $a_{kqi} : = \exp (j \frac{2 π}{λ} ξ_{i}^{T} s_{q})$ $\hbox{$a_{kqi} := \exp(j\frac{2\pi}{\lambda} \bxi_i^T \bs_q)$}$ , where λ is the wavelength. Stacking the coefficients for i = 1,··· ,P into a vector a_k,q = a_k(s_q), we obtain the array response vector, which thus has model $a_{k,q} = a_{k} (s_{q}) = \frac{1}{\sqrt{P}} e^{\frac{j 2 π}{λ} Ξ^{T} s_{q}} = \frac{1}{\sqrt{P}} e^{\frac{j 2 π}{λ} Ξ^{T} Q_{k} (L,B) β_{q}}$ $\begin{equation} \ba_{k,q} = \ba_k(\bs_q) =\frac{1}{\sqrt{P}}{\rm e}^{\frac{j 2\pi}{\lambda}\bXi^T\bs_q} =\frac{1}{\sqrt{P}}{\rm e}^{\frac{j 2\pi}{\lambda}\bXi^T\bQ_{k}(L,B)\bbeta_q} \end{equation}$ (9)where Ξ is a 3 × P matrix containing the positions of the P receiving elements. We introduced a scaling by $1 / \sqrt{P}$ $\hbox{$1/\sqrt{P}$}$ as a normalization constant such that ∥ a_k(s_q) ∥ = 1. The entries of the array response vector are connected to the Fourier transform coefficients that are familiar in radio astronomy models.

Assuming an array that is otherwise calibrated, the received antenna signals y_k [n] can be modeled as $\begin{matrix} y_{k} [n] = A_{k} x [n] + n_{k} [n], n = 1, \cdot \cdot \cdot,N \end{matrix}$ $\begin{eqnarray} \label{eq:samplemodel} \by_k[n]=\bA_k\bx[n]+\bn_k[n], \qquad n = 1, \cdots, N \end{eqnarray}$ (10)where A_k is a P × Q matrix whose columns are the array response vectors [A_k] _q = a_k,q, x [n] is a Q × 1 vector representing the signals from the sky, and n_k [n] is a P × 1 vector modeling the noise.

From the data, the system estimates covariance matrices of the input vector at each snapshot k = 1,··· ,K, as ${R̂}_{k} = \frac{1}{N} \sum_{n = 1}^{N} y_{k} [n] y_{k} [n]^{H}, k = 1, \cdot \cdot \cdot,K .$ $\begin{equation} \bRh_{k} = \frac{1}{N} \sum_{n=1}^{N} \by_k[n]\by_k[n]^H, \qquad k = 1, \cdots, K \,. \end{equation}$ (11)Since the received signals and noise are Gaussian, these covariance matrix estimates form sufficient statistics for the imaging problem (Leshem & van der Veen 2000). The covariance matrices are given by $R_{k} = ℰ {y_{k} y_{k}^{H}},$ $\begin{equation} \bR_k=\EExp\{\by_k \by_k^H\} , \end{equation}$ (12)for which the model is $R_{k} = A_{k} Σ A_{k}^{H} + R_{n,k},$ $\begin{equation} \label{eq:cov_def} \bR_k = \bA_k \bSigma\bA_k^H+\bR_{\bn,k}, \end{equation}$ (13)where Σ = ℰ { xx^H } and $R_{n,k} = ℰ {n_{k} n_{k}^{H}}$ $\hbox{$\bR_{\bn,k}=\EExp\{\bn_k\bn_k^H\}$}$ are the source and noise covariance matrices, respectively. We have assumed that sky sources are stationary, and if we also assume that they are independent, we can model Σ = diag(σ) where $σ = [\begin{matrix} σ_{1} & , ..., & σ_{Q} \end{matrix}] T$ $\begin{equation} \bsigma= \begin{bmatrix}\sigma_1 &,\dots, &\sigma_Q\end{bmatrix}^T \end{equation}$ (14)represents the intensity of the sources. To connect the covariance data model (13)to language more familiar to radio astronomers, let us take a closer look at the elements of the matrix R_k. Temporarily ignoring the noise covariance matrix R_n,k, we note that $\begin{matrix} [R_{k}]_{ij} & = & \frac{1}{P} \sum_{q = 1}^{Q} σ_{q} a_{kqi} a_{kqi}^{*} \\ = & \frac{1}{P} \sum_{q = 1}^{Q} σ_{q} e^{j \frac{2 π}{λ} (ξ_{i} - ξ_{j})^{T} s_{q}} \\ = & \frac{1}{P} \sum_{q = 1}^{Q} σ_{q} e^{j \frac{2 π}{λ} [(x_{i} - x_{j}) l_{q} + (y_{i} - y_{j}) m_{q} + (z_{i} - z_{j}) \sqrt{1 - l_{q}^{2} - m_{q}^{2}}] .} \end{matrix}$ $\begin{eqnarray} [\bR_k]_{ij}&=& \frac{1}{P} \sum_{q=1}^Q \sigma_q a_{kqi} a_{kqi}^* \notag \\ &=&\frac{1}{P}\sum_{q=1}^Q \sigma_q {\rm e}^{j\frac{2 \pi}{\lambda} (\bxi_i-\bxi_j)^T\bs_q}\notag \\ &=&\frac{1}{P}\sum_{q=1}^Q \sigma_q {\rm e}^{j\frac{2 \pi}{\lambda} \left [(x_i-x_j)l_q + (y_i-y_j)m_q+(z_i-z_j)\sqrt{1-l^2_q-m^2_q} \right].} \end{eqnarray}$ (15)If we define $\frac{1}{λ} [x_{i} - x_{j}, y_{i} - y_{j}, z_{i} - z_{j}]^{T} = [u_{ij}, v_{ij}, w_{ij}]^{T}$ $\hbox{$\frac{1}{\lambda}[x_i-x_j,y_i-y_j,z_i-z_j]^T = [u_{ij},v_{ij},w_{ij}]^T$}$ , then we can write [R_k] _ij ≡ V(u_ij,v_ij,w_ij), where V(u,v,w) is the visibility function, and (u,v,w) are the spatial frequencies (Leshem et al. 2000). In other words, the entries of the covariance matrix R_k are samples of the visibility function at a given frequency and time arranged in a matrix, and Eq. (13)represents the measurement equation in matrix form.

We can write this equation in several other ways. By vectorizing both sides of (13)and using the properties of Kronecker products (4), we obtain $r_{k} = (A_{k}^{*} ◦ A_{k}) σ + r_{n,k}$ $\begin{equation} \br_k = (\bA_k^* \circ \bA_k)\bsigma + \br_{\bn,k} \end{equation}$ (16)where \begin{lxirformule}$\br_k=\vect(\bR_k)$\end{lxirformule} and r_n,k = vect(R_n,k). After stacking the vectorized covariances for all of the snapshots, we obtain $r = Ψ σ + r n$ $\begin{equation} \label{eq:vectRtotal} \br = \bPsi\bsigma + \br_{n} \end{equation}$ (17)where $r = [\begin{matrix} r_{1} \\ \begin{matrix} . \\ . \\ . \end{matrix} \\ r_{K}, \end{matrix}] Ψ = [\begin{matrix} A_{1}^{*} ◦ A_{1} \\ \begin{matrix} . \\ . \\ . \end{matrix} \\ A_{K}^{*} ◦ A_{K}, \end{matrix}] r_{n} = [\begin{matrix} r_{n, 1} \\ \begin{matrix} . \\ . \\ . \end{matrix} \\ r_{n,K} \end{matrix}] .$ $\begin{equation} \label{eq:vectorized} \br = \begin{bmatrix}\br_{1}\\ \vdots\\\br_{K},\end{bmatrix} \quad \bPsi=\begin{bmatrix} \bA_{1}^* \circ \bA_{1} \\ \vdots \\ \bA_{K}^* \circ \bA_{K},\end{bmatrix} \quad \br_{n}= \begin{bmatrix}\br_{\bn,1}\\ \vdots\\\br_{\bn,K}\end{bmatrix} . \end{equation}$ (18)Similarly, we vectorize and stack the sample covariance matrices as ${r̂}_{k} = vect ({R̂}_{k}), r̂ = [\begin{matrix} {r̂}_{1} \\ \begin{matrix} . \\ . \\ . \end{matrix} \\ {r̂}_{K} \end{matrix}] .$ $\begin{equation} \label{def:brh} \brh_{k}=\vect(\bRh_{k}) ,\qquad \brh = \begin{bmatrix}\brh_{1}\\ \vdots\\\brh_{K}\end{bmatrix} . \end{equation}$ (19)This collects all the available covariance data into a single vector.

Alternatively, we can use the independence between the time samples to write the aggregate data model as $R = [\begin{matrix} R_{1} & ... & 0 \\ \begin{matrix} . \\ . \\ . \end{matrix} & {\begin{matrix} . \end{matrix}}^{.} . & 0 \\ 0 & ... & R_{K} \end{matrix}] = \sum_{q = 1}^{Q} σ_{q} (I_{K} ◦ A^{q}) (I_{K} ◦ A^{q})^{H} + R_{n},$ $\begin{equation} \bR=\begin{bmatrix} \bR_1 &\dots &\zeros \\ \vdots &\ddots &\zeros \\ \zeros &\dots & \bR_K \end{bmatrix} = \sum_{q=1}^Q \sigma_q(\bI_K \circ \bA^q) (\bI_K \circ \bA^q)^H + \bR_{n} \,, \label{eq:blockdiagtilde} \end{equation}$ (20)where $\begin{matrix} R_{n} = [\begin{matrix} \end{matrix}], \\ A^{q} = [\begin{matrix} \end{matrix}], q = 1, \cdot \cdot \cdot,Q . \end{matrix}$ $\begin{eqnarray} && \bR_{n}=\begin{bmatrix} \bR_{\bn,1} &\dots &\zeros \\ \vdots &\ddots &\zeros \\ \zeros &\dots & \bR_{\bn,K} \end{bmatrix}, \\ && \bA^q=\begin{bmatrix} \ba_{1,q} & \dots & \ba_{K,q} \end{bmatrix} , \quad q=1,\cdots,Q. \end{eqnarray}$

3. The imaging problem

Using the data model (17), the imaging problem is to find the intensity, σ, of the sources, along with their directions represented by the matrices A_k, from given sample covariance matrices $\hbox{$\bRh_k, \, k=1,\cdots,K$}$ . As the source locations are generally unknown, this is a complicated (nonlinear) direction-of-arrival estimation problem.

The usual approach in radio astronomy is to define a grid for the image and to assume that each pixel (grid location) contains a source. In this case the source locations are known, and estimating the source intensities is a linear problem, but for high-resolution images the number of sources may be very large. The resulting linear estimation problem is often ill-conditioned unless additional constraints are posed.

3.1. Gridded imaging model

After defining a grid for the image and assuming that a source exists for each pixel location, let I (rather than Q) denote the total number of sources (pixels), σ an I × 1 vector containing the source intensities, and A_k (k = 1,··· ,K) the P × I array response matrices for these sources. The A_k are known, and σ can be interpreted as a vectorized version of the image to be computed.(To distinguish the gridded source locations and source powers from the “true” sources, we later denote parameters and variables that depend on the Q true sources by a tilde.)

For a given observation $\hbox{$\brh$}$ , image formation amounts to the estimation of σ. For a sufficiently fine grid, σ approximates the solution of the discrete source model. However, as we discuss later, working in the image domain leads to a gridding-related noise floor. This is solved by fine adaptation of the location of the sources and estimation of the true locations in the visibility domain.

A consequence of using a discrete source model in combination with a sequential source-removing technique such as CLEAN is the modeling of extended structures in the image by many point sources. As we discuss in Sect. 6, this also holds for the algorithms proposed in this paper.

3.2. Unconstrained least squares image

If we ignore the term r_n, then Eq. (17)directly leads to least squares (LS) and weighted least squares (WLS) estimates of σ (Wijnholds & van der Veen 2008). In particular, solving the imaging problem with LS leads to the minimization problem $\min_{σ} \frac{1}{2 K} ∥ r̂ - Ψ σ ∥^{2},$ $\begin{equation} \label{eq:UCLS} \min_{{\bsigma}} \; \frac{1}{2K}\| \brh - \bAt\bsigma\|^2 \,, \end{equation}$ (23)where the normalization factor 2K is introduced to simplify the expression for the gradient and does not affect the solution. It is straightforward to show that the solution to this problem is given by any σ that satisfies $H_{LS} σ = {σ̂}_{MF}$ $\begin{equation} \label{eq:imagingeq} \HLS\bsigma=\SMFDI \end{equation}$ (24)where we define the “matched filter” (MF, also known as the classical “direct Fourier transform dirty image”) as ${σ̂}_{MF} = \frac{1}{K} Ψ^{H} r̂ = \frac{1}{K} \sum_{k} vectdiag (A_{k}^{H} {R̂}_{k} A_{k}),$ $\begin{equation} \label{eq:mfdimg} \SMFDI=\frac{1}{K}\bAt^H\brh=\frac{1}{K}\sum_k \vectdiag(\bA_k^H\bRh_k\bA_k), \end{equation}$ (25)and the deconvolution matrix H_LS as $H_{LS} = \frac{1}{K} Ψ^{H} Ψ = \frac{1}{K} \sum_{k} (A_{k}^{T} A_{k}^{*}) ⊙ (A_{k}^{H} A_{k}),$ $\begin{equation} \label{eq:convMat} \HLS=\frac{1}{K}\bAt^H\bAt=\frac{1}{K}\sum_k (\bA^T_k\bA^*_k)\odot (\bA^H_k\bA_k), \end{equation}$ (26)where we have used the definition of Ψ from Eq. (18)(with tilde removed) and properties of the Kronecker and Khatri-Rao products. Similarly we can define the WLS minimization as $\min_{σ} \frac{1}{2 K} ∥ ({R̂}^{- T / 2} \otimes {R̂}^{- 1 / 2}) (r̂ - Ψ σ) ∥^{2},$ $\begin{equation} \label{eq:UWCLS} \min_{{\bsigma}} \frac{1}{2K} \| (\bRh^{-T/2}\otimes \bRh^{-1/2})(\brh - \bAt\bsigma) \|^2 \,, \end{equation}$ (27)where the weighting assumes Gaussian distributed observations. The weighting improves the statistical properties of the estimates, and $\hbox{$\bRh$}$ is used instead of R because it is available and asymptotically gives the same optimal results, i.e., convergence to maximum likelihood estimates (Ottersten et al. 1998). The solution to this optimization is similar to the solution to the LS problem and is given by any σ that satisfies $H_{WLS} σ = {σ̂}_{WLS},$ $\begin{equation} \label{eq:imagingeqwls0} \HWLS\bsigma=\SWLSDI , \end{equation}$ (28)where ${σ̂}_{WLS} = \frac{1}{K} Ψ^{H} ({R̂}^{- T} \otimes {R̂}^{-1}) r̂$ $\begin{equation} \label{eq:imagingeqwls} \SWLSDI=\frac{1}{K}\bAt^H(\bRh^{-T}\otimes \bRh^{-1})\brh \end{equation}$ (29)is the “WLS dirty image” and $H_{WLS} = \frac{1}{K} Ψ^{H} ({R̂}^{- T} \otimes {R̂}^{-1}) Ψ,$ $\begin{equation} \label{eq:imagingeqwls2} \HWLS=\frac{1}{K}\bAt^H(\bRh^{-T}\otimes \bRh^{-1})\bAt , \end{equation}$ (30)is the associated deconvolution operator.

A connection to beamforming is obtained as follows. The ith pixel of the “Matched Filter” dirty image in Eq. (25) can be written as $\begin{matrix} σ̂ MF,i = \frac{1}{K} \sum_{k} a_{k,i}^{H} {R̂}_{k} a_{k,i}, \end{matrix}$ $\begin{eqnarray*} \SMFDIind{i} = \frac{1}{K} \sum_k \ba_{k,i}^H \bRh_k \ba_{k,i}, \end{eqnarray*}$ and if we replace $a_{k,i} / \sqrt{K}$ $\hbox{$\ba_{k,i}/\sqrt{K}$}$ by a more general “beamformer” w_k,i, this can be generalized to a more general dirty image $σ_{w,i} = \sum_{k} w_{k,i}^{H} {R̂}_{k} w_{k,i} .$ $\begin{equation} \label{eq:beamimg} \WTDIind{i} = \sum_k \bw_{k,i}^H \bRh_k \bw_{k,i} . \end{equation}$ (31)Here, w_k,i is called a beamformer because we can consider that it acts on the antenna vectors y_k [n] as $z_{k,i} [n] = w_{k,i}^{H} y_{k} [n]$ $\hbox{$z_{k,i}[n] = \bw_{k,i}^H \by_k[n]$}$ , where z_k,i [n] is the output of the (direction-dependent) beamformer, and σ_w,i = ∑ _kℰ { | z_k,i | ² } is interpreted as the total output power of the beamformer, summed over all snapshots. We encounter several such beamformers in the rest of the paper. Most of the beamformers discussed in this paper include the weighted visibility vector (R^{− T} ⊗ R^-1)r. The relation between this weighting and more traditional weighting techniques, such as natural and robust weighting, is discussed in Appendix A.

3.3. Preconditioned weighted least squares image

If Ψ has full column rank, then H_LS and H_WLS are non-singular and a unique solution to LS and WLS exists; for example, the solution to Eq. (24)becomes $σ = H_{LS}^{-1} {σ̂}_{MF} .$ $\begin{equation} \bsigma = \HLS^{-1}\SMFDI . \end{equation}$ (32)Unfortunately, if the number of pixels is large, then H_LS and H_WLS become ill-conditioned or even singular, so that Eqs. (24)and (28)have an infinite number of solutions (Wijnholds & van der Veen 2008). Generally, we need to improve the conditioning of the deconvolution matrices and to find appropriate regularizations.

One way to improve the conditioning of a matrix is to apply a preconditioner. The most widely used and simplest one is the Jacobi preconditioner (Barrett et al. 1994), which for any matrix M, is given by [diag(M)] ^-1. Let D_WLS = diag(H_WLS), then by applying this preconditioner to H_WLS we obtain $[D_{WLS}^{-1} H_{WLS}] σ = D_{WLS}^{-1} {σ̂}_{WLS} .$ $\begin{equation} \label{eq:imagingeqwlspre} [\bD_{\text{WLS}}^{-1}\HWLS]\bsigma=\bD_{\text{WLS}}^{-1}\SWLSDI . \end{equation}$ (33)We take a closer look at $D_{WLS}^{-1} {σ̂}_{WLS}$ $\hbox{$\bD_{\text{WLS}}^{-1}\SWLSDI$}$ for the case where K = 1. In this case, $\begin{matrix} H_{WLS} & = & (A_{1}^{*} ◦ A_{1})^{H} ({R̂}_{1}^{- T} \otimes {R̂}_{1}^{-1}) (A_{1}^{*} ◦ A_{1}) \\ = & (A_{1}^{T} {R̂}_{1}^{- T} A_{1}^{*}) ⊙ (A_{1}^{H} {R̂}_{1}^{-1} A_{1}) \end{matrix}$ $\begin{eqnarray*} \HWLS&=&(\bA_1^*\circ \bA_1)^H(\bRh_1^{-T} \otimes \bRh^{-1}_1)(\bA_1^*\circ \bA_1)\\ &=&(\bA_1^T\bRh_1^{-T}\bA_1^*)\odot(\bA_1^H\bRh_1^{-1}\bA_1) \end{eqnarray*}$ and $\begin{matrix} D_{WLS}^{-1} = [\begin{matrix} \end{matrix}] . \end{matrix}$ $\begin{eqnarray*} \bD_{\text{WLS}}^{-1}=\begin{bmatrix} \frac{1}{(\ba_{1,1}^H\bRh_1^{-1}\ba_{1,1})^2} & & \\ & \ddots & \\ & & \frac{1}{(\ba_{1,I}^H\bRh_1^{-1}\ba_{1,I})^2} \end{bmatrix}. \end{eqnarray*}$ This means that $\begin{matrix} D_{WLS}^{-1} {σ̂}_{WLS} & = & D_{WLS}^{-1} ({R̂}_{1}^{- T} \otimes {R̂}_{1}^{-1}) (A_{1}^{*} ◦ A_{1})^{H} {r̂}_{1} \\ = & ({R̂}_{1}^{- T} A_{1}^{*} D_{WLS}^{- 1 / 2} ◦ {R̂}_{1}^{-1} A_{1} D_{WLS}^{- 1 / 2})^{H} {r̂}_{1}, \end{matrix}$ $\begin{eqnarray*} \bD_{\text{WLS}}^{-1}\SWLSDI&=&\bD_{\text{WLS}}^{-1}(\bRh_1^{-T} \otimes \bRh^{-1}_1)(\bA_1^*\circ \bA_1)^H\brh_1 \\ &=&(\bRh_1^{-T}\bA_1^*\bD_{\text{WLS}}^{-1/2} \circ \bRh_1^{-1}\bA_1\bD_{\text{WLS}}^{-1/2})^H\brh_1, \end{eqnarray*}$ which is equivalent to a dirty image that is obtained by applying a beamformer of the form $w_{i} = \frac{1}{a_{1,i}^{H} {R̂}_{1}^{-1} a_{1,i}} {R̂}_{1}^{-1} a_{1,i}$ $\begin{equation} \bw_i = \frac{1}{\ba_{1,i}^H\bRh_1^{-1}\ba_{1,i}}\bRh_1^{-1}\ba_{1,i} \end{equation}$ (34)to both sides of $\hbox{$\bRh_{1}$}$ and stacking the results, $σ̂_{i} = w_{i}^{H} {R̂}_{1} w_{i}$ $\hbox{$\hat{\sigma}_i = \bw_i^H\bRh_{1}\bw_i$}$ , of each pixel into a vector. This beamformer is known in array processing as the minimum variance distortionless response (MVDR) beamformer (Capon 1969), and the corresponding dirty image is called the MVDR dirty image and was introduced in the radio astronomy context in Leshem & van der Veen (2000). This shows that the preconditioned WLS image (motivated by its connection to the maximum likelihood) is expected to exhibit the features of high-resolution beamforming associated with the MVDR. Examples of such images are shown in Sect. 6.

3.4. Bounds on the image

Another approach to improving the conditioning of a problem is to introduce appropriate constraints on the solution. Typically, image formation algorithms exploit external information regarding the image in order to regularize the ill-posed problem. For example, maximum entropy techniques (Frieden 1972; Gull & Daniell 1978) impose a smoothness condition on the image, while the CLEAN algorithm (Högbom 1974) exploits a point source model wherein most of the image is empty, and this has recently been connected to sparse optimization techniques (Wiaux et al. 2009).

A lower bound on the image is almost trivial: each pixel in the image represents the intensity at a certain direction, so is non-negative. This leads to a lower bound σ ≥ 0. Such a non-negativity constraint has been studied, for example, in (Briggs 1995), resulting in a non-negative LS (NNLS) problem $\begin{matrix} \min_{σ} \frac{1}{2 K} ∥ r̂ - Ψ σ ∥^{2} \\ subjectto 0 \leq σ \end{matrix},$ $\begin{equation} \begin{array}{l} \label{eq:NNLS} \displaystyle\min_{{\bsigma}}\, \frac{1}{2K}\| \brh - \bAt\bsigma\|^2 \\ \hbox{subject to } \vzero \le \bsigma \ena , \end{equation}$ (35)A second constraint follows if we also know an upper bound γ such that σ ≤ γ, which will bound the pixel intensities from above. We propose several choices for γ.

By closer inspection of the ith pixel of the MF dirty image $\hbox{$\SMFDI$}$ , we note that its expected value is given by $\begin{matrix} σ_{MF,i} = \frac{1}{K} \sum_{k} a_{k,i}^{H} R_{k} a_{k,i} . \end{matrix}$ $\begin{eqnarray*} \MFDIind{i} =\frac{1}{K}\sum_k\ba^H_{k,i} \bR_k \ba_{k,i} \,. \end{eqnarray*}$ Using $a_{i} = vect (A^{i}) = {[\begin{matrix} a_{1,i}^{T} & ... & a_{i,K}^{T} \end{matrix}]}^{T},$ $\begin{equation} \label{eq:longa} \ba_i = \vect(\bA^i) = \begin{bmatrix}\ba_{1,i}^T & \dots & \ba_{i,K}^T \end{bmatrix}^T, \end{equation}$ (36)and the normalization $a_{k,i}^{H} a_{k,i} = 1$ $\hbox{$\ba_{k,i}^H \ba_{k,i} = 1$}$ , we obtain $σ_{MF,i} = \frac{1}{K} a_{i}^{H} R a_{i} = σ_{i} + \frac{1}{K} a_{i}^{H} R_{r} a_{i},$ $\begin{equation} \label{eq:MFupper} \MFDIind{i} =\frac{1}{K}\ba_{i}^H\bR\ba_i=\sigma_{i}+\frac{1}{K}\ba_{i}^H\bR_r\ba_i, \end{equation}$ (37)where $R_{r} = \sum_{j \neq i} σ_{j} (I_{K} ◦ A^{j}) (I_{K} ◦ A^{j})^{H} + R n$ $\begin{equation} \bR_{r}=\sum_{j \neq i} \sigma_j (\bI_K \circ \bA^j)(\bI_K \circ \bA^j)^H + \bR_{n} \end{equation}$ (38)is the contribution of all other sources and the noise. We note that R_r is positive-(semi)definite. Thus, Eq. (37) implies σ_MF,i ≥ σ_i which means that the expected value of the MF dirty image forms an upper bound for the desired image, or $σ \leq σ_{MF} .$ $\begin{equation} \label{eq:mfupper} \bsigma \leq \MFDI . \end{equation}$ (39)Using the relation between the MF dirty image and beamformers as discussed in Sect. 3.2, we answer the following question: What is the tightest upper bound for σ_i that we can construct using linear beamforming?

This question can be translated into an optimization problem and a closed form solution (Appendix B) exists: $σ_{opt,i} = \min_{k} (\frac{1}{a_{k,i}^{H} R_{k}^{-1} a_{k,i}}) \cdot$ $\begin{equation} \label{eq:ASSC0} \sigma_{\text{opt},i}=\min_k \left(\frac{1}{\ba_{k,i}^H\bR_k^{-1}\ba_{k,i}}\right)\cdot \end{equation}$ (40)Here σ_opt,i is the tightest upper bound, and the beamformer that achieves this bound is called the adaptive selective sidelobe canceller (ASSC; Levanda & Leshem 2013).

One problem with using this result in practice is that σ_opt,i depends on a single snapshot. Actual dirty images are based on the sample covariance matrix $\hbox{$\bRh$}$ , so they are random variables. If we use a sample covariance matrix $\hbox{$\bRh$}$ instead of the true covariance matrix R in Eq. (40), the variance of the result can be unacceptably large. An analysis of this problem and various solutions for it are discussed in Levanda & Leshem (2013).

To reduce the variance we tolerate an increase of the bound with respect to the tightest upper bound, however, we would like our result to be tighter than the MF dirty image. It can be shown that the MVDR dirty image defined as $σ_{MVDR,i} = \frac{1}{\frac{1}{K} \sum_{k} a_{k,i}^{H} R_{k}^{-1} a_{k,i}},$ $\begin{equation} \MVDRDIind{i} = \frac{1}{\frac{1}{K}\sum_k\ba_{k,i}^H\bR_{k}^{-1}\ba_{k,i}} , \end{equation}$ (41)satisfies σ_i ≤ σ_MVDR,i ≤ σ_MF,i and produces a very tight bound (see Appendix B for the proof). This leads to the following constraint $σ \leq σ_{MVDR} .$ $\begin{equation} \label{eq:mvdr_upper} \bsigma \le \MVDRDI . \end{equation}$ (42)Interestingly, for K = 1 the MVDR dirty image is the same image as we obtained earlier by applying a Jacobi preconditioner to the WLS problem.

3.5. Estimation of the upper bound from noisy data

The upper bounds (39)and (42) assume that we know the true covariance matrix R. However, in practice we only measure $\hbox{$\bRh$}$ , which is subject to statistical fluctuations. Choosing a confidence level of six times the standard deviation of the dirty images ensures that the upper bound will hold with a probability of 99.9%.

This leads to an increase in the upper bound by a factor 1 + α where α> 0 is chosen such that $σ \leq (1 + α) {σ̂}_{MF} .$ $\begin{equation} \label{eq:noisymfup} \bsigma \leq (1+\alpha) ~ \SMFDI. \end{equation}$ (43)Similarly, for the MVDR dirty image, the constraint based on $\hbox{$\bRh$}$ is $σ \leq (1 + α) {σ̂}_{MVDR},$ $\begin{equation} \label{eq:noisymvdrup} \bsigma \leq (1+\alpha) ~ \UMVDRDI , \end{equation}$ (44)where $σ̂ MVDR,i = \frac{C}{\frac{1}{K} \sum_{k} a_{k,i}^{H} {R̂}_{k}^{-1} a_{k,i}}$ $\begin{equation} \UMVDRDIind{i} = \frac{C}{\frac{1}{K}\sum_k\ba_{k,i}^H\bRh_{k}^{-1}\ba_{k,i}} \end{equation}$ (45)is an unbiased estimate of the MVDR dirty image, and $C = \frac{N}{N - P}$ $\begin{equation} C=\frac{N}{N-P} \end{equation}$ (46)is a bias correction constant. With some algebra the unbiased estimate can be written in vector form as ${σ̂}_{MVDR} = D^{-1} Ψ^{H} ({R̂}^{- T} \otimes {R̂}^{-1}) r̂,$ $\begin{equation} \label{eq:UBMVDRV} \UMVDRDI=\bD^{-1}\bAt^H(\bRh^{-T} \otimes \bRh^{-1})\brh, \end{equation}$ (47)where $D = \frac{1}{KC} {diag}^{2} (A^{H} {R̂}^{-1} A^{)},$ $\begin{equation} \label{eq:D} \bD =\frac{1}{KC}\diag^2\left( \bA^H\bRh^{-1}\bA\right), \end{equation}$ (48)and $\begin{matrix} A & = & {[\begin{matrix} \end{matrix}]}^{T} \\ = & [\begin{matrix} \end{matrix}] . \end{matrix}$ $\begin{eqnarray} \bA&=&\begin{bmatrix}\bA_1^T & \dots &\bA_K^T\end{bmatrix}^T \notag \\ &=&\begin{bmatrix} \ba_1 & \dots & \ba_I \end{bmatrix}. \end{eqnarray}$ (49)The exact choice of α and C are discussed in Appendix C.

3.6. Constrained least squares imaging

Now that we have lower and upper bounds on the image, we can use these as constraints in the LS imaging problem to provide a regularization. The resulting constrained LS (CLS) imaging problem is $\begin{matrix} \min_{σ} \frac{1}{2 K} ∥ r̂ - Ψ σ ∥^{2} \\ s . t . 0 \leq σ \leq γ \end{matrix},$ $\begin{equation} \label{eq:CLS} \begin{array}{l} \displaystyle\min_{{\bsigma}} \frac{1}{2K}\| \brh - \bAt\bsigma\|^2 \\ \hbox{s.t. } \vzero \le \bsigma \le \bgamma \end{array} , \end{equation}$ (50)where γ can be chosen either as γ = σ_MF for the MF dirty image or γ = σ_MVDR for the MVDR dirty image (or their sample covariance based estimates given by formulae (43)and (44)).

The improvements to the unconstrained LS problem that were discussed in Sect. 3.2 are still applicable. The extension to WLS leads to the cost function

$f_{WLS} (σ) = \frac{1}{2} ∥ ({R̂}^{- T / 2} \otimes {R̂}^{- 1 / 2}) (r̂ - Ψ σ^{)} ∥^{2} .$ $\begin{equation} \label{def:vecfWLS} f_{\text{WLS}}(\bsigma)=\frac{1}{2}\| (\bRh^{-T/2} \otimes \bRh^{-1/2}) \left(\brh-\bAt\bsigma\right)\|^2 . \end{equation}$ (51)The constrained WLS problem is then given by $\begin{matrix} \min_{σ} f_{WLS} (σ) \\ s . t . 0 \leq σ \leq γ . \end{matrix}$ $\begin{equation} \label{eq:WLS} \begin{array}{ll} \displaystyle\min_{{\bsigma}} f_{\text{WLS}}(\bsigma) \\ \hbox{s.t. } \vzero \le \bsigma \le \bgamma . \end{array} \end{equation}$ (52)We also recommend including a preconditioner that, as shown in Sect. 3.3, relates the WLS to the MVDR dirty image. However, because of the inequality constraints, (52)does not have a closed form solution, and it is solved by an iterative algorithm. To have the relation between WLS and MVDR dirty image during the iterations, we introduce a change of variable of the form $\begin{matrix} _{ˇ} \\ σ \end{matrix} = D σ$ $\hbox{$\PWLSsigma=\bD \bsigma$}$ , where $\begin{matrix} ˇ \\ σ \end{matrix}$ $\hbox{$\PWLSsigma$}$ is the new variable for the preconditioned problem and the diagonal matrix D is given in (48). The resulting constrained preconditioned WLS (CPWLS) optimization problem is $\begin{matrix} \begin{matrix} ˇ \\ σ \end{matrix} = & \arg \min_{\begin{matrix} ˇ \\ σ \end{matrix}} \frac{1}{2} ∥ ({R̂}^{- T / 2} \otimes {R̂}^{- 1 / 2}) (r̂ - Ψ D^{-1} {\begin{matrix} ˇ \\ σ \end{matrix}}^{)} ∥^{2} \\ s . t . 0 \leq \begin{matrix} ˇ \\ σ \end{matrix} \leq D γ \end{matrix},$ $\begin{equation} \begin{array}{rl} \label{eq:preWLS} \PWLSsigma=& \displaystyle\arg \min_{\PWLSsigma} \frac{1}{2}\| (\bRh^{-T/2} \otimes \bRh^{-1/2}) \left(\brh-\bAt\bD^{-1}\PWLSsigma\right)\|^2 \\ & \hbox{s.t. } \vzero \le\PWLSsigma\le \bD \bgamma \end{array} , \end{equation}$ (53)and the final image is found by setting $σ = D^{-1} \begin{matrix} _{ˇ} \\ σ \end{matrix}$ $\hbox{$\bsigma=\bD^{-1} \PWLSsigma$}$ . (Here we use D as a positive diagonal matrix so that the transformation to an upper bound for $\begin{matrix} ˇ \\ σ \end{matrix}$ $\hbox{$\PWLSsigma$}$ is correct.) Interestingly, the dirty image that follows from the (unconstrained) WLS part of the problem is given by the MVDR image $\hbox{$\UMVDRDI$}$ in (47).

4. Constrained optimization using an active set method

The constrained imaging formulated in the previous section requires the numerical solution of the optimization problems (50) or (53). The problem is classified as a positive definite quadratic program with simple bounds, this is a special case of a convex optimization problem with linear inequality constraints, and we can follow standard approaches to find a solution (Gill et al. 1981; Boyd & Vandenberghe 2004).

For an unconstrained optimization problem, the gradient of the cost function calculated at the solution must vanish. If we are not yet at the optimum in an iterative process, the gradient is used to update the current solution. For constrained optimization, the constraints are usually added to the cost function using (unknown) Lagrange multipliers that need to be estimated along with the solution. At the solution, part of the gradient of the cost function is not zero but related to the nonzero Lagrange multipliers. For inequality constraints, the sign of the Lagrange multipliers plays an important role.

As we show here, these characteristics of the solution (based on the gradient and the Lagrange multipliers) can be used to develop an algorithm called the active set method, which is closely related to the sequential source removing techniques such as CLEAN.

In this section, we use the active set method to solve the constrained optimization problem.

4.1. Characterization of the optimum

Let $\bar{σ}$ $\hbox{$\bar{\bsigma}$}$ be the solution to the optimization problem (50) or (53). An image is called feasible if it satisfies the bounds σ ≥ 0 and −σ ≥ −γ. At the optimum, some pixels may satisfy a bound with equality, and these are called the “active” pixels.

We use the following notation. For any feasible image σ, let $\begin{matrix} ℒ (σ) & = & {i | σ_{i} = 0} \\ 𝒰 (σ) & = & {i | σ_{i} = γ_{i}} \\ 𝒜 (σ) & = & ℒ (σ) \cup 𝒰 (σ) \\ ℱ (σ) & = & ℐ \ 𝒜 (σ) . \end{matrix}$ $\begin{eqnarray} \MCL(\bsigma)&=&\{i\,|\,\sigma_{i} = 0\}\\ \MCU(\bsigma)&=&\{i\,|\,\sigma_{i} = \gamma_i\}\\ \MCA(\bsigma)&=&\MCL(\bsigma) \cup \MCU(\bsigma)\\ \label{eq:freesetF} \MCF(\bsigma)&=&\MCI \setminus \MCA(\bsigma) . \end{eqnarray}$ Here, ℐ = { 1,··· ,I } is the set of all pixel indices; ℒ(σ) is the set where the lower bound is active, i.e., the pixel value is 0; $\hbox{$\MCU(\bsigma)$}$ is the set of pixels that attain the upper bound; $\hbox{$\MCA(\bsigma)$}$ is the set of all pixels where one of the constraints is active. These are the active pixels. The free set ℱ(σ) is the set of pixels i, which have values strictly between 0 and γ_i. Furthermore, for any vector v = [v_i], let v_ℱ correspond to the subvector with indices i ∈ ℱ, and similarly define v_ℒ and $\hbox{$\bv_\MCU$}$ . We write $\hbox{$\bv = \bv_\MCF \oplus \bv_\MCL \oplus \bv_\MCU$}$ .

Let $\bar{σ}$ $\hbox{$\bar{\bsigma}$}$ be the optimum, and let $\bar{g} = g (\bar{σ})$ $\hbox{$\bar{\bg} = \bg(\bar{\bsigma})$}$ be the gradient of the cost function at this point. Define the free sets and active sets $\hbox{$\MCF, \MCL, \MCU$}$ at $\bar{σ}$ $\hbox{$\bar{\bsigma}$}$ . We can write $\bar{g} = \bar{g}_{ℱ} \oplus \bar{g}_{ℒ} \oplus \bar{g}_{𝒰}$ $\hbox{$\bar{\bg} = \bar{\bg}_\MCF \oplus \bar{\bg}_\MCL \oplus \bar{\bg}_\MCU$}$ . Associated with the active pixels of $\bar{σ}$ $\hbox{$\bar{\bsigma}$}$ is a vector $\bar{λ} = \bar{λ}_{ℒ} \oplus \bar{λ}_{𝒰}$ $\hbox{$\bar{\blambda} = \bar{\blambda}_\MCL \oplus \bar{\blambda}_\MCU$}$ of Lagrange multipliers. Optimization theory (Gill et al. 1981) tells us that the optimum $\bar{σ}$ $\hbox{$\bar{\bsigma}$}$ is characterized by the following conditions: $\begin{matrix} g_{ℱ} (\bar{σ}) = 0 \\ \bar{λ} ℒ = \bar{g} ℒ \geq 0 \\ \bar{λ} 𝒰 = - \bar{g} 𝒰 \geq 0. \end{matrix}$ $\begin{eqnarray} \label{eq:gradFcond} & & \bg_{\MCF}(\bar{\bsigma}) = \zeros \\ \label{eq:etaL} & & \bar{\blambda}_\MCL = \bar{\bg}_{\MCL} \ge \zeros \\ \label{eq:etaU} && \bar{\blambda}_\MCU = -\bar{\bg}_{\MCU} \ge \zeros . \end{eqnarray}$ Thus, the part of the gradient corresponding to the free set is zero, but the part of the gradient corresponding to the active pixels is not necessarily zero. Since we have simple bounds, this part becomes equal to the Lagrange multipliers $\bar{λ}_{ℒ}$ $\hbox{$\bar{\blambda}_\MCL$}$ and $- \bar{λ}_{𝒰}$ $\hbox{$-\bar{\blambda}_\MCU$}$ (the negative sign is caused by the condition $\hbox{$-\bsigma_\MCU \ge -\bgamma_\MCU$}$ ). The condition λ ≥ 0 is crucial: a negative Lagrange multiplier would indicate that there is a feasible direction of descent p for which a small step in that direction, $\bar{σ} + μ p$ $\hbox{$\bar{\bsigma} + \mu \bp$}$ , has a lower cost and still satisfies the constraints, thus contradicting optimality of $\bar{σ}$ $\hbox{$\bar{\bsigma}$}$ (Gill et al. 1981).

“Active set” algorithms consider that if the true active set at the solution is known, the optimization problem with inequality constraints reduces to an optimization with equality constraints, $\begin{matrix} z & = & \arg \min_{σ} f (σ) \\ s . t . & σ_{ℒ} = 0, σ_{𝒰} = γ_{𝒰} . \end{matrix}$ $\begin{eqnarray} \label{eq:subproblem} \bz &= & \arg\min_{\bsigma}f(\bsigma)\\ \text{s.t.} ~&& \bsigma_{{\MCL}}=\zeros ,\; \bsigma_{{\MCU}}=\bgamma_{\MCU} . \notag \end{eqnarray}$ (61)Since we can substitute the values of the active pixels into σ, the problem becomes a standard unconstrained LS problem with a reduced dimension: only $\bar{σ}_{ℱ}$ $\hbox{$\bar{\bsigma}_\MCF$}$ needs to be estimated. Specifically, for CLS the unconstrained subproblem is formulated as $f (σ) = \frac{1}{2 K} ∥ b_{LS} - Ψ_{ℱ} σ_{ℱ} ∥ 2$ $\begin{equation} \label{eq:subproblemLS} f(\bsigma) = \frac{1}{2K}\|\bb_{\text{LS}}-\bAt_{\MCF}\bsigma_{\MCF} \|^2 \end{equation}$ (62)where $b_{LS} = r̂ - Ψ_{𝒰} σ_{𝒰} .$ $\begin{equation} \label{eq:bsubproblemLS} \bb_{\text{LS}}=\brh-\bAt_{\MCU}\bsigma_{\MCU}. \end{equation}$ (63)Similarly, for CPWLS we have $f (\begin{matrix} ˇ \\ σ \end{matrix}) = \frac{1}{2} {\begin{matrix} ∥ \\ ∥ \\ ∥ \\ ∥ \end{matrix} b_{PWLS} - ({R̂}^{- T / 2} \otimes {R̂}^{- 1 / 2}) (Ψ D^{-1})_{ℱ} {\begin{matrix} ˇ \\ σ \end{matrix}}_{ℱ} \begin{matrix} ∥ \\ ∥ \\ ∥ \\ ∥ \end{matrix}}^{2},$ $\begin{equation} \label{eq:subproblemMVDR} f(\PWLSsigma) = \frac{1}{2}\left \|\bb_{\text{PWLS}}- \left (\bRh^{-T/2}\otimes \bRh^{-1/2} \right) (\bAt\bD^{-1})_{\MCF}\PWLSsigma_{\MCF}\right \|^2 , \end{equation}$ (64)where $b_{PWLS} = ({R̂}^{- T / 2} \otimes {R̂}^{- 1 / 2}) (r̂ - (Ψ D^{-1})_{𝒰} {\begin{matrix} ˇ \\ σ \end{matrix}}_{𝒰}) .$ $\begin{equation} \label{eq:bsubproblemMVDR} \bb_{\text{PWLS}}=\left (\bRh^{-T/2}\otimes \bRh^{-1/2} \right) (\brh-(\bAt\bD^{-1})_{\MCU}\PWLSsigma_{\MCU}) . \end{equation}$ (65)In both cases, closed form solutions can be found, and we discuss a suitable Krylov-based algorithm for this in Sect. 5.

As a result, the essence of the constrained optimization problem is to find ℒ, $\hbox{$\MCU$}$ , and ℱ. In the literature, algorithms for this are called “active set methods”, and we propose a suitable algorithm in Sect. 4.3.

4.2. Gradients

We first derive expressions for the gradients required for each of the unconstrained subproblems (62) and (64). Generically, a WLS cost function (as function of a real-valued parameter vector θ) has the form $f (θ)_{WLS} = β ∥ G^{1 / 2} c (θ) ∥^{2} = β c (θ)^{H} G c (θ)$ $\begin{equation} f(\btheta)_{\text{WLS}} = \beta \|\bG^{1/2}\bc(\btheta)\|^2 = \beta\bc(\btheta)^H\bG\bc(\btheta) \end{equation}$ (66)where G is a Hermitian weighting matrix and β is a scalar. The gradient of this function is $g (θ) = 2 β {(\frac{\partial c}{\partial θ^{T}})}^{H} G c .$ $\begin{equation} \label{eq:gWLS} \bg(\btheta) = 2\beta\left(\frac{\partial \bc}{\partial \btheta^T}\right)^H\bG\bc \,. \end{equation}$ (67)For LS we have θ = σ, $\hbox{$\bc=\brh-\bAt\bsigma$}$ , $β = \frac{1}{2 K}$ $\hbox{$\beta=\frac{1}{2K}$}$ , and G = I. This leads to $\begin{matrix} g_{LS} (σ) & = & - \frac{1}{K} Ψ^{H} (r̂ - Ψ σ) \\ = & H_{LS} σ - {σ̂}_{MF} . \end{matrix}$ $\begin{eqnarray} \label{eq:gls} \bg_{\text{LS}}(\bsigma) & =&-\frac{1}{K}\bAt^H(\brh-\bAt\bsigma) \notag \\ &=&\bH_{\text{LS}}\bsigma-\SMFDI. \end{eqnarray}$ (68)For PWLS, $θ = \begin{matrix} _{ˇ} \\ σ \end{matrix}$ $\hbox{$\btheta=\PWLSsigma$}$ , $c = r̂ - Ψ D^{-1} \begin{matrix} _{ˇ} \\ σ \end{matrix}$ $\hbox{$\bc=\brh-\bAt\bD^{-1}\PWLSsigma$}$ , $β = \frac{1}{2}$ $\hbox{$\beta=\frac{1}{2}$}$ , and $\hbox{$\bG=\bRh^{-T} \otimes \bRh^{-1}$}$ . Substituting these into Eq. (67), we obtain $\begin{matrix} g_{PWLS} (\begin{matrix} ˇ \\ σ \end{matrix}) & = & - D^{-1} Ψ^{H} ({R̂}^{- T} \otimes {R̂}^{-1}) (r̂ - Ψ D^{-1} \begin{matrix} ˇ \\ σ \end{matrix}) \\ = & H_{PWLS} \begin{matrix} ˇ \\ σ \end{matrix} - {σ̂}_{MVDR} \end{matrix}$ $\begin{eqnarray} \label{eq:gmvdr} \bg_{\text{PWLS}}(\PWLSsigma) &=&-\bD^{-1}\bAt^H(\bRh^{-T} \otimes \bRh^{-1})(\brh-\bAt\bD^{-1}\PWLSsigma) \notag \\ &=&\bH_{\text{PWLS}}\PWLSsigma-\UMVDRDI \end{eqnarray}$ (69)where $H_{PWLS} = D^{-1} Ψ^{H} ({R̂}^{- T} \otimes {R̂}^{-1}) Ψ D^{-1},$ $\begin{equation} \bH_{\text{PWLS}} = \bD^{-1}\bAt^H(\bRh^{-T} \otimes \bRh^{-1})\bAt\bD^{-1}, \end{equation}$ (70)and we used Eq. (47).

An interesting observation is that the gradients can be interpreted as residual images obtained by subtracting the dirty image from a convolved model image. At a later point, this will allow us to relate the active set method to sequential source removing techniques.

4.3. Active set methods

In this section, we describe the steps needed to find the sets ℒ, $\hbox{$\MCU$}$ and, ℱ, and the solution. We follow the template algorithm proposed in Gill et al. (1981). The algorithm is an iterative technique where we gradually improve on an image. Let the image at iteration j be denoted by σ^(j) where j = 1,2,··· , and we always ensure this is a feasible solution (satisfies 0 ≤ σ^(j) ≤ γ). The corresponding gradient is the vector g = g(σ^(j)), and the current estimate of the Lagrange multipliers λ is obtained from g using (59) and (60). The sets ℒ, $\hbox{$\MCU$}$ , and ℱ are current estimates that are not yet necessarily equal to the true sets.

If this image is not yet the true solution, it means that one of the conditions in Eqs. (58)−(60) is violated. If the gradient corresponding to the free set is not yet zero (g_ℱ ≠ 0), then this is remedied by recomputing the image from the essentially unconstrained subproblem (61). It may also happen that some entries of λ are negative. This implies that we do not yet have the correct sets ℒ, $\hbox{$\MCU$}$ , and ℱ. Suppose λ_i< 0. The connection of λ_i to the gradient indicates that the cost function can be reduced in that dimension without violating any constraints (Gill et al. 1981), at the same time making the pixel no longer active. Thus we remove the ith pixel from the active set, add it to the free set, and recompute the image with the new equality constraints using Eq. (61). As discussed later, a threshold ϵ is needed in the test for the negativity of λ_i, therefore this step is called the “detection problem”.

Table 1 summarizes the resulting active set algorithm and describes how the solution z to the subproblem is used at each iteration. Some efficiency is obtained by not computing the complete gradient g at every iteration, but only the parts corresponding to $\hbox{$\MCL, \MCU$}$ , when they are needed. For the part corresponding to ℱ, we use a flag that indicates whether g_ℱ is zero or not.

Table 1

Constrained LS imaging using active sets.

The iterative process is initialized in line 1. This can be done in many ways. As long as the initial image lies within the feasible region (0 ≤ σ⁽⁰⁾ ≤ γ), the algorithm will converge to a constrained solution. We can simply initialize by σ⁽⁰⁾ = 0.

Line 3 is a test for convergence, corresponding to the conditions (58)−(60). The loop is followed while any of the constraints is violated.

If g_ℱ is not zero, then the unconstrained subproblem (61) is solved in line 5. If this solution z satisfies the feasibility constraints, then it is kept, the image is updated accordingly, and the gradient is estimated at the new solution (only λ_min = min(λ) is needed, along with the corresponding pixel index).

If z is not feasible, then in lines 12−16 we try to move in the direction of z as far as possible. The direction of descent is $p = z - σ_{ℱ}^{(j)}$ $\hbox{$\bp = \bz - \bsigma_\MCF^{(j)}$}$ , and the update will be $σ_{ℱ}^{(j + 1)} = σ_{ℱ}^{(j)} + μ p$ $\hbox{$\bsigma_{\MCF}^{(j+1)}= \bsigma_{\MCF}^{(j)}+\mu\bp$}$ , where μ is a non-negative step size. The ith pixel will hit a bound if either $σ_{i}^{(j)} + μ p_{i} = 0$ $\hbox{$\sigma_i^{(j)} + \mu p_i = 0$}$ or $σ_{i}^{(j)} + μ p_{i} = γ_{i}$ $\hbox{$\sigma_i^{(j)} + \mu p_i = \gamma_i$}$ ; i.e., if $μ_{i} = \begin{matrix} \max \end{matrix} (- \frac{σ_{i}^{(j)}}{p_{i}}, \frac{γ_{i} - σ_{i}^{(j)}}{p_{i}})$ $\begin{equation} \label{eq:mumax} \mu_i=\max \left(-\frac{\sigma_i^{(j)}}{p_i} , \frac{\gamma_i-\sigma_i^{(j)}}{p_i}\right) \end{equation}$ (71)(μ_i is non-negative). Then the maximal feasible step size towards a constraint is given by μ_max = min(μ_i) for i ∈ ℱ. The corresponding pixel index is removed from ℱ and added to ℒ or $\hbox{$\MCU$}$ .

If in line 3 the gradient satisfied g_ℱ = 0 but a Lagrange multiplier is negative, we delete the corresponding constraint and add this pixel index to the free set (line 20). After this, the loop is entered again with the new constraint sets.

If we initialize the algorithm with σ⁽⁰⁾ = 0, then all pixel indices will be in the set ℒ, and the free set is empty. During the first iteration, σ_ℱ remains empty but the gradient is computed (line 9). Equations (68)and (69)show that it will be equal to the negated dirty image. Thus the minimum of the Lagrange multipliers λ_min will be the current strongest source in the dirty image, and it will be added to the free set when the loop is entered again. This shows that the method as described above will lead to a sequential source removal technique similar to CLEAN. In particular, the PWLS cost function (69)relates to LS-MVI (Ben-David & Leshem 2008), which applies CLEAN-like steps to the MVDR dirty image.

In line 3, we try to detect whether a pixel should be added to the free set (λ_min< 0). We note that λ follows from the gradient, (68)or (69), which is a random variable. We should avoid the occurrence of a “false alarm”, because it will lead to overfitting the noise. Therefore, the test should be replaced by λ_min< −ϵ, where ϵ> 0 is a suitable detection threshold. Because the gradients are estimated using dirty images, they share the same statistics (the variance of the other component in Eqs. (68)and (69)is much smaller). To reach a desired false alarm rate, we propose to choose ϵ proportional to the standard deviation of the ith pixel on the corresponding dirty image for the given cost function. (How to estimate the standard deviation of the dirty images and the threshold is discussed in Appendix C.) Choosing ϵ to be six times the standard deviation ensures a false alarm of < 0.1% over the complete image.

The use of this statistic improves the detection and thus the estimates greatly, however the correct detection also depends on the quality of the estimates in the previous iterations. If a strong source is off-grid, the source is usually underestimated, and this leads to a biased estimation of the gradient and the Lagrange multipliers, which in turn leads to including pixels that are not real sources. In the next section we describe one possible solution for this case.

4.4. Strong off-grid sources

In this section, we use a tilde to indicate “true” source parameters (as distinguished from the gridded source model); for example, $\begin{matrix} ˜ \\ σ \end{matrix}$ $\hbox{$\bsigmat$}$ indicates the vector with the true source intensities, and $\begin{matrix} ˜ \\ Σ \end{matrix}$ $\hbox{$\boldsymbol{\tilde{\Sigma}}$}$ the corresponding diagonal matrix, ${\begin{matrix} ˜ \\ a \end{matrix}}_{k,q}$ $\hbox{$\bat_{k,q}$}$ indicates their array response vectors and ${\begin{matrix} ˜ \\ A \end{matrix}}_{k}$ $\hbox{$\bAtt_k$}$ the corresponding matrix. The versions without tilde refers to the I gridded sources.

The mismatch between Ψ and the unknown $\begin{matrix} ˜ \\ Ψ \end{matrix}$ $\hbox{$\bPsit$}$ results in underestimating source intensities, which means that the remaining contribution of that source produces bias and possible artifacts in the image. To achieve high dynamic ranges, we suggest finding a grid correction for the pixels in the free set ℱ.

Let a_k,i have the same model as ${\begin{matrix} ˜ \\ a \end{matrix}}_{k,q}$ $\hbox{$\bat_{k,q}$}$ with β_i pointing toward the center of the ith pixel. When a source is within a pixel but not exactly in the center, we can model this mismatch as $\begin{matrix} {\begin{matrix} ˜ \\ a \end{matrix}}_{k,q} & = & \frac{1}{\sqrt{P}} e^{\frac{j 2 π}{λ} Ξ^{T} Q_{k} (β_{i} + δ_{i})} \\ = & a_{k,i} ⊙ e^{\frac{j 2 π}{λ} Ξ^{T} Q_{k} δ_{i}} \end{matrix}$ $\begin{eqnarray} \bat_{k,q}&=&\frac{1}{\sqrt{P}}{\rm e}^{\frac{j2\pi}{\lambda}\bXi^T\bQ_{k}(\bbeta_i+\bdelta_i)}\nonumber\\ &=& \ba_{k,i} \odot {\rm e}^{\frac{j2\pi}{\lambda}\bXi^T\bQ_{k}\bdelta_i}\nonumber \end{eqnarray}$ where $δ_{i} = {\begin{matrix} _{˜} \\ β \end{matrix}}_{q} - β_{i}$ $\hbox{$\bdelta_i=\tilde{\bbeta}_q-\bbeta_i$}$ and i ∈ ℱ. Because both β_i and $\begin{matrix} ˜ \\ β_{q} \end{matrix}$ $\hbox{$\tilde{\bbeta_q}$}$ are 3 × 1 unit vectors, each only has two degrees of freedom. This means that we can parameterize the unknowns for the grid-correcting problem using coefficients δ_1,i and δ_i,2. We assume that when a source is added to the free set, its actual position is very close to the center of the pixel on which it was detected. This means that δ_1,i and δ_i,2 are within the pixel’s width, denoted by W, and height, denoted by H. In this case we can replace Eq. (61)by a nonlinear constrained optimization, $\begin{matrix} \min_{δ, σ} & \frac{1}{2} ∥ b - Ψ (δ)_{ℱ} σ_{ℱ} ∥_{2}^{2} \\ s . t . & - W / 2 < δ_{1,i} < W / 2 \\ - H / 2 < δ_{i, 2} < H / 2 \end{matrix}$ $\begin{eqnarray} \label{eq:doals} \min_{\bdelta,\bsigma} &&\frac{1}{2}\| \bb - \bAt(\bdelta)_{\MCF}\bsigma_{{\MCF}}\|_2^2 \notag \\ \text{s.t.}&&-W/2<\delta_{1,i}<W/2\notag\\ &&-H/2<\delta_{i,2}<H/2 \end{eqnarray}$ (72)where Ψ(δ)_ℱ contains only the columns corresponding to the set ℱ, δ_j is a vector obtained by stacking δ_i,j for j = 1,2, and $b = r̂ - Ψ_{𝒰} σ_{𝒰} .$ $\begin{equation} \bb=\brh-\bAt_{\MCU}\bsigma_{\MCU}. \end{equation}$ (73)This problem can also be seen as a direction of arrival (DOA) estimation that is an active research area and beyond the scope of this paper. A good review of DOA mismatch correction for MVDR beamformers can be found in Chen & Vaidyanathan (2007), and Gu & Leshem (2012) proposed a correction method that is specifically applicable to the radio astronomical context.

Besides solving formula (72)instead of Eq. (61)in line 5 of the active set method, we also need to update the upper bounds and the standard deviations of the dirty images at the new pixel positions that are used in the other steps (e.g., lines 3, 6, and 13); the rest of the steps remain the same. Because we have a good initial guess to where each source in the free set is, we propose a Newton-based algorithm to do the correction.

4.5. Boxed imaging

A common practice in image deconvolution techniques like CLEAN is to use a priori knowledge and to narrow the search area for the sources to a certain region of the image, called CLEAN boxes. Because the contribution of the sources (if any) outside these boxes is assumed to be known, we can subtract them from the data such that we can assume that the intensity outside the boxes is zero.

To include these boxes in the optimization process of the active set algorithm, it is sufficient to make sure that the value of the pixels not belonging to these boxes do not change and remain zero. This is equivalent to replacing Ψ with Ψ_ℬ, where ℬ is the set of indices belonging to the boxes, before we start the optimization process. However, as we explain in the next section, we avoid storing the matrix Ψ in memory by exploiting its Khatri-Rao structure. We address this implementation issue by replacing Eq. (57)with $ℱ (σ) = (ℐ \ 𝒜 (σ)) \cap ℬ,$ $\begin{equation} \MCF(\bsigma)=(\MCI \setminus \MCA(\bsigma)) \cap \MCB , \end{equation}$ (74)which makes certain that the values of the elements outside of the boxes do not change. This has the same effect as removing the columns not belonging to ℬ from Ψ. Of course we have to make sure that these values are initialized to zero. By choosing σ⁽⁰⁾ = 0, this is automatically the case. The only problem with this approach is that the values outside the box remain in the set ℒ that is used for estimating the Lagrange variables, resulting in expensive calculations that are not needed. This problem is easily solved by calculating the gradient only for the pixels belonging to ℬ. The a priori non-zero values of the pixels (that were not in the boxes and were removed from the data) are added to the solution when the optimization process is finished.

5. Implementation using Krylov subspace-based methods

From the active-set methods described in the previous section, we know that we need to solve Eqs. (62)or (64)at each iteration. In this section we describe how to achieve this efficiently without the need to store the whole convolution matrix in memory.

During the active-set updates, we need to solve linear equations of the form Mx = b. However, there are cases where we do not have direct access to the elements of the matrix M. This can happen, for example, when M is too large to fit in memory. There are also cases where M (or M^H) are implemented as subroutines that produce the result of the matrix vector multiplication Mv for some input vector v. For example, for M = Ψ the operation Ψ^Hv generates a dirty image. An equivalent (and maybe optimized) implementation of such imaging subroutine might already be available to the user. In these scenarios it is necessary or beneficial to be able to solve the linear systems, using only the available matrix vector multiplication or the equivalent operator. A class of iterative solvers that can solve a linear system by only having access to the result of the multiplications with the matrix M are the Krylov subspace-based methods.

To illustrate the idea behind Krylov subspace-based methods, we assume that M is a square and non-singular matrix. In this case there exists a unique solution for x that is given by x = M^-1b. Using the minimum polynomial of a matrix we can write $\begin{matrix} M^{-1} = \frac{1}{γ_{0}} \sum_{j = 0}^{m - 1} γ_{j + 1} M^{j}, \end{matrix}$ $\begin{eqnarray*} \bM^{-1}=\frac{1}{\gamma_0}\sum_{j=0}^{m-1}\gamma_{j+1}\bM^j, \end{eqnarray*}$ where for a diagonalizable matrix M, m is the number of distinct eigenvalues (Ipsen & Meyer 1998). Using this polynomial expansion we have for our solution $\begin{matrix} x & = & \frac{1}{γ_{0}} \sum_{j = 0}^{m - 1} γ_{j + 1} M^{j} b \\ = & [\begin{matrix} \end{matrix}] γ \end{matrix}$ $\begin{eqnarray*} \bx&=&\frac{1}{\gamma_0}\sum_{j=0}^{m-1}\gamma_{j+1}\bM^j\bb \\ &=&\begin{bmatrix}\bb &\bM\bb &\dots &\bM^{m-1}\bb\end{bmatrix}\bgamma \end{eqnarray*}$ where $\begin{matrix} γ = \frac{1}{γ_{0}} [γ_{1}, ..., γ_{m}]^{T}, \end{matrix}$ $\begin{eqnarray*} \bgamma=\frac{1}{\gamma_0}[\gamma_1, \dots, \gamma_m]^T, \end{eqnarray*}$ and $\hbox{$\MCK_m(\bM,\bb)=[\bb, \bM\bb, \dots, \bM^{m-1}\bb]$}$ is called the Krylov subspace of M and b. Krylov subspace-based methods compute $\hbox{$\MCK_n(\bM,\bb)$}$ iteratively, for n = 1,2,.. and find an approximate for x by means of a projection on this subspace. Updating the subspace only involves a matrix-vector multiplication of the form Mv.

In cases where M is singular or where it is not a square matrix, another class of Krylov-based algorithms can be used that is related to bidiagonalization of the matrix M. The rest of this section describes the idea behind the Krylov-based technique LSQR and the way this helps more efficient implementation of a linear solver for our imaging algorithm.

5.1. Lanczos algorithm and LSQR

When we are solving CLS or PWLS, we need to solve a problem of the form $∥ b - M x ∥_{2}^{2}$ $\hbox{$\|\bb-\bM\bx\|_2^2$}$ as the first step in the active-set iterations; for example, in Eq. (62)M = Ψ_ℱ. It does not have to be a square matrix, and usually it is ill-conditioned, especially if the number of pixels is large. In general we can find a solution for this problem by first computing the singular value decomposition (SVD) of M as $M = US V^{H},$ $\begin{equation} \label{eq:svd} \bM=\bU\bS\bV^H, \end{equation}$ (75)where U and V are unitary matrices, and S is a diagonal matrix with positive singular values. Then the solution x to min ∥ b−Mx ∥ ² is found by solving for y in $\begin{matrix} S y = U^{H} b, \end{matrix}$ $\begin{eqnarray} \bS\by=\bU^H\bb, \end{eqnarray}$ (76)followed by setting $\begin{matrix} x = V y . \end{matrix}$ $\begin{eqnarray} \bx=\bV\by. \end{eqnarray}$ (77)Solving the LS problem with this method is expensive in both number of operations and memory usage, especially if the matrices U and V are not needed after finding the solution. As we see below, looking at another matrix decomposition helps us to reduce these costs. For the rest of this section we use the notation given by (Paige & Saunders 1982).

The first step in this approach for solving LS problem is to reduce M to a lower bidiagonal form as follows $M = UB V^{H},$ $\begin{equation} \bM=\bU\bB\bV^H, \end{equation}$ (78)where B is a bidiagonal matrix of the form $B = [\begin{matrix} α_{1} \\ β_{2} & α_{2} \\ {\begin{matrix} . \end{matrix}}^{.} . & {\begin{matrix} . \end{matrix}}^{.} . \\ β_{r} & α_{r} \\ 0 \end{matrix}],$ $\begin{equation} \bB=\left[\begin{array}{cccc|c} \alpha_1 & & & & \\ \beta_2&\alpha_2 & & & \\ &\ddots&\ddots & & \\ & & \beta_r&\alpha_r &\\ \hline & & & & \zeros\\ \end{array}\right],\\ \end{equation}$ (79)where r = rank(M) = rank(B) and U,V are unitary matrices (different than in Eq. (75)). This representation is not unique, and without loss of generality we could choose U to satisfy $U^{H} b = β_{1} e 1$ $\begin{equation} \bU^H\bb=\beta_1\be_1 \end{equation}$ (80)where β₁ = ∥ b ∥ ₂ and e₁ is a unit norm vector with its first element equal to one.

Using B, forward substitution gives the LS solution efficiently by solving y in $B y = U^{H} b = β_{1} e 1$ $\begin{equation} \bB\by=\bU^H\bb=\beta_1\be_1 \end{equation}$ (81)followed by $x = V y .$ $\begin{equation} \bx=\bV\by\notag. \end{equation}$ (82)Using forward substitution we have $\begin{matrix} y_{1} & = & \frac{β_{1}}{α_{1}} \\ x_{1} & = & v_{1} y_{1}, \end{matrix}$ $\begin{eqnarray} y_1&=&\frac{\beta_1}{\alpha_1}\\ \bx_1&=&\bv_1 y_1, \end{eqnarray}$ followed by the recursion, $\begin{matrix} y_{n + 1} & = & - \frac{β_{n + 1}}{α_{n + 1}} y_{n} \\ x_{n + 1} & = & x_{n} + v_{n + 1} y_{n + 1} \end{matrix}$ $\begin{eqnarray} y_{n+1}&=&-\frac{\beta_{n+1}}{\alpha_{n+1}} y_{n}\\ \bx_{n+1}&=&\bx_{n}+\bv_{n+1} y_{n+1} \end{eqnarray}$ for n = 1,...,M where M<r is the iteration at which ∥ M^H(Mx_n−b) ∥ ² vanishes within the desired precision. We can combine the bidiagonalization and solving for x and avoid extra storage needed for saving B, U, and V. One such algorithm is based on a Krylov subspace method called the Lanczos algorithm (Golub & Kahan 1965). We first initialize with $\begin{matrix} β_{1} & = & ∥ b ∥_{2} \\ u_{1} & = & \frac{b}{β_{1}} \\ α_{1} & = & ∥ M^{H} u_{1} ∥_{2} \\ v_{1} & = & \frac{M^{H} u_{1}}{α_{1}} \cdot \end{matrix}$ $\begin{eqnarray} \label{eq:initbeta} \beta_1&=&\|\bb\|_2\\[3mm] \label{eq:initu} \bu_1&=&\frac{\bb}{\beta_1}\\[3mm] \alpha_1&=&\|\bM^H\bu_1\|_2\\[3mm] \bv_1&=& \frac{\bM^H\bu_1}{\alpha_1}\cdot \end{eqnarray}$ The iterations are then given by $\begin{matrix} \begin{matrix} \end{matrix} \end{matrix}$ $\begin{eqnarray} \label{eq:lanczos} \begin{array}{l l} \beta_{n+1}&= \|\bM\bv_{n}-\alpha_{n}\bu_{n}\|_2\\[3mm] \bu_{n+1}&=\frac{1}{\beta_{n+1}} (\bM\bv_n-\alpha_n\bu_n) \\[3mm] \alpha_{n+1}&=\|\bM^H\bu_{n+1}-\beta_{n+1}\bv_{n}\|_2 \\[3mm] \bv_{n+1}&=\frac{1}{\alpha_{n+1}} (\bM^H\bu_{n+1}-\beta_{n+1}\bv_n) \end{array} \end{eqnarray}$ (91)for n = 1,2,...,M, where $u_{n}^{H} u_{n} = v_{n}^{H} v_{n} = 1$ $\hbox{$\bu_n^H\bu_n=\bv_n^H\bv_n=1$}$ . This provides us with all the parameters needed to solve the problem.

However, because of finite precision errors, the columns of U and V found in this way lose their orthogonality as we proceed. To prevent this error propagation into the final solution x, different algorithms, such as conjugate gradient (CG), MINRES, and LSQR, have been proposed. The exact updates for x_n and stopping criteria to find M depend on the choice of algorithm used and therefore are not included in the iterations above.

An overview of Krylov subspace-based methods is given by (Choi 2006, p.91). This study shows that LSQR is a good candidate for solving LS problems when we are dealing with an ill-conditioned and non-square matrix. For this reason we use LSQR to solve Eqs. (62)or (64). Because the remaining steps during the LSQR updates are a few scalar operations and do not have a large impact on the computational complexity of the algorithm, we do not go into the details(see Paige & Saunders 1982).

In the next section we discuss how to use the structure in M to avoid storing the entire matrix in memory and how to parallelize the computations.

5.2. Implementation

During the active set iteration we need to solve Eqs. (62)and (64)where the matrix M in LSQR is replaced by Ψ_ℱ and (R^{− T/ 2} ⊗ R^{− 1/2})(ΨD^-1)_ℱ, respectively. Because Ψ has a Khatri-Rao structure and selecting and scaling a subset of columns does not change this, Ψ_ℱ and (ΨD^-1)_ℱ also have a Khatri-Rao structure. Here we show how to use this structure to implement Eqs. (91)in parallel and with less memory usage.

The only time the matrix M enters the algorithm is via the matrix-vector multiplications Mv_n and M^Hu_{n + 1}. As an example we use M = Ψ_ℱ for solving Eq. (62). Let k_n = Ψ_ℱv_n. We partition k_n as Ψ into $k_{n} = {[\begin{matrix} k_{1,n}^{T} & ... & k_{K,n}^{T} \end{matrix}]}^{T} .$ $\begin{equation} \bk_n=\begin{bmatrix} \bk_{1,n}^T &\dots & \bk_{K,n}^T \end{bmatrix}^T. \end{equation}$ (92)Using the definition of Ψ in Eq. (18), the operation k_n = Ψ_ℱv_n could also be performed using $K_{k,n} = \sum_{i \in ℱ} v_{i,n} a_{k,i} a_{k,i}^{H},$ $\begin{equation} \bK_{k,n}=\sum_{i \in \MCF} v_{i,n} \ba_{k,i}\ba_{k,i}^H, \end{equation}$ (93)and subsequently we set $k_{k,n} = vect (K_{k,n}) .$ $\begin{equation} \bk_{k,n}=\vect(\bK_{k,n}). \end{equation}$ (94)This process can be highly parallelized because of the independence between the correlation matrices of each time snapshot. The matrix K_k,n can then be used to find the updates in Eqs. (91).

The operation M^Hu in Eq. (91)is implemented in a similar way. Using the beamforming approach (similar to Sect. 3.4), this operation can also be done in parallel for each pixel and each snapshot. In both cases the calculations can be formulated as correlations and beamforming of parallel data paths, which means that efficient hardware implementations are feasible. Also we can consider traditional LS or WLS solutions as a special case when all the pixels belong to the free set, which means that those algorithms can also be implemented efficiently in hardware in the same way. During the calculations we work with a single beamformer at the time, and the matrix Ψ need not to be precalculated and stored in memory. This makes it possible to apply image formation algorithms for large images when there is a memory shortage.

The computational complexity of the algorithm is dominated by the transformation between the visibility domain and image domain (correlation and beamforming). The dirty image formation and correlation have a complexity of O(KP²I). This means that the worst-case complexity of the active set algorithm is O(TMKP²I) where T is the number of active set iterations and M the maximum number of Krylov iterations. A direct implementation of CLEAN for solving the imaging problem presented in Sect. 3 in a similar way would have a complexity of O(TKP²I). The proposed algorithm is therefore order M times more complex, essentially because it recalculates the flux for all the pixels in the free set, while CLEAN only estimates the flux of a newly added pixel. Considering that (for a well-posed problem) solving Mx = b using LSQR is algebraically equivalent to solving M^HMx = M^Hb using CG (Fong 2011), we can use the convergence properties of CG (Demmel 1997) to obtain an indication of the required number of Krylov iterations M. It is found that M is on the order $O (\sqrt{card (ℱ)})$ $\hbox{$O(\!\sqrt{\text{card}(\MCF)})$}$ where card(ℱ) is the cardinality of the free set, which is equal to the number of pixels in the free set.

In practice, many implementations of CLEAN use the FFT instead of a DFT (matched filter) for calculating the dirty image. Extending the proposed method to use similar techniques is possible and will be presented in future works.

Fig. 1

Contoured true source on dB scale.

Fig. 2

Contoured dirty images on dB scale. a) MF dirty image. b) MVDR dirty image.

Fig. 3

Extended source simulations. Units for the first and third columns are in dB. Linear scale is used for residual images (second column). a) CLEAN Image. b) CLEAN Residual image. c) CLEAN cross-section. d) Solution of CLS + MF. e) Residual image for CLS + MF. f) Cross-section for CLS + MF. g) Solution of CLS + MVDR. h) Residual image for CLS + MVDR. i) Cross-section for CLS image + MVDR. j) CPWLS image. k) CPWLS residual image. l) MVDR dirty image and CPWLS cross-section.

6. Simulations

The performance of the proposed methods were evaluated using simulations. Because the active-set algorithm adds a single pixel to the free set at each step, it is important to investigate the effect of this procedure on extended sources and noise. For this purpose, in our first simulation set-up we used a high dynamic range simulated image with a strong point source and two weaker extended sources in the first part of the simulations. In a second set up, we made a full sky image using sources from the 3C catalog.

Following the discussion in Sect. 4.2, we defined the residual image for CLS and CLEAN as $\begin{matrix} σ_{res} = Ψ^{H} (r̂ - Ψ σ - r_{n}), \end{matrix}$ $\begin{eqnarray*} \bsigma_{\rm res}=\bAt^H(\brh-\bAt\bsigma-\br_{n}), \end{eqnarray*}$ and for CPWLS, we used $\begin{matrix} σ_{res} = D^{-1} Ψ^{H} ({R̂}^{- T} \otimes {R̂}^{-1}) (r̂ - Ψ D^{-1} \begin{matrix} ˇ \\ σ \end{matrix} - r_{n}), \end{matrix}$ $\begin{eqnarray*} \bsigma_{\rm res}=\bD^{-1}\bAt^H(\bRh^{-T}\otimes \bRh^{-1})(\brh-\bAt\bD^{-1}\PWLSsigma-\br_{n}), \end{eqnarray*}$ where we assumed we know the noise covariance matrix R_n.We also defined the reconstruction S/N on the image in dB scale as $\begin{matrix} {S / N}_{r} = 20 \log_{10} (\frac{∥ σ_{true} ∥}{∥ σ_{ture} - σ̂ ∥}), \end{matrix}$ $\begin{eqnarray*} \text{S/N}_r = 20 \log_{10} \left (\frac{\| \bsigma_{\text{true}} \|}{\| \bsigma_{\text{ture}}-\bsigmah \|} \right), \end{eqnarray*}$ where σ_true is the true image and $\hbox{$\bsigmah$}$ is the reconstructed image.

6.1. Extended sources

An array of 100 dipoles (P = 100) with random distribution was used with the frequency range of 58−90 MHz from which we simulated three equally spaced channels. Each channel has a bandwidth of 195 kHz and was sampled at Nyquist rate. These specifications are consistent with the LOFAR telescope in LBA mode (van Haarlem et al. 2013). LOFAR uses one-second snapshots, and we did the simulation using only two snapshots, i.e., K = 2. We used spectrally white sources for the simulated frequency channels, which allowed us to extend the data model to one containing all frequency data by simply stacking the individual $\hbox{$\brh$}$ for each frequency into a single vector. Likewise, we stacked the individual Ψ into a single matrix. Since the source intensity vector σ is common for all frequencies, the augmented data model has the same structure as before.

The simulated source is a combination of a strong point source with intensity 40 dB and two extended structures with intensities of 0 dB. The extended structures are composed of from seven nearby Gaussian-shaped sources, one in the middle and six on a hexagon around it. (This configuration was selected to generate an easily reproducible example.) Figure 1 shows the simulated image on dB scale. The background noise level that was added is at −10 dB, which is also 10 dB below the extended sources. This is equivalent to a dynamic range of 40 dB and a minimum S/N of 10 dB.

Figures 2a and b show the matched filter and MVDR dirty images, respectively. The first column of Fig. 3 shows the final result of the CLEAN, CLS with the MF dirty image as upper bound, CLS with the MVDR dirty image as upper bound, and CPWLS with the MVDR dirty image as upper bound without the residual images. For each image, the extracted point sources were convolved with a Gaussian beam to smoothen the image. We used a Gaussian beam that has the same main beamwidth as the MF dirty image. The second column of Fig. 3 shows the corresponding residual images as defined before, and the last column shows a cross section parallel to the β₂ axis going through the sources at the center of the image.

Remarks are:

As expected the MVDR dirty image has a much better dynamicrange (≈ 40 dB) and lower side-lobes compared to the MF dirty image (≈ 15 dB dynamic range).
Due to a better initial dirty image and upper bound, the CPWLS deconvolution gives a better reconstruction of the image.
The cross sections show the accuracy of the estimated intensities. This shows that not only the shape but also the magnitude of the sources are better estimated using CPWLS.
Using the MVDR upper bound for CLS improves the estimate, illustrating the positive effect of using a proper upper bound.
All algorithms manage to recover the intensity of the strong point source with high quality. The S/N_r for CLS and CLS with MVDR is highest at 62.8 dB then CLEAN and CPWLS with 62.6 and 58.4 dB, respectively. (Only the pixel corresponding to the strong source is used to calculate these S/N_r.)
CPWLS has the best performance in recovering the extended sources with S/N_r of 16.5 dB compared to 11.9 and 11.7 dB for CLEAN and CLS respectively. (The pixel corresponding to the strong source was removed for calculating these S/N_r.)
The residual image for CPWLS is almost two orders of magnitude lower than the residual images for CLEAN and CLS.
While the residual image of the CLS algorithm appears very similar to the CLEAN reconstruction, CLS can guarantee that these values are inside the chosen confidence interval of six standard deviations of each pixel, while CLEAN does not provide this guarantee.

6.2. Full sky with 3C sources

In a second simuation set-up, we constructed an all-sky image with sources from the 3C catalog. The array configuration is the same as before with the same number of channels and snapshots. A background noise level of 0 dB (with respect to 1 Jansky) is added to the sky.

We first checked which sources from the 3C catalog are visible at the simulated date and time. From these we chose 20 sources that represent the magnitude distribution on the sky and produce the highest dynamic range available in this catalog. Table 2 shows the simulated sources with corresponding parameters. The coordinates are the (l,m) coordinates at the first snapshot. Because the sources are not necessarily on the grid points, we combined the active set deconvolution with the grid corrections on the free set as described in Sect. 4.4.

Figure 4a shows the true and estimated positions for the detected sources. Because the detection mechanism was able to detect the correct number of sources, we have included the estimated fluxes in Table 2 for easier comparison. Figure 4b shows the full-sky MF dirty image. Figure 5a shows the final reconstructed image with the residual added to it (with grid corrections applied), and Fig. 5c shows the same result for CLEAN.

Remarks:

The active set algorithm with grid corrections automaticallystops after adding the correct number of sources based on thedetection mechanism we have incorporated in the active setmethod.
Because of the grid correction, no additional sources are added to compensate for incorrect intensity estimates on the grids.
All 20 sources are visible in the final reconstructed image, and no visible artifacts are added to the image.
CLEAN also produces a reasonable image with all the sources visible. However, a few hundred point sources have been detected during the CLEAN iteration, most of which are the result of the strong sources that are not on the grid. Some clear artifacts are introduced (as seen in the residual image) that are also the result of the incorrect subtraction of off-grid sources.
Figure 5b shows that the residual image using active set and grid corrections contains a “halo” around the position of the strong source–the residual image is not flat. In fact, the detection mechanism in the active set algorithm (with a threshold of 6 times the standard deviation) has correctly not considered this halo as a source. The halo is a statistical artifact due to finite samples and will be reduced in magnitude by longer observations with a rate proportional to $1 / \sqrt{N K^{2}}$ $\hbox{$1/\sqrt{N K^2}$}$ .
The CLEAN algorithm requires more than 100 sources to model the image. This is mainly because of the the strong off–grid source (Cassiopeia A). This illustrates that while CLEAN is less complex than the proposed method when the number of detected sources are equal, in practice CLEAN might need many more sources to model the same image.

Table 2

Simulated sources from 3C catalog.

Fig. 4

Point source simulations. a) Position estimates. b) Full sky MF dirty image in dB (with respect to 1 Jy).

Fig. 5

Reconstructed images in dB (with respect to 1 Jy) scale and residual images on linear scale. a) Reconstructed image with grid correction plus residual image. b) Residual image using active set and grid correction. c) Reconstructed image with CLEAN plus residual image. d) Residual image using CLEAN.

7. Conclusions

Based on a parametric model and constraints on the intensities, we have formulated image deconvolution as a weighted least squares optimization problem with inequality constraints. We first showed that the classical (matched filter) dirty image is an upper bound, but a much tighter upper bound is provided by the “MVDR dirty image”. The conditioning of the problem can be improved by a preconditioning step, which is also related to the MVDR dirty image.

Second, the constrained least squares problem was solved using an active-set-based method. The relation between the resulting method and sequential source removing techniques such as CLEAN was explained. The theoretical background of the active set methods can be used for deeper insight into how the sequential techniques work.In particular, the active set algorithm uses a detection threshold with a known false alarm, which can be set such that no false sources appear in the image, and we showed that by introducing a grid correcting step into the active set iterations, we can improve both the detection of the sources and the estimation of their intensities.

Third, the Khatri-Rao structure of the data model was used in combination with Krylov based techniques to solve the linear systems involved in the deconvolution process with less storage and complexity.The complexity of the algorithm is higher than that of classical sequential source removing techniques (by a factor proportional to the square root of the detected number of sources), because the detected source intensities are re-estimated by the Krylov subspace technique after each step of the active set iteration. However, the proposed algorithm has a better detection mechanism compared to classical CLEAN, which leads to a lower number of sources to model the image. As a result, the overall complexity is expected to be comparable. We also expect that the performance of the algorithm can be readily improved because the updates by the active set iterations are one-dimensional (one source is added or removed), and this can be exploited to update the Krylov subspaces accordingly, rather than computing them each time from scratch. This is left for future work.

The simulations show that the proposed CPWLS algorithm provides improved spatial structure and improved intensity estimates compared to CLEAN-based deconvolution of the classical dirty image.A particularly attractive aspect is the demonstrated capability of the algorithm to perform automated source detection, which will be of interest for upcoming large surveys.

References

Barrett, R., Berry, M., Chan, T. F., et al. 1994, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd edn. (Philadelphia, PA: SIAM) [Google Scholar]
Ben-David, C., & Leshem, A. 2008, IEEE J. Selected Topics in Signal Processing, 2, 670 [NASA ADS] [CrossRef] [Google Scholar]
Bhatnager, S., & Cornwell, T. 2004, A&A, 426, 747 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Boyd, S., & Vandenberghe, L. 2004, Convex Optimization (Cambridge University Press) [Google Scholar]
Briggs, D. S. 1995, Ph.D. Thesis, The New Mexico Institute of Mining and Technology, Socorro, New Mexico [Google Scholar]
Capon, J. 1969, Proc. IEEE, 57, 1408 [Google Scholar]
Carrillo, R., McEwen, J., & Wiaux, Y. 2012, MNRAS, 426, 1223 [NASA ADS] [CrossRef] [MathSciNet] [Google Scholar]
Carrillo, R. E., McEwen, J. D., & Wiaux, Y. 2014, MNRAS, 439, 3591 [NASA ADS] [CrossRef] [MathSciNet] [Google Scholar]
Chen, C.-Y., & Vaidyanathan, P. 2007, IEEE Trans. Signal Processing, 55, 4139 [NASA ADS] [CrossRef] [Google Scholar]
Choi, S.-C. T. 2006, Ph.D. Thesis, Stanford University [Google Scholar]
Cornwell, T. 2008, IEEE J. Selected Topics in Signal Processing, 2, 793 [Google Scholar]
Demmel, J. W. 1997, Applied Numerical Linear Algebra (SIAM) [Google Scholar]
Fong, D. C.-l. 2011, Ph.D. Thesis, Stanford University [Google Scholar]
Frieden, B. 1972, J. Opt. Soc. Am., 62, 511 [Google Scholar]
Gill, P. E., Murray, W., & Wright, M. H. 1981, Practical optimization (London: Academic Press Inc., Harcourt Brace Jovanovich Publishers) [Google Scholar]
Golub, G., & Kahan, W. 1965, J. Soc. Industr. Appl. Math., Ser. B: Numerical Analysis, 2, 205 [Google Scholar]
Gu, Y., & Leshem, A. 2012, IEEE Trans. Signal Processing, 60, 3881 [NASA ADS] [CrossRef] [Google Scholar]
Gull, S., & Daniell, G. 1978, Nature, 272, 686 [NASA ADS] [CrossRef] [Google Scholar]
Högbom, J. A. 1974, A&AS, 15, 417 [NASA ADS] [Google Scholar]
Ipsen, I. C. F., & Meyer, C. D. 1998, Am. Math. Monthly, 105, 889 [CrossRef] [MathSciNet] [Google Scholar]
Junklewitz, H., Bell, M. R., Selig, M., & Enßlin, T. 2016, A&A, 586, A76 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Leshem, A., & van der Veen, A. 2000, IEEE Trans. Information Theory, Special issue on information theoretic imaging, 46, 1730 [CrossRef] [Google Scholar]
Leshem, A., van der Veen, A., & Boonstra, A. J. 2000, ApJS, 131, 355 [NASA ADS] [CrossRef] [Google Scholar]
Levanda, R., & Leshem, A. 2008, Electrical and Electronics Engineers in Israel, IEEEI 2008. IEEE 25th Convention, 716 [Google Scholar]
Levanda, R., & Leshem, A. 2013, IEEE Trans. Signal Processing, 61, 5063 [NASA ADS] [CrossRef] [Google Scholar]
Lochner, M., Natarajan, I., Zwart, J. T., et al. 2015, MNRAS, 450, 1308 [NASA ADS] [CrossRef] [Google Scholar]
Ottersten, B., Stoica, P., & Roy, R. 1998, Digital Signal Processing, 8, 185 [CrossRef] [Google Scholar]
Paige, C. C., & Saunders, M. A. 1982, ACM Trans. Math. Softw., 8, 43 [CrossRef] [MathSciNet] [Google Scholar]
Rau, U., Bhatnagar, S., Voronkov, M., & Cornwell, T. 2009, Proc. IEEE, 97, 1472 [NASA ADS] [CrossRef] [Google Scholar]
Reid, R. 2006, MNRAS, 367, 1766 [NASA ADS] [CrossRef] [Google Scholar]
Shaman, P. 1980, J. Multivariate Analysis, 10, 51 [CrossRef] [Google Scholar]
van Haarlem, M. P., Wise, M. W., Gunst, A. W., et al. 2013, A&A, 556, A2 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Wiaux, Y., Jacques, L., Puy, G., Scaife, A., & Vandergheynst, P. 2009, MNRAS, 395, 1733 [NASA ADS] [CrossRef] [Google Scholar]
Wijnholds, S., & van der Veen, A.-J. 2008, IEEE J. Selected Topics in Signal Processing, 2, 613 [NASA ADS] [CrossRef] [Google Scholar]

Appendix A: Relation between WLS, natural, and robust weighting

Natural weighting is a technique to improve the detection of weak sources by promoting the visibility values that have a better signal-to-noise-ratio (Briggs 1995). This is done by dividing each visibility sample by the variance of noise on that sample (while assuming that the noise on each sample is independent). Considering that the visibility samples are the elements of the covariance matrix $\hbox{$\bRh_k$}$ we can model the sample visibilities as ${r̂}_{k} = r_{k} + ϵ,$ $\appendix \setcounter{section}{1} \begin{equation} \brh_k=\br_k+\bepsilon, \end{equation}$ (A.1)where ϵ is the complex noise on the samples. As discussed in Appendix C, $\hbox{$\bRh$}$ has a Wishart distribution and for a large number of samples N we have ${r̂}_{k} ~ 𝒩 (r_{k}, (R_{k}^{T} \otimes R_{k}) / N)$ $\hbox{$\brh_k \sim \MCN(\br_k,(\bR_k^T\otimes \bR_k)/N)$}$ . This means that $\hbox{$\bepsilon = \brh_k - \br_k$}$ has a complex Gaussian distribution $𝒩 (0, (R_{k}^{T} \otimes R_{k}) / N)$ $\hbox{$\MCN(\zeros,(\bR_k^T\otimes \bR_k)/N)$}$ . Because astronomical sources are usually much weaker than the system noise, it is common to use the approximation R_k ≈ R_n,k. With this approximation and using the independence of system noise on each receiving element (antenna or station), we can assume that R_n,k is diagonal and that $(R_{k}^{T} \otimes R_{k}) / N \approx (R_{n,k}^{T} \otimes R_{n,k}) / N$ $\hbox{$(\bR_k^T\otimes \bR_k)/N \approx (\bR_{\bn,k}^T\otimes \bR_{\bn,k})/N$}$ is also a diagonal approximation of the noise covariance matrix on the visibility samples. With this framework we can write the natural weighting as ${r̂}_{natural} = N (R_{n,k}^{- T} \otimes R_{n,k}^{-1}) {r̂}_{k} .$ $\appendix \setcounter{section}{1} \begin{equation} \brh_{\text{natural}}= N (\bR_{\bn,k}^{-T}\otimes \bR^{-1}_{\bn,k})\brh_k. \end{equation}$ (A.2)This shows that natural weighting is a very reasonable approximation of the weighting used when solving (28)for WLS (except for a factor N that drops out from both sides).

Next, we relate WLS to Robust Weighting (Briggs 1995) by assuming slightly different simplifications. Let us assume that $R_{n,k} = σ_{n}^{2} I$ $\hbox{$\bR_{\bn,k}=\sigma_{n}^2 \bI$}$ and let us consider a single source with intensity σ then we have for $\begin{matrix} R_{k}^{-1} & = & (R_{n,k} + σ {\begin{matrix} ˜ \\ a \end{matrix}}_{k} {\begin{matrix} ˜ \\ a \end{matrix}}_{k}^{H})^{-1} \\ = & R_{n,k}^{-1} - \frac{σ R_{n,k}^{-1} {\begin{matrix} ˜ \\ a \end{matrix}}_{k} {\begin{matrix} ˜ \\ a \end{matrix}}_{k}^{H} R_{n,k}^{-1}}{1 + σ {\begin{matrix} ˜ \\ a \end{matrix}}_{k}^{H} R_{n,k}^{-1} {\begin{matrix} ˜ \\ a \end{matrix}}_{k}} \\ = & \frac{1}{σ_{n}^{2}} (I - \frac{{\begin{matrix} ˜ \\ a \end{matrix}}_{k} {\begin{matrix} ˜ \\ a \end{matrix}}_{k}^{H}}{1 + \frac{σ_{n}^{2}}{σ}}) \cdot \end{matrix}$ $\appendix \setcounter{section}{1} \begin{eqnarray} \bR_k^{-1}&=&(\bR_{\bn,k}+ \sigma \bat_k\bat_k^H)^{-1} \notag \\ &=& \bR_{\bn,k}^{-1}-\frac{\sigma \bR_{\bn,k}^{-1}\bat_k\bat_k^H\bR_{\bn,k}^{-1}}{1+\sigma \bat^H_k\bR_{\bn,k}^{-1}\bat_k} \notag \\ &=&\frac{1}{\sigma_{n}^2}\left ( \bI-\frac{\bat_k\bat_k^H}{1+\frac{\sigma_{n}^2}{\sigma}} \right )\cdot \end{eqnarray}$ (A.3)Compared to natural weighting, now not only the noise power but also the available signal power is taken into account for the weighting. The term $1 / (1 + σ_{n}^{2} / σ)$ $\hbox{$1/(1+\sigma_{n}^2/\sigma)$}$ is the same as the parametric Wiener filter in the Fourier domain as given by (Briggs 1995) which relates Robust Weighting to standard signal processing concepts. However Robust Weighting also takes the visibility sampling of the gridded uv-plane into account when calculating the weights, which is not explained in the derivation above. Hence the exact relation between Robust Weighting and WLS is still missing. This relation is interesting and should be addressed in future works.

Appendix B: Using beamformers to find upper bounds

In this section we show how to use linear beamformers to define an upper bound for the image. We also show that ASSC gives the tightest bound if we use true covariance matrix R. We also show that the MVDR dirty image is an upper bound but tighter than MF dirty image.

Let us define a beamformer $w_{MF,i} = \frac{1}{\sqrt{K}} a_{i}$ $\hbox{$\bw_{\text{MF},i}=\frac{1}{\sqrt{K}}\ba_i$}$ , then we observe that each pixel in the MF dirty image is the output of this beamformer: $σ_{MF,i} = w_{MF,i}^{H} R w_{MF,i} .$ $\appendix \setcounter{section}{2} \begin{equation} \MFDIind{i}=\bw_{\text{MF},i}^H\bR\bw_{\text{MF},i}. \end{equation}$ (B.1)As indicated in Sect. 3.2, we can extend this concept to a more general beamformer w_i. The output power of this beamformer, in the direction of the ith pixel, becomes $σ_{w,i} = w_{i}^{H} R w_{i} = σ_{i} w_{i}^{H} (I_{K} ◦ A^{i}) (I_{K} ◦ A^{i})^{H} w_{i} + w_{i}^{H} R_{r} w_{i} .$ $\appendix \setcounter{section}{2} \begin{equation} \label{eq:beamformer} \WTDIind{i}= \bw_{i}^H\bR\bw_{i}=\sigma_{i}\bw_i^H(\bI_K \circ \bA^i)(\bI_K \circ \bA^i)^H\bw_i+ \bw_{i}^H\bR_r\bw_{i} . \end{equation}$ (B.2)If we require that $w_{i}^{H} (I_{K} ◦ A^{i}) (I_{K} ◦ A^{i})^{H} w_{i} = 1$ $\appendix \setcounter{section}{2} \begin{equation} \label{eq:mvdrconst_mulit} \bw_i^H(\bI_K \circ \bA^i)(\bI_K \circ \bA^i)^H\bw_i=1 \end{equation}$ (B.3)we have $σ_{w,i} = σ_{i} + w_{i}^{H} R_{r} w_{i} .$ $\appendix \setcounter{section}{2} \begin{equation} \label{eq:MVDRmin} \WTDIind{i}=\sigma_{i}+\bw_{i}^H\bR_{r}\bw_{i} \,. \end{equation}$ (B.4)As before, the fact that R_r is positive definite implies that $σ_{i} \leq σ_{w,i} .$ $\appendix \setcounter{section}{2} \begin{equation} \label{eq:wupper} \sigma_{i} \le \WTDIind{i} \,. \end{equation}$ (B.5)We can easily verify that w_MF,i satisfies Eq. (B.3)and hence σ_MF,i is a specific upper bound. We can translate the problem of finding the tightest upper bound to the following optimization question: $\begin{matrix} σ_{opt,i} & = & \min_{w_{i}} w_{i}^{H} R w_{i} \\ s . t . w_{i}^{H} (I_{K} ◦ A^{i}) (I_{K} ◦ A^{i})^{H} w_{i} = 1 \end{matrix}$ $\appendix \setcounter{section}{2} \begin{eqnarray} \sigma_{\text{opt},i}&=&\min_{\bw_i} \bw_{i}^H\bR\bw_{i} \\ && \text{s.t.~} \bw_i^H(\bI_K \circ \bA^i)(\bI_K \circ \bA^i)^H\bw_i=1 \notag \end{eqnarray}$ (B.6)where σ_opt,i would be this tightest upper bound.

To solve this optimization problem we follow standard optimization techniques and define the Lagrangian and take derivatives with respect to w and the Lagrange multiplier μ. This leads to the following system $\begin{matrix} w & = & μ R^{-1} (I_{K} ◦ A_{i}) (I_{K} ◦ A_{i})^{H} w \\ 1 & = & w^{H} (I_{K} ◦ A_{i}) (I_{K} ◦ A_{i})^{H} w . \end{matrix}$ $\appendix \setcounter{section}{2} \begin{eqnarray} \label{ew:grad} \bw&=&\mu \bR^{-1}(\bI_K \circ \bA_i)(\bI_K \circ \bA_i)^H\bw \\ \label{ew:equality} 1&=&\bw^H(\bI_K \circ \bA_i)(\bI_K \circ \bA_i)^H\bw. \end{eqnarray}$ Because R is full–rank and Eq. (B.8)we can model w as $w = μ R^{-1} (I_{K} ◦ A_{i}) x .$ $\appendix \setcounter{section}{2} \begin{equation} \bw=\mu \bR^{-1}(\bI_K \circ \bA_i) \bx. \end{equation}$ (B.9)Filling back into Eq. (B.7)we have $\begin{matrix} μ R^{-1} (I_{K} ◦ A_{i}) x = \\ μ^{2} R^{-1} (I_{K} ◦ A_{i}) (I_{K} ◦ A_{i})^{H} R^{-1} (I_{K} ◦ A_{i}) x \end{matrix}$ $\appendix \setcounter{section}{2} \begin{equation} \begin{array}{l} \mu \bR^{-1}(\bI_K \circ \bA_i) \bx =\\\quad\quad\mu^2 \bR^{-1}(\bI_K \circ \bA_i)(\bI_K \circ \bA_i)^H\bR^{-1}(\bI_K \circ \bA_i) \bx \end{array} \end{equation}$ (B.10)and $\begin{matrix} (I_{K} ◦ A_{i}) x = \\ μ (I_{K} ◦ A_{i}) (I_{K} ◦ A_{i})^{H} R^{-1} (I_{K} ◦ A_{i}) x . \end{matrix}$ $\appendix \setcounter{section}{2} \begin{equation} \begin{array}{l} (\bI_K \circ \bA_i) \bx =\\\quad\quad\mu (\bI_K \circ \bA_i)(\bI_K \circ \bA_i)^H\bR^{-1}(\bI_K \circ \bA_i) \bx. \end{array} \end{equation}$ (B.11)Multiplying both sides of this equation by (I_K°A_i)^H we get $x = μ (I_{K} ◦ A_{i})^{H} R^{-1} (I_{K} ◦ A_{i}) x .$ $\appendix \setcounter{section}{2} \begin{equation} \label{eq:xgrad} \bx=\mu (\bI_K \circ \bA_i)^H\bR^{-1}(\bI_K \circ \bA_i) \bx. \end{equation}$ (B.12)Doing the same for Eq. (B.8)we have $\begin{matrix} μ^{2} x^{H} (I_{K} ◦ A_{i})^{H} R^{-1} (I_{K} ◦ A_{i}) (I_{K} ◦ A_{i})^{H} R^{-1} (I_{K} ◦ A_{i}) x = 1. \end{matrix}$ $\appendix \setcounter{section}{2} \begin{equation} \begin{array}{l} \mu^2 \bx^H(\bI_K \circ \bA_i)^H \bR^{-1}(\bI_K \circ \bA_i)(\bI_K \circ \bA_i)^H \bR^{-1}(\bI_K \circ \bA_i) \bx =1. \end{array} \end{equation}$ (B.13)Now we use Eq. (B.12)and we find $x^{H} x = 1,$ $\appendix \setcounter{section}{2} \begin{equation} \bx^H\bx=1, \end{equation}$ (B.14)which makes finding x an eigenvalue problem. By taking a closer look at the matrix (I_K°A_i)^HR^-1(I_K°A_i) we find that this matrix is diagonal $\begin{matrix} (I_{K} ◦ A_{i})^{H} R^{-1} (I_{K} ◦ A_{i}) = \\ [\begin{matrix} a_{1,i}^{H} R_{1}^{-1} a_{1,i} 0 ... 0 \\ 0 a_{i, 2}^{H} R_{2}^{-1} a_{i, 2} \begin{matrix} . \\ . \\ . \end{matrix} \\ \begin{matrix} . \\ . \\ . \end{matrix} {\begin{matrix} . \end{matrix}}^{.} . 0 \\ 0 ... 0 a_{i,K}^{H} R_{K}^{-1} a_{i,K} \end{matrix}] \end{matrix}$ $\appendix \setcounter{section}{2} \begin{equation} \begin{array}{l} (\bI_K \circ \bA_i)^H\bR^{-1}(\bI_K \circ \bA_i)=\\[3mm] ~~~~~~~~~\begin{bmatrix} \ba_{1,i}^H\bR_1^{-1}\ba_{1,i} & \zeros & \dots & \zeros \\ \zeros & \ba_{i,2}^H\bR_2^{-1}\ba_{i,2} &&\vdots \\ \vdots & &\ddots & \zeros\\ \zeros &\dots &\zeros & \ba_{i,K}^H\bR_K^{-1}\ba_{i,K} \end{bmatrix} \end{array} \end{equation}$ (B.15)and hence x = e_m is an elementary vector with all entries equal to zero except for mth entry which equals unity. m is the index corresponding to largest eigenvalue, λ_max, and from Eq. (B.12)we have μ = 1 /λ_max. Filling back for w we find $w_{i, opt} = \frac{1}{a_{i,m} R_{m}^{-1} a_{i,m}} R^{-1} (e_{m} \otimes a_{i,m})$ $\appendix \setcounter{section}{2} \begin{equation} \bw_{i,\text{opt}}=\frac{1}{\ba_{i,m}\bR_m^{-1}\ba_{i,m}}\bR^{-1}(\be_m \otimes \ba_{i,m}) \end{equation}$ (B.16)and the output of the beamformer $\begin{matrix} σ_{opt} & = & w_{i, opt}^{H} R w_{i, opt} \\ = & \frac{a_{i,m}^{H} R_{m}^{-1} a_{i,m}}{(a_{i,m}^{H} R_{m}^{-1} a_{i,m})^{2}} \\ = & \frac{1}{a_{i,m}^{H} R_{m}^{-1} a_{i,m}} \\ = & \min_{k} (\frac{1}{a_{k,i}^{H} R_{k}^{-1} a_{k,i}}) \cdot \end{matrix}$ $\appendix \setcounter{section}{2} \begin{eqnarray} \sigma_{\rm opt}&=& \bw_{i,\text{opt}}^H\bR\bw_{i,\text{opt}} \notag \\[3mm] &=&\frac{\ba_{i,m}^H\bR^{-1}_m\ba_{i,m}}{(\ba_{i,m}^H\bR^{- 1}_m\ba_{i,m})^2} \notag \\[3mm] &=&\frac{1}{\ba_{i,m}^H\bR_m^{-1}\ba_{i,m}} \notag \\[3mm] &=&\min_k\left(\frac{1}{\ba_{k,i}^H\bR_k^{-1}\ba_{k,i}}\right)\cdot \end{eqnarray}$ (B.17)In order to reduce the variance of this solution we suggest to find a beamformer that instead of Eq. (B.3) satisfies the slightly different normalization constraint $w_{i}^{H} a_{i} = \sqrt{K} .$ $\appendix \setcounter{section}{2} \begin{equation} \bw_i^H\ba_{i}=\sqrt{K} . \end{equation}$ (B.18)We show that the expected value of the resulting dirty image constitutes a larger upper bound than the ASSC (40), but because the output power of this beamformer depends on more than one snapshot it has a lower variance than ASSC, so that it is more robust in practice.

With this constraint, the beamforming problem is $\begin{matrix} w_{i} & = & \arg \min_{w_{i}} w_{i}^{H} R w_{i} \\ s . t . w_{i}^{H} a_{i} = \sqrt{K} \end{matrix}$ $\appendix \setcounter{section}{2} \begin{eqnarray} \label{eq:mvdrproblem0} \bw_i&=&\arg\min_{\bw_i} \bw_i^H\bR\bw_i \\ && \text{s.t.~} \bw_i^H\ba_i=\sqrt{K} \notag \end{eqnarray}$ (B.19)which is recognized as the classical minimum variance distortionless response (MVDR) beamforming problem (Capon 1969). Thus, the solution is given in closed form as $w_{MVDR,i} = \frac{\sqrt{K}}{a_{i}^{H} R^{-1} a_{i}} R^{-1} a_{i} .$ $\appendix \setcounter{section}{2} \begin{equation} \label{eq:wmvdr} \bw_{\text{MVDR},i} = \frac{\sqrt{K}}{\ba_{i}^H\bR^{-1}\ba_{i}} \bR^{-1}\ba_{i}. \end{equation}$ (B.20)To demonstrate that this image is still an upper bound we show that $α : = w_{i}^{H} (I_{K} ◦ A^{i}) (I_{K} ◦ A^{i})^{H} w_{i} \geq 1 .$ $\appendix \setcounter{section}{2} \begin{equation} \alpha := \bw_i^H(\bI_K \circ \bA^i)(\bI_K \circ \bA^i)^H\bw_i \geq 1 \,. \end{equation}$ (B.21)Indeed, inserting (B.20)into this inequality gives $\begin{matrix} K \frac{a_{i}^{H} R^{-1} (I_{K} ◦ A^{i}) (I_{K} ◦ A^{i})^{H} R^{-1} a_{i}}{(a_{i}^{H} R^{-1} a_{i})^{2}} = K \frac{\sum_{k} (a_{k,i}^{H} R_{k}^{-1} a_{k,i})^{2}}{{(\sum_{k} a_{k,i}^{H} R_{k}^{-1} a_{k,i})}^{2}} \\ = K \frac{h^{T} h}{h^{T} 1_{K} 1_{K}^{T} h} \geq K \frac{1}{λ_{\max} (1_{K} 1_{K}^{T})} = 1, \end{matrix}$ $\appendix \setcounter{section}{2} \begin{equation} \begin{array}{l} \label{eq:MVDRUB2} K\frac{\ba_i^H\bR^{-1}(\bI_K \circ \bA^i)(\bI_K \circ \bA^i)^H\bR^{-1}\ba_i}{(\ba_i^H\bR^{-1}\ba_i)^2} = K \frac{\sum_k (\ba_{k,i}^H\bR_k^{-1}\ba_{k,i})^2}{\left(\sum_k \ba_{k,i}^H\bR_k^{-1}\ba_{k,i}\right)^2}\\[3mm] ~~~~~~~~~~ = K\frac{\bh^T\bh}{\bh^T\ones_K\ones_K^T\bh} \geq K\frac{1}{\lambda_{\text{max}}(\ones_K\ones_K^T)}=1, \end{array} \end{equation}$ (B.22)where h = (I_K°Aⁱ)^HR^-1a_i is a K × 1 vector with entries $h_{k} = a_{k,i}^{H} R_{k}^{-1} a_{k,i}$ $\hbox{$h_k=\ba_{k,i}^H\bR_k^{-1}\ba_{k,i}$}$ and λ_max(·) is the largest eigenvalue of of the argument matrix. Hence, a similar reasoning as in Eq. (B.2) gives $\begin{matrix} σ_{MVDR,i} = α σ_{i} + w_{MVDR,i}^{H} R_{r} w_{MVDR,i} \geq σ_{i} \end{matrix}$ $\appendix \setcounter{section}{2} \begin{eqnarray*} \MVDRDIind{i} = \alpha \sigma_i + \bw_{\text{MVDR},i}^H \bR_r \bw_{\text{MVDR},i} \ge \sigma_i \end{eqnarray*}$ which leads to formula (42).

Note that w_MF,i also satisfies the constraint in Eq. (B.19), i.e. $w_{MF,i}^{H} a_{i} = \sqrt{K}$ $\hbox{$\bw_{\text{MF},i}^H \ba_i= \sqrt{K}$}$ , but does not necessary minimize the output power $w_{i}^{H} R w_{i}$ $\hbox{$\bw_i^H \bR \bw_i$}$ , therefore the MVDR dirty image is smaller than the MF dirty image: σ_MVDR ≤ σ_MF. Thus it is a tighter upper bound. This relation also holds if R is replaced by the sample covariance $\hbox{$\bRh$}$ .

Appendix C: Variance of the dirty image

To find the confidence intervals for the dirty images we need to find estimates for the variance of both matched filter and MVDR dirty images. In our problem the sample covariance matrix is obtained by squaring samples from a Gaussian process. This means that $\hbox{$N\bRh \sim \MCW_p(\bR,N)$}$ where $\hbox{$\MCW_p(\bR,N)$}$ is the Wishart distribution function of order p with expected value equal to R and N degrees of freedom. For any deterministic vector ζ, $N ζ^{H} R̂ ζ ~ ζ^{H} R ζ χ^{2} (N) .$ $\appendix \setcounter{section}{3} \begin{equation} N\bzeta^H\bRh\bzeta \sim \bzeta^H\bR\bzeta~ \chi^2(N). \end{equation}$ (C.1)where χ²(N) is the standard χ² distribution with N degrees of freedom. In radio astronomical applications N is usually very large and we can approximate this χ² distribution with a Gaussian such that $\hbox{$\bzeta^H\bRh\bzeta \sim \MCN(\bzeta^H\bR\bzeta,(\bzeta^H\bR\bzeta)^2/N)$}$ . The variance of the matched filter dirty image is given by $See Formula in PDF$ $\begin{equation*} \Var(\MFDIind{i})=\frac{1}{NK^2}\sum_k(\ba_{k,i}^H\bR\ba_{k,i})^2. \end{equation*}$ Using this result we can find the x% confidence interval which results in an increase of the upper bound such that $σ \leq {σ̂}_{MF} + α \sqrt{Var ({σ̂}_{MF})},$ $\appendix \setcounter{section}{3} \begin{equation} \bsigma \leq \SMFDI + \alpha \sqrt{\Var(\SMFDI)}, \end{equation}$ (C.2)where α is chosen depending on x. Requiring at most a single false detection on the entire image translate into α ≈ 6.

When we estimate the MVDR dirty image from sample covariance matrices we need to be more careful, mainly because the result is biased and we need to correct for that bias. For each pixel of the MVDR dirty image obtained from sample covariance matrices we have $\begin{matrix} σ̂ MVDR,i = Kg (Z) = \frac{K}{\sum_{k} a_{k,i}^{H} {R̂}_{k}^{-1} a_{k,i}} \end{matrix}$ $\appendix \setcounter{section}{3} \begin{eqnarray} \UMVDRDIind{i}=K g(Z)=\frac{K}{\sum_k \ba_{k,i}^H\bRh_{k}^{-1}\ba_{k,i}} \end{eqnarray}$ (C.3)where g(Z) = 1 /Z and $Z = \sum_{k} a_{k,i}^{H} {R̂}_{k}^{-1} a_{k,i}$ $\hbox{$Z=\sum_k \ba_{k,i}^H\bRh_{k}^{-1}\ba_{k,i}$}$ . Using a perturbation model Z = Z₀ + ΔZ and a Taylor approximation we find $\begin{matrix} g (Z) & \approx & \frac{1}{Z_{0}} - \frac{1}{Z_{0}^{2}} Δ Z \\ \approx & \frac{1}{Z_{0}^{2}} (Z_{0} - Δ Z) . \end{matrix}$ $\appendix \setcounter{section}{3} \begin{eqnarray} \label{eq:gz} g(Z)&\approx&\frac{1}{Z_0}-\frac{1}{Z_0^2}\Delta Z\notag \\ &\approx& \frac{1}{Z_0^2}(Z_0-\Delta Z). \end{eqnarray}$ (C.4)Let Z₀ = ℰ { Z } then ℰ { ΔZ } = 0 and ℰ { g(Z) } ≈ 1 /Z₀. We would like this estimate to be unbiased which means that we want $ℰ {g (Z)} \approx \frac{1}{\sum_{k} a_{k,i}^{H} R_{k}^{-1} a_{k,i}}$ $\appendix \setcounter{section}{3} \begin{equation} \MCE\{g(Z)\}\approx\frac{1}{\sum_k \ba_{k,i}^H\bR_{k}^{-1}\ba_{k,i}} \end{equation}$ (C.5)however we have, $\begin{matrix} Z_{0} & = & \sum_{k} a_{k,i} ℰ {{R̂}_{k}^{-1}} a_{k,i} \\ = & \sum_{k} a_{k,i}^{H} \frac{N R_{k}^{-1}}{N - p} a_{k,i} \\ = & \frac{N}{N - P} \sum_{k} a_{k,i}^{H} R_{k}^{-1} a_{k,i} \end{matrix}$ $\appendix \setcounter{section}{3} \begin{eqnarray} Z_0&=&\sum_k \ba_{k,i}\MCE\{\bRh_{k}^{-1}\}\ba_{k,i} \notag \\ &=&\sum_k \ba_{k,i}^H\frac{N\bR_{k}^{-1}}{N-p}\ba_{k,i} \notag \\ &=&\frac{N}{N-P}\sum_k \ba_{k,i}^H\bR_{k}^{-1}\ba_{k,i} \end{eqnarray}$ (C.6)where we have used $ℰ {{R̂}^{-1}} = \frac{N}{N - P} R^{-1}$ $\hbox{$\MCE\{\bRh^{-1}\}=\frac{N}{N-P}\bR^{-1}$}$ (Shaman 1980). So in order to remove this bias we need to scale it by a correction factor $C = \frac{N}{N - P}$ $\appendix \setcounter{section}{3} \begin{equation} \label{eq:C} C=\frac{N}{N-P} \end{equation}$ (C.7)and $σ̂ MVDR,i = CKg (Z) .$ $\appendix \setcounter{section}{3} \begin{equation} \UMVDRDIind{i}=CK g(Z). \end{equation}$ (C.8)Now we need to find an estimate for the variance of the MVDR dirty image. Using Eq. (C.4)we see that the first order approximation of $Var (g (Z)) \approx Var (Z) / Z_{0}^{4}$ $\hbox{$\Var(g(Z))\approx\Var(Z)/Z_0^4$}$ . We find Var(Z) using the independence of each snapshot so we can write $Var (Z) = \sum_{k} Var (a_{k,i}^{H} {R̂}_{k}^{-1} a_{k,i}) .$ $\appendix \setcounter{section}{3} \begin{equation} \Var(Z)=\sum_k \Var(\ba_{k,i}^H\bRh_{k}^{-1}\ba_{k,i}). \end{equation}$ (C.9)In order to find $Var (a_{k,i}^{H} {R̂}_{k}^{-1} a_{k,i})$ $\hbox{$\Var(\ba_{k,i}^H\bRh_{k}^{-1}\ba_{k,i})$}$ we need to use some properties of the complex inverse Wishart distribution. A matrix has

complex inverse Wishart distribution if it’s inverse has a complex Wishart distribution (Shaman 1980). Let us define an invertible matrix B as $B = [\begin{matrix} a_{k,i} & B_{1} \end{matrix}],$ $\appendix \setcounter{section}{3} \begin{equation} \bB=\begin{bmatrix}\ba_{k,i} & \bB_1 \end{bmatrix}, \end{equation}$ (C.10)then $\hbox{$\bX=(\bB\bRh^{-1}\bB^H)/N$}$ has an inverse Wishart distribution because $\hbox{$\bX^{-1}=N(\bB^{-H}\bRh\bB^{-1})$}$ has a Wishart distribution. In this case $X_{11} = (a_{k,i}^{H} {R̂}^{-1} a_{k,i}) / N$ $\hbox{$\bX_{11}=(\ba_{k,i}^H\bRh^{-1}\ba_{k,i})/N$}$ also has an inverse Wishart distribution with less degrees of freedom. The covariance of an inverse Wishart matrix is derived in (Shaman 1980), however because we are dealing only with one element, this results simplifies to $Var (N X_{11}) = \frac{N^{2}}{(N - P)^{2} (N - P - 1)} (a_{k,i}^{H} R^{-1} a_{k,i})^{2} .$ $\appendix \setcounter{section}{3} \begin{equation} \Var(N\bX_{11})=\frac{N^2}{(N-P)^2(N-P-1)}(\ba_{k,i}^H\bR^{-1}\ba_{k,i})^2. \end{equation}$ (C.11)The variance of the unbiased MVDR dirty image is thus given by $\begin{matrix} Var (σ̂ MVDR,i) & = & Var (CKg (Z)) \\ \approx & \frac{K^{2}}{(N - P - 1)} \frac{\sum_{k} (a_{k,i}^{H} R_{k}^{-1} a_{k,i})^{2}}{{(\sum_{k} a_{k,i} R_{k}^{-1} a_{k,i})}^{4}} \cdot \end{matrix}$ $\appendix \setcounter{section}{3} \begin{eqnarray} \Var(\UMVDRDIind{i})&=&\Var(C K g(Z))\notag \\[3mm] &\approx& \frac{K^2}{(N-P-1)}\frac{\sum_k (\ba_{k,i}^H\bR_{k}^{-1}\ba_{k,i})^2}{\left(\sum_k \ba_{k,i}\bR_{k}^{-1}\ba_{k,i}\right)^4}\cdot\notag \end{eqnarray}$ Now that we have the variance we can use the same method that we used for MF dirty image to find α and $σ \leq {σ̂}_{MVDR} + α \sqrt{Var ({σ̂}_{MVDR})} .$ $\appendix \setcounter{section}{3} \begin{equation} \bsigma \leq \UMVDRDI + \alpha \sqrt{\Var(\UMVDRDI)}. \end{equation}$ (C.12)

All Tables

Table 1

Constrained LS imaging using active sets.

In the text

Table 2

Simulated sources from 3C catalog.

In the text

All Figures

	Fig. 1 Contoured true source on dB scale.
In the text

	Fig. 2 Contoured dirty images on dB scale. a) MF dirty image. b) MVDR dirty image.
In the text

Fig. 3

Extended source simulations. Units for the first and third columns are in dB. Linear scale is used for residual images (second column). a) CLEAN Image. b) CLEAN Residual image. c) CLEAN cross-section. d) Solution of CLS + MF. e) Residual image for CLS + MF. f) Cross-section for CLS + MF. g) Solution of CLS + MVDR. h) Residual image for CLS + MVDR. i) Cross-section for CLS image + MVDR. j) CPWLS image. k) CPWLS residual image. l) MVDR dirty image and CPWLS cross-section.

In the text

	Fig. 4 Point source simulations. a) Position estimates. b) Full sky MF dirty image in dB (with respect to 1 Jy).
In the text

	Fig. 5 Reconstructed images in dB (with respect to 1 Jy) scale and residual images on linear scale. a) Reconstructed image with grid correction plus residual image. b) Residual image using active set and grid correction. c) Reconstructed image with CLEAN plus residual image. d) Residual image using CLEAN.
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] Barrett, R., Berry, M., Chan, T. F., et al. 1994, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd edn. (Philadelphia, PA: SIAM) [Google Scholar]

[2] Ben-David, C., & Leshem, A. 2008, IEEE J. Selected Topics in Signal Processing, 2, 670 [NASA ADS] [CrossRef] [Google Scholar]

[3] Bhatnager, S., & Cornwell, T. 2004, A&A, 426, 747 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[4] Boyd, S., & Vandenberghe, L. 2004, Convex Optimization (Cambridge University Press) [Google Scholar]

[5] Briggs, D. S. 1995, Ph.D. Thesis, The New Mexico Institute of Mining and Technology, Socorro, New Mexico [Google Scholar]

[6] Capon, J. 1969, Proc. IEEE, 57, 1408 [Google Scholar]

[7] Carrillo, R., McEwen, J., & Wiaux, Y. 2012, MNRAS, 426, 1223 [NASA ADS] [CrossRef] [MathSciNet] [Google Scholar]

[8] Carrillo, R. E., McEwen, J. D., & Wiaux, Y. 2014, MNRAS, 439, 3591 [NASA ADS] [CrossRef] [MathSciNet] [Google Scholar]

[9] Chen, C.-Y., & Vaidyanathan, P. 2007, IEEE Trans. Signal Processing, 55, 4139 [NASA ADS] [CrossRef] [Google Scholar]

[10] Choi, S.-C. T. 2006, Ph.D. Thesis, Stanford University [Google Scholar]

[11] Cornwell, T. 2008, IEEE J. Selected Topics in Signal Processing, 2, 793 [Google Scholar]

[12] Demmel, J. W. 1997, Applied Numerical Linear Algebra (SIAM) [Google Scholar]

[13] Fong, D. C.-l. 2011, Ph.D. Thesis, Stanford University [Google Scholar]

[14] Frieden, B. 1972, J. Opt. Soc. Am., 62, 511 [Google Scholar]

[15] Gill, P. E., Murray, W., & Wright, M. H. 1981, Practical optimization (London: Academic Press Inc., Harcourt Brace Jovanovich Publishers) [Google Scholar]

[16] Golub, G., & Kahan, W. 1965, J. Soc. Industr. Appl. Math., Ser. B: Numerical Analysis, 2, 205 [Google Scholar]

[17] Gu, Y., & Leshem, A. 2012, IEEE Trans. Signal Processing, 60, 3881 [NASA ADS] [CrossRef] [Google Scholar]

[18] Gull, S., & Daniell, G. 1978, Nature, 272, 686 [NASA ADS] [CrossRef] [Google Scholar]

[19] Högbom, J. A. 1974, A&AS, 15, 417 [NASA ADS] [Google Scholar]

[20] Ipsen, I. C. F., & Meyer, C. D. 1998, Am. Math. Monthly, 105, 889 [CrossRef] [MathSciNet] [Google Scholar]

[21] Junklewitz, H., Bell, M. R., Selig, M., & Enßlin, T. 2016, A&A, 586, A76 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[22] Leshem, A., & van der Veen, A. 2000, IEEE Trans. Information Theory, Special issue on information theoretic imaging, 46, 1730 [CrossRef] [Google Scholar]

[23] Leshem, A., van der Veen, A., & Boonstra, A. J. 2000, ApJS, 131, 355 [NASA ADS] [CrossRef] [Google Scholar]

[24] Levanda, R., & Leshem, A. 2008, Electrical and Electronics Engineers in Israel, IEEEI 2008. IEEE 25th Convention, 716 [Google Scholar]

[25] Levanda, R., & Leshem, A. 2013, IEEE Trans. Signal Processing, 61, 5063 [NASA ADS] [CrossRef] [Google Scholar]

[26] Lochner, M., Natarajan, I., Zwart, J. T., et al. 2015, MNRAS, 450, 1308 [NASA ADS] [CrossRef] [Google Scholar]

[27] Ottersten, B., Stoica, P., & Roy, R. 1998, Digital Signal Processing, 8, 185 [CrossRef] [Google Scholar]

[28] Paige, C. C., & Saunders, M. A. 1982, ACM Trans. Math. Softw., 8, 43 [CrossRef] [MathSciNet] [Google Scholar]

[29] Rau, U., Bhatnagar, S., Voronkov, M., & Cornwell, T. 2009, Proc. IEEE, 97, 1472 [NASA ADS] [CrossRef] [Google Scholar]

[30] Reid, R. 2006, MNRAS, 367, 1766 [NASA ADS] [CrossRef] [Google Scholar]

[31] Shaman, P. 1980, J. Multivariate Analysis, 10, 51 [CrossRef] [Google Scholar]

[32] van Haarlem, M. P., Wise, M. W., Gunst, A. W., et al. 2013, A&A, 556, A2 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[33] Wiaux, Y., Jacques, L., Puy, G., Scaife, A., & Vandergheynst, P. 2009, MNRAS, 395, 1733 [NASA ADS] [CrossRef] [Google Scholar]

[34] Wijnholds, S., & van der Veen, A.-J. 2008, IEEE J. Selected Topics in Signal Processing, 2, 613 [NASA ADS] [CrossRef] [Google Scholar]

Radio astronomical image formation using constrained least squares and Krylov subspaces⋆

1. Introduction

2. Data model

3. The imaging problem

3.1. Gridded imaging model

3.2. Unconstrained least squares image

3.3. Preconditioned weighted least squares image

3.4. Bounds on the image

3.5. Estimation of the upper bound from noisy data

3.6. Constrained least squares imaging

4. Constrained optimization using an active set method

4.1. Characterization of the optimum

4.2. Gradients

4.3. Active set methods

4.4. Strong off-grid sources

4.5. Boxed imaging

5. Implementation using Krylov subspace-based methods

5.1. Lanczos algorithm and LSQR

5.2. Implementation

6. Simulations

6.1. Extended sources

6.2. Full sky with 3C sources

7. Conclusions

References

Appendix A: Relation between WLS, natural, and robust weighting

Appendix B: Using beamformers to find upper bounds

Appendix C: Variance of the dirty image

All Tables

All Figures

Radio astronomical image formation using constrained least squares and Krylov subspaces^⋆