Free Access
Volume 566, June 2014
Article Number A127
Number of page(s) 11
Section Astronomical instrumentation
Published online 24 June 2014

© ESO, 2014

1. Introduction

The new generation of interferometers is characterized by very wide fields of view, large fractional bandwidth, high sensitivity, and high resolution. At low frequency, such as with the Low-Frequency Array (LOFAR), or the Murchison Widefield Array (MWA), the cross-correlation between voltages from pairs of antenna (the visibilities) are affected by severe complex baseline-time-frequency direction dependent effects (DDE) such as the complex phased array beams, the ionosphere and its associated Faraday rotation, the station’s clock drifts, and the sky structure. At higher frequency, the interferometers using dishes are less affected by ionosphere, whereas troposphere, pointing errors, and dish deformation play an important role.

1.1. Direction dependent effects and calibration problems

A wide variety of solvers has been developed to tackle the direction-dependent calibration problems of radio interferometry. In this paper, for the clarity of our discourse, we classify them in two categories. The first and most widely used family of algorithms (later referred as the Jones-based solvers) aim at estimating the apparent net product of various effects discussed above. The output solution is a Jones matrix per time-frequency bin per antenna per direction (see Yatawatta et al. 2008; Noordam & Smirnov 2010, and references therein). Sometimes the solutions are used to derive physical parameters (e.g., Intema et al. 2009; Yatawatta 2013 in the cases of ionosphere and beam shape, respectively). The second family of solvers estimate directly from the data the physical terms mentioned above that give rise to a set of visibility (later referred as the continuous or physics-based solvers). These solvers are used for direction-independent calibration in the context of fringe-fitting for VLBI (Cotton 1995; Walker 1999, and references therein) to constrain the clock states and drifts (also referred as delays and fringe rates). Bhatnagar et al. (2004) and Smirnov (2011b) have presented solutions to the direction-dependent calibration problem of pointing error. It is important to note that deconvolution algorithms are also physics-based solvers estimating the sky brightness, potentially taking the DDE calibration solution into account (Bhatnagar et al. 2008, 2013; Tasse et al. 2013). The latest imaging solvers can also estimate spectral energy distribution parameters (Rau & Cornwell 2011; Junklewitz et al. 2014). Most of these imaging algorithms are now well understood in the framework of compressed sensing theory (see McEwen & Wiaux 2011, for a review). Their goals, constrains, and methods are, however, very different from purely calibration-related algorithms, and we will not discuss them further in this paper.

Jones-based and physics-based solvers have both advantages and disadvantages. The main problem with using physics-based solvers is that the system needs to be modeled accurately, while analytically complex physics can intervene before measuring a given visibility. Jones-based algorithms solving for the effective Jones matrices are not subject to this problem, because no assumptions have to be made about the physics underlying the building of a visibility (apart from the sky model that is assumed).

However, one important disadvantage of Jones-based solvers over physics-based solvers for DDE calibration is that they lack robustness when solving for a large number of parameters and can be subject to ill-conditioning. This can have dramatic effects in the image plane, such as source suppression. In the most extreme case these algorithms can artificially decrease noise in the calibrated residual maps by over-fitting the data. This easily drives artificially high dynamic range estimates. Hundreds of parameters (i.e., of directions) per antenna, polarization, can correspond to tens of thousands of free parameters per time and frequency bin. The measurement operator being highly nonlinear, for each given data set and process space, it is often hard to know whether ill-conditioning is an issue. Simulations can give an answer in individual cases, and a minimum time and frequency interval for the solution estimate can be estimated. However, this time interval can be large, and the true underlying process can vary significantly within that interval.

1.2. Tracking versus solving

Another important consideration is the statistical method used by the algorithm to estimate the parameters. Most existing Jones-based and physics-based solvers minimize a chi-square. This is done by using the Gauss-Newton, gradient descent, or Levenberg-Marquardt algorithm. More recently, in order to solve for larger systems in the context of calibration of DDEs, this has been extended using expectation maximization, and SAGE algorithms (Yatawatta et al. 2008; Kazemi et al. 2011). One well-known problem is that conventional least-squares minimization and maximum likelihood solvers lack robustness in the presence of low signal-to-noise ratios (S/N; the estimated minimum chi-square jumps in between adjacent data chunks, while this behaviour is non-physical). In most cases, a filtering of the estimated solutions (Box car, median, etc) or an interpolation might be necessary (see e.g., Cotton 1995). In practice, situations of low S/N combined with the need to perform DDE calibration are not rare (in the case of LOFAR, the ionosphere varies on scale of 30 s while not much flux is available in the field).

In this paper we present a new calibration algorithm whose structure is that of a Kalman filter. Our main aim is to address the stability and ill-conditioning issues discussed above by using a physics-based approach, which (i) decreases the number of free parameters in the model and (ii) increases the amount of usable data; the algorithm structure allows us to (iii) use additional physical priors (time/frequency process smoothness for example), while (iv) keeping the algorithm computationally cheap. We note that we do not do any quantitative comparison between least-squares solvers and our approach. Instead, we focus on describing an implementation of a nonlinear Kalman filter for radio interferometry and we study its robustness.

While nonlinear least-squares solvers are iterative, our algorithm uses a nonlinear Kalman filter, which is a recursive sequence (see Sect. 2). Kalman filters are referred in the literature as minimum mean square error estimators, and instead of fitting the data at best (least-squares solver), they minimize the error on the estimate given information on previous states. In other words, they can be viewed as tracking the process rather than solving for it. An estimated process state vector1 built from previous recursions, together with a covariance matrix prior are specified. This way, the filter allows us to constrain the expected location of the true process state along the recursion. Even when the location of the minimum chi-square jumps between data chunks, the posterior estimate stays compatible with the prior estimate and with the data (under the measurement and evolutionary models). As more data goes though the filter, the process state and its covariance are updated (and the trace of the covariance matrix decreases in general).

An interesting aspect of our approach is that we use alternative data domains (Sect. 3), which amounts to conducing the calibration in the image domain. We show that this approach provides higher robustness. We discuss the details of the implementation and algorithmic costs in Sect. 4. An important step for the feasibility of the approach is to re-factor the filtering steps using the Woodbury matrix identity (Sect. 4.1). We demonstrate the efficiency of our algorithms in Sect. 5, based on simulations of the clock/ionosphere (Sect. 5.1) and pointing error (Sect. 5.2) problems. An extended discussion on the differences between our algorithm and other existing techniques is given in Sect. 6. An overview of the mathematical notation is given in Table 1.

Table 1

Overview of the mathematical notations used throughout this paper.

1.3. Radio interferometry measurement equation

To model the complex DDE (station beams, ionosphere, Faraday rotation, etc.), we make extensive use of the radio interferometry measurement equation (RIME) formalism, which provides a model of a generic interferometer (for extensive discussions on the validity and limitations of the measurement equation see Hamaker et al. 1996; Smirnov 2011a). Each of the physical phenomena that transforms or converts the electric field before the correlation is modeled by linear transformations (2×2 matrices). If is a sky direction, and MH stands for the Hermitian transpose operator of matrix M, then the 2 × 2 correlation matrix V(pq) between antennas p and q at time t and frequency ν can be written as where x is a vector containing the parameters of the given system (ionosphere state, electronics, clocks, etc.), Dps is the product of direction-dependent Jones matrices corresponding to antenna p (e.g., beam, ionosphere phase screen, and Faraday rotation), Gptν is the product of direction-independent Jones matrices for antenna p (like electronic gain and clock errors), and Xs is referred as the sky term2 in the direction s, and is the true underlying source coherency matrix [[,], [, ]]. The scalar term describes the effect of the array geometry and correlator on the observed phase shift of a coherent plane wave between antennas p and q. We have with [u,v,w] T as the baseline vector between antennas p and q in wavelength units and φ(u,v,w,s) = ul + vm + w(n − 1).

Although the detailed structure of Eq. (1) is of fundamental importance, throughout this paper it is reduced to a nonlinear operator h:RN → RM, where N is the number of free parameters and M is the number of data points. The operator h therefore maps a vector xk parameterizing the Jones matrices and/or sky terms appearing on the right-hand side of Eq. (1) (the states of the beam, the ionosphere, the clocks, the sky, etc.), and maps it to a vector of visibilities yk such that yk = h(xk). In the following, yk is the set of visibilities at the time step k for all frequencies, all baselines, and all polarizations. The choice of mapping the state space to the measurement space for all frequencies for a limited amount of time (time step k) is motivated by the fact that regularity is much stronger on the frequency axis. For example, the Jones matrices associated with the ionosphere or clocks, although they greatly vary in time, have a very stable frequency behaviour at any given time.

2. Kalman filter for nonlinear systems

Nonlinear least-squares algorithms only consider the χ2 value for the given data chunk. As mentioned above, this is a problem in (i) low S/R and (ii) ill-conditioned regimes. For example for (i), if one considers a noisy χ2 valley, the least-squares solution will jump between each time-frequency bin because of noise – while this behaviour is obviously non-physical. Effect (ii) will bring instability as a result of the χ2 valley having multiple local minima, or a flat minima. As explained in Sect. 1.2, Kalman filters provide a number of advantages allowing us in principle to significantly improve robustness, and to minimize the impact of ill-conditioning.

In the following, we assume an evolution operator xk = f(xk − 1) + wk describing the evolution of the physical quantities underlying the RIME, and a measurements operator yk = h(xk) + vk generating the set of measurement for a given process vector xk (examples for both f and h are given in Sect. 5). The random variables vk and wk model the noise and are assumed to follow normal distributions and , where Rk and Qk are the data and process covariance matrix respectively. In the following, we name the predicted-process and data domains the codomains of f and h, respectively.

2.1. Kalman filter

The traditional Kalman filter (Kalman 1960) assumes (a) f and h to be linear operators (written F and H below for f and h respectively). If the process vector xk − 1 for the time-step k − 1 has estimated mean and Pk − 1 | k − 1 estimated covariance from the data at step k − 1, assuming (b) Gaussian noise in the process and data domains, xk − 1 | k − 1 is distributed following .

Under the conditions (a) and (b), operators F and H yield Gaussian distributions in the predicted-process and data domains respectively. Given and Pk − 1 | k − 1 the Kalman filter (i) predicts and Pk | k − 1 through F; and (ii) updates those to and Pk | k through H given the data yk.

It can be shown that the mean and covariance of xk − 1 can be evolved through F giving and Pk | k − 1 as follows:

Taking into account the data vector yk at step k, the updated mean and covariance and Pk | k of xk are estimated through the calculation of the Kalman gain Kk, and are given by

The estimate is optimal in the sense that Pk | k is minimized. This approach is extremely powerful for linear-systems, but the radio interferometry measurement equation is highly nonlinear (the operator h in Eq. (1)). This makes the traditional Kalman filters unpractical for the radio interferometry calibration problem.

2.2. Unscented Kalman filter

The Kalman filters fail at properly estimating the statistics of x essentially because f and h are nonlinear, and lead to non-Gaussian distributions in the predicted-process and data domains described above.

The unscented Kalman filter (UKF, Julier & Uhlmann 1997; Wan & van der Merwe 2000) aims at properly estimating the mean and covariance in both these domains by directly applying the nonlinear operators f and h to deform the initial Gaussian distribution of x. In practice, instead of selecting a large number of process vectors built at random as is done in Monte Carlo particle filters for example, the unscented transform (UT) scheme selects a much smaller set of 2N + 1 sigma-points in the process domain in a deterministic manner. Each point is characterized by a location in the process space and by a corresponding weight. The set is built in such a way that its mean and covariance match the statistics of the random variable x. The points are propagated through the nonlinear operators f and h in the predicted-process and data domains, respectively, and the corresponding mean and covariance are estimated based on their evolved positions and associated weights. Using Taylor expansions of f and h, it can be shown that the mean and covariance of the evolved random variable are correct up to the third order of the expansion (Julier & Uhlmann 1997). Errors are introduced by higher order terms, but partial knowledge about these can be introduced using proper weighting schemes. It is important to note, however, that even though the mean and covariance can be properly estimated after applying nonlinear operators through the UT, the Kalman filter still assumes Gaussian statistics of all random variables to estimate the statistics of x.

2.2.1. Sigma-points and associated weights

Given a multivariate distribution with covariance P of size N × N, the set of sigma-points are generated in the following way (10)Here N is the number of parameters, P is the process covariance matrix, and [M] i is the ith column of the matrix M. The real-valued scalar w0 controls the distance of the sigma-points to the origin. As N increases, the radius of the sphere that contains the sigma-points increases as well. As shown in Julier & Uhlmann (1997), for the errors to be minimized on the mean and covariance estimate, the sigma-points should stay in the neighbourhood of the origin. The sigma-point locations are scaled by a parameter α giving: (11)When estimating the mean of the evolved distribution, the weights associated with the sigma-points are (12)where μ is a normalizing constant appearing while computing the Taylor expansion of the nonlinear operator. When computing the covariance of the sigma-points, the weights are given by (13)where β is an extra parameter that can be used to incorporate additional knowledge on the fourth-order term of the Taylor expansion of the covariance.

2.2.2. Filtering steps

A set of sigma-points is generated assuming , following the scheme outlined above (Sect. 2.2.1). The sigma-points are then propagated through the nonlinear evolution operator f (14)and the mean and covariance are estimated as follows: If μ = α2 the expressions of and xk | k − 1 agree up to the third order of the Taylor expansion. We note that Eqs. (15) and (16) are the UKF versions of Eqs. (3) and (4). Once and Pk | k − 1 are estimated, we assume , and a new set of sigma-points is generated following the scheme outlined in Eqs. (10) and (11). The new set of sigma-points are propagated onto the measurement domain through the nonlinear observation function h (17)where is the measurement vector corresponding to each process vector . As in Eqs. (15) and (16) the measurement mean ŷk, measurement covariance Pykyk, and state-measurement cross-covariance Pxkyk are then estimated:

We note again that Eqs. (18) and (19) mirror the behaviour of Eq. (5), while the term Pxkyk (Eq. (20)) is similar to in Eq. (6); ŷk has size M, Pykyk has size M × M and Pxkyk has size N × M, where N and M are the dimensions of the process and measurement spaces respectively. The Kalman gain is then (21)and the updated estimates and Pk | k can be computed:

thumbnail Fig. 1

This figure shows how the sigma-point statistics compares to the true statistics. Specifically, we consider the case of a frequency-dependent phase gradient (clock offset, ionospheric effect, or source position offset). The true covariance corresponds to the gray area, while the sigma points appear as dotted lines, together with their associated estimated covariance (thick dashed line). The noise in the data is represented by the dash-dotted line. Qualitatively, the goodness of the description depends on the type of the data representations operator.

3. Data representation

In this section we describe how we can modify the measurement operator together with the raw data to improve robustness. Using the operator discussed in Sect. 3.2 in combination with the Kalman filter discussed above, this is equivalent to an image-plane calibration.

3.1. Robustness with large process covariance

As explained in Sects. 2.2 and 4, the unscented transform correctly approximates the evolved covariance up to the third order. When the radius of the sphere containing σ-points in the process domain increase, and depending on the strength of nonlinearities of the evolution and measurement operators f and h, the estimated mean and covariance can be affected by large errors. This is the case when the multivariate ellipsoid is too deformed, and the statistics of the σ-points in the evolved domain no longer capture the true statistics.

Here, we introduce another layer of nonlinear operators that transform the sets of visibilities into another measurement domain. We define various simple representation operators as where is the operator described in Sect. 3.2 transforming a set of visibilities into another set of 1D images (the Fourier transform along the frequency axis, also referred to later as the pseudo-image domain). The operators |.|, φ(.), and Re(.) take the complex norm, the phase, and the real part, respectively; M is the number of visibility data, and Mi the number of pixels in the image domain. Our goal is to obtain a system that has less nonlinearity, so that the σ-points statistics still properly match the true statistics, even when the volume within the multivariate ellipsoid is large in the process domain.

In order to illustrate this idea, we compare the evolved covariance estimated using Eq. (19) to the true evolved mean and covariance as estimated by running Monte Carlo simulations. The system is made of one point source at the phase center, the bandpass goes from 30 to 70 MHz, and our interferometer consists of one baseline. We consider the ellipsoid of the clock-offset parameter (therefore corresponding to a large 10 × 10-9 s estimated error), and inspect how it is reflected in the data domains after applying (the symbol ° is used here for the function composition). For this system, Eq. (1) becomes , where Δt01 is our random variable. Figure 1 shows how the statistics of the σ-points compare to the true statistics when using different data representations . When picks the real part Re or the phase φ of ( and ), the σ-points statistics are obviously wrong.

3.2. One-dimensional image domain for calibration

Although a single transformation separate the uv-plane from the image domain, it seems the latter is sometimes more suited for calibration. Intuitively, in the uv domain clock shifts, source position, and ionospheric disturbance will wrap the phases of the complex-valued visibilities everywhere, and the strong nonlinearities sometimes present in h make the distribution strongly non-Gaussian. On the contrary, in the image domain, the same type of perturbations only affect the data locally, and will move the flux from one pixel to its neighborhood.

We cannot use the 2D image domain, as it is built from the superposition of all baselines, which would lead to an effective loss of information. Instead, similarly to what is done for VLBI delays and fringe rates calibration (Cotton 1995), we build a 1D image per baseline (see Appendix A for details). As shown in Fig. 1, when going into the pseudo-image domain, the power is concentrated in a few pixels. Using Monte Carlo simulations Fig. 1 shows the ability of the unscented transform to properly describe the data statistics after the nonlinear measurement operator and the different representation operators (Sect. 3.1) have been applied to the process domain. We have subtracted the true mean MC from each quantity plotted in this figure. The real part of individual pixels gives a better match, but is still biased. Taking the norm of the image-plane complex pixel seems to behave well in all conditions. This is discussed in detail in Appendix A.

3.3. The augmented measurement state for regularization

As explained above, one of the aims of the work presented in this paper is to address the ill-conditioning issues related to the large inverse problem underlying the use of modern interferometers. This is done by analytically specifying the physics underlying the RIME (using a physics-based approach, see Sect. 1.1), and by using the Kalman filter mechanism, able to constrain the location of the true process state through the transmission of previous estimated state and associated covariance. Yet, in some situations, and particularly when two variables are analytically degenerate (such as the clock shifts and the ionosphere when the fractional bandwidth is small), the robustness of the scheme presented is not strong enough to guarantee the regularity of solutions, and the estimated process can drift to a domain where solutions are non-physical.

In order to take into account external constrains while still properly evolving the process covariance Pk | k, we introduce an augmented measurement model (see e.g., Henar 2011; Hiltunen et al. 2011). Using the idea underlying Tikhonov regularization, if x0 is the expected value of x, and the covariance of x0, our cost function becomes where xC = xTC-1x is the norm of vector x for the metric C, with C the covariance matrix of x (or the Mahalanobis distance). The operator is the augmented version of h and is the block diagonal covariance matrix of the augmented process vector. The parameter γ allows us to control the strength of the Tikhonov regularization, and is such that .

4. Implementation for radio-interferometry

As explained above, radio-interferometry deals with large inverse problems, made of millions or billions of nonlinear equations. This poses a few deep problems including (i) numerical cost and (ii) numerical stability. In this section, we describe our UKF implementation.

4.1. Woodbury matrix identity

The first matter is the size of the matrices involved in the UKF recursion steps presented in Sect. 2.2. Specifically, in the case of LOFAR, we have nbl ~ 1500 baselines and nν ~ 250 frequencies. This gives a number of dimensions M for the measurement space of M ~ 1.5 × 106 per recursion (taking into account the four-polarization visibilities). The predicted measurement covariance matrix Pykyk has size M × M, and in practice becomes impossible to store and invert directly (~8 petabytes of memory). Fortunately we can re-factor Eq. (21) so that we do not have to explicitly calculate each cell of Pykyk. We can see that Eq. (19) can be rewritten as (27)where Sk is a matrix of size M × (2N + 1), [Sk] i is the ith column of Sk, and W is a diagonal matrix of size (2N + 1) × (2N + 1) containing the weights on its diagonal. Using the Woodbury matrix identity3 (Hager 1989), we can express the Kalman gain Kk as (28)This relation (Eq. (28)) is quite remarkable, as it allows us to apply the Kalman gain without explicitly calculating it, and without estimating either Pykyk or its inverse. Instead, the inverse W-1 of the diagonal weight matrix of size (2N + 1) × (2N + 1), and the inverse of the data covariance matrix of size M × M have to be estimated. Even though Rk has large dimensions, if the noise is uncorrelated only the diagonal has to be stored and the inverse can be computed element-by-element. Similarly, the inner product of matrices with Rk are computationally cheap. At each recursion k we have to explicitly estimate the σ-points evolved through the measurement operator h and contained in Sk.

4.2. Adaptive step

While Pk | k characterizes the posterior process covariance, the matrix Q (Eqs. (4) and (16)) characterizes the intrinsic process covariance through time. It can, for example, describe the natural time-variability of the ionosphere, the speed of the clock drift, or the beam stability. In addition, in a strong nonlinear regime, it is well known that the Kalman filters can underestimate Pk | k, and thereby drive biases in the estimate of xk. This would typically happen when is too far from x, or when the process covariance is changing (for example a changing and increasing ionospheric disturbance for a given time-period). Although the Kalman filter does not produce an update Qk of Q, based on the residual data we can externally update it and write Qk = κQ. The scaling factor κ is useful for estimating whether the model properly fits the data at any time step k. Following Ding et al. (2007), we can write where tr{A} is the operator computing the trace of a matrix A. The weights are designed to take into account past residual values, and τQk is a time-type constant. Here, estimating κ is computationally cheap as tr only accesses the diagonal of the input matrix.

thumbnail Fig. 2

Snapshot residual image estimated at different recursion times (the color scale is identical in each panel). As recursion time evolves, more data cross the Kalman filter, the ionospheric and clock parameters estimates become more accurate, and the residual noise level decreases (see also Fig. 4).

4.3. Computational cost

In this section, we discuss the computational cost of the proposed algorithm. Our concern is to show that the approach is feasible, and we do not intend to show that it is faster than other existing approaches. However, we discuss the issues of the scaling relations and parallelizability of various parts of the algorithm. We argue that using the refactorization described in Sect. 4.1, our algorithm should be compatible with existing hardware, even for the data sets produced by the most modern radiotelescopes.

As explained above, we adopt a physics-based approach to reduce the number of degrees of freedom by orders of magnitudes while using more data at a time (see Sect. 1.1 for a discussion on Jones-based versus physics-based approach). The Kalman filter is fundamentally recursive (i.e., it has only one iteration), while tens to hundreds of iterations are needed to reach local convergence with Levenberg-Marquardt for example. This means that the equivalent gain by the proposed approach on the model estimation side is the number of iteration. This gain might be balanced in some cases by the larger data chunks processed at a time by the Kalman filter itself. We give here the scaling relations for our implementation of the filter scheme.

The predict step (controlled by operator f in Sect. 2.2) always represents a relatively small number of steps, as it scales with the number of parameters N in the process vector. The update step, however, is the costly part of the computation (Sect. 2.2). It consists of (i) estimating the data corresponding to different points in the process domain (applying the operator h, see Eq. (1)) and (ii) computing the updated estimate of the process vector and associated covariance. Step (i) is common to all calibration algorithms, and in most cases this is the expensive part of the calculation as we map N parameters to M data points, M having relatively high values of ~104 − 106. Within our framework, most of the computation is spent in the estimate of Sk of size M × (2N + 1) (Eq. (17)), which, compared to the Jacobian equivalent that would have size M × N, represents a cost of a factor of ~2 in computational time. It is worth noting that this step is heavily parallelizable. The operations in (ii) to apply the Kalman gain (Eq. (28)) are negligible in terms of computing time for the example described in Sect. 5.1. From the scaling associated with the use of the Woodbury matrix identity, following the work presented in Mandel (2006), we estimate this operation should scale as .

We believe the algorithm should be practical with large data sets. For the few test cases we have been working on (with a 4-core CPU) with ~20 to ~100 parameters, and moderate to large data set (such as the LOFAR HBA data set containing ~1500 baselines, 30 sub-bands, and a corresponding M ~ 1.8 × 105 points per recursion, see Sect. 5.1.2), the algorithm was always sveral times faster than real time. On the LOFAR CEP1 cluster, a distributed software would work with up to M ~ 106 points per recursion and N ~ 100. Preliminary tests showed that the Kalman filter’s most consuming steps appearing in Eq. (28) are computed within a few seconds (computing inner products with numpy/ATLAS, on an 8-core CPU).

5. Simulations

In this section, we present simulations for (i) the pointing error calibration problem (also addressed using physics-based algorithms in Bhatnagar et al. 2004; Smirnov 2011b) and (ii) the clock/ionosphere problem.

5.1. Clock drifts and ionosphere

LOFAR raw data sets are characterized by a few dominating direction-independent and DDEs, including clocks and the ionosphere. While direction dependent calibration is known to be difficult, clock errors and ionosphere effects combined with a limited – even though large – bandwidth make the problem partially ill-conditioned.

5.1.1. Evolution and measurement operators

The error due to the clock offset of antenna p produces a linearly frequency-dependent phase . The time delay introduced by the ionosphere is frequency dependent , where Tp,d is the total electron content (TEC), given in TEC-units (TECU), and seen by station p in direction d. This gives a phase , with k = 8.44 × 109 m3 s-2.

The measurement operator h (Eq. (1)) is built from the direction independent Gptν and direction dependent terms as where I is the 2 × 2 unity matrix and δtp is a simple linear operator unpacking the clock value of antenna p from the process vector x. For this test, we choose to model the ionosphere using a Legendre polynomial function (see e.g., Yavuz 2007). Assuming a single ionospheric screen at a height of 100 km, the nonlinear operator extracts the Legendre coefficients, and returns the TEC value seen by antenna p in direction d. For this simulation, we are using the representation presented in Sect. 3.

thumbnail Fig. 3

Estimated clock errors (full line) and posterior covariance (dotted line) as a function of time for different LOFAR stations in the simulation presented in Sect. 5.1. The dashed line shows the true underlying clock offset. To improve convergence speed, before t = 6 min, the evolutionary model is stochastic.

The operator f that describes the dynamics of the system typically contains a lot of physics. Clock offsets drift linearly with time, while the ionosphere has a non-trivial behaviour (defined in Sect. 5.1.2). We configure the filter to consecutively use two types of evolution operator f. The first is used for the first ~6 min, and f is the identity function, which corresponds to a stochastic evolution. This appears useful when the initial process estimate starts far from the true process state. This way, the filter’s state estimate gets closer to the true state without assuming any physical evolutionary dynamics. The convergence speed and accuracy are then both controlled by the covariance matrices Pk and Qk described above. We set the second evolutionary operator f to be an extrapolating operator. It computes an estimated process vector value from the past estimates. At time step k, for the ith component of x, solutions are estimated as follows: where is the operator computing the polynomial interpolation, is the degree of the polynomial used for the interpolation, wi are the weights associated with each point xkm at tkm, and τi gives a timescale beyond which past information is tapered.

5.1.2. Simulation for LOFAR

An important consideration is that at any given time our algorithm needs to access all frequencies simultaneously. With 250 sub-bands (16-bit mode), 1 channel per sub-band, 1500 baselines, 4 polarization, this gives a number of visibilities per recursion of 1.5 × 106. The data is currently distributed and stored per sub-band, so our software needs to deal with a number of technical problems for a realistic full size simulation. For this simulation, we scale down the problem by a factor ~8 in terms of number of frequency points per recursion. Assuming NVSS source counts (Condon et al. 1998), a spectral index of −0.8, and a field of view of 8 degrees in diameter, we estimate a total of ~20 Jy of signal per pointing at ~150 MHz. Inspecting the cross polarization visibilities of a LOFAR calibrated data set with Δν = 0.2 MHz and Δt = 6 s gives an estimated noise of ~2 Jy per visibility bin. We work on 30 sub-bands only, with frequencies linearly distributed between 100 MHz and 150 MHz, and scale the signal to match S/N ~ 10 per visibility per sub-band. We distribute the corresponding flux density on a 3 × 3 rectangular grid of sources with a step of 3 degrees in RA and 3 degrees in Dec.

For the dynamics of the underlying physical effects, we apply a linearly drifting clock offset taken at random with ns. As mentioned above, we model the ionosphere with a 2D-Legendre polynomial basis function. For this simulation, each Legendre lij coefficient varies following lij = sin(t/τij + dτij), with τij and dτij taken at random. Along the lines discussed above, we set and for the clock and for the ionosphere, respectively. These orders for the extrapolating polynomials are in agreement with the linear clock drift and the nonlinear behaviour of the ionosphere.

5.1.3. Results

The filter and simulation configuration are described in Sects. 5.1.1 and 5.1.2, respectively. At each recursion step, ~180.000 complex visibilities cross the filter and the process state (clock and ionosphere) as well as its covariance are estimated.

First, in order to inspect the match to the data we derive the frequency-baseline-direction dependent Jones matrices at the discrete locations of the sources in our sky-model, and subtract the visibilities corresponding to the given discrete directions. We then grid the residual visibilities and compute the snapshot images (see Fig. 2). Figure 4 shows the minimum and maximum residual as well as the standard deviation as a function of time. Very quickly the visibilities are correctly matched down to the thermal noise.

thumbnail Fig. 4

Top panel: maximum and minimum residual values in the snapshot image as a function of recursion time (full line). Bottom panel: standard deviation in the residual snapshot maps. The expected thermal noise is shown in the bottom figure as the dotted line.

thumbnail Fig. 5

Ionosphere state described using a Legendre polynomial basis function. The estimated and true state of the ionosphere (left and right panels) are shown at the beginning and at the end of the filter’s recursion. The open circles show the location of the pierce points in the ionosphere. The spatial coordinates are given in kilometers from the array center projected on the ionosphere screen.

In Fig. 3 we show the clock offsets estimates as a function of time compared to the true clock errors. The clock offset estimates seem to converge asymptotically to the true underlying states. The ionospheric parameter estimates are more subject to ill-conditioning, as some parts of the TEC-screen are not pierced by any projected station. However, plotting the TEC-screen corresponding to the individual Legendre coefficients gives a good qualitative match to the true TEC values (Fig. 5).

5.2. Pointing errors

One of the dominating calibration errors for interferometers using dishes are the individual antenna pointing errors. They start to have a significant effect even at moderate dynamic range, and can be severe for non-symmetric primary beams with azimuthal dish mounts. Bhatnagar et al. (2004) and Smirnov (2011b) have presented a physics-based calibration scheme to specifically solve for pointing errors, using a least-squares minimization technique combined with a beam model.

Here, we simulate a Westerbork Synthesis Radio Telescope (WSRT) data set. As in Sect. 5.1, we first define a measurement equation (operator h). We only consider the direction dependent primary beam effect, using the WSRT cos3 beam model

where δlpt and δmpt are the operators unpacking the pointing error values δl and δm for antenna p at time t. The f evolution operator is the same as in Sect. 5.1.1, with τi = 5 min (Eq. (33)). We simulate a data-set containing 64 channels centered at ~1.3 GHz and channel width δν = 0.3 MHz. The sky model has 20 sources with a total flux density of ~10 Jy, with a noise of 0.2 Jy per visibility. The simulated pointing errors have an initial global offset distributed as arcmin, and the pointing offsets evolve periodically as δl(t) = l0 + alcos(2πt/τl + φl), with arcmin, min, and φl uniformly distributed between 0 and 2π. The same scheme is used to generate the evolution law for δm.

Figure 6 shows the comparison between the estimated pointing errors and the true pointing errors for a few antennas. The filter’s estimate rapidly converges to the true pointing offset, and properly tracks its state within the estimated uncertainty.

thumbnail Fig. 6

True pointing errors for a WSRT simulation (dashed line) together with the estimated state as a function of time (full line). Our algorithm properly tracks the time dependent pointing errors within the estimated covariance (dotted line).

6. Discussion and conclusion

6.1. Overview: pros, cons, and potential

As discussed throughout this paper, it is important to obtain robust algorithms that do not affect the scientific signal. In this paper, we have presented a method that aims at improving robustness along the following lines:

  • 1.

    The Kalman filter presented here is fundamentally recursive, and information from the past is transferred along the recursion, thereby constraining the expected location of the underlying true state. This is fundamentally different from minimizing a chi-square, and then smoothing or interpolating this solution, especially since we can assume a physical measurement and evolutionary model.

  • 2.

    Contrary to the Jones-based algorithms that have to deal with hundreds of degrees of freedom, our algorithm follow a physics-based approach (see Sect. 1.1 for other physics-based methods). It aims at estimating the true underlying physical term, potentially describing the Jones matrices of individual effects everywhere in the baseline-direction-frequency space. The very inner structure of the RIME can be used to constrain the solutions. This feature allows us to take into account much bigger data chunks. Typically, most effects have a very stable frequency behaviour, and the data in the full instrumental bandwidth can be simultaneously used at each recursion. This improves conditioning.

  • 3.

    The measurement operator is nonlinear, and combining (1) with (2) is made possible by using a modern nonlinear version of the Kalman filter, together with the representation operator presented in Sect. 3.

  • 4.

    Ill-conditioning can still be significant if effects are analytically degenerate to some degree. We can modify the measurement operator to take external prior information into account (see Sect. 3.3), and reject solutions that are considered to be non-physical. For example, this can allow the user to provide the filter with an expected ionospheric power spectrum of the Legendre coefficients.

  • 5.

    One of the benefits of using filters is that they produce a posterior covariance matrix on the estimated process state. The covariance estimate should be reliable assuming the nonlinearities are not too severe.

Given the large size of our inverse problem, and in order to make any algorithm practical, optimizing the computational cost is of prime importance. Using the Woodbury matrix identity (Sect. 4.1), we re-factor the unscented Kalman filter steps to make the algorithm practical. Even for the moderately large simulations described in Sect. 5.1, a 4-core CPU is able to constrain solutions faster than real time. The need to access the data of all frequencies simultaneously represents some technical problems, as these are distributed over different cluster nodes.

An important potential problem with physics-based approaches is that the system needs to be described analytically, while algorithms solving for the effective Jones matrices do not use any assumptions about the physics underlying the building of a visibility (apart from the sky model that is assumed). This would cause problems in particular if the model encapsulated in the operator h misses physical ingredients that are present in reality, and would probably drive biases in the estimates. Furthermore, the unscented Kalman Filter used and adapted to the context of radio-interferometry in this paper deals with nonlinearities only up to a certain level. This means in practice that the process a priori covariance has a certain maximum size for a given type of nonlinearities.

6.2. Conclusion

The use of filters and similar methods can potentially improve radio interferometric calibration. As with least-squares minimization techniques, our approach is guaranteed to work only if nonlinearities are not too severe in the neighborhood of the true process state. Other algorithms dealing with nonlinearities are known to provide higher robustness, such as more general particle filters, or Markov chains Monte Carlo algorithms. The latter is guaranteed to provide a correct estimated posterior distribution. However, most of these methods are computationally expensive because of the many predict steps that have to be computed, and this fact could make them impractical, given the large size of our problem. Recursive algorithms are well adapted to streaming pre-calibration, and based on preliminary simulations, our algorithm seems to be robust enough to solve for the sky term (positions, flux densities, spectral indices, etc.) in a streaming way.

We have not yet demonstrated the efficiency of our algorithm with real data sets essentially because of its complexity and novelty. Indeed, our software needs to deal with a number of technical issues as well as more fundamental problems. Specifically in the case of the newest interferometers such as LOFAR, (i) we have to deal with large quantities of distributed data, and the algorithm has to access all frequencies simultaneously. Beyond these technical aspects, because we solve for the underlying physical effects; (ii) we need to build pertinent physical models for the various effects we are solving for, such as the ionosphere or phased array beams. An application of this algorithm to real data sets will be presented in a future paper.


The process state vector encodes the states of the instrument, ionosphere, beams, etc. It is written as x throughout this paper.


For convenience, in this section and throughout the paper, we do not show the sky term that usually divides the sky to account for the projection of the celestial sphere onto the plane, as this has no influence on the results.


The Woodbury matrix identity has sometimes been used in the context of the ensemble Kalman filters, and is given by:


I thank Ludwig Schwardt for helping me understand some important aspects of Kalman filters. These open-ended discussions were very helpful to developing the framework presented in this paper. Thanks go to Trienko Grobler and Oleg Smirnov for giving useful comments on the paper draft.


  1. Bhatnagar, S., Cornwell, T. J., & Golap, K. 2004, EVLA Memo 84. Solving for the antenna based pointing errors, Tech. Rep., NRAO [Google Scholar]
  2. Bhatnagar, S., Cornwell, T. J., Golap, K., & Uson, J. M. 2008, A&A, 487, 419 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  3. Bhatnagar, S., Rau, U., & Golap, K. 2013, ApJ, 770, 91 [NASA ADS] [CrossRef] [Google Scholar]
  4. Condon, J. J., Cotton, W. D., Greisen, E. W., et al. 1998, AJ, 115, 1693 [NASA ADS] [CrossRef] [Google Scholar]
  5. Cotton, W. D. 1995, in Very Long Baseline Interferometry and the VLBA, eds. J. A. Zensus, P. J. Diamond, & P. J. Napier, ASP Conf. Ser., 82, 189 [Google Scholar]
  6. Ding, W., Wang, J., Rizos, C., & Kinlyside, D. 2007, J. Navigation, 60, 517 [NASA ADS] [CrossRef] [Google Scholar]
  7. Hager, W. W. 1989, SIAM Rev., 31, 221 [CrossRef] [Google Scholar]
  8. Hamaker, J. P., Bregman, J. D., & Sault, R. J. 1996, A&AS, 117, 137 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  9. Henar, F. E. 2011, Master thesis, Institut für Biomedizinische Technik [Google Scholar]
  10. Hiltunen, P., Särkkä, S., Nissilä, I., Lajunen, A., & Lampinen, J. 2011, Inverse Problems, 27, 025009 [NASA ADS] [CrossRef] [Google Scholar]
  11. Intema, H. T., van der Tol, S., Cotton, W. D., et al. 2009, A&A, 501, 1185 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  12. Julier, S. J., & Uhlmann, J. K. 1997, in Signal Processing, Sensor Fusion, and Target Recognition VI, ed. I. Kadar, SPIE Conf. Ser., 3068, 182 [NASA ADS] [Google Scholar]
  13. Junklewitz, H., Bell, M. A., & Enßlin, T. 2014, A&A, submitted [arXiv:1401.4711] [Google Scholar]
  14. Kalman, R. E. 1960, J. Fluids Eng., 82, 35 [Google Scholar]
  15. Kazemi, S., Yatawatta, S., Zaroubi, S., et al. 2011, MNRAS, 414, 1656 [NASA ADS] [CrossRef] [Google Scholar]
  16. Mandel, J. 2006, in CCM Report 231, University of Colorado at Denver and Health Sciences Center [Google Scholar]
  17. McEwen, J. D., & Wiaux, Y. 2011 [arXiv:1110.6137] [Google Scholar]
  18. Noordam, J. E., & Smirnov, O. M. 2010, A&A, 524, A61 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  19. Rau, U., & Cornwell, T. J. 2011, A&A, 532, A71 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  20. Smirnov, O. M. 2011a, A&A, 527, A106 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  21. Smirnov, O. M. 2011b, presentation at CALIM2011 Conf. [Google Scholar]
  22. Tasse, C., van der Tol, S., van Zwieten, J., van Diepen, G., & Bhatnagar, S. 2013, A&A, 553, A105 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  23. Walker, R. C. 1999, in Synthesis Imaging in Radio Astronomy II, eds. G. B. Taylor, C. L. Carilli, & R. A. Perley, ASP Conf. Ser., 180, 433 [Google Scholar]
  24. Wan, E. A., & van der Merwe, R. 2000, The Unscented Kalman Filter for Nonlinear Estimation [Google Scholar]
  25. Yatawatta, S. 2013, Exp. Astron., 35, 469 [NASA ADS] [CrossRef] [Google Scholar]
  26. Yatawatta, S., Zaroubi, S., de Bruyn, G., Koopmans, L., & Noordam, J. 2008 [arXiv:0810.5751] [Google Scholar]
  27. Yavuz, E., Arıkan, F., Arıkan, O., Erol, C. B. 2007, in 13th European Signal Processing Conference, EUSIPCO 2005, Antalya, Turkey, 2394 [Google Scholar]

Appendix A: One-dimensional images properties

In this section we discuss in more detail the 1D image representations introduced in Sect. 3. Aiming at being as conservative as possible, but still working in the image domain, we define g1D to be the operator that builds one 1D image per baseline (pq), and polarization i as

where c is the speed of light, rect is the rectangular function, νm = (ν0 + ν1) / 2, and Δν = ν1ν0 with ν0 and ν1 the minimum and maximum available frequencies. In order to align the u-coordinate with the frequency extent of the baseline, we rotate bpq and s and sd with 3 × 3 rotation matrix Uθφ such that where (l,m,n) are the image plane coordinates, (upq,vpq) are the uv-coordinates of baseline (pq), (ν0,ν1) are the lower and higher frequencies values of the interferometer’s bandpass. It is unitary so . We can write the complex 1D image as (A.5)where is the convolution product and Sd,i is the apparent flux of the source in direction d for polarization i.

More intuitively, this means that is obtained by projecting the sky on the baseline, and convolving it with the 1D point spread function (PSF) of the given baseline. The term PSF1D is obtained by computing the inverse Fourier transform of the uv-domain sampling function, where sinc is the cardinal sine function. We can see in Eq. (A.7) that PSF1D contains both low and high spatial frequency terms (u′Δν/c and uνm/c, respectively, whose ratio equals the fractional bandwidth).

Therefore, while still contains the high frequency fringe, taking the complex norm using eliminates the high spatial frequency term, and PSF1D under is smoother ( extracts the envelope of PSF1D). This intuitively explains why the representation seems to provide a better match between the true and the σ-points statistics. As the clock and ionospheric displacements mostly amount to apparent shifts in source locations, smoothness of provides stability in the σ-points statistics in the data domain.

Applying any of the representation operators presented above modifies the properties of the input data covariance matrix Rk (Eq. (5), (19), and (27)). Assuming the noise in the visibilities is uncorrelated, Rk is diagonal matrix. Assuming the noise is not baseline or frequency dependent we have Rk = σ2, for and , and Rk = (σ2/nν) for . The statistics of are non-Gaussian since it is the norm of a complex number. In that case the random variable is which follows a non-central χ-distribution with two degrees of freedom, and mean and covariance (A.8)\newpage

where is a generalized Laguerre polynomial.

All Tables

Table 1

Overview of the mathematical notations used throughout this paper.

All Figures

thumbnail Fig. 1

This figure shows how the sigma-point statistics compares to the true statistics. Specifically, we consider the case of a frequency-dependent phase gradient (clock offset, ionospheric effect, or source position offset). The true covariance corresponds to the gray area, while the sigma points appear as dotted lines, together with their associated estimated covariance (thick dashed line). The noise in the data is represented by the dash-dotted line. Qualitatively, the goodness of the description depends on the type of the data representations operator.

In the text
thumbnail Fig. 2

Snapshot residual image estimated at different recursion times (the color scale is identical in each panel). As recursion time evolves, more data cross the Kalman filter, the ionospheric and clock parameters estimates become more accurate, and the residual noise level decreases (see also Fig. 4).

In the text
thumbnail Fig. 3

Estimated clock errors (full line) and posterior covariance (dotted line) as a function of time for different LOFAR stations in the simulation presented in Sect. 5.1. The dashed line shows the true underlying clock offset. To improve convergence speed, before t = 6 min, the evolutionary model is stochastic.

In the text
thumbnail Fig. 4

Top panel: maximum and minimum residual values in the snapshot image as a function of recursion time (full line). Bottom panel: standard deviation in the residual snapshot maps. The expected thermal noise is shown in the bottom figure as the dotted line.

In the text
thumbnail Fig. 5

Ionosphere state described using a Legendre polynomial basis function. The estimated and true state of the ionosphere (left and right panels) are shown at the beginning and at the end of the filter’s recursion. The open circles show the location of the pierce points in the ionosphere. The spatial coordinates are given in kilometers from the array center projected on the ionosphere screen.

In the text
thumbnail Fig. 6

True pointing errors for a WSRT simulation (dashed line) together with the estimated state as a function of time (full line). Our algorithm properly tracks the time dependent pointing errors within the estimated covariance (dotted line).

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.