Issue 
A&A
Volume 532, August 2011



Article Number  A71  
Number of page(s)  17  
Section  Astronomical instrumentation  
DOI  https://doi.org/10.1051/00046361/201117104  
Published online  25 July 2011 
A multiscale multifrequency deconvolution algorithm for synthesis imaging in radio interferometry
^{1} National Radio Astronomy Observatory, Socorro, NM, USA
email: rurvashi@aoc.nrao.edu
^{2} Australia Telescope National Facility, CSIRO, Sydney, Australia
email: tim.cornwell@atnf.csiro.au
Received: 19 April 2011
Accepted: 4 June 2011
Aims. We describe MSMFS, a multiscale multifrequency deconvolution algorithm for wideband synthesisimaging, and present imaging results that illustrate the capabilities of the algorithm and the conditions under which it is feasible and gives accurate results.
Methods. The MSMFS algorithm models the wideband skybrightness distribution as a linear combination of spatial and spectral basis functions, and performs imagereconstruction by combining a linearleastsquares approach with iterative χ^{2} minimization. This method extends and combines the ideas used in the MSCLEAN and MFCLEAN algorithms for multiscale and multifrequency deconvolution respectively, and can be used in conjunction with existing widefield imaging algorithms. We also discuss a simpler hybrid of spectralline and continuum imaging methods and point out situations where it may suffice.
Results. We show via simulations and application to multifrequency VLA data and wideband EVLA data, that it is possible to reconstruct both spatial and spectral structure of compact and extended emission at the continuum sensitivity level and at the angular resolution allowed by the highest sampled frequency.
Key words: techniques: interferometric / techniques: image processing / methods: numerical / radio continuum: general
© ESO, 2011
1. Introduction
Instruments such as the EVLA (Perley et al. 2009), ASKAP (Deboer et al. 2009) and LOFAR (de Vos et al. 2009) are among a new generation of broadband radio interferometers, currently being designed and built to provide highdynamicrange imaging capabilities. The large instantaneous bandwidths offered by new frontend systems increase the raw continuum sensitivity of these instruments and allow us to measure the spectral structure of the incident radiation across large continuous frequency ranges. Until recently, the primary goal of wideband imaging has been to obtain a continuum image that makes use of the increased sensitivity and spatialfrequency coverage offered by combining multifrequency measurements. So far, effects due to the spectral structure of the skybrightness distribution have been considered mainly in the context of reducing errors in the continuum image, without paying attention to the accuracy of a spectral reconstruction. But now, the new bandwidths (100%) are large enough to allow the spectral structure to also be reconstructed to produce a meaningful astrophysical measurement. To do so, we need imaging algorithms that model and reconstruct both spatial and spectral structure simultaneously, and that are also sensitive to various effects of combining measurements from a large range of frequencies (namely varying ranges of sampled spatial scales and varying arrayelement response functions).
The simplest method of wideband image reconstruction is to image each frequency channel separately and combine the results at the end. However, singlechannel imaging is restricted to the narrowband uvcoverage and sensitivity of the instrument, and source spectra can be studied only at the angular resolution allowed by the lowest frequency in the sampled range. Also, for complicated extended emission, the singlefrequency uvcoverage may not be sufficient to produce a consistent solution across frequency. While such imaging may suffice for many science goals, it does not take full advantage of what an instantaneously wideband instrument provides, namely the sensitivity and spatialfrequency coverage obtained by combining measurements from multiple receiver frequencies during image reconstruction.
Multifrequencysynthesis (MFS) (Conway et al. 1990) is the technique of combining measurements at multiple discrete receiver frequencies during synthesis imaging. MFS was initially done to increase the apertureplane coverage of sparse arrays by using narrowband receivers and switching frequencies during the observations. Wide bandwidth systems (~10%) later presented the problem of bandwidthsmearing, which was eliminated by splitting the wide band into narrowband channels and mapping them onto their correct spatialfrequencies during imaging. It was assumed that at the receiver sensitivities of the time, the measured sky brightness was constant across the observed bandwidth. The next step was to consider a frequencydependent sky brightness distribution. Conway et al. (1990) describe a doubledeconvolution algorithm based on the instrument’s responses to a series of spectral basis functions, in particular, the first two terms of a Taylor series. A map of the average spectral index is derived from the coefficient maps. Sault & Wieringa (1994) describe the MFCLEAN algorithm which uses a formulation similar to doubledeconvolution but calculates Taylorcoefficients via a leastsquares solution. More recently, Likhachev (2005) rederives the leastsquares method used in MFCLEAN using more than two series coefficients.
So far, these CLEANbased multifrequency deconvolution algorithms used pointsource (zeroscale) flux components to model the sky emission, a choice not well suited for extended emission. Cornwell (2008) describes the MSCLEAN algorithm which does matchedfiltering using templates constructed from the instrument response to various large scale flux components. Greisen et al. (2009) describe a method simular to MSCLEAN, and Bhatnagar & Cornwell (2004) describe the ASPCLEAN algorithm that explicitly fits for the parameters of Gaussian flux components and uses scale size to aid the separation of signal from noise. We show in this paper that with the MFCLEAN approach, deconvolution errors that occur with a pointsource model are enhanced in the spectralindex and spectralcurvature images because of error propagation effects, and that the use of a multiscale technique can minimize this.
For high dynamic range imaging across wide fieldsofview, directiondependent instrumental effects need to be accounted for. Bhatnagar et al. (2008) describe an algorithm for the correction of timevariable and widefield instrumental effects for narrowband interferometric imaging. For widefield wideband imaging, these algorithms must be extended to include the frequency dependence of the instrument.
In this paper, we describe MSMFS (multiscale multifrequency synthesis) as an algorithm that combines variants of the MFCLEAN and MSCLEAN approaches to simultaneously reconstruct both spatial and spectral structure of the skybrightness distribution. Frequencydependent primarybeam correction is considered as a postdeconvolution correction step^{1}. In Sect. 5, we show imaging examples using simulations, multifrequency VLA data and wideband EVLA data, to illustrate the capabilities and limits of the MSMFS algorithm.
1.1. Wideband imaging
We begin with a discussion of how well we can reconstruct both spatial and spectral information from an incomplete set of visibilities sampled at multiple observing frequencies. An interferometer samples the visibility function of the sky brightness distribution at a discrete set of spatialfrequencies (called the uvcoverage). The spatialfrequencies sampled at each observing frequency ν are between and , where u is used here as a generic label for the uvdistance^{2} and b represents the length of the baseline vector (in units of meters) projected onto the plane perpendicular to the direction of the source. The maximum spatialfrequency measured at each frequency defines the angular resolution of the instrument at the frequency (θ_{ν} = 1/u_{max}(ν)). The range of spatialfrequencies between u_{min} at ν_{max} and u_{max} at ν_{min} represents the region where the visibility function is sampled at all frequencies in the band, and there is sufficient information to reconstruct both spatial and spectral structure. The spatialfrequencies outside this region are sampled only by a fraction of the band and the accuracy of a broadband reconstruction depends on how well the spectral and spatial structure are constrained by an appropriate choice of flux model.
Radio interferometric measurements of wideband continuum emission can be described as one or more of the following situations.

1.
Flat spectrum sources: for a flatspectrum source, measurements at multiple frequencies sample the same spatial structure, increasing the signaltonoise of the measurements in regions of overlapping spatialfrequencies, and providing better overall uvplane filling. The angular resolution of the instrument is given by u_{max} at ν_{max}. Standard deconvolution algorithms applied to measurements combined via MFS will suffice to reconstruct source structure across the full range of spatial scales measured across the band.

2.
Unresolved sources with spectral structure: consider a compact, unresolved source (with spectral structure) that is measured as a point source at all frequencies. The visibility function of a point source is flat across the entire spatialfrequency plane. Therefore, even if u_{max} changes with frequency, the spectrum of the source is adequately sampled by the multifrequency measurements. Using a flux model in which each source is a δfunction with a specific spectral model (for example, a smooth polynomial), it is possible to reconstruct the spectral structure of the source at the maximum possible angular resolution (given by u_{max} at ν_{max}).

3.
Resolved sources with spectral structure: for resolved sources with spectral structure, the accuracy of the reconstruction across all spatial scales between u_{min} at ν_{min} and u_{max} at ν_{max} depends on an appropriate choice of flux model, and the constraints that it provides. For example, a source emitting broadband synchrotron radiation can be described by a fixed brightness distribution at one frequency with a powerlaw spectrum associated with each location. Images can be made at the maximum angular resolution (given by u_{max} at ν_{max}) with the assumption that different observing frequencies probe the same spatial structure but measure different amplitudes (usually a valid assumption). This constraint is strong enough to correctly reconstruct even moderately resolved sources that are completely unresolved at the low end of the band but resolved at the higher end. On the other hand, a source whose structure itself changes by 100% in amplitude across the band would break the above assumption (bandlimited signals).In this case, a complete reconstruction would be possible only in the region of overlapping spatialfrequencies (between u_{min} at ν_{max} and u_{max} at ν_{min}), unless the flux model includes constraints that bias the solution towards one appropriate for such sources.

4.
Spectral structure at large spatial scales: the lower end of the spatialfrequency range presents a different problem. The size of the central hole in the uvcoverage of a typical interferometer increases with frequency, and spectra are not measured adequately at spatial scales corresponding to spatialfrequencies below u_{min} at ν_{max}. In the extreme case where most of the visibility function lies within this uvhole, a flatspectrum largescale source (for example) can be indistinguishable from a relatively smaller source with a steep spectrum. Additional constraints in the form of shortspacing spectra may be required for an accurate reconstruction.

5.
Frequency dependence of the instrument: arrayelement responses usually vary with frequency, direction and time. Standard calibration accounts for the frequency and time dependence for the direction in which the instrument is pointing. Away from the pointingdirection, the frequencydependent shape of the primarybeam is the dominant remaining instrumental effect, and this results in artificial spectral structure in the images. To recover both spatial and spectral structure of the sky brightness across a large field of view, the frequency dependence of the primary beam must be modeled and removed before or during multifrequency synthesis imaging.
To summarize, just as standard interferometric image reconstruction uses a priori information about the spatial structure of the sky to estimate the visibility function in unmeasured regions of the uvplane, multifrequency image reconstruction algorithms need to use a priori information about the spectral structure of the sky brightness. By combining a suitable model with the known frequencydependence of the spatialfrequency coverage and element response function, it is possible to reconstruct the broadband sky brightness distribution from incomplete spectral and spatialfrequency sampling.
2. Multiscale multifrequency deconvolution
The MSMFS algorithm described here is based on the iterative imagereconstruction framework described in Rau et al. (2009) and summarized in Appendix A. Sections 2.1 to 2.7 formulate the algorithm and summarize its implementation in the CASA package. Differences between the multiscale and multifrequency parts of MSMFS with the original MFCLEAN and MSCLEAN approaches are highlighted in Sects. 3.1 and 3.2.
2.1. Parameterization of spatial structure
An image with multiscale structure is written as a linear combination of images at different spatial scales (Cornwell 2008). (1)where I^{m} is a multiscale model image^{3}, and is a collection of δfunctions that describe the locations and integrated amplitudes of flux components of scale s in the image. N_{s} is the number of discrete spatial scales used to represent the image and is a tapered truncated parabola of width proportional to s. The symbol ⋆ denotes convolution.
2.2. Parameterization of spectral structure
The spectrum of each flux component is modeled by a polynomial in frequency (a Taylor series expansion about ν_{0}). (2)where represents a multiscale Taylor coefficient image, and N_{t} is the order of the Taylor series expansion.
These Taylor coefficients are interpreted by choosing an astrophysically appropriate spectral model and performing a Taylor expansion to derive an expression that each coefficient maps to. One practical choice is a power law with a varying index, represented by a secondorder polynomial in space. (3)Here, represents an average spectralindex, and represents spectralcurvature. The motivation behind this choice of interpretation is the fact that continuum synchrotron emission is usually modeled (and observed) as a power law distribution with frequency. Across the wide frequency ranges that new receivers are now sensitive to, spectral breaks, steepening and turnovers need to be factored into models, and the simplest way to include them and ensure smoothness, is spectral curvature^{4}.
A Taylor expansion of Eq. (3) yields the following expressions for the first three coefficients from which the spectral index and curvature images can be computed algebraically. (4)Note that with this choice of parameterization, we are using a polynomial to model a powerlaw, and N_{t} rapidly increases with bandwidth. A powerseries expansion about and will yield a logarithmic expansion (i.e. I vs. log ν) which requires fewer coefficients to represent the same spectrum^{5}.
2.3. Multiscale multifrequency model
A wideband model of the sky brightness distribution is constructed from Eqs. (1) and (2). A wideband flux component is a spatial basis function (, Gaussian or parabola) whose integrated amplitude follows a Taylor polynomial in frequency. A region of emission in which the spectrum varies with position will be modeled as a sum of these wideband flux components. The imagereconstruction process simultaneously solves for spatial and spectral coefficients of these flux components.
The image at each frequency can be modeled as a linear combination of Taylorcoefficient images at different spatial scales. (5)Here, N_{s} is the number of discrete spatial scales used to represent the image and N_{t} is the order of the series expansion of the spectrum. represents a collection of δfunctions that describe the locations and integrated amplitudes of flux components of scale s in the image of the tth series coefficient.
2.4. Measurement equations
The measurement equations^{6} for a sky brightness distribution parameterized by Eq. (5) are (6) is a vector of n × 1 visibilities measured at frequency ν. w_{ν} are Taylorweights (shown in Eq. (5)). [S_{ν}] is an n × m projection operator that represents the spatialfrequency sampling function for frequency ν. The imagedomain convolution of model with is written as a spatialfrequencydomain multiplication. where is a spatialfrequency taper function. All images are vectors of shape m × 1.
Equation (6) can be rewritten to include all frequencies by stacking [S_{ν}] for multiple frequencies. Let N_{c} be the number of frequencies. (7)where V^{obs} is a vector of nN_{c} × 1 visibilities, [S] is an nN_{c} × m sampling matrix representing the multifrequency uvcoverage of the synthesis array. is a diagonal nN_{c} × nN_{c} matrix of weights, and consists of N_{c} diagonal blocks each of size n × n and containing .
If the summations over t and s are written as a blockmatrix dotproduct, the full measurement matrix has the shape nN_{c} × mN_{s}N_{t}. When multiplied by the set of N_{s}N_{t} model sky vectors each of shape m × 1, it produces nN_{c} visibilities.
For N_{t} = 3,N_{s} = 2 the measurement equations can be written as follows, in block matrix form. The subscript p denotes the pth spatial scale and the subscript q denotes the qth Taylor coefficient of the spectrum polynomial. (8)
2.5. Normal equations
The normal equations for the system described in Eq. (6) can be written in block matrix form, with each blockrow (for scale size s, and Taylor term t) given by (9)Here, each is an m × m block of the Hessian matrix, and is one of N_{s}N_{t} dirty images. [W^{im}]is a diagonal matrix of dataweights (and imagingweights) and is a diagonal matrix containing Taylorweights . is the Hessian matrix formed using only one frequency channel, and is a convolution operator containing a shifted version of the singlefrequency pointspreadfunction in each row (see Appendix A.2 for details). [F^{†}T_{s}F] and [F^{†}T_{p}F] are also convolution operators with and as their kernels. The process of convolution is associative and commutative, and therefore, is also a convolution operator whose kernel is given by (15)The dirty images on the RHS of Eq. (9) can be written as follows. where is the dirty image formed by direct Fourier inversion of weighted visibilities from one frequency channel.
When all scales and Taylor terms are combined, the full Hessian matrix contains N_{t}N_{s} × N_{t}N_{s} blocks each of size m × m, and N_{t} Taylor coefficient images each of size m × 1, for all N_{s} spatial scales.
The normal equations in block matrix form for the example in Eq. (8) for N_{t} = 3,N_{s} = 2 is shown in Eq. (21). The Hessian matrix consists of N_{s} × N_{s} = 2 × 2 blocks (the four quandrants of the matrix), each for one pair of spatial scale s,p (the upper indices). Within each quadrant, the N_{t} × N_{t} = 3 × 3 matrices correspond to various pairs of t,q (Taylor coefficient indices; the lower indices). This layout shows how the multiscale and multifrequency aspects of this imaging problem are combined and illustrates the dependencies between the spatial and spectral basis functions. (21)This is the system of equations to be solved. The spatialfrequency sampling of a real interferometer is always incomplete ( [S] is rankdeficient). Therefore, each Hessian block, and the entire Hessian matrix is singular, and an exact inverse does not exist^{7}. An accurate reconstruction can be obtained only via successive approximation (iterative numerical optimization).
2.6. Principal solution
The principal solution^{8} of the normal equations is an approximate solution that can be computed via diagonal approximations of all Hessian blocks. This solution is then used to pick flux components within the minorcycle of iterative deconvolution.
Each Hessian block is a convolution operator with a shifted version of a pointspreadfunction in each row (their centers are aligned on the diagonal). A diagonal approximation represents the assumption that the PSFs are δfunctions, or that the amplitudes in the dirty image at the location of a source reflect the true flux of the source. Also, with the assumption of spatially invariant PSFs, all elements on the diagonal within each Hessian block are the same. Therefore, we can reduce each to one number. The full Hessian reduces to an N_{t}N_{s} × N_{t}N_{s} element matrix [H^{peak}], The principal solution can be obtained by inverting [H^{peak}] once, and applying it to the dirty image vectors, one pixel at a time. Such a solution will be correct only at the locations of the centers of isolated fluxcomponents and must be augmented with an iterative optimization approach to ensure accuracy. In the case of perfect sampling (where the Hessian blocks are truly diagonal and PSFs are δfunctions), the principal solution will directly give correct images of Taylorseries coefficients.
2.6.1. Properties of [H^{peak}]
Some properties of [H^{peak}] are worth noting, to understand the numerical stability of this approach and its dependence on the choice of spectral and spatial basis functions (sky model), and spectral and spatialfrequency sampling functions (data and instrument).

1.
There are N_{s}N_{t} elements on the diagonal of [H^{peak}]. Each is a measure of theinstrument’s sensitivity to a flux component of unit total fluxwhose shape and spectrum are described by one pair of spatial andspectral basis functions ( and ). Each diagonal element is given by(22)Note that the instrument’s sampling function and dataweights are included in this expression.

2.
The offdiagonal elements measure the orthogonality^{9} between the various basis functions, for the given uvcoverage and weighting scheme. They measure the amount of overlap between basis functions in the measurement domain. Smaller values indicate a more orthogonal set of basis functions that the instrument is better able to distinguish between.

3.
The condition number of this matrix (or of blocks within this matrix) will indicate if the chosen set of basis functions and spatialfrequency coverage provide enough constraints to provide a stable solution, and can be used as a metric to choose a suitable basis set. For a simple example, if a 3term solution is attempted with data from only two distinct frequencies, [H^{peak}] will be singular. Or, for some choice of multifrequency uvcoverage, the visibilities measured by the instrument for two different spatial scales may become hard to distinguish. Then, the crossterm element of [H^{peak}] corresponding to this combination could have a higher value, indicating that the two parameters are highly coupled, and there is insufficient information in the data and sampling pattern to distinguish between the scales. A similar situation can arise to create ambiguity between spatial and spectral structure (an extreme example is multifrequency measurements from only one baseline).

4.
In general, [H^{peak}] will be a positivedefinite symmetric matrix whose inverse can be easily computed via a Cholesky decomposition^{10}. The value of N_{s} is usually < 10, making the inversion of [H^{peak}] tractable as a onetime operation.

5.
Some further approximations can be made about the structure of [H^{peak}] to simplify its inversion, and it is important to understand the numerical implications of these tradeoffs. One is a blockdiagonal approximation of [H^{peak}] (i.e. using only those blocks of the Hessian in Eq. (21) for which s = p; topleft and bottomright quadrants). This approximation treats each spatial scale separately and assumes that the scale basis functions are orthogonal. For each scale, crossterms between Taylor functions are preserved, and a multifrequency principal solution can be obtained separately for each spatial scale. Note that a set of tapered truncated paraboloids is never orthogonal, but this separatescale approximation works because of the iterative χ^{2}minimization process. This approximation makes the Hessian inversion easier, but to preserve accuracy, the update step of the iterative deconvolution still needs to evaluate the full LHS of the normal equations while subtracting out a flux component.
2.7. MSMFS algorithm
This section describes an iterative process that solves the normal equations (Eq. (9)) and produces a set of N_{t} Taylorseries coefficient images at N_{s} different spatial scales. Appendix B lists the algorithmsteps in pseudocode format, reflecting the implementation of MSMFS in the CASA^{11} software package.
Precompute Hessian:
each block of the Hessian is a convolution operator, consisting of a shifted version of the same convolution kernel in each row. Therefore, it suffices to compute and store one kernel per Hessian block. Convolution kernels for all distinct blocks in the N_{s}N_{t} × N_{s}N_{t} Hessian are evaluated via Eq. (15). All kernels are normalized by the sumofweights such that the peak of is unity, and the relative weights between Hessian blocks is preserved. A blockdiagonal approximation of [H^{peak}] is done, and a set of N_{s} matrices each of shape N_{t} × N_{t} and denoted as are constructed. Their inverses are computed and stored in .
Initialization:
all N_{s}N_{t} model images are initialized to zero (or an a priori model to start from).
Major and minor cycles:
iterative image reconstruction in radio interferometry is usually split into major and minor cycles (see Appendix A.3). The major cycles compute the RHS of the normal equations, and the minor cycle inverts the Hessian and gets an estimate of the model. This model is used in the next major cycle to compute new RHS vectors from the residuals, and the process repeats itself until convergence is achieved. Steps 1 and 5 (of the list of steps given below) form one major cycle, and repetitions of Steps through 4 form the minor cycle.

1.
Compute residual images: the RHS vectors (residual or dirtyimages) 1 } of the normal equations are computed viaEq. (20) by first computing the multifrequency dirtyimages and then convolving them by the scale basis functions.

Find a flux component: the principal solution is computed for all pixels, one scale at a time. (23)Here, is a list of N_{t} Taylorcoefficients for the pixellocation pix and scale s. is the sth block (of size N_{t} × N_{t}) in the list of diagonalblocks of [H^{peak}], and is the N_{t} × 1 vector constructed from 1 } for one pixel pix. This step is performed on all pixels, separately for all scales s, resulting in N_{s} sets of N_{t} Taylorcoefficient images. The most appropriate set of Taylor coefficients must now be chosen. The search is performed across pixels and scales. Many heuristics can be used here. For example, in iteration i, choose the N_{t} element solutionset with the dominant q = 0 component across all scales and pixel locations. Or, pick the set of components that makes the largest impact on the value of χ^{2}. Or, choose the location from the peak in the t = 0 residual image, and compute the principal solution only for that pixel. The result of this step is a set of N_{t} model images, each containing one δfunction that marks the location of the center of a flux component of shape (p represents the scale of the chosen component, out of all possible values of s). The amplitudes of these N_{t}δfunctions are the Taylor coefficients that model the spectrum of the integrated flux of this component. Let these N_{t} model images from iteration i be denoted as .

3.
Update model images: a set of N_{t} multiscale model images are accumulated. (24)where g is a loopgain that takes on values between 0 and 1 and controls the step size for each iteration in the χ^{2}minimization process.

4.
Update RHS: the RHS residual images^{12} are updated by evaluating and subtracting out the entire LHS of the normal equations. However, since the chosen flux component corresponds to just one scale, this becomes a summation over only Taylor terms (and not scale; there is only one scale p in the current model). (25)Repeat from Step 2 until the minorcycle flux limit is reached.

Predict: model visibilities are computed from the set of Taylorcoefficient images via Eq. (6). Residual visibilities are computed as . Repeat from Step 1 until a global convergence criterion is satisfied. (After the first iteration, is used as the new in Step 1.)
Restoration:
after convergence, the model Taylorcoefficient images can be interpreted in different ways.

1.
The most obvious data products are the Taylorcoefficientimages themselves, which are directly smoothed by the restoringbeam. Residual images are added back in after computing theprincipal solution from the residuals obtained in the last instanceof Step 1, to ensure that any undeconvolved flux has the correctflux values^{13}.

2.
For the study of broadband radio emission, the spectral coefficients can be interpreted in terms of a power law in frequency with varying index (as described in Sect. 2.2). The data products are images of the referencefrequency flux , the spectralindex and the spectral curvature . Spectral index and curvature images are calculated only in regions where the values in are above a chosen threshold.

3.
An image cube can be constructed by evaluating the spectral polynomial via Eq. (2) for each frequency. This data product is useful for sources whose emission is not well modeled by a power law, but is a smooth polynomial in frequency. Bandlimited signals that taper off smoothly in frequency are one example.

4.
When multiple sources along a given lineofsight have different spectra, the Taylorcoefficients will represent the combined spectrum. To compute spectral index and curvature maps for foreground sources, a polynomial backgroundsubtraction must be done on the Taylorcoefficient images before Eqs. (26) to (28) are evaluated.
Primarybeam correction:
for widefield imaging, the spatial and spectral structure of the primary beam (arrayelement response function) contributes to the measured signal. If this instrumental effect is not accounted for, the output Taylorcoefficient images approximately represent the product of the sky and the primarybeam. (29) P_{υ0} is the primary beam at the reference frequency, and P_{α} and Pβ are spectral index and curvature due to the frequency dependence of the primary beam.
A correction for the average primarybeam and its frequency dependence can be done as a postdeconvolution step. The primarybeams are first evaluated or measured as a function of frequency, and the frequencydependence per pixel modeled by a powerlaw or a polynomial (perferably the same spectral polynomial used for the image reconstruction). Primarybeam correction can then be done as follows. Note that if a polynomial is fit for the frequencydependence of the primary beam, and P_{υ0},Pα,Pβ computed from it, the above operation is numerically identical to doing a polynomial division in terms of two sets of coefficients (for N_{t} ≤ 3). A bruteforce polynomialdivision using more series coefficients will yield a more accurate solution. Note however, that such a correction will not be accurate if there are timedependent variations in the primarybeam, and will require integration with the AWProjection algorithm discussed in Bhatnagar et al. (2008).
3. Relation to MFCLEAN and MSCLEAN
The MSMFS algorithm is a combination of the general ideas used in MSCLEAN and MFCLEAN, but there are some subtle differences. The next two sections briefly discuss these differences and their numerical implications.
3.1. Relation to MFCLEAN
The MFCLEAN algorithm (Sault & Wieringa 1994) models the sky as a collection of point sources with a Taylor polynomial spectrum. A pointsource version of MSMFS can be derived by setting N_{s} = 1 and using function in the derivations in Sects. 2.4 and 2.5. The normal equations can be written in block matrix form (for example, for N_{t} = 3). (33)Each block [H_{t,q}] is a convolution operator with as its kernel. On the other hand, the MFCLEAN algorithm described in Sault & Wieringa (1994) follows a matchedfiltering approach using functions called spectralPSFs, which are equivalent to the convolution kernels from the first row of Hessian blocks (q = 0) in Eq. (34). In MFCLEAN, the Hessian elements and RHS vectors are calculated by convolving spectralPSFs with themselves and the residual images. Formally, this matched filtering approach is exactly equal to the calculations shown in Eqs. (34) and (35) only under the conditions that there is no overlap on the spatialfrequency plane between measurements from different observing frequencies, and all measurements are weighted equally across the spatialfrequency plane (uniform weighting). In practice, MFCLEAN incurs errors for arrays with dense spatialfrequency coverage where tracks from different baselines and frequencychannels intersect. When applied to simulated EVLA data, numerical instabilities limited the fidelity of the final image, especially with extended emission, and this instability was eliminated by changing the computations from Eqs. (36) and (37) to Eqs. (34) and (35). Also, the MFCLEAN formulation uses a twoterm Taylorpolynomial, which can be shown to result in a dynamicrange limit of 10^{4} for a bandwidth ratio of 2:1 and source spectral index of –1.0.
3.2. Relation to MSCLEAN
The MSCLEAN algorithm (Cornwell 2008; Greisen et al. 2009) models the sky as a combination of multiscale flux components with no spectral structure.
A narrowband (or flatspectrum) version of MSMFS can be derived by setting N_{t} = 1 in the derivations in Sects. 2.4 and 2.5. The normal equations can be written in block matrix form (for example, for N_{s} = 2). The peaks of the convolution kernels from the diagonal blocks of the Hessian are a measure of the sensitivity of the instrument to a particular spatial scale. (38)In MSMFS, a diagonal approximation of this Hessian is used to compute the principal solution. This is equivalent to normalizing the residual images (RHS vectors) by the sum of weights for each spatial scale, before searching for peaks.
In both existing forms of MSCLEAN, this normalization is replaced by a scale bias, an empirical term that deemphasises large spatial scales. The scale bias b_{s} = 1−0.6 s/s_{max} used by Cornwell (2008) (where s_{max} is the width of the largest scale basis function) is a linear approximation of how the inverse of the area under each scale function changes with scale size^{14}. The algorithm described by Greisen et al. (2009) uses b_{s} ≈ 1.0/s^{2x} where x ∈ {0.2,0.7} , to approximate a normalization by the area under a Gaussian, for the case when images are smoothed by applying a uvtaper that tends to unity for the zero spatialfrequency. Both these normalization schemes are approximations of using a diagonal approximation of [H^{peak}].
Once we have this understanding, we can see that the full Hessian [H^{peak}] (and not just a diagonal approximation) can be inverted to get the normalization exactly right, giving an accurate estimate of the total flux at each scale. This becomes useful for sources that contain overlapping flux components of different spatial scales. However, this solution gives correct values only at the locations of the centers of the fluxcomponents, and introduces large errors in the PSF sidelobes. Therefore, for reasons of stability, a diagonal approximation is still a more appropriate choice (demonstrated on simulated EVLA data).
Another difference lies in the minorcycle updates. The update steps in MSMFS and the Cornwell MSCLEAN evaluate the full LHS of the normal equations to update the smoothed residual images and subtract out flux components within the image domain. This allows each minor cycle iteration to search for flux components across all spatial scales. The update step in the Greisen MSCLEAN ignores the crossterms, and performs a full set of minor cycle iterations on one scale at a time. A choice between all these methods will depend on tradeoffs between the accuracy within each minor cycle iteration, the computational cost per step, and optimized global convergence patterns to control the total number of iterations.
4. Hybrids of narrowband and continuum techniques
The preceding sections discussed multifrequency solutions that used data from all measured frequencies together, to take advantage of the combined spatialfrequency coverage. However, there are some situations where singlechannel methods used in combination with multifrequencysynthesis (and no builtin spectral model) will be able to deliver scientifically useful wideband reconstructions at the continuum sensitivity.
The basic idea of a hybrid wideband method is to combine the advantages of singlechannel imaging (simplicity and nondependence on any spectral model) with those of continuum imaging (deconvolution with full continuum sensitivity).

1.
Deconvolve each channel separately upto the singlechannelsensitivity limit σ_{chan}. Only sources brighter than σ_{chan} will be detected anddeconvolved.

2.
Remove the contribution of bright (spectrally varying) sources by subtracting out visibilities predicted from the model image cube. At this stage, the peak residual brightness is at the level of the singlechannel noise limit σ_{chan}.

3.
Perform MFS imaging (flatspectrum assumption) on the continuum residuals to extract flux that lies between σ_{chan} and σ_{cont}. According to Conway et al. (1990) (and as discussed later in Sect. 6), errors due to this flatspectrum assumption become visible only above a dynamic range of ~1000 (for α = −0.7 and a 2:1 bandwidth ratio). Therefore, as long as the sensitivity improvement between a singlechannel and the full band is less than ~1000, this second step will incur no errors even if the remaining flux has spectral structure. This requirement translates to N_{chan} < 10^{6}, which is usually satisfied^{15}.

4.
Add model images from both steps, and restore the results. For unresolved sources, it may be appropriate to use a cleanbeam fitted for the highest frequency, but in general, to not bias spectral information, all channels should be restored using a clean beam fitted to the PSF at the lowest frequency in the range.
The advantages of this approach are its simplicity, and that it can handle wideband reconstructions with bandlimited signals and spectrallines. The disadvantages are that the angular resolution of the images and spectral information will be restricted to that given by the lowest frequency (a factor of two worse than what is possible for the 2:1 bandwidth available with the EVLA Lband). Also, the singlefrequency spatialfrequency coverage may not be sufficient to unambiguously reconstruct all the spatial structure of interest at all frequencies, which in turn will affect spectra measured from these singlechannel images. In some cases, additional constraints can be used. Bong et al. (2006) describe spatiospectral MEM, an entropy based method in which singlechannel imaging is done along with a smoothness constraint applied across frequency.
In general, singlechannel methods can be used for wideband imaging mainly to construct an image of the continuum flux. Only if there is sufficient singlefrequency uvcoverage to reconstruct an accurate model of the source structure (for example, fields of isolated point sources), may reliable spectral information also be derived from such an approach.
5. MSMFS imaging results
This section contains three imaging examples that demonstrate the MSMFS algorithm’s basic capabilities. Section 6 discusses various sources of error and how they manifest themselves, and Sect. 7 demonstrates the limits to which the algorithm can be reasonably pushed.
5.1. EVLA simulation
Data:
wideband EVLA observations were simulated for a sky brightness distribution consisting of one point source with spectral index of − 2.0 and two overlapping Gaussians with spectral indices of − 1.0 and + 1.0. The spectral index across the resulting extended source varies smoothly between − 1.0 and + 1.0, with a spectral turnover in the region of overlap of the two Gaussians, corresponding to a spectral curvature of approximately +0.5. Data were simulated for the EVLA in Cconfiguration between 1–2 GHz, for an 8h observation with measurements 5 min apart. The goal of this test was to assess the ability of the MSMFS algorithm to reconstruct both spatial and spectral information accurately enough for astrophysical use.
Fig. 1 MSMFS imaging results using simulated EVLA data: these images compare truth images (left column) with the results of two wideband imaging runs; multiscale (middle column) and pointsource (right column). The top three rows represent the first three Taylorcoefficient images, and the fourth and fifth rows show spectral index and spectral curvature respectively. 
Imaging results:
two MSMFS imaging runs were done and the results compared. Figure 1 illustrates the algorithm’s performance by comparing several truth images describing the true sky brightness (left column) and reconstructions from a multiscale (middlecolumn) and a pointsource (rightcolumn) version of MSMFS. The multiscale version used a flux model in which N_{t} = 3 and N_{s} = 4 with scale sizes defined by widths of 0,6,18,24 pixels, and the pointsource version used N_{t} = 3 and N_{s} = 1 with one scale function given by the δfunction (to emulate the MFCLEAN algorithm). From top to bottom, the rows correspond to the intensity image at the reference frequency I_{0} = I_{υ0}, the firstorder Taylorcoefficient I_{1} = I_{α}I_{υ0}, the secondorder Taylorcoefficient I_{2} = (I _{α}(Iα − 1)/2 + I_{β})I_{υ0}, the spectral index I_{α} = I_{1}/I_{0} , and the spectral curvature I_{β} = (I_{2}/I_{0}) − I_{α}(I_{α} − 1)/2.

1.
With a multiscale multifrequency flux model (MSMFS) thespectral index across the extended source was reconstructed to anaccuracy of δα < 0.02 with the errors rising at the edges of the source wherethe signaltonoise ratio decreases. The spectral curvature acrossthe extended source was estimated to an accuracy of δβ < 0.05 in the centralregion with the maximum error of δβ ≈ 0.2 in the regions where thecurvature signal goes to zero and the source surface brightness isalso minimum (the outer edges of the source).

2.
With a multifrequency pointsource model (MFCLEAN) the accuracy of the spectral index and curvature maps was limited to δα ≈ 0.2,δβ ≈ 0.6. This is because the use of a point source model will break any extended emission into components the size of the resolution element and this leads to deconvolution errors well above the offsource noise level. Error propagation during the computation of spectral index and curvature as ratios of these noisy Taylorcoefficient images leads to high error levels in the result (see Sect. 6.3). This clearly shows the importance of using a multiscale flux model when there is extended emission.

3.
The Gaussian on the left has α = −1.0 and β = 0.0, and N_{t} = 3 is not sufficient to model the powerlaw accurately, leading to a value of β ≈ + 0.1 in the truth image as well as in the reconstruction. However, the Gaussian on the right has α = + 1.0 and β = 0.0 (a straight line), and N_{t} = 3 is more than sufficient to model it, leading to an accurate value of β ≈ 0 in the truth image and MSMFS reconstruction. The pointsource has α = −2.0, and the error on β is proportionally higher. The use of N_{t} > 3 reduces these errors.
5.2. Multifrequency VLA observations of CygnusA
Objective:
multifrequency VLA observations of the bright radio galaxy Cygnus A were used to test the MSMFS algorithm on real data and to test standard calibration methods on wideband data. Cygnus A is an extremely bright (1000 Jy) radio galaxy with a pair of bright compact hotspots (α = −0.5) located about 1 arcmin away from each other on either side of a very compact flatspectrum core, and extended radio lobes associated with the hotspots with broadband synchrotron emission at multiple spatial scales (α = −0.6 ~ − 1.0) (Carilli & Barthel 1996). The best existing images of CygnusA and its spectral structure have been from large amounts of multiconfiguration narrowband VLA data (Carilli et al. 1991) designed to measure the spatial structure as completely as possible at two widely separated frequencies (1.4 and 4.8 GHz).
The goal of this test was to use multifrequency snapshot observations of Cygnus A to evaluate how well the MSMFS algorithm is able to reconstruct both spatial and spectral structure from measurements in which the singlefrequency uvcoverage is insufficient to accurately reconstruct all the spatial structure at that frequency.
Data were taken in April 2009 when the VLA was transitioning to the EVLA. 15 out of 27 antennas had new wideband EVLA feeds, but the correlator was that of the VLA (narrowband). Wideband data were taken as a series of narrowband snapshots spread across 8 h and 9 distinct frequencies across the LBand (30 min per frequency tuning). Flux calibration at each frequency was done via observations of 3C 286. Phaseonly calibration was done using an existing narrowband image of Cygnus A at 1.4 GHz (Carilli et al. 1991) as a model, and further selfcalibration was done with the wideband flux model derived from the MSMFS algorithm. The final dataset used for imaging consisted of 9 spectral windows each of a width of about 4 MHz and separated by about 100 MHz.
Fig. 2 Cygnus A – total intensity and spectralindex: this figure shows the MSMFS total intensity map (top), and spectralindex maps obtained via three methods – MSMFS (second row), a hybrid singleband method (third row), and from highresolution fullsynthesis narrowband images at 1.4 and 4.8 GHz (bottom). The MSMFS spectral index map has the angularresolution of the total intensity map, and agrees with the values in the highresolution comparison map (α = −0.5 at the hotspots increasing to α ≈ − 1.0 in the halo). However, the hybrid method resulted in a map with a wider range of values (positive and negative) and is at a much lower angular resolution. 
These data were imaged using two methods, the MSMFS algorithm with N_{t} = 3 and N_{s} = 10, and a hybrid of singlechannel imaging followed by MFS on the residuals. Note that these observations do not have dense singlefrequency uvcoverage, and the purpose of applying the hybrid method was to emphasize the errors that can occur if this method is used inappropriately. Due to the small angular size of CygnusA, the effect of the Lband primarybeam was ignored in both runs.
Results:
Figure 2 shows the reconstructed totalintensity images (top), the spectralindex map obtained via MSMFS (second row) and the spectralindex map constructed from singlesubband maps (third row). For comparison, the image at the bottom is a spectralindex map constructed from existing narrowband images at 1.4 and 4.8 GHz, each constructed from a combination of VLA A, B, C and D configuration data (Carilli et al. 1991).

1.
The total intensity image (top row) has a peak brightness of77 mJy/beam at the hotspots and a peak brightnessof about 400 mJy/beam for the fainter extendedparts of the halo. Both the methods (MSMFS and hybrid) gavevery similar totalintensity images and residual images. Theonsource and offsource residuals for the MSMFS algorithmare 30 mJy and 25 mJy and with thehybrid algorithm are 50 mJy and30 mJy respectively.

2.
The spatial structure seen in the MSMFS spectral index image (second row) is very similar to that seen in the twopoint (1.4–4.8 GHz) spectral index image (bottom). This shows that despite having a comparatively small amount of data (20 VLA Bconfiguration snapshots at 9 frequencies) the use of an algorithm that models the sky brightness distribution appropriately is able to extract the same astrophysical information as traditional methods applied to large amounts of data (full synthesis runs in multiple VLA configurations at two frequencies). The estimated errors on the spectral index map are < 0.1 for the brighter regions of the source (near the hotspots) and ≥ 0.2 for the fainter parts of the lobes and the core. In contrast, the Hybrid spectral index map (third row) clearly shows errors arising due to nonunique solutions at each separate frequency (due to insufficient narrowband spatialfrequency coverage) as well as smoothing to the angular resolution at the lowest frequency.

3.
The MSMFS spectral curvature map (not shown here) contains values corresponding to △ α ≈ 0.4 for the brightest regions on the source. Such a change in α appears high, and we did not have sufficient singlefrequency data at other bands to verify these values (the next section contains an example where we could verify the curvature maps with other data). Also, the values of curvature changed between imagingruns with N_{t} = 3 and N_{t} = 4, suggesting overfitting errors due to the low signaltonoise ratio of the curvature measurement (which could arise from inaccurate bandpass calibration).
5.3. Multifrequency observations of M 87
A similar observation of M 87 was done with the goal of measuring the 1–2 GHz spectral index in different parts of the inner corejetlobe system and outer halo filaments.
M 87 is a bright (200 Jy) radio galaxy located at the center of the Virgo cluster. The spatial distribution of broadband synchrotron emission from this source consists of a bright central region (spanning a few arcmin) containing a flatspectrum core, a jet (with known spectral index of −0.55) and two radio lobes with steeper spectra (−0.5 > α > −0.8) (Rottmann et al. 1996; Owen et al. 2000). This central region is surrounded by a large diffuse radio halo (7 to 14 arcmin) with many bright narrow filaments (≈10′′ × 3′).
Multifrequency VLA data were taken similar to the observations of CygnusA, with a series of 10 snapshots at 16 different frequencies within the sensitivity range of the EVLA Lband receivers. These data were imaged using MSMFS with N_{t} = 3 and N_{2} = 12. The top row of images in Fig. 3 shows the intensity, spectral index and spectral curvature maps of the bright central region at an angular resolution of 3 arcsec (C+Bconfiguration).
Fig. 3 M 87 core/jet/lobe – intensity, spectral index and curvature: these images show 3arcsec resolution maps of the central bright region of M 87 (core+jet and inner lobes). The quantities displayed are the intensity at 1.5 GHz (top left), the spectral index (top middle) and the spectral curvature (top right). The spectral index is near zero at the core, varies between − 0.36 and − 0.6 along the jet and out into the lobes. The spectral curvature is on average 0.5 which translates to △ α = 0.2 across Lband. The plot at the bottom compares the resulting average spectrum (solid line) with that formed by imaging each spectralwindow separately (dots). The dashed and dotted lines correspond to fixed values of spectral index (–0.43, –0.53, –0.63). 

1.
The peak brightness at the center of the final restored intensity image was 15 Jy with an offsource rms of1.8 mJy and an onsource rms of between 3 and10 mJy. The spectral index map^{16} of the bright central region(at 3 arcsec resolution) shows a near flatspectrumcore with α_{LL} = −0.25, a jet with α_{LL} = −0.52 and lobes with −0.6 > α_{LL} > −0.7. This bright central region hadsufficient (>100) signaltonoise to be able to detect spectral curvature, and no obvious deconvolution errors. However the error barson the spectral curvature are at the same level as the measurementitself, and a reliable estimate can only be obtained as an averageover this entire bright region. The average curvature is measuredto be β_{LL} = −0.5 which corresponds to a change in α across Lband by .

2.
To verify consistency of this spectralcurvature value with the data, each of the 16 spectralwindows was imaged separately, and restored with the same cleanbeam. The plot at the bottom of Fig. 3 shows this integrated flux spectrum (log (I) vs. log (ν/ν_{0})) as round dots, along with the average spectrum calculated by MSMFS (curved line), and straight dashed and dotted lines that correspond to constant spectral indices of − 0.43, − 0.53 and − 0.63. These plots show that a change in α of about 0.2 is consistent with the data. Note that the scatter seen in the singlespectralwindow data points is at the 1% level of the source flux. This illustrates the accuracy at which bandpass calibration must be done in order to measure a physically plausible spectralcurvature signal across a 2:1 bandwidth.

3.
These numbers were further compared with twopoint spectral indices computed between 327 MHz (Pband), 1.4 GHz (Lband), and 4.8 GHz (Cband) from existing images (Owen et al. 2000; Owen F., priv. comm.). Across the bright central region, twopoint spectral indices are −0.36 > α_{PL} > −0.45 and −0.5 > α_{LC} > −0.7. The measured values from our experiment (−0.4 > α_{LL} > −0.7 and △ α ≈ 0.2 across Lband) are consistent with these independent calculations.
6. Sources of error
There are various sources of error that can affect the imaging process, and leave artifacts both onsource and offsource. As with any imagereconstruction algorithm, signs of these errors must be lookedfor in the output images before astrophysical interpretation.
6.1. Onsource polynomialfit errors
The errors on the polynomial coefficients and quantities derived from them will depend on the number of measurements of the spectrum, their distribution across a frequency range, the signaltonoise ratio of the pattern being fitted, and the order of the polynomial used in the fit. These errors will affect regions in the image both onsource and offsource, and the resulting error patterns and their magnitudes will depend on the available uvcoverage, and the choice of reference frequency. Although the physical parameters I_{υ0},I_{α} and I_{β} can be obtained from the first three coefficients of a Taylor expansion of a powerlaw with varying index (Eqs. (26) to (28)), a higher order polynomial may be required during the fitting process to improve the accuracy of the first three coefficients^{17}.
Fig. 4 Peak residuals and Errors for MFS with different values of N_{t}: These plots show the measured peak residuals (top left) and the errors on I_{υ0} (top right), I_{α} (bottom left), and I_{β} (bottom right) when a pointsource of flux 1.0 Jy and α = −1.0 was imaged using Taylor polynomials of different orders (N_{t} = 1−7) and a linear spectral basis. The xaxis of all these plots show the value of N_{t} used for the simulation and plots for α and β begin from N_{t} = 2 and N_{t} = 3 respectively. An example of how to read these plots: for a 2:1 bandwidth ratio, a source with spectral index = –1.0 and N_{t} = 4, the achievable dynamic range (measured as the ratio of the peak flux to the peak residual near the source) is about 10^{5}, the error on the peak flux at the reference frequency is 1 part in 10^{3}, and the absolute errors on α anre β are 10^{2} and 10^{1} respectively. 
Figure 4 summarizes the errors obtained when the order of the polynomial chosen for imaging is not sufficient to model the powerlaw spectrum of the source. Data for 8 h synthesis runa with EVLA uvcoverages were simulated for 5 different frequency ranges around 2.0 GHz. The sky brightness distribution used for the simulation was one point source whose flux is 1.0 Jy and spectral index is –1.0 with no spectral curvature. The bandwidth ratios^{18} for these 5 datasets were 100%(3:1), 66%(2:1), 50%(1.67:1), 25%(1.28:1), 10%(1.1:1).
MSMFS was repeated on all these datasets with N_{t} = 1 to N_{t} = 7, and one scale N_{s} = 1, a δfunction. All these datasets were imaged using a maximum of 10 iterations, a loopgain of 1.0, natural weighting and a flux threshold of 1.0 µJy. No noise was added to these simulations (in order to isolate and measure numerical errors due to the spectral fits). The top left panel in Fig. 4 shows the peak residuals, measured over the entire 0th order residual image. The other three panels show onsource errors for I_{υ0} (top right), I_{α} (bottom left) and I_{α} (bottom right). Errors on I_{υ0},I_{α},I_{β} were computed at the location of the point source by taking differences with the ideal values of I_{υ0} = 1.0,α = −1.0,β = 0.0.
One noticeable trend from these plots is that with sufficient signaltonoise in the measurements, all errors decrease exponentially (linearly in logspace) as a function of increasing order of the polynomial, and as a function of decreasing total bandwidth. As expected, for very narrow bandwidths, the use of highorder polynomials increases the error. Also, when the order of the polynomial used is too low, the peak residuals are much smaller than the onsource error incurred on I_{υ0}, I_{α} and I_{β}. These trends are based on one simple example, and represent the bestcase scenario in which all sources can be described as point sources. For extended emission, there can be additional errors due to deconvolution artifacts. In the case of very noisy spectra, or at a late stage of image reconstruction when the signaltonoise ratio in the residual images is low, errors can arise from attempting to use too many terms in the polynomial fit.
6.2. Offsource errors and dynamicrange limits
Consider the errors on the continuum image when spectral structure is ignored during MFS imaging (N_{t} = 0; flatspectrum assumption). Spectral structure will masquerade as spurious spatial structure, leading to error patterns that resemble the instrument’s response to the firstorder term in the Taylorseries expansion. A rough rule of thumb for EVLA uvcoverages is that for a point source with spectral index α = −1.0 measured between 1 and 2 GHz, the peak error obtained if the spectrum is ignored is at a dynamic range of < 10^{3}. In other words, if the dynamicrange allowed by the data (peak brightness/thermal noise) is 800 (for example), there will be no visible artifacts if the spectral structure upto α = −1.0 is ignored. These numbers are consistent with error estimates derived in Conway et al. (1990) that predict errors at the level of Iα/X where X = O(500) is a factor that depends on the uvcoverage of the instrument and the choice of reference frequency. Therefore, if the only goal is to obtain a continuum image over a narrow fieldofview, it may be possible to achieve the maximumpossible dynamic range by dividing out an average spectral index (one single number over the entire sky) from the visibilities before imaging them via MFS with a flatspectrum assumption (Conway et al. 1990).
Fig. 5 Stokes I images of the 3C 286 field, using MSMFS with N_{t} = 1 (top left), N_{t} = 2 (top right), N_{t} = 3 (bottom left), N_{t} = 4 (bottom left). The peak flux of 3C 286 is 14 Jy/beam. The axes labels are in units of arcminutes from the pointing center and all images are shown with the same greyscales. The residual rms near 3C 286 for these four images are 9 mJy, 1 mJy, 200 µJy and 140 µJy, and the rms measured 1 degree away from 3C 286 are 1 mJy, 200 µJy , 85 µJy and 80 µJy. The thermalnoise limit for this dataset was 70 µJy. 
At high dynamicranges, offsource errors trace the spectral response functions for higherorder Taylor terms. As long as they are visible above the noise, they can be eliminated with a higherorder polynomial fit. Figure 5 shows a set of four images of the 3C 286 field made from wideband EVLA data taken in October 2010. These data consist of four EVLA snapshots spread across 90 min, and cover a frequency range of 1.02 GHz to 2.1 GHz (800 MHz usable bandwidth after accounting for radiofrequencyinterference). 3C 286 is 14.4 Jy/beam at 1.5 GHz, with a spectral index of –0.47. The four panels correspond to MSMFS imaging runs with N_{t} = 1 (top left), N_{t} = 2 (top right), N_{t} = 3 (bottom left), N_{t} = 4 (bottom left), all with N_{s} = 1 (a δfunction). All images are displayed with the same greyscale levels, and show the levels at which the error patterns appear. The dynamic range (calculated as the peak brightness to the peak residual measured near the brightest peak) ranges from 1.6 × 10^{3} when spectral structure is ignored (N_{t} = 1), to 1.1 × 10^{5} with N_{t} = 4.
6.3. Propagation of multiscale errors
Deconvolution errors contribute to the onsource error in the Taylor coefficient images, and these errors propagate to the spectral index map which is computed as a ratio of two coefficient images (Iα = I_{1}/I_{0}). Errors that result when a pointsource flux model is used for extended emission can increase the error bars on the spectral index and curvature by an order of magnitude (as demonstrated by the example shown in Fig. 1). These errors are approximately given by (39)where △ I_{0} and △ I_{1} are the absolute errors measured in the first two Taylorcoefficient images. For the example shown in Fig. 1, the errors on the coefficient images were measured as the rms of the deviation from the truth images within the region of high signaltonoise. These errors result in a prediction of △ Iα ≈ 0.05 for the multiscale version, and △ Iα ≈ 0.7 for the pointsource version, which approximately matches the rms of the deviation of the output α image from the truth α image.
Fig. 6 Moderately resolved sources – singlechannel images: these figures show the 6 singlechannel images generated from simulated EVLA data between 1 and 4 GHz in the EVLA Dconfiguration. The sky brightness consists of two point sources, each of flux 1.0 Jy at a reference frequency of 2.5 GHz and separated by 18 arcsec. Their spectral indices are +1.0 (top source) and −1.0 (bottom source). The angular resolution at 1 GHz is 60 arcsec, and at 4 GHz is 15 arcsec and the circles in the lower left corner of each image shows the resolution element decreasing in size as frequency increases. 
6.4. Frequency dependence of the primary beam
When imaging wide fieldsofview, sources away from the pointing center will be attenuated by the value of the primary beam at each frequency. The EVLA primarybeams across a 2:1 bandwidth contribute an extra spectralindex of –1.4 at the halfpower point (measured from simulated beams, as well as a Gaussian approximation of the main lobe of the primary beam (Sault & Wieringa 1994)). Therefore, even if a source has a flat spectrum, this artificial spectral index can result in imaging artifacts at the levels described in Sect. 6 (i.e. a dynamic range limit of ~10^{3} for a flat spectrum source at the HPBW, due only to the spectral variation of the primary beam). This effect can be corrected as a postdeconvolution step, or by using widefield imaging algorithms along with MSMFS. So far, accurate postdeconvolution corrections have been demonstrated out to the 10% level of the highestfrequency primarybeam.
Fig. 7 Moderately resolved sources – MSMFS Images: these images show the intensity at 2.5 GHz (left), the spectral index showing a gradient between − 1 and +1 (middle) and the spectral curvature which peaks between the two sources and falls off on either side (right). The angular resolution of these images is 15 arcsec, corresponding to the highest frequency in the data (and Fig. 6). 
7. Imaging performance (nonstandard conditions)
This section describes a set of simulations that test the limits of MSMFS for different types of source structure. Three cases are studied; sources unresolved at the lowest sampled frequency and resolved at the upper end, sources whose visibility functions lie mostly within the central unsampled region of the uvplane near the origin, and bandlimited signals.
7.1. Moderately resolved sources
Consider a source with broadband continuum emission and spatial structure that is unresolved at the lowfrequency end of the band and resolved at the highfrequency end. Traditionally, spectral information would be available only at the angular resolution offered by the lowest observed frequency. With MSMFS, the intensity distribution as well as the spectral index of such emission can be imaged at the angular resolution allowed by the highest frequency in the band. This is because compact emission has a signature all across the spatialfrequency plane and its spectrum is well sampled by the measurements. The highest frequencies constrain the spatial structure, and the flux model (in which a spectrum is associated with each flux component) naturally fits a spectrum at the angular resolution at which the spatial structure is modeled. Such a reconstruction is modeldependent and will be accurate only if the spectrum at the smallest measured scales can really be represented as a polynomial.
EVLA simulation:
wideband EVLA data were simulated for the Dconfiguration across a frequency range of 3.0 GHz with 6 frequency channels between 1 and 4 GHz (600 MHz apart). This wide frequency range was chosen to emphasize the difference in angular resolution at the two ends of the band (60 arcsec at 1 GHz, and 15 arcsec at 4.0 GHz). The sky brightness chosen for this test consists of a pair of point sources separated by a distance of 18 arcsec (about one resolution element at the highest frequency), making this a moderately resolved source. These point sources were given different spectral indices ( + 1.0 for the top source and − 1.0 for the bottom one). Figure 6 shows the six singlechannel images of this source. At the low frequency end, the source is almost indistinguishable from a single flux component centered at the location of the bottom source whose flux peaks at the lowfrequency end. The source structure becomes apparent only in the higher frequencies where the top source (with a positive spectral index) is brighter.
MSMFS Imaging results:
these data were imaged using the MSMFS algorithm with N_{t} = 3 and N_{s} = 1 with only one spatial scale (a δfunction), and Fig. 7 shows the results. The intensity distribution, spectral index and curvature of this source were recovered at the angular resolution allowed by the 4.0 GHz samples (15 arcsec). This demonstrates that an appropriate flux model can constrain the solution accurately at the angular resolution given by the highest sampled frequency.
7.2. Emission at large spatial scales
Consider a very large (extended) flatspectrum source whose visibility function falls mainly within the central hole in the uvcoverage. The size of this central hole increases with observing frequency. The minimum spatialfrequency sampled per channel will measure a decreasing peak flux level as frequency increases. Since the reconstruction below the minimum spatialfrequency involves an extrapolation of the measurements and is unconstrained by the data, these decreasing peak visibility levels can be mistakenly interpreted as a source whose amplitude itself is decreasing with frequency (a lessextended source with a steep spectrum). Usually, a physicallyrealistic flux model suffices as a constraint, but with the MSMFS model (polynomial spectra associated with 2D Gaussianlike components), a large flatspectrum source and a smaller steepspectrum source are both allowed and considered equally probable. This creates an ambiguity between the reconstructed scale and spectrum that cannot always be resolved directly from the data, and requires additional information (a lowfrequency narrowband image at the reference frequency to constrain the spatial structure, or lowresolution spectral information).
Fig. 8 Very large spatial scales – intensity, spectral index, residuals: these images show the intensity distribution (left), spectral index (middle) and the residuals (right) for two imaging runs. The top row shows results without shortspacing information, and shows a false negative α ≈ −0.8 for the extended emission. The bottom row shows results with shortspacing constraints, in which the extended source has been reconstructed correctly. 
EVLA simulation:
wideband EVLA data were simulated for the Dconfiguration across a frequency range of 3.0 GHz centred at 2.5 GHz. (6 frequency channels located 600 MHz apart between 1.0 and 4.0 GHz). The size of the central hole in the uvcoverage was increased by flagging all baselines shorter than 100 m and the wide frequency range was chosen to emphasize the difference between the largest spatial scale measured at each frequency (0.3 kλ or 10.3 arcmin at 1.0 GHz, and 1.3 kλ or 2.5 arcmin at 4.0 GHz). The sky brightness chosen for this test consists of one large flatspectrum (α = 0.0) 2D Gaussian whose FWHM is 2.0 arcmin (corresponding to 1.6 kλ, at the reference frequency of 2.5 GHz), and one steep spectrum pointsource (α = −1.0) located on top of this extended source at 30 arcsec away from its peak.
Fig. 9 Very large spatial scales – visibility plots: these plots show the observed (left) and reconstructed (right) visibility functions for a simulation in which a large extended flatspectrum source is observed with an interferometer with a large central hole in its uvcoverage. The colours/shades in these plots represent 6 frequency channels spread between 1 and 4 GHz. The top row of plots shows that when no shortspacing information was used, these data can be mistakenly fit using a lessextended source with a steep spectrum. The bottom row shows that the inclusion of shortspacing information is sufficient to reconstruct the sky brightness distribution correctly. 
MSMFS imaging results:
these data were imaged using the MSMFS algorithm with N_{t} = 3 and N_{s} = 3 with scale sizes given by [0, 10, 30] pixels. Two imaging runs were performed with these parameters, without and with shortspacing information. Figure 8 shows images of the intensity (left), spectral index (middle) and residuals (right) for these two runs, and Fig. 9 shows the visibility amplitudes present in the simulated data (left column) as well as in the reconstructed model (right column) at each of the 6 frequencies. In both figures, the top and bottom rows correspond to runs without and with shortspacing information respectively and each pair (in Fig. 8) are displayed at the same intensity scale.

1.
The first imaging run used only baselines longer than100 m to simulate a large central hole in theuvcoverage. The spectrum of the pointsource was correctlyestimated as − 1.0, but the extended source acquired a falsesteepspectrum (α ≈ −0.8). The algorithm was able to reconstruct thecorrect flux and spectrum of the extended source only if themultiscale basis functions were carefully chosen to match theknown scale size (i.e. a stronger a prioriconstraint). This shows that without additional constraints, it isnot always possible to distinguish large flatspectrum sourcefrom a slightly lessextended steep spectrum source.

2.
A second imaging run was performed including additional information in the form of shortspacing constraints, added in by retaining a small number of very shortbaseline measurements at each frequency (baselines shorter than 25 m). The visibility plots and imaging results with this dataset show that the shortspacing flux estimates were sufficient to bias the solution towards the correct solution (flatspectrum extended source).

3.
The image residuals are at the same level in both runs, demonstrating that in the absence of additional shortspacing information, both flux models are equally poorly constrained by the data themselves. One way to avoid this problem altogether (but lose some information) is to flag all spatialfrequencies smaller than u_{min} at ν_{max} and not attempt to reconstruct any spatial scales larger than what ν_{max} allows.
7.3. Bandlimited signals
Consider a source of radiation where frequency traces different physical structures in the source (as opposed to a fixed structure with each point emitting broadband radiation). For such sources, emission may be detected in only part of the sampled frequency range and is for all practical purposes a bandlimited signal.
Since the MSMFS algorithm uses a polynomial to model the spectrum of the source (and is not restricted to a powerlaw spectrum) it should be able to reconstruct such structure as long as it varies smoothly. However, for a bandlimited signal, the angular resolution at which the structure can be mapped will be limited to the resolution of the highest frequency at which the signal is detected (and not the highest resolution allowed by the measurements).
EVLA Dconfiguration data were simulated for the bandlimited radiation observed with synchrotron emission from solar prominences where different frequencies probe different depths in the solar atmosphere. The structures are generally archlike with lower frequencies sampling the top of the loop and higher frequencies sampling the legs. The MSMFS algorithm was run on these simulated data, using N_{t} = 5 to fit a 4thorder polynomial to the source spectrum (to accomodate its nearly bandlimited nature) and N_{s} = 3 with scales given by [0, 10, 30] pixels. Iterations were terminated after 200 iterations.
Figure 10 shows a comparison of the true and reconstructed structure at three different frequencies (1.2 GHz (left), 1.8 GHz (middle) and 2.6 GHz (right) ). The top row shows the true structure, and the bottom row is the reconstruction. The 3D structure is mostly recovered, with the largest errors being in the central region where the signal spans the shortest bandwidth. The point source on the right edge of the loop was reconstructed at an angular resolution slightly lower than that of the highest sampled frequency and corresponds to the highest frequency at which this spot is brighter than the background emission. Figure 11 compares the true and reconstructed spectra for three positions in the source (left leg (left), arch of the loop (middle), and right leg (right)). These spectra show the accuracy to which a 4thorder polynomial with MSMFS was able to reconstruct the structure.
Spectrallines are an extreme case of a bandlimited signal, and the use of MSMFS imaging is restricted to obtaining a wideband model of the continuum flux from linefree channels, to be subtracted out of the data before spectralline strengths are studied.
Fig. 10 Bandlimited signals – multifrequency images: these images show a comparison between the true sky brightness (top row) and the brightness reconstructed using the MSMFS algorithm (bottom row) at a set of three frequencies (1.0, 1.8, and 2.6 GHz from left to right). 
Fig. 11 Bandlimited signals – spectra across the source: these plots show the true (downarrows) and reconstructed (uparrows) spectra at different locations for the example discussed in this section (shown in Fig. 10). The left column corresponds to the left end of the loop at the location of the leg and shows smooth structure stretching almost all across the band. The middle column corresponds to the middle of the source where the only structure in the lineofsight is the upper part of the loop (emission at a small fraction of the band). The right column shows spectra for the brightest point on the right end of the loop. 
8. Discussion
The introduction of broadband receivers into radio interferometry has opened up new opportunities for the study of wideband continuum emission from a vast range of astrophysical objects. With imaging algorithms that account for the frequency dependence of the incoming radiation as well as of the instrument, we can minimize imaging artifacts, achieve continuum sensitivity and reconstruct spatial and spectral structure at the angularresolution allowed by the highest observed frequency.
The MSMFS algorithm models the wideband sky brightness distribution as a collection of multiscale flux components whose amplitudes follow a Taylorpolynomial in frequency. The data products are a set of Taylorcoefficient images, which can either be interpreted in terms of continuum intensity, spectral index and curvature, or used to evaluate a spectral cube, or serve as a wideband model for selfcalibration or continuumsubtraction. For widefield imaging, multiple pointingcenters (mosaicing) and the wterm are accounted for during imageformation, and the effect of the primary beam and its frequencydependence is approximately corrected in a postdeconvolution step.
For point sources for which standard imaging has a dynamicrange limit of a few thousand, this algorithm achieves dynamic ranges >10^{5} on test observations with the EVLA, and >10^{6} on noisefree simulations. For sources with smooth continuum spectra, it is able to reconstruct spectral information at the angular resolution allowed by the combined multifrequency uvcoverage. However, if the visibility functions of very largescale emission lie mostly within the central unsampled region of the uvplane, the algorithm requires a priori knowledge about the spectral structure at those scales. For high SNR data, errorbars on the spectralindex images range from <0.02 when multiscale basis functions are chosen appropriately, up to >0.2 in the extreme case of modeling smooth extended emission with a set of δfunctions. In practice, onsource errors of △ α ≈ 0.1 have been achieved. For low SNR extended structure (<10), errors due to overfitting can dominate when a highorder polynomial is used.
There are several directions in which wideband imaging techniques need to be extended and improved. The imaging errors in MSMFS are currently dominated by the multiscale aspect of the algorithm, and methods that do adaptive fits (Bhatnagar & Cornwell 2004) and use more a priori information about largescale spectra may be more appropriate. Methods that adaptively find the optimal number of parameters to operate with would help in errorcontrol. Implementations of algorithms such as MSCLEAN and MSMFS are inefficient in memoryuse, and other approaches may be required for large image sizes. For widefield imaging, the time variation of the antenna primary beams must be taken into account during imaging, and wideband methods combined with algorithms for directiondependent corrections (Bhatnagar et al. 2008, or Smirnov 2011). For fullStokes wideband imaging, where a Taylorpolynomial in frequency is not the most appropriate basis function to model Stokes Q,U,V emission, wideband imaging with other flux models must be tried.
In this paper, superscripts for vectors and matrices indicate type (model, sky, observed, dirty, residual, etc.), and subscripts in italics indicate enumeration indices (t,q for Taylorterm, s,p for spatial scale, ν for frequency channel). Nonitalic subscripts indicate specific values of the enumerated indices (for example, I_{υ0}, I_{0} or I_{α}).
Wideband imaging algorithms described in Conway et al. (1990) and Sault & Wieringa (1994) use a fixed spectral index across the band, and handle slight curvature by performing multiple rounds of imaging after removing the dominant or average α at each stage. They also suggest using higher order polynomials to handle spectral curvature.
Conway et al. (1990) state that the logarithmic expansion has better convergence properties than the linear expansion when α ≪ 1. An even more compact representation is a polynomial in log I vs. log ν, but it becomes numerically unstable to operate on logarithms and exponentials of pixel amplitudes, especially in the presence of noise.
Appendix A contains an explanation of the matrix notation used here, and briefly describes standard radiointerferometric imagereconstruction within a leastsquares modelfitting framework (measurement equations, normal equations, and iterative χ^{2} minimization).
The principal solution (as defined in Bracewell & Roberts (1954) and used in Cornwell et al. (1999)) is a term specific to radio interferometry and represents the dirty image normalized by the sum of weights. It is the image formed purely from the measured data, with no contribution from the invisible distribution of images (unmeasured spatialfrequencies). It is also an approximate solution of the normal equations [H]I^{sky} = I^{dirty} (see Appendix A.2), calculated using a diagonal approximation of the Hessian. Each element on the diagonal is the peak of the PSF, which is also the sum of weights. For isolated sources, the values measured at the peaks of the principal solution (dirty) images are the true sky values. The CLEAN minorcycle algorithm uses this fact to estimate source fluxes from the peaks of the normalized dirty image.
The following definition of orthogonality is used here. Two vectors are orthogonal if their inner product is zero. The orthogonality of a pair of scale functions is measured by the integral of the product of their uvtaper functions. To account for uvcoverage, this integral is weighted by the sampling function.
A Cholesky decomposition factors a symmetric positivedefinite matrix into a lower triangular matrix and its conjugate transpose. It is used in the solution of system of equations [A]x = b where [A] is symmetric positivedefinite. The normal equations of a linear leastsquares problem in which the signal is modeled as a linear combination of basis functions, are usually in this form (Press et al. 1988).
As pointed out in Sect. 2.6, this will be accurate only for isolated point sources that were left out of the minor cycle.
The spectral index between two frequency bands A and B will be denoted as α_{AB}. For example, the symbol α_{PL} corresponds to the frequency range between Pband (327 MHz) and Lband (1.4 GHz), and α_{LL} corresponds to two frequencies within Lband (here, 1.1 and 1.8 GHz). A similar convention will be used for spectral curvature β.
Conway et al. (1990) comment on a bias that occurs with a 2term Taylor expansion, due to the use of a polynomial of insufficient order to model an exponential.
There are two definitions of bandwidth ratio that are used in radio interferometry. One is the ratio of the highest to the lowest frequency in the band, and is denoted as ν_{high}:ν_{low}. Another definition is the ratio of the total bandwidth to the central frequency (ν_{high} − ν_{low})/ν_{mid} expressed as a percentage. For example, the bandwidth ratio for ν_{low} = 1.0 GHz, ν_{high} = 2.0 GHz is 2:1 and 66%.
The convolution of two vectors a ⋆ b is equivalent to the multiplication of their Fourier transforms. A 1–D convolution operator is constructed from a and applied to b as follows. Let [A] = diag(a). Then, a ⋆ b = [F^{†}diag( [F]a)F]b = [C]b. Here, [F] is the Discrete Fourier Transform (DFT) operator. [C] is a Toeplitz matrix, with each row containing a shifted version of a. Multiplication of [C] with b implements the shiftmultiplyadd sequence required for the process of convolution.
Acknowledgments
The authors would like to thank the National Radio Astronomy Observatory, the New Mexico Institute of Mining and Technology, and the Australia Telescope National Facility for support during the Ph.D. Thesis project that resulted in this algorithm and its implementation within CASA. The authors would like to thank J.A. Eilek in particular, for extremely helpful comments on the presentation of the thesis material that formed the basis of this paper. The authors would also like to thank S.Bhatnagar, K. Golap, R. Nityananda, F. N. Owen, R. J. Sault, and M. A. Voronkov, among others, for useful discussions pertaining to this work and its software implementation. This project used data from the (E)VLA telescope (test observations and project AR664) operated by the National Radio Astronomy Observatory, a facility of the National Science Foundation operated under cooperative agreement by Associated Universities, Inc.
References
 Bhatnagar, S., & Cornwell, T. J. 2004, A&A, 426, 747 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Bhatnagar, S., Cornwell, T. J., Golap, K., & Uson, J. M. 2008, A&A, 487, 419 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Bong, S.C., Lee, J., Gary, D. E., & Yun, H. S. 2006, ApJ, 636, 1159 [NASA ADS] [CrossRef] [Google Scholar]
 Bracewell, R. N., & Roberts, J. A. 1954, Aust. J. Phys., 7, 615 [NASA ADS] [CrossRef] [Google Scholar]
 Carilli, C. L., & Barthel, P. D. 1996, A&ARv, 7, 1 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Carilli, C. L., Perley, R. A., Dreher, J. W., & Leahy, J. P. 1991, ApJ, 383, 554 [NASA ADS] [CrossRef] [Google Scholar]
 Conway, J. E., Cornwell, T. J., & Wilkinson, P. N. 1990, MNRAS, 246, 490 [NASA ADS] [Google Scholar]
 Cornwell, T. J. 2008, IEEE Journal of Selected Topics in Sig. Proc., 2, 793 [Google Scholar]
 Cornwell, T., Braun, R., & Briggs, D. S. 1999, in Synthesis Imaging in Radio Astronomy II, ed. G. B. Taylor, C. L. Carilli, & R. A. Perley, ASP Conf. Ser., 180, 151 [Google Scholar]
 de Vos, M., Gunst, A. W., & Nijboer, R. 2009, IEEE Proc., 97, 1431 [Google Scholar]
 Deboer, D. R., Gough, R. G., Bunton, J. D., et al. 2009, IEEE Proc., 97, 1507 [NASA ADS] [CrossRef] [Google Scholar]
 Greisen, E. W., Spekkens, K., & van Moorsel, G. A. 2009, Astron. J., 137, 4718 [Google Scholar]
 Likhachev, S. 2005, in Future Directions in High Resolution Astronomy, ed. J. Romney, & M. Reid, ASP Conf. Ser., 340, 608 [Google Scholar]
 Owen, F. N., Eilek, J. A., & Kassim, N. E. 2000, ApJ, 543, 611 [NASA ADS] [CrossRef] [Google Scholar]
 Perley, R., Napier, P., Jackson, J., et al. 2009, IEEE Proc., 97, 1448 [Google Scholar]
 Press, W., Flannery, B., Teukolsky, S., & Vetterling, W. 1988, Numerical Recipes in C. (Cambridge: Press Syndicate of the University of Cambridge) [Google Scholar]
 Rau, U. 2010, Ph.D. Thesis, New Mexico Institute of Mining and Technology, Socorro, NM, USA [Google Scholar]
 Rau, U., Bhatnagar, S., Voronkov, M. A., & Cornwell, T. J. 2009, IEEE Proc., 97, 1472 [Google Scholar]
 Rottmann, H., Mack, K., Klein, U., & Wielebinski, R. 1996, A&A, 309, L19 [NASA ADS] [Google Scholar]
 Sault, R. J., & Wieringa, M. H. 1994, A&AS, 108, 585 [NASA ADS] [Google Scholar]
 Smirnov, O. M. 2011, A&A, 527, A106 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Appendix A: Matrix notation and framework
The matrix notation used in this paper is explained here, in the context of an iterative χ^{2} minimization process used to solve a system of linear equations. The basic idea is as follows. Let Ax = b be the system of equations to be solved (measurement equations). The goal is to find a set of parameters x that minimizes χ^{2} = (Ax − b)^{†}W(Ax − b). Setting grad(χ^{2}) = 0 to minimize χ^{2} leads to a new system of equations called the normal equations [A^{†}WA]x = [A^{†}]Wb. The matrix on the LHS is called the Hessian [H] = [A^{†}WA]. Iterations begin with an initial guess for the parameters x. These parameters are updated in iteration i as x_{i + 1} ← x_{i} + g [H^{1}]A^{†}W(b − Ax_{i}), where g controls the stepsize. Iterations continue until a convergence criterion is satisfied. The basic iterative framework used in most imaging and deconvolution algorithms in radio interferometry can be described using this matrix notation (Rau et al. 2009; Rau 2010).
A.1. Measurement equations
The measurement equation of an imaging instrument describes its transfer function (the effect of the measurement process on the input signal). For an ideal interferometer (a perfect spatialfrequency filter, with no instrumental gains), it can be written as follows in matrix notation as follows. Let be a pixelated image of the sky and let be a vector of n visibilities. Let S_{n × m} be a projection operator that describes the instrument’s sampling function (uvcoverage) as a mapping of m discrete spatial frequencies (pixels on a grid) to n visibility samples (usually n > m). Let F_{m × m} be the Fourier transform operator. Then, the measurement equations become .
A.2. Normal equations
The normal equations are the linear system of equations whose solution gives a weighted leastsquares estimate of a set of parameters in a model (χ^{2} minimization). For an ideal interferometer, it is given by where W_{n × n} is a diagonal matrix of weights and S^{†} denotes the mapping of measured visibilities onto a spatialfrequency grid. We can write the Normal equations as where the Hessian (matrix on the LHS) is by construction a convolution operator^{19} with a shifted version of the pointspread function in each row. The dirty image on the RHS is produced by direct Fourier inversion of weighted visibilities. The normal equations therefore state that the dirty image is the result of the convolution of with . The solution of these normal equations represents a deconvolution.
A.3. Iterative solution
Most existing iterative image reconstruction algorithms in radio interferometry consist of major and minor cycles. Major cycles compute the RHS of the normal equations, and minor cycles perform approximate (implicit) Hessian inversions to calculate updates for the sky model parameters I^{m}. The first major cycle starts by transforming the observed visibilities into an image as I^{dirty} = [F^{†}S^{†}W]V^{obs}, and initializing the skymodel I^{m}. Minor cycle steps do a deconvolution to calculate updates to the model I^{m}. After several iterations, this updated model is fed into the next major cycle. This and all subsequent major cycles calculate model visibilities from the current model V^{m} = [SF]I^{m}, calculate residual visibilities V^{res} = V^{obs} − V^{m}, and transform these residuals into images I^{res} = [F^{†}S^{†}W]V^{res}. These residual images replace the initial dirty image, and a new set of minorcycle iterations are done. This process continues until a convergence criterion is reached. Usually, convergence is defined as I^{res} being noiselike with no signal left to be extracted in the minorcycle.
Appendix B: MSMFS as implemented in CASA
The MSMFS algorithm described in Sect. 2.7 has been implemented and released via the CASA software package (version 3.1 onwards). More recently, it has been implemented in the ASKAPsoft^{20} package. Algorithm 1 contains a pseudocode listing.
The main parameters that control the algorithm are

1.
ν_{0}: a reference frequency chosen near the middle of the sampled frequency range, about which the Taylor expansion is performed,

2.
N_{t}: the number of coefficients of the Taylor polynomial to solve for, and

3.
N_{s} and : a set of scale sizes in units of image pixels to use for the multiscale representation of the image. In order to always allow for the modeling of unresolved sources, the first scale function is forced to be a δfunction.
The data products are N_{t} Taylorcoefficient images, a spectral index image, and a curvature image (if N_{t} > 2). This wideband image model can be used within a standard selfcalibration loop.
All Figures
Fig. 1 MSMFS imaging results using simulated EVLA data: these images compare truth images (left column) with the results of two wideband imaging runs; multiscale (middle column) and pointsource (right column). The top three rows represent the first three Taylorcoefficient images, and the fourth and fifth rows show spectral index and spectral curvature respectively. 

In the text 
Fig. 2 Cygnus A – total intensity and spectralindex: this figure shows the MSMFS total intensity map (top), and spectralindex maps obtained via three methods – MSMFS (second row), a hybrid singleband method (third row), and from highresolution fullsynthesis narrowband images at 1.4 and 4.8 GHz (bottom). The MSMFS spectral index map has the angularresolution of the total intensity map, and agrees with the values in the highresolution comparison map (α = −0.5 at the hotspots increasing to α ≈ − 1.0 in the halo). However, the hybrid method resulted in a map with a wider range of values (positive and negative) and is at a much lower angular resolution. 

In the text 
Fig. 3 M 87 core/jet/lobe – intensity, spectral index and curvature: these images show 3arcsec resolution maps of the central bright region of M 87 (core+jet and inner lobes). The quantities displayed are the intensity at 1.5 GHz (top left), the spectral index (top middle) and the spectral curvature (top right). The spectral index is near zero at the core, varies between − 0.36 and − 0.6 along the jet and out into the lobes. The spectral curvature is on average 0.5 which translates to △ α = 0.2 across Lband. The plot at the bottom compares the resulting average spectrum (solid line) with that formed by imaging each spectralwindow separately (dots). The dashed and dotted lines correspond to fixed values of spectral index (–0.43, –0.53, –0.63). 

In the text 
Fig. 4 Peak residuals and Errors for MFS with different values of N_{t}: These plots show the measured peak residuals (top left) and the errors on I_{υ0} (top right), I_{α} (bottom left), and I_{β} (bottom right) when a pointsource of flux 1.0 Jy and α = −1.0 was imaged using Taylor polynomials of different orders (N_{t} = 1−7) and a linear spectral basis. The xaxis of all these plots show the value of N_{t} used for the simulation and plots for α and β begin from N_{t} = 2 and N_{t} = 3 respectively. An example of how to read these plots: for a 2:1 bandwidth ratio, a source with spectral index = –1.0 and N_{t} = 4, the achievable dynamic range (measured as the ratio of the peak flux to the peak residual near the source) is about 10^{5}, the error on the peak flux at the reference frequency is 1 part in 10^{3}, and the absolute errors on α anre β are 10^{2} and 10^{1} respectively. 

In the text 
Fig. 5 Stokes I images of the 3C 286 field, using MSMFS with N_{t} = 1 (top left), N_{t} = 2 (top right), N_{t} = 3 (bottom left), N_{t} = 4 (bottom left). The peak flux of 3C 286 is 14 Jy/beam. The axes labels are in units of arcminutes from the pointing center and all images are shown with the same greyscales. The residual rms near 3C 286 for these four images are 9 mJy, 1 mJy, 200 µJy and 140 µJy, and the rms measured 1 degree away from 3C 286 are 1 mJy, 200 µJy , 85 µJy and 80 µJy. The thermalnoise limit for this dataset was 70 µJy. 

In the text 
Fig. 6 Moderately resolved sources – singlechannel images: these figures show the 6 singlechannel images generated from simulated EVLA data between 1 and 4 GHz in the EVLA Dconfiguration. The sky brightness consists of two point sources, each of flux 1.0 Jy at a reference frequency of 2.5 GHz and separated by 18 arcsec. Their spectral indices are +1.0 (top source) and −1.0 (bottom source). The angular resolution at 1 GHz is 60 arcsec, and at 4 GHz is 15 arcsec and the circles in the lower left corner of each image shows the resolution element decreasing in size as frequency increases. 

In the text 
Fig. 7 Moderately resolved sources – MSMFS Images: these images show the intensity at 2.5 GHz (left), the spectral index showing a gradient between − 1 and +1 (middle) and the spectral curvature which peaks between the two sources and falls off on either side (right). The angular resolution of these images is 15 arcsec, corresponding to the highest frequency in the data (and Fig. 6). 

In the text 
Fig. 8 Very large spatial scales – intensity, spectral index, residuals: these images show the intensity distribution (left), spectral index (middle) and the residuals (right) for two imaging runs. The top row shows results without shortspacing information, and shows a false negative α ≈ −0.8 for the extended emission. The bottom row shows results with shortspacing constraints, in which the extended source has been reconstructed correctly. 

In the text 
Fig. 9 Very large spatial scales – visibility plots: these plots show the observed (left) and reconstructed (right) visibility functions for a simulation in which a large extended flatspectrum source is observed with an interferometer with a large central hole in its uvcoverage. The colours/shades in these plots represent 6 frequency channels spread between 1 and 4 GHz. The top row of plots shows that when no shortspacing information was used, these data can be mistakenly fit using a lessextended source with a steep spectrum. The bottom row shows that the inclusion of shortspacing information is sufficient to reconstruct the sky brightness distribution correctly. 

In the text 
Fig. 10 Bandlimited signals – multifrequency images: these images show a comparison between the true sky brightness (top row) and the brightness reconstructed using the MSMFS algorithm (bottom row) at a set of three frequencies (1.0, 1.8, and 2.6 GHz from left to right). 

In the text 
Fig. 11 Bandlimited signals – spectra across the source: these plots show the true (downarrows) and reconstructed (uparrows) spectra at different locations for the example discussed in this section (shown in Fig. 10). The left column corresponds to the left end of the loop at the location of the leg and shows smooth structure stretching almost all across the band. The middle column corresponds to the middle of the source where the only structure in the lineofsight is the upper part of the loop (emission at a small fraction of the band). The right column shows spectra for the brightest point on the right end of the loop. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.