Issue 
A&A
Volume 675, July 2023



Article Number  A116  
Number of page(s)  14  
Section  Astronomical instrumentation  
DOI  https://doi.org/10.1051/00046361/202245013  
Published online  07 July 2023 
Deep learningbased deconvolution for interferometric radio transient reconstruction^{★}
^{1}
Université ParisSaclay, Université Paris Cité, CEA, CNRS, AIM,
Orme des Merisiers,
91191
GifsurYvette, France
email: benjamin.chiche@cea.fr
^{2}
Safran Electronics & Defense,
100 avenue de Paris,
91344
Massy, France
^{3}
LESIA, Université Paris Cité, Observatoire de Paris, Université PSL, CNRS, Sorbonne Université,
5 place Jules Janssen,
92195
Meudon, France
Received:
19
September
2022
Accepted:
10
April
2023
Context. Radio astronomy is currently thriving with new large groundbased radio telescopes coming online in preparation for the upcoming Square Kilometre Array (SKA). Facilities like LOFAR, MeerKAT/SKA, ASKAP/SKA, and the future SKALOW bring tremendous sensitivity in time and frequency, improved angular resolution, and also highrate data streams that need to be processed. They enable advanced studies of radio transients, volatile by nature, that can be detected or missed in the data. These transients are markers of highenergy accelerations of electrons and manifest in a wide range of temporal scales (e.g., from milliseconds for pulsars or fast radio bursts to several hours or days for accreting systems). Usually studied with dynamic spectroscopy of time series analysis, there is a motivation to search for such sources in large interferometric datasets. This requires efficient and robust signal reconstruction algorithms.
Aims. To correctly account for the temporal dependency of the data, we improve the classical image deconvolution inverse problem by adding the temporal dependency in the reconstruction problem, and we propose a solution based on deep learning.
Methods. We introduce two novel neural network architectures that can do both spatial and temporal modeling of the data and the instrumental response. Then, we simulate representative timedependent image cubes of point source distributions and realistic telescope pointings of MeerKAT to generate toy models to build the training, validation, and test datasets. Finally, based on the test data, we evaluate the source profile reconstruction performance of the proposed methods and classical image deconvolution algorithm CLEAN applied framebyframe.
Results. In the presence of increasing noise level in data frame, the proposed methods display a high level of robustness compared to framebyframe imaging with CLEAN. The deconvolved image cubes bring a factor of 3 improvement in fidelity of the recovered temporal profiles and a factor of 2 improvement in background denoising.
Conclusions. The proposed neural networks are not iterative and can benefit from efficient GPUbased architectures. Consequently, they could unlock the development of realtime data processing at the initial cost of learning the behavior of the telescope. Radio observatories are performing sky surveys to produce deep images in ever larger fields of view, increasing the transient source access window both spatially and temporally. Our method can effectively capture the temporal structures that are present in such survey data.
Key words: techniques: image processing / instrumentation: interferometers / methods: data analysis
Vizualisations of our results are available on the following page: https://github.com/bjmch/DLRadioTransient
© The Authors 2023
Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.
1 Introduction
Nextgeneration radio facilities like LOFAR (van Haarlem et al. 2013), MeerKAT/SKA (Booth & Jonas 2012), ASKAP/SKA (Dewdney et al. 2009), and SKALOW (Scaife 2020) allow for high spectral, highrate, improved angular resolutions, and high instantaneous sensitivity. This is a notable improvement for studying transient radio sources via aperture synthesis using radio interferometers. These sources appear and disappear over time and can be randomly distributed in the sky. They are associated with highenergy physical phenomena (e.g., pulsars, rotating radio transients (RRATs), Solar System magnetized objects, and Lorimertype bursts; Lorimer et al. 2007) and, more generally, fast radio bursts (FRBs). Searching for such sources in large datasets produced by these instruments is a new challenge that requires competitive and efficient signal reconstruction algorithms.
Radio interferometers enable imaging via aperture synthesis based on processing the correlations between each pair of antenna signals. In the first approximation, a radio interferometer samples noisy Fourier components of the sky associated with its spatial frequencies (i.e., the visibilities; Wilson et al. 2009) inside the main field of view of the instrument. Under the smallfield approximation assumption, the sky can be approximated by computing the inverse Fourier transform of those Fourier samples. The number of baselines is limited, so the Fourier map is incomplete. Therefore, it is necessary to solve an “inpainting” (Garsden et al. 2015) inverse problem (i.e., an estimate lacking information in the Fourier plane). Another option is to switch from the inpainting problem in the visibility space to its equivalent deconvolution problem in the image space. Deconvolution of radio images from a static sky in the context of radio interferometry has been the subject of studies for several decades. Notably, Högbom (1974) designed the original CLEAN algorithm, which is still the most widespread basis for newer deconvolution algorithms in the community. Several variants of this algorithm have been subsequently proposed (Schwab 1984; Clark 1980). Improvements taking into account source morphology (Abrantes et al. 2009), spectral dependencies (Rau & Cornwell 2011), and sparse representations (Garsden et al. 2015; Nammour et al. 2022; Girard et al. 2015; Dabbech et al. 2015; Carin 2009; Carrillo et al. 2012; Wiaux et al. 2009) have also been investigated in recent years. When the sky contains transient sources, classical detection methods rely on framebyframe image analysis (e.g., with the LOFAR Transient Pipeline; Swinbank et al. 2015). However, framebyframe transient detection is subject to two observing biases: (i) a detection issue when the frames are derived from too short time integration data displaying a high noise level and, conversely, (ii) a “dilution” problem when time integration is too long to resolve the transient in time, resulting in a time smearing of the transient. Therefore, to account for these biases and sources that possess coherent structure in time, methods should be designed that directly account for the timecoherent structure (i.e., the source light curve) hidden in the signal. This also occurs in large radio surveys with interferometers. Mapping the visible radio sky requires an optimal use of the observing time and pointing location to reach a target angular resolution and sensitivity. While surveys mainly address the distribution of static sources (astrometry and flux density), transient radio sources might also occur during the short exposure (e.g., ~15 min pointing of MeerKAT) and be missed due to observing biases (detection and dilution). Therefore, finding a robust deconvolution method is key to optimizing both telescope time and transient detectability.
Recently, deep learning (DL) has contributed to achieving far better performance in solving inverse problems, such as image and video superresolution, deblurring, and denoising (Zhang et al. 2017a, 2018; Xu et al. 2014; Nah et al. 2019) and computed tomography (CT) and magnetic resonance imaging (MRI) reconstructions (Adler & Öktem 2018). In astrophysics, DL has recently been used for tasks such as parametric deconvolution (Tuccillo et al. 2018), optical image deconvolution (Sureau et al. 2020), and radio image deconvolution (Nammour et al. 2022; Terris et al. 2023) (and the equivalent inpainting in the Fourier domain; Schmidt et al. 2022). The success of neural networks in these tasks relies on their excellent capability to learn the image prior from data. Based on these premises, in this study we design novel DL architectures that improve radio image timeseries deconvolution in the context of radio transient source reconstruction. The structure of the article is the following. In Sect. 2 we review studies related to ours. Then, the inverse problem formulation is presented in Sect. 3. We introduce new neural network architectures to solve the problem in Sect. 4. Then, the performance of the proposed networks is evaluated through simulations: Sect. 5 details experimental setups and Sect. 6 presents results. Section 7 discusses hypotheses and limitations. Section 8 concludes our work.
2 Related work
2.1 Deconvolution in radio interferometry
Two approaches have been classically used in radio interferometry deconvolution: the variational maximum entropy method (Frieden 1972; Narayan & Nityananda 1986; Wernecke 1977) and the iterative CLEAN algorithm (Högbom 1974). Both methods generally perform well when dealing solely with point sources, but CLEAN is the most widespread technique in the radio astronomy community. This algorithm supposes a finite number of point sources. It restores them based on matching pursuit (Bergeaud & Mallat 1995) using a single basis vector and the impulse response (hereafter point spread function, PSF) of the telescope that made the observation. Clark (1980) proposed a variant of CLEAN by optimizing the algorithm with fast Fourier transform (FFT) and structuring the algorithm computations between major and minor cycles. Minor cycles are carried out in (gridded) image space, whereas major cycles fall in the ungridded visibility space. Going back and forth between these two spaces led to improvements in both fidelity and accuracy. This strategy was further developed in Schwab (1984). Abrantes et al. (2009) and Rau & Cornwell (2011) brought in further improvements by respectively taking into account the morphological and spectral behavior of the sources. More recently, several teams have addressed the deconvolution problem within the compressed sensing framework (Garsden et al. 2015; Girard et al. 2015; Dabbech et al. 2015; Carin 2009; Carrillo et al. 2012; Wiaux et al. 2009; Jiang et al. 2015), and then DL (Terris et al. 2023). Schmidt et al. (2022) also used DL, but solved the equivalent inpainting problem in the Fourier space. However, most of these methods do not take into account the temporal structures of timeevolving sources in the context of image timeseries deconvolution. The method presented in Jiang et al. (2015) is the only existing method that considered a temporal prior. However, this method is extremely slow in practice, and struggles when dealing with interferometric images with realistic dimensions. Moreover, it introduces many false detections in empty regions of the reconstructed skies. For these reasons, we do not consider it in our empirical study that is presented later in this paper.
2.2 Deep learning and inverse problems
DL has shown remarkable results in solving inverse problems in image and video restoration. In inverse problems, the aim is to reconstruct the original data x from some observed data y. They are related through a degradation model with the form y = D_{α}(Ax) (Pustelnik et al. 2016), where matrix A models linear transformations (such as blurring) and D_{α} models other degradations. This function is parameterized by α. Firstgeneration networks solving inverse problems (Xu et al. 2014; Dong et al. 2015; Kim et al. 2016) approximate x by learning in a supervised manner the inverse function from the data y, where θ are parameters of the network. These approaches perform well when the degradation model has fixed parameters (i.e., A and α are unique across the train and test data). Their performance deteriorates in a multiple degradation scenario (i.e., when A and α vary across the train and test data). In the case of radio interferometric observation, the blurring operator A depends on the time of each image and on the considered location in the sky, and thus changes between samples. Zhang et al. (2018) improved upon the firstgeneration networks by proposing a single convolutional neural network (CNN) that can handle multiple degradations. This network learns the mapping from the data. More recently, Gu et al. (2019) proposed another way to handle multiple degradations based on the spatial feature transform (SFT) layer introduced in Wang et al. (2018). This layer provides an affine transformation of its input feature maps conditioned on the knowledge about A and α.
Other strategies that can handle multiple degradations use iterative approaches, often derived from convex optimization algorithms, coupled with neural networks. We broadly distinguish two methods. One of them is Deep Unrolling, which consists in unfolding the iterative loop of a classical iterative algorithm with a given number of iterations and representing all operations as layers of a neural network (Zhang et al. 2020; Adler & Öktem 2018; Monga et al. 2021; Diamond et al. 2017; Chiche et al. 2020). The network can be trained and optimized based on data, while keeping the knowledge of the degradation in its internal structure. The network is notably factorized into a prior step that encodes learned image prior and a data step which leverages knowledge about the degradations. The other one is Plug & Play, which uses an iterative proximal algorithm, for instance in the framework of alternating direction method of multiplier plug & play (Venkatakrishnan et al. 2013; Sreehari et al. 2016; Chan et al. 2017) or regularization by denoising (Romano et al. 2017; Reehorst & Schniter 2018), and replaces the operator related to the prior by a denoising CNN (Meinhardt et al. 2017; Arjomand Bigdeli et al. 2017; Gupta et al. 2018; Terris et al. 2023) or a series of denoising CNNs trained in different settings (Zhang et al. 2017b).
However, the former can encounter a memory issue at training time if we want to unroll an important number of iterations. Moreover, its data step generally applies the linear operator A and its adjoint operator A^{T} to the input signal, diluting the signal too much if the operator is strongly illconditioned. The latter approach shares a drawback with classical iterative algorithms like CLEAN: it is slow because it requires an important number of iterations to deconvolve an image.
3 Problem
3.1 Interferometry imaging problem
This study deals with imaging by aperture synthesis from interferometric data. The limited number of antennas and observing baselines, time, and frequencies restrict the amount of accessible samples of the sky visibility function. In addition, these visibilities are subject to noise that can be modeled as an additive Gaussian noise in the first approximation. In a limited field of view and ignoring directionindependent or directiondependent effects and calibration issues, this illposed inverse problem can be expressed in the Fourier space (i.e., the measurement space) as (1)
where V_{y} is the collection of observed visibilities, V is the true visibility function, ϵ is approximated as an additive white Gaussian noise, and M is a sampling mask representing the limited access of an interferometer to the measurement (depending on antenna configuration and observational parameters).
Equation (1) can be rewritten as a deconvolution problem formulated in the image or direct space. This problem links the observed degraded sky image to the corresponding true sky image (2)
with ℱ^{−1}(Vy) = y, ℱ^{−1}(M) = h, ℱ^{−1}(V) = x, and ℱ^{−1}(M) * ℱ^{−1} (ϵ) = η, and where * denotes the convolution operation; y is the observed image, called the dirty image; h is the PSF, also called the dirty beam, which represents the sampling operation in Fourier space by the interferometer; and x is the ground truth (GT) image. We note that σ_{ϵ} is the noise level of ϵ. Thus, the corresponding variance for η can be obtained from σ = M_{2} • σ_{ϵ} = ℱ(h)_{2} • σ_{ϵ}.
3.2 Extension to transient imaging
To enable robust imaging of transient sources, instead of solving frame by frame, which can be subject to observation bias, we extend the problem in Eq. (2) to a deconvolution problem accounting for the temporal dependency of the different terms (i.e., sky, noise, and instrument sampling): (3)
Here ℱ^{−1} (Vy,t) = y_{t}, ℱ^{−1} (M_{t}) = h„ ℱ^{−1} (V_{t}) = x_{t} and ℱ^{−1} (M_{t}) * ℱ^{−1}(ϵ_{t}) = η_{t}. The noise level of η, is σ_{t}, = M_{t}_{2} • σ_{ϵ} = ℱ(h)_{2} • σ_{ϵ}. By stacking {y_{t}}_{t∈I}, {x_{t}}_{t∈I} and {h_{t}}_{t∈I} in the temporal dimension, we respectively obtain a dirty cube, a GT cube and a PSF cube. These cubes are threedimensional data structures denoted Y, X, and H, respectively; I is the set of T time steps ordered following the observation intervals. In the single image problem (2), h depends on the total time and frequency integration of observation, while the sky x is supposed to be static. As the sky naturally rotates over the instrument during the observation, the morphology of h depends on the interferometer location on Earth, declination of the source, and observing dates. In the dynamic imaging problem extension (3), the h_{t} operator has a time dependency (i.e., the instrumental response varies for consecutive single observing time intervals). As a result, both the sky and the interferometer responses vary over time. The associated mask M_{t} samples the Fourier transform of the sky at different time dates, enabling the possibility of capturing the temporal evolution of the observed sky.
We assumed that the datacube X only contains pointlike sources for simplification. We assume each source has an angular scale that is much smaller than the angular resolution brought by the PSF. In addition, we assume that the cube contains mixtures of sources with constant and varying flux densities over time. Their locations on the sky will be random but constant during the observation.
Because a radio interferometric dataset provides exact information on the baseline length and orientation for all samples, the morphology and time dependency of h_{t} are known for all t. Finally, we control the noise level σ_{ϵ} to mimic the various quality levels of the observations. Following the discussion in Sect. 2, we are thus in a multiple degradation scenario, as in Zhang et al. (2018).
In this study, for the sake of simplification, we focus on the monochromatic case where the observing frequency is fixed. We therefore do not deal with the dependency in frequency of the imaging problem.
4 Method
In the context of the timeseries image deconvolution problem (3), a single image deconvolution method, such as CLEAN, can be independently applied in a framebyframe approach. However, this method does not capture the temporal structure of the sky. A method capable of dealing with the temporal evolution of both telescope and the sky is required. Given recent successes of neural networks in various restoration tasks, we propose to solve problem (3) based on DL. In our setting, a network realizes the following mapping: . Its input contains a degraded cube and information about the degradation (H, σ_{ϵ}) in a nonblind manner. The symbol θ denotes parameters of the network that are learned from training data {Y_{i},X_{i},H_{i}, σ_{ϵ,i}}. We propose two implementations of the network N, called 2D1D Net and Deflation Net, respectively.
4.1 Multiple degradations
We adopt the following scheme to incorporate knowledge about the image formation model and handle multiple degradations.
Each h_{t} in H is originally of size l × l (with l = 256 in our study; see Sect. 5), but is first centercropped with crop size r×r(r = 96 in our study) and projected onto a bdimensional linear space by a PCA projection matrix . The matrix Ρ is learned from all PSFs that constitute the training PSF cubes. In our empirical study (which is detailed in Sects. 5 and 6), the value b = 50 explains 90% of the total variance. The higher and the closer to 1 this percentage is, the better it is (for example, Zhang et al. 2018 preserved about 99.8% of the energy of their training blurring kernels with PCA), but a value of b higher than 50 makes our DL models (detailed in the following) too large in terms of number of parameters and provokes a memory error at training time. We denote as this projected PSF, and is then concatenated with σ_{t} = M_{t}_{2} • σ_{ϵ} = T(h_{t})_{2} • σ_{ϵ•} We denote this vector
2DID Net. C_{r} denotes the operation that centercrops a 2D structure with crop size r.
4.2 2D 1D Net
We decouple N into two modules that sequentially process the input data. The first module is a network with 2D convolutional layers that successively and independently encode each image in the degraded cube into feature maps. Each of these encodings considers the PSF and the noise level used in the degradation. This transformation performs a deconvolution of the degraded image. This module is referred to as 2D Net. After these independent deconvolutions, the produced feature maps are stacked in an additional temporal dimension and given to the second module. This module captures temporal structures within the stacked maps and estimates the GT cube. The temporal profile of a source is continuous in time, which justifies this architectural choice. Because point sources do not spatially move in our study, the temporal structure is extracted based on ID convolutions layers along the time dimension. We call this module ID Net. By unifying the two modules, we build the entire network 2DID Net.
Algorithm 1 summarizes the 2DID Net. In the first place, each pair is given to 2D Net to produce an intermediate feature map z_{t} (lines 28). Figure 1 describes the structure of this subnetwork. This network is composed of an encoder that extracts input features and SFT Net that can manage multiple degradations. In this step, 2D Net deconvolves y, by using The SFT layer uses this vector to modulate feature maps. This layer was introduced in Wang et al. (2018) and used in Gu et al. (2019) for the first time to handle multiple degradations in inverse problems. This layer applies an affine transformation to the feature maps conditioned on the degradation maps which are obtained by stretching into size (b + 1) × height × width, where all the elements of the ith map equal the ith element of Here, height and width respectively indicate the height and width of input images and feature maps. We see that in our experiments, height = width = 256. The affine transformation involves scaling and shifting operations (4)
where γ and β are estimated by additional small convolutional layers and ⊙ is the Hadamard product.
Next, {z_{t}}_{t∈I} are stacked in the temporal dimension. This gives Z, a tensor of dimensions f×T× height × width (line 10). The output is passed to ID Net, composed of three ID convolutional layers working along the temporal dimension, interlaced with ReLU activations (line 11). The kernel size of each ID convolutional layer is set to 5. With three such layers, the temporal receptive field is 5 + 4 + 4= 13, which is enough considering the temporal extension of the experimental transient events in this work (see Sect. 5.1.2). Figure 2 details the ID Net architecture.
Fig. 1 2D Net. Each convolutional layer outputs ƒ feature maps, ƒ = 32 in our study. The kernel size of each convolutional layer is set to 3 × 3. 
Fig. 2 ID Net. {z_{t}}_{t∈I} has dimensions f×T× height × width. Each convolutional layer outputs ƒ feature maps, except the last, which outputs images with one channel each. Each ID convolutional layer has kernel size 5. 
4.3 Deflation Net
Algorithm 2 summarizes the Deflation Net. This network is based on the same subnetworks as 2DID Net but involves a different computation flow. Specifically, it decouples the reconstruction of constant and transient sources. In the first place, both input images {y_{t}}_{t∈I} and PSFs {h_{t}}_{t∈I} are averaged over the temporal dimension (lines 1 and 2). The average input image presents a reduced noise level (computed in line 4) compared to each image in the input cube. The average PSF is better conditioned than each PSF of the PSF cube. Then, the average image is deconvolved in the feature space based on the average PSF to give the average sky features (line 7). Next, this average sky is reconvolved by the corresponding PSF (line 14) and subtracted to the individual degraded sky in the feature space at each time step. Each resulting image only contains transient sources. This sky is then deconvolved based on the individual PSF to reconstruct transient sources (line 15). This individual deconvolved image and the average deconvolved sky are finally summed in the feature level via skip connections (line 16). This processing is done for each time step, and all outputs are then sent to the final 1D Net (lines 19 and 20).
Deflation Net. C_{r} denotes the operation that centercrops a 2D structure with crop size r.
5 Experiment
5.1 Datasets
We generate disjoint training, validation, and test sets of GT and PSF cubes at various noise levels. The corresponding dirty cubes are generated based on Eq. (3).
5.1.1 PSF cubes
We simulate the interferometric response of MeerKAT in the Lband, using its current 64antenna distribution. The observing frequency is fixed to 1420 MHz. The location of MeerKAT on Earth is Long = 21.33°, Lat = −30.83°, h = 1195 m. The minimum pointing elevation is set to 2 degrees. Based on these parameters, we randomly pick PSF cubes by sampling the local visible sky for a given observing time and observing rate. Each cube accounts for the typical eighthour tracking observation of a randomly selected source in the J2000 reference frame that is visible in the local sky during the eight hours. This period is divided into successive intervals of 15min scans. They produce a 15min integrated PSF associated with the (u,v) coverage computed toward the source’s current location. Each PSF is projected into a spatial grid of 256 × 256 pixels. Hence, the cube has the dimensions 32 × 256 × 256. Adopting the notations from Sect. 3.2, we have T = 32 and t_{i+1} − t_{i} = 15 min. We generated 435 training, 50 validation, and 50 test PSF cubes samples. For all of them the source elevations at the beginning of the observation are above 20 degrees as seen from MeerKAT. Figure 3 depicts the 32 (15 min) frames of a PSF cube. As the source appears to be rotating above the interferometer during the eighthour observation, the projected baselines vary, leading to a continuous rotation and warping of the PSF.
Figure 4 depicts the effective distribution of elevation in training PSFs set. This distribution is skewed toward 30° elevation in the case of the telescope at hand, MeerKAT. This is expected because the PSF simulation accounts for the visibility of sources, in the southern sky, as seen from MeerKAT. For any given observation duration (e.g., 8 h max) and integration (e.g., 15 min/image), we computed the telescope source tracking and its aperture projection, both required to derive the effective (u,v) coverage and therefore the PSF. An astrophysical source is associated with a unique pair of coordinates (declination δ and right ascension α) on the celestial sphere (see Fig. A.1 for an illustration). Depending on the source declination (δ, in the equatorial frame), hour angle (associated with right ascension α and time of observation), and observation duration, not all (α,δ) directions are accessible. Therefore, a uniform random distribution of directions, drawn from the accessible direction window in the sky, will appear skewed around the local elevation of the south celestial pole (SCP), located around 30° for MeerKAT. Circumpolar sources close to the SCP are always accessible.
The PCA projection matrix P that encodes knowledge about the PSF is learned from all PSFs of the train PSF set (i.e., from the 435 × 32 = 13 920 PSFs). The value of PCA components b = 5 explains 90% of the total variance. Figure 5 shows the ten main PCA eigenvectors that composed the PSF of the whole set.
5.1.2 GT cubes
We suppose the sources to be unresolved points sources in the sky image. The size of the pixel on the sky is fixed to 1.5″. We assume that within a sky image of size 30′ × 30′ (i.e., 1200 × 12 pixels), at most 30 constant and 2 transient point sources can be placed. This distribution of sources can be considered compatible with shallow imaging like that of MeerKAT. Following this distribution, we generate 39 000 training sky cubes with dimensions 32 × 256 × 256. These data are divided into three equal parts, containing zero, one, or two transient sources per field. Each validation and test set contains 66 cubes following the same distribution of sources, with half of them containing a transient source and the other half containing two. The source peak flux density (i.e., the value of the pixel of a constant source in the ground truth) is randomly sampled between 1 and 100. Regarding a transient source, its amplitude is a discrete function A(t_{i}) sampled from the continuous functions A(t). For each source, this profile is randomly chosen between the following models: gate, gaussian, and skewnormal. Their parameters are randomly chosen, and the transient source’s maximum amplitude is randomly chosen between [50, 100]. Examples of temporal profiles are shown in Sect. 6, Fig. 6. Our neural network systems consider normalized pixel values (i.e., the GT cube’s pixel values are divided by 100) so that they lie between 0 and 1.
Fig. 3 PSF test cube chosen for evaluation. The text in each panel reports: time step, elevation in degrees, azimuth in degrees. Each PSF is normalized to 1 (black). The gray color scale was reversed for clarity. The apparent rotation of the PSF with time is the result of the projection of the interferometer aperture toward the source direction, which depends on the interferometer location, declination of the source, and observing time (see Fig. A.1 for an illustration). 
Fig. 4 Histogram of distribution of elevations in the training PSF set. The distribution is skewed toward the south celestial pole where circumpolar sources can be observed from MeerKAT. 
Fig. 5 PCA eigenvectors for the first ten largest eigenvalues computed over the whole PSF set. 
5.2 Training procedure
From each GT cube in the training, validation, and test sets, the corresponding dirty cube is generated based on Eq. (3). Minibatch gradient descent with a batch size of 4 is used for training. For each example in the batch, a train PSF cube is randomly picked, and the noise level σ_{ϵ} is randomly sampled from [0,6]. Data augmentation with random flipping and/or transposition was performed. The learning rate is set to 10^{−4}, and we train our models for 100 epochs each. The loss function is the pixelwise meansquared error (MSE) between GT and estimated cubes. Vafaei Sadr et al. (2019) claimed that if GT images are skies with point sources, then the MSE loss function would be suboptimal and instead proposed to smooth the point sources. However, our study obtained satisfactory empirical results with this pixelwise MSE between GT and estimated cubes.
5.3 Evaluation metrics
Given a constant or transient source at a certain location in a GT cube X, we can first define a subcube obtained by locally cropping the cube around the source with a patch size ρ × ρ (ρ = 3 in our study, defining the region 𝒟_{s}). We can also extract a subcube at the same location for the estimated cube. We can then compute the root MSE (RMSE) between the two subcubes. This error quantifies the fidelity of the restored temporal profile. The GT subcube norm can normalize this value. We denote as NRMSE_{S} this error for the source S.
To measure the input signal quality, for each source s in the dirty cube {y_{t}}_{t∈I} with location (i_{s}, j_{s}) we evaluate its signaltonoise ratio S/N_{s}. We adopt the following definition: (5)
Here τ refers to the temporal localization of the transient. It corresponds to the time step when the amplitude of the transient source is maximum in the GT cube. For a constant source, we set τ = 15. 𝒟_{s} is the local patch of size 3 × 3 centered on the true source location (i_{S}, j_{S}). The parameter σ_{b} denotes the background noise estimated on y_{τ} after excluding 𝒟_{S}. In radio images, Peak S/N is the ratio of peak flux density (usually a single pixel) of the source to the local noise root mean square. Here, to absorb pixel gridding bias, the source flux is accounted over the region 𝒟_{S}. This ensures that all the recovered flux density of the source is properly accounted for in the presence of noise. This error could lead to detrimental and unfair representations of light curve reconstruction with different methods.
Moreover, to measure the performance of background denoising (i.e., restoration of the empty region of the sky) we exclude all 𝒟_{S} subcubes for a GT cube. We operate the same procedure on the corresponding estimated cube and compute the RMSE between the two. We note this metric RMSE_{noise}.
We compare the proposed algorithms for the cube restoration against framebyframe CLEAN deconvolution of the dirty cubes. To avoid biasing the final result during the CLEAN final restoring beam step, we considered only the detected CLEAN components associated with the detected sources. PSF cube and noise level are provided to CLEAN in a nonblind manner. They intervene in defining a threshold from which the algorithm iteration stops. Furthermore, to prevent bias on the use of CLEAN due to a highnoise environment of the dirty cube, we stopped CLEAN based on a maximum number of iterations and, as for other reconstruction methods, also considered only counting the flux density around the source location in 𝒟_{S}.
Fig. 6 Reconstructions of temporal profiles related to some transient sources in the test cubes. The horizontal and vertical axes indicate the time step and normalized amplitude for each subpanel. The figure compares methods at different noise levels σ_{ϵ}. The names of the sources are, from left to right: source 1, source 2, source 3, source 4, and source 5. 
6 Results
6.1 Fixed test PSF cube and varying noise
We picked a PSF test cube and evaluated the methods on the test sky cubes with the following noise levels: σ_{ϵ} ∈ {0,1,2,3,4,5,6}. The injected noise levels were selected to range from low noise level cases, where the constant and transient sources are readily detectable in the dirty cubes, to high noise level cases, where no sources can be seen (such a high noise level is displayed on the first line of Fig. 7). Figure 3 shows the PSF test cube that has been picked. We see that the PSF rotates with time.
Figure 6 compares the reconstruction of temporal profiles related to several transient sources in the test cubes. The figure compares methods at different noise levels σ_{ϵ}. We observe that DLbased methods produce better reconstructions than CLEAN. With increasing noise level σ_{ϵ}, the performance of CLEAN rapidly degrades, whereas that of DLbased methods remains good. The plots of Fig. 6 related to source 1 illustrate this. Moreover, even in the presence of noise, DLbased methods provide a better restoration and preservation of the start and end dates of the transient signal. This is granted by the temporal modeling capability of these methods. Conversely, in the case of CLEAN, the restored start and end dates of the transient is distorted because of the noise in single frames. The plots related to source 5 at noise level σ_{ϵ} = 3 and σ_{ϵ} = 4 illustrate these statements regarding the end of the transient signal. Among DLbased methods, Deflation Net seems better for restoring transient signals at a highnoise regime. The plots of source 3 depict this. This suggests that the architectural choice of Deflation Net decoupling constant and transient source reconstructions is appropriate with respect to the inverse problem. Even at the highest noise level σ_{ϵ} = 6, Deflation Net can restore the end of the transient signal. Figure 7 visualizes source 3 reconstructed from different methods, at noise level σ_{ϵ} = 4. We see that DLbased methods more efficiently reconstruct the transient source than CLEAN. Moreover, Deflation Net performs the best in reconstructing the end of the transient source. These results are particularly important whenever one telescope wants to react quickly upon detecting a potential transient. A transient could be detected on a nearly realtime imaging pipeline that integrates our trained network as soon as its light curve rises. Alerts could therefore be created and distributed earlier and with more confidence concerning false detections.
As a matter of comparison and robustness, Fig. 8 compares reconstructions of temporal profiles for constant sources. As expected, DLbased methods systematically present reconstructions with higher fidelity. Moreover, they produce light curves with more stable amplitude variations than the CLEANbased method. We must ensure that the trained network correctly reconstructs the light curves of constant sources without making them look like flickering transient sources. Moreover, Deflation Net presents less varying reconstructed constant sources than 2D1D Net. Deflation Net is understood to provide a better reconstruction of constant sources for the following reasons. First, the noise level is reduced in the average input sky, and the average PSF is better conditioned than the individual PSF frame. Deconvolution of an average sky by the average PSF is thus easier for constant sources. Figures 9 and B.1 illustrate this by respectively displaying the average input sky for various noise levels and the average PSF. Second, the skip connections of Deflation Net (line 16 of Algorithm 2) allow reconstructions at different time steps to be distributed near the average deconvolved sky. Figure 10 compares the average variance of reconstructed constant sources over the test cubes at different values of σ_{ϵ}. The figure confirms that DLbased methods reconstruct constant sources with better fidelity than CLEAN, and that Deflation Net produces the lowest variances concerning constant sources. On average, constant sources reconstructed by 2D1D Net present a variance ~3.0 times smaller than those reconstructed by CLEAN. Deflation Net’s variance is ~8.6 times smaller than CLEAN’s.
After aggregating results of all inferences with all of the noise levels σ_{ϵ} ∈ {0,1,2,3,4,5,6}, the upper subpanel of Fig. 11 shows NRMSEs averaged over sources belonging to the same S/N interval delimited by deciles in S/N_{s}. We observe that DLbased methods generally perform better than CLEAN, except when the S/N_{s} is very high (S/N_{s} > 197.5). This case indeed corresponds to that of σ_{ϵ} = 0, for which CLEAN is optimal. In other cases, DLbased methods, on average, perform better than CLEAN. By comparing positions of error bars indicating standard deviations, we observe that in many cases (S/N_{s} between 40.5 and 128.6), DLbased methods present inhomogeneous performance over different sources, but their worst performance is statistically still better than the best performance of CLEAN. At a highnoise regime (S/N_{s} below 40.5), DLbased methods perform better than the CLEANbased method on average. Within the DLbased methods, Deflation Net performs better than 2D 1D Net on average in most noisy cases (S/N between 0 and 86). On average, CLEAN presents NRMSE_{S} values that are ~2.4 times higher than 2D1D Net and ~2.7 times higher than Deflation Net.
Figure 7 illustrates the high performance of DLbased methods in background denoising. We observe that they effectively restore the empty sky around the sources. This is less the case for the framebyframe CLEAN method, which captures noise and generates residual noisy pixel distributions around the true source. Figure 12 compares RMSE_{noise}, averaged over the test cubes at different values of σ_{ϵ}. We observe that for all values of σ_{ϵ}, DLbased methods better suppress background noise than CLEAN. Within the DLbased methods, Deflation Net better restores the background sky because the noise level is reduced and the PSF is better conditioned in averaged input sky. On average, CLEAN presents RMSE_{noise} values that are ~1.8 times higher than the 2D1D Net values and ~2 times bigger than the Deflation Net values.
Fig. 7 Visualization of source 3 in Fig. 6 reconstructed from different methods, at noise level σ_{ϵ} = 4. 
Fig. 8 Reconstructions of temporal profiles of constant sources. The horizontal and vertical axes indicate the time step and normalized amplitude for each subpanel. 
Fig. 9 Constant source in the input degraded sky between time steps 0 and 6, and average of the input skies over all the time steps. First row: noise level σ_{ϵ} = 2. Second row: σ_{ϵ} = 3. The noise level is reduced in the averaged sky. 
Fig. 10 Average variance of reconstructed light curves of constant sources over the test cubes at different values of the noise level σ_{ϵ}. The variances are computed on cubes normalized between 0 and 1. The vertical bars show the standard deviation. 
6.2 Varying test PSF cube and fixed noise
In this section we set σ_{ϵ} = 3 and aggregate results over all of the PSF test cubes.
Figure 13 shows average NRMSE_{S} for sources in the test set for ten equally spaced bins of PSF elevations. The apparent envelope for each method delimits the first and the last ten percentiles. We see that for all bins, our DLbased methods, on average, outperform CLEAN in large margins. This is especially true for Deflation Net, which presents the last ten percentiles near the first ten percentiles of CLEAN in most cases (elevations > 41 degree). Among the DLbased methods, here again, in most cases Deflation Net performs better on average than 2D1D Net (elevations between 27 and 82 degrees). This confirms that the Deflation Net architecture efficiently tackles the inverse problem. On average, CLEAN presents NRMSE values that are ~2.6 times bigger than 2D1D Net and ~3.2 times bigger than Deflation Net. Furthermore, in most cases (elevations > 34 degrees), the envelope of Deflation Net is smaller than those of 2D1D Net and CLEAN. Therefore, Deflation Net brings less error dispersion in source light curve reconstruction than other methods. This can be explained by its skip connections that set reconstructions at different time steps near the average deconvolved sky. Our methods perform worse at elevations between 20 and 27 degrees and better at elevations between 82 and 89 degrees. Between these bounds, their performance seems to increase with increased elevation. This is coherent: the closer to 90 degrees the elevation, the better conditioned the PSF is, and therefore the easier it is to deconvolve. Conversely, lower elevation means that the PSF is worse conditioned and undergoes a strong projection effect, making the deconvolution task harder.
Second, after aggregating the results of all inferences with all the test PSF cubes over the test sky cubes, the lower subpanel of Fig. 11 shows NRMSE_{s} averaged over sources belonging to the same S/N interval delimited by deciles in S/N_{s}. We observe that DLbased methods perform better on average than CLEAN for all intervals of S/N_{s}. We can state the following by observing the black whiskers indicating standard deviations around mean values: in most cases, DLbased methods present inhomogeneous performance, but their worst performance is statistically still better than the best performance of CLEAN. Within the DLbased methods, Deflation Net performs slightly better than 2D1D Net when S/N_{s} is higher than 44, and outperforms 2D1D Net when S/N_{s} is below 44. This confirms again that the Deflation Net architecture is appropriate regarding the inverse problem. On average, 2D1D Net presents NRMSE_{S} that is 3.10 times smaller than that of CLEAN. Deflation Net shows NRMSE_{S} values that are 3.50 times smaller than that of CLEAN.
Fig. 11 Average source reconstruction performance of 2D1D Net, Deflation Net, and CLEAN, over the test sky cubes with (upper subpanel) a fixed test PSF cube and the noise levels σ_{ϵ} ∈ {0,1,2,3,4,5,6); (lower subpanel) a fixed noise level σ_{ϵ} = 3 and the 50 test PSF cubes. Regarding all of these inferences, sources in the test sky cubes have different input S/N_{s}. These S/N_{s} values can be divided into ten deciles. This figure visualizes NRMSE_{S} averaged over sources in each decile. The black whiskers indicate the standard deviations. 
Fig. 12 log(RMSE_{noise}) averaged over the test cubes at different values of σ_{ϵ}. The vertical bars show the standard deviation. Because of the log scale, the standard deviation is higher when RMSE_{noise} is small. This is the case for the DLbased methods. 
Fig. 13 Mean NRMSE_{S} for sources in the test set for ten equally spaced bins of PSF pointing elevations. The envelope for each method delimits the first and the last ten percentiles. 
Fig. 14 Reconstructions of temporal profiles of source 3 and source 5 (also shown in Fig. 6) at different noise levels σ_{ϵ}, from the normalized test cubes. 
6.3 Importance of temporal modeling
In this subsection we evaluate the benefit brought by the 1D Net in 2D1D Net. This measures to what extent capturing the temporal structure of a signal increases the signal reconstruction performance. To do so, we compare the performance of CLEAN, 2D1D Net, and 2D Net. The last is a variant of 2D1D Net from which we excluded the 1D Net part, and we made the last 2D convolutional layer of 2D Net output a onechannel image at each time step. This mode reduces our method to a classic imager that forms a single image on timeintegrated data. Figure 14 compares the reconstructions of temporal profiles of some transient sources in the test cubes with the fixed PSF used in Sect. 6.1 and illustrated in Fig. 3. As expected, we observe that with increasing noise levels, 2D Net rapidly loses the capability to reconstruct the ends of the light curve, which lodge at the noise level. This drawback is shared with CLEAN. On the contrary, 2D1D Net succeeds in restoring them properly. Figure 15 visualizes source 5 reconstructed from different methods, at noise level σ_{ϵ} = 3. We see that 2D1D Net reconstructs the transient profile more efficiently down to the disappearance. 2D Net fails to reconstruct it, similarly to CLEAN. These figures illustrate that, despite the noise, 1D Net allows correctly reconstructing the beginning and the end of a transient event by enabling temporal modeling.
The upper subpanel of Fig. 16 aggregates inference results on the test cubes with the fixed PSF cube illustrated in Fig. 3 and the noise levels σ_{ϵ} ∈ {0,1,2,3,4,5,6}. The lower subpanel of Fig. 16 aggregates inference results on the test cubes with the fixed noise level σ_{ϵ} = 3 and the 45 PSF test cubes used in Sect. 6.2. Both figures compare NRMSE_{s} averaged over sources of the same interval delimited by deciles in S/N_{s}. On average, we observe that 2D Net performs slightly better than CLEAN in terms of NRMSE_{s}. 2D1D Net significantly outperforms both methods and presents on average ~2.3 times smaller NRMSE_{s} than 2D1D Net and CLEAN. This exemplifies the benefit brought by the temporal modeling capability of 1D Net.
Even if 2D Net performs similarly to CLEAN based on NRMSE_{s}, the former performs better regarding background denoising. This is shown in Fig. 17, which compares RMSE_{noise}, averaged over the test cubes at different values of σe and the PSF illustrated in Fig. 3. For all values of σ_{ϵ}, 2D Net better suppresses background noise than CLEAN by large margins. This shows that even if 2D Net does not extract temporal features, its convolutional layers enable highly efficient denoising. 2D1D Net performs even better than 2D Net regarding this background denoising task. This shows that the temporal modeling capability of 1D Net also contributes to increasing the background denoising performance. On average, 2D Net presents RMSE_{noise} that are ~1.5 times smaller than that of CLEAN. 2D1D Net presents RMSE_{noise} that are ~1.2 times smaller than 2D Net. Figure 15 illustrates these observations. Especially at the end of the transient event, framebyframe CLEAN captures noise and generates residual noisy pixels around the true source. 2D Net generates less noisy pixels (it only generates a noisy pixel at time step t = 10). 2D1D Net does not generate any obvious noisy pixel.
Fig. 15 Visualization of source 5 in Fig. 14 reconstructed from different methods, at noise level σ_{ϵ} = 2. 
7 Discussion and limitations
In this section, we discuss some relevant points, regarding our methods. The first point concerns the support size of the frame. In our inverse problem, we approximate the aperture synthesis method as linear degradation embodied by a multiplication of the true data with a mask in the Fourier domain. Moreover, we carried out our study in the gridded space where all quantities are discrete set with constant frame support size. The mask, therefore, has the same support size as the sky. All frame support sizes between the sky, the mask, and the PSF cubes are linked. Therefore, if one wants to change the support size, new training of the networks is required. This is contrary to some other DLbased image or video restoration problems where the linear degradation is defined by a convolution rather than by a mask. In this case, a trained neural network can deconvolve an image of any size. Regarding our networks, as only the 2D Net module operates deconvolution, we should only retrain this module, and 1D Net can be frozen. From a dataset of ungridded visibilities, it is up to the scientist to decide which support size fits the scientific objective of the data. It should be trained on a support size that corresponds to the final products used (e.g., image catalog of a survey, transient detection pipeline).
Secondly, the total duration of transient events is worth considering. The total observation period and the number of temporal frames (which defined a slice PSF of the PSF cube) determine the upper limits on the entire duration and the shortest timescale of transient events we deal with. As an illustration, for a fixed number of PSF frames over the total observation period, an increase in the latter will increase the integration time of each PSF frame. Thus, if the total observation period is more extended, with a fixed number of PSFs, each PSF appears smoother and presents fewer secondary lobes. Therefore, each time the total observation period changes, our networks should be theoretically retrained with new PSF cubes. We can suppose that the profiles of transient events in the train sky cube are diverse enough to be representative of real transients that can be encountered. In this case, we can keep the same train sky set.
Conversely, if we suppose the length of the observation increases for a fixed PSF time integration, the third dimension of the cube increases, but each PSF will share the same characteristics as before. In that case, we should only retrain ID Net, and 2D Net can be frozen as its role is to deconvolve PSFs as was done previously.
Computational complexity is another discussion point. Once trained, our networks present faster reconstruction than framebyframe CLEAN because, contrary to CLEAN, they are not iterative algorithms and can benefit from efficient GPUbased architectures. This aspect makes them attractive to the radio astronomy community. On our Intel I910940X CPU and NVIDIA TITAN RTX GPU, 2D1D Net reconstructs the sky cube of 32 frames in ~65 ms. This speed is ~75 ms for Deflation Net. Both of them contain 0.1 M parameters.
Regarding theoretical guarantee, one of the drawbacks of our DLbased methods is that they lack mathematical frameworks allowing us to derive the theoretical error bounds or uncertainty beyond empirical statistics on the results.
It is also possible to extend our proposed methods to the polychromatic case, where the inverse problem also has a frequency dependency. In this case, the dirty, GT, and PSF cubes are fourdimensional data structures, adding the frequency dimension. A simple way to realize this extension is (i) to stack several skies at different frequencies in the channel dimension at the input of the 2D Net module and (ii) to stack several PSFs at different frequencies in the channel dimension at the input of the SFT layer. The computation of the average sky and PSF, and the subtraction step in the Deflation Net should be done for each frequency.
Our neural network systems considered normalized pixel values; the ground truth images’ pixel values were divided by 100 (i.e., they lie between 0 and 1). In the same spirit, when facing different pixel intensity ranges occurring in other types of datasets, for example the real one, a minmax normalization allows us to bring our methods into conditions similar to those that are tackled in this paper. The neural networks can eventually be retrained.
Finally, our proposed methods can easily be plugged into modern imagers, such as those based on widefield imaging using projection (e.g., WSCLEAN; Offringa et al. 2014) or those based on faceting (e.g., DDFacet; Tasse et al. 2018). These imagers address directiondependent effects corrections (i.e., the correction of the widefield projection effects or the noncoplanar arrays) and its counterpart (killMS) takes care of the calibration regarding these effects. While this is not in the scope of our paper, making a calibration tool based on CNNs is also an interesting topic to consider. It would require a full recasting of the problem in the framework of the Radio Interferometer Measurement Equation (RIME; Smirnov 2011), including the relevant Jones matrices and using ungridded visibilities directly. We leave these considerations for future work.
Fig. 16 Average source reconstruction performance of 2D1D Net, 2D Net, and CLEAN, demonstrating the importance of temporal modeling. These methods are evaluated on the test sky cubes with (upper subpanel) a fixed test PSF cube and the noise levels σ_{ϵ} ∈ {0,1, 2,3,4,5, 6); (lower subpanel) a fixed noise level σ_{ϵ} = 3 and the 50 test PSF cubes. Regarding all of these inferences, sources in the test sky cubes have different input S/N_{s}. These S/N_{s} values can be divided into ten deciles. This figure visualizes NRMSE_{S} averaged over sources in each decile. The black whiskers indicate standard deviations. 
Fig. 17 log(RMSE_{noise}) averaged over the test cubes at different values of σ_{ϵ}. The vertical bars show standard deviation. Because of the log scale, the standard deviation is higher when RMSE_{noise} is small. This is the case for the DLbased methods. 
8 Conclusion
In this study we dealt with transient source reconstruction in the context of radio interferometric imaging. We formulated this problem as a deconvolution of image time series. We proposed two neural networks to address this task, namely 2D1D Net and Deflation Net. Thanks to the SFT layer, they can handle multiple PSFs that vary depending on the observed sky positions. They involve the same submodules: 2D Net and 1D Net. The former deconvolves individual frames and the latter enables temporal sky modeling. 2D1D Net is based on a simple feedforward inference, whereas Deflation Net involves different computational flows. The latter restores the average sky and uses it to isolate transient sources in individual frames. Experiments based on simulated data and metrics measuring temporal profile reconstruction and background denoising demonstrated superior performance of these DLbased methods over CLEAN in the presence of noise. Deflation Net performs the best, excelling in reconstructing constant sources and background denoising. The ablation study confirms the temporal modeling enabled by 1D Net significantly increases the sky cube reconstruction performance.
Learning methods adapted to deconvolution, temporal structures of transient sources, and specificities of instrumental response are key elements to efficiently analyzing the images obtained via radio interferometers in the SKA era. For instance, the raw sensitivity of MeerKAT enables deep imaging (i.e., a typical σ ~ 10 µJy in 15 min) as well as high cadence imaging capabilities in a large field of view. Being able to deconvolve the data in highnoise regimes efficiently will maximize the chance to discover new transients and overcome the limitations imposed by current deconvolution methods. Missing transients in the image plane can be due to the lack of sensitivity (i.e., detection problem) or the lack of sufficient temporal sampling (i.e., dilution problem) that averages out shortscale transients. The robust 2D and 1D image reconstruction brought by the trained networks introduced in this study has significantly improved the 3D estimation of the sky. Replacing classical framebyframe deconvolution methods with the DLbased reconstruction methods can lead to better use of telescope time. This will drastically reduce the required exposure time while enabling a faster temporal sky sampling.
Acknowledgements
This work is supported by the European Community through the grant ARGOS (contract no. 101094354) and by Safran Electronics & Defense.
Appendix A Coordinates of an astrophysical source
Fig. A.1 Astrophysical source associated with a unique pair of coordinates (declination δ and right ascension α) on the celestial sphere. Figure adapted from Astronomy (2009). 
Appendix B Average PSF over the total observation duration
References
 Abrantes, F., Lopes, C., Rodrigues, T., et al. 2009, Geochem. Geophys. Geosyst., 10, Q09U07 [Google Scholar]
 Adler, J., & Öktem, O. 2018, IEEE Trans. Med. Imaging, 37, 1322 [CrossRef] [Google Scholar]
 Arjomand Bigdeli, S., Zwicker, M., Favaro, P., & Jin, M. 2017, in Advances in Neural Information Processing Systems, eds. I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (USA: Curran Associates, Inc.), 30 [Google Scholar]
 Astronomy, U. 2009, CelestialEquatorial (RA/Dec) Demonstrator (USA: NASA) [Google Scholar]
 Bergeaud, F., & Mallat, S. 1995, Proc. Int. Conf. Image Process., 1, 53 [CrossRef] [Google Scholar]
 Booth, R., & Jonas, J. 2012, African Skies, 16, 101 [NASA ADS] [Google Scholar]
 Carin, L. 2009, IEEE Antennas Propag. Magazine, 51, 72 [CrossRef] [Google Scholar]
 Carrillo, R. E., McEwen, J. D., & Wiaux, Y. 2012, MNRAS, 426, 1223 [Google Scholar]
 Chan, S. H., Wang, X., & Elgendy, O. A. 2017, IEEE Trans. Comput. Imaging, 3, 84 [CrossRef] [Google Scholar]
 Chiche, B. N., FronteraPons, J., Woiselle, A., & Starck, J.L. 2020, in 2020 Tenth International Conference on Image Processing Theory, Tools and Applications (IPTA), IEEE, 1 [Google Scholar]
 Clark, B. 1980, A&A, 89, 377 [NASA ADS] [Google Scholar]
 Dabbech, A., Ferrari, C., Mary, D., et al. 2015, A&A, 576, A7 [CrossRef] [EDP Sciences] [Google Scholar]
 Dewdney, P. E., Hall, P. J., Schilizzi, R. T., & Lazio, T. J. L. W. 2009, IEEE Proc., 97, 1482 [Google Scholar]
 Diamond, S., Sitzmann, V., Heide, F., & Wetzstein, G. 2017, ArXiv eprints [arXiv:1705.08041] [Google Scholar]
 Dong, C., Loy, C. C., He, K., & Tang, X. 2015, IEEE Trans. Pattern Anal. Mach. Intell., 38, 295 [Google Scholar]
 Frieden, B. R. 1972, J. Opt. Soc. Am., 62, 511 [NASA ADS] [CrossRef] [Google Scholar]
 Garsden, H., Girard, J., Starck, J.L., et al. 2015, A&A, 575, A90 [CrossRef] [EDP Sciences] [Google Scholar]
 Girard, J. N., Garsden, H., Starck, J. L., et al. 2015, J. Instrum., 10, C08013 [CrossRef] [Google Scholar]
 Gu, J., Lu, H., Zuo, W., & Dong, C. 2019, in Proceedings of the IEEE conference on computer vision and pattern recognition, 1604 [Google Scholar]
 Gupta, H., Jin, K. H., Nguyen, H. Q., McCann, M. T., & Unser, M. 2018, IEEE Trans. Medical Imaging, 37, 1440 [CrossRef] [Google Scholar]
 Högbom, J. 1974, A&AS, 15, 417 [Google Scholar]
 Jiang, M., Girard, J. N., Starck, J.L., Corbel, S., & Tasse, C. 2015, Eur. Signal Process. Conf., 23, 1646 [Google Scholar]
 Kim, J., Kwon Lee, J., & Mu Lee, K. 2016, in Proceedings of the IEEE conference on computer vision and pattern recognition, 1646 [Google Scholar]
 Lorimer, D. R., Bailes, M., McLaughlin, M. A., Narkevic, D. J., & Crawford, F. 2007, Science, 318, 777 [Google Scholar]
 Meinhardt, T., Moller, M., Hazirbas, C., & Cremers, D. 2017, in Proceedings of the IEEE International Conference on Computer Vision, 1781 [Google Scholar]
 Monga, V., Li, Y., & Eldar, Y. C. 2021, IEEE Signal Process. Magazine, 38, 18 [CrossRef] [Google Scholar]
 Nah, S., Timofte, R., Gu, S., et al. 2019, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 1985 [CrossRef] [Google Scholar]
 Nammour, F., Akhaury, U., Girard, J. N., et al. 2022, A&A 663, A69 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Narayan, R., & Nityananda, R. 1986, ARA&A, 24, 127 [Google Scholar]
 Offringa, A., McKinley, B., HurleyWalker, N., et al. 2014, MNRAS, 444, 606 [Google Scholar]
 Pustelnik, N., BenazzaBenhayia, A., Zheng, Y., & Pesquet, J.C. 2016, Wiley Encyclopedia of Electrical and Electronics Engineering (Hoboken: Wiley) [Google Scholar]
 Rau, U., & Cornwell, T. J. 2011, A&A, 532, A71 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Reehorst, E. T., & Schniter, P. 2018, IEEE Trans. Comput. Imaging, 5, 52 [Google Scholar]
 Romano, Y., Elad, M., & Milanfar, P. 2017, SIAM J. Imaging Sci., 10, 1804 [CrossRef] [Google Scholar]
 Scaife, A. 2020, Phil. Trans. R. Soc. A, 378, 20190060 [CrossRef] [Google Scholar]
 Schmidt, K., Geyer, F., Fröse, S., et al. 2022, A&A 664, A134 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Schwab, F. 1984, AJ, 89, 1076 [NASA ADS] [CrossRef] [Google Scholar]
 Smirnov, O. M. 2011, A&A, 527, A106 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Sreehari, S., Venkatakrishnan, S. V., Wohlberg, B., et al. 2016, IEEE Trans. Comput. Imaging, 2, 408 [Google Scholar]
 Sureau, F., Lechat, A., & Starck, J.L. 2020, A&A, 641, A67 [EDP Sciences] [Google Scholar]
 Swinbank, J. D., Staley, T. D., Molenaar, G. J., et al. 2015, Astron. Comput., 11, 25 [NASA ADS] [CrossRef] [Google Scholar]
 Tasse, C., Hugo, B., Mirmont, M., et al. 2018, A&A, 611, A87 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Terris, M., Dabbech, A., Tang, C., & Wiaux, Y. 2023, MNRAS, 518, 604 [Google Scholar]
 Tuccillo, D., HuertasCompany, M., Decencière, E., et al. 2018, MNRAS, 475, 894 [NASA ADS] [CrossRef] [Google Scholar]
 Vafaei Sadr, A., Vos, E. E., Bassett, B. A., et al. 2019, MNRAS, 484, 2793 [CrossRef] [Google Scholar]
 van Haarlem, M. P., Wise, M. W., Gunst, A., et al. 2013, A&A, 556, A2 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Venkatakrishnan, S. V., Bouman, C. A., & Wohlberg, B. 2013, in 2013 IEEE Global Conference on Signal and Information Processing, IEEE, 945 [CrossRef] [Google Scholar]
 Wang, X., Yu, K., Dong, C., & Change Loy, C. 2018, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 606 [Google Scholar]
 Wernecke, S. J. 1977, Rad. Sci., 12, 831 [NASA ADS] [CrossRef] [Google Scholar]
 Wiaux, Y., Jacques, L., Puy, G., Scaife, A. M., & Vandergheynst, P. 2009, MNRAS, 395, 1733 [NASA ADS] [CrossRef] [Google Scholar]
 Wilson, T. L., Rohlfs, K., & Hüttemeister, S. 2009, Tools of Radio Astronomy, 5 (Berlin: Springer) [Google Scholar]
 Xu, L., Ren, J. S., Liu, C., & Jia, J. 2014, in Advances in Neural Information Processing Systems, eds. Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, & K. Weinberger (USA: Curran Associates, Inc.), 27 [Google Scholar]
 Zhang, K., Zuo, W., Chen, Y., Meng, D., & Zhang, L. 2017a, IEEE Trans. Image Process., 26, 3142 [Google Scholar]
 Zhang, K., Zuo, W., Gu, S., & Zhang, L. 2017b, in IEEE Conference on Computer Vision and Pattern Recognition, 3929 [Google Scholar]
 Zhang, K., Zuo, W., & Zhang, L. 2018, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3262 [Google Scholar]
 Zhang, K., Gool, L. V., & Timofte, R. 2020, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3217 [Google Scholar]
All Tables
2DID Net. C_{r} denotes the operation that centercrops a 2D structure with crop size r.
Deflation Net. C_{r} denotes the operation that centercrops a 2D structure with crop size r.
All Figures
Fig. 1 2D Net. Each convolutional layer outputs ƒ feature maps, ƒ = 32 in our study. The kernel size of each convolutional layer is set to 3 × 3. 

In the text 
Fig. 2 ID Net. {z_{t}}_{t∈I} has dimensions f×T× height × width. Each convolutional layer outputs ƒ feature maps, except the last, which outputs images with one channel each. Each ID convolutional layer has kernel size 5. 

In the text 
Fig. 3 PSF test cube chosen for evaluation. The text in each panel reports: time step, elevation in degrees, azimuth in degrees. Each PSF is normalized to 1 (black). The gray color scale was reversed for clarity. The apparent rotation of the PSF with time is the result of the projection of the interferometer aperture toward the source direction, which depends on the interferometer location, declination of the source, and observing time (see Fig. A.1 for an illustration). 

In the text 
Fig. 4 Histogram of distribution of elevations in the training PSF set. The distribution is skewed toward the south celestial pole where circumpolar sources can be observed from MeerKAT. 

In the text 
Fig. 5 PCA eigenvectors for the first ten largest eigenvalues computed over the whole PSF set. 

In the text 
Fig. 6 Reconstructions of temporal profiles related to some transient sources in the test cubes. The horizontal and vertical axes indicate the time step and normalized amplitude for each subpanel. The figure compares methods at different noise levels σ_{ϵ}. The names of the sources are, from left to right: source 1, source 2, source 3, source 4, and source 5. 

In the text 
Fig. 7 Visualization of source 3 in Fig. 6 reconstructed from different methods, at noise level σ_{ϵ} = 4. 

In the text 
Fig. 8 Reconstructions of temporal profiles of constant sources. The horizontal and vertical axes indicate the time step and normalized amplitude for each subpanel. 

In the text 
Fig. 9 Constant source in the input degraded sky between time steps 0 and 6, and average of the input skies over all the time steps. First row: noise level σ_{ϵ} = 2. Second row: σ_{ϵ} = 3. The noise level is reduced in the averaged sky. 

In the text 
Fig. 10 Average variance of reconstructed light curves of constant sources over the test cubes at different values of the noise level σ_{ϵ}. The variances are computed on cubes normalized between 0 and 1. The vertical bars show the standard deviation. 

In the text 
Fig. 11 Average source reconstruction performance of 2D1D Net, Deflation Net, and CLEAN, over the test sky cubes with (upper subpanel) a fixed test PSF cube and the noise levels σ_{ϵ} ∈ {0,1,2,3,4,5,6); (lower subpanel) a fixed noise level σ_{ϵ} = 3 and the 50 test PSF cubes. Regarding all of these inferences, sources in the test sky cubes have different input S/N_{s}. These S/N_{s} values can be divided into ten deciles. This figure visualizes NRMSE_{S} averaged over sources in each decile. The black whiskers indicate the standard deviations. 

In the text 
Fig. 12 log(RMSE_{noise}) averaged over the test cubes at different values of σ_{ϵ}. The vertical bars show the standard deviation. Because of the log scale, the standard deviation is higher when RMSE_{noise} is small. This is the case for the DLbased methods. 

In the text 
Fig. 13 Mean NRMSE_{S} for sources in the test set for ten equally spaced bins of PSF pointing elevations. The envelope for each method delimits the first and the last ten percentiles. 

In the text 
Fig. 14 Reconstructions of temporal profiles of source 3 and source 5 (also shown in Fig. 6) at different noise levels σ_{ϵ}, from the normalized test cubes. 

In the text 
Fig. 15 Visualization of source 5 in Fig. 14 reconstructed from different methods, at noise level σ_{ϵ} = 2. 

In the text 
Fig. 16 Average source reconstruction performance of 2D1D Net, 2D Net, and CLEAN, demonstrating the importance of temporal modeling. These methods are evaluated on the test sky cubes with (upper subpanel) a fixed test PSF cube and the noise levels σ_{ϵ} ∈ {0,1, 2,3,4,5, 6); (lower subpanel) a fixed noise level σ_{ϵ} = 3 and the 50 test PSF cubes. Regarding all of these inferences, sources in the test sky cubes have different input S/N_{s}. These S/N_{s} values can be divided into ten deciles. This figure visualizes NRMSE_{S} averaged over sources in each decile. The black whiskers indicate standard deviations. 

In the text 
Fig. 17 log(RMSE_{noise}) averaged over the test cubes at different values of σ_{ϵ}. The vertical bars show standard deviation. Because of the log scale, the standard deviation is higher when RMSE_{noise} is small. This is the case for the DLbased methods. 

In the text 
Fig. A.1 Astrophysical source associated with a unique pair of coordinates (declination δ and right ascension α) on the celestial sphere. Figure adapted from Astronomy (2009). 

In the text 
Fig. B.1 Average PSF over the PSF cube in Figure 3. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.