Open Access
Issue
A&A
Volume 675, July 2023
Article Number A116
Number of page(s) 14
Section Astronomical instrumentation
DOI https://doi.org/10.1051/0004-6361/202245013
Published online 07 July 2023

© The Authors 2023

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

1 Introduction

Next-generation radio facilities like LOFAR (van Haarlem et al. 2013), MeerKAT/SKA (Booth & Jonas 2012), ASKAP/SKA (Dewdney et al. 2009), and SKA-LOW (Scaife 2020) allow for high spectral, high-rate, improved angular resolutions, and high instantaneous sensitivity. This is a notable improvement for studying transient radio sources via aperture synthesis using radio interferometers. These sources appear and disappear over time and can be randomly distributed in the sky. They are associated with high-energy physical phenomena (e.g., pulsars, rotating radio transients (RRATs), Solar System magnetized objects, and Lorimer-type bursts; Lorimer et al. 2007) and, more generally, fast radio bursts (FRBs). Searching for such sources in large datasets produced by these instruments is a new challenge that requires competitive and efficient signal reconstruction algorithms.

Radio interferometers enable imaging via aperture synthesis based on processing the correlations between each pair of antenna signals. In the first approximation, a radio interferometer samples noisy Fourier components of the sky associated with its spatial frequencies (i.e., the visibilities; Wilson et al. 2009) inside the main field of view of the instrument. Under the small-field approximation assumption, the sky can be approximated by computing the inverse Fourier transform of those Fourier samples. The number of baselines is limited, so the Fourier map is incomplete. Therefore, it is necessary to solve an “inpaint-ing” (Garsden et al. 2015) inverse problem (i.e., an estimate lacking information in the Fourier plane). Another option is to switch from the inpainting problem in the visibility space to its equivalent deconvolution problem in the image space. Decon-volution of radio images from a static sky in the context of radio interferometry has been the subject of studies for several decades. Notably, Högbom (1974) designed the original CLEAN algorithm, which is still the most widespread basis for newer deconvolution algorithms in the community. Several variants of this algorithm have been subsequently proposed (Schwab 1984; Clark 1980). Improvements taking into account source morphology (Abrantes et al. 2009), spectral dependencies (Rau & Cornwell 2011), and sparse representations (Garsden et al. 2015; Nammour et al. 2022; Girard et al. 2015; Dabbech et al. 2015; Carin 2009; Carrillo et al. 2012; Wiaux et al. 2009) have also been investigated in recent years. When the sky contains transient sources, classical detection methods rely on frame-by-frame image analysis (e.g., with the LOFAR Transient Pipeline; Swinbank et al. 2015). However, frame-by-frame transient detection is subject to two observing biases: (i) a detection issue when the frames are derived from too short time integration data displaying a high noise level and, conversely, (ii) a “dilution” problem when time integration is too long to resolve the transient in time, resulting in a time smearing of the transient. Therefore, to account for these biases and sources that possess coherent structure in time, methods should be designed that directly account for the time-coherent structure (i.e., the source light curve) hidden in the signal. This also occurs in large radio surveys with interferometers. Mapping the visible radio sky requires an optimal use of the observing time and pointing location to reach a target angular resolution and sensitivity. While surveys mainly address the distribution of static sources (astrometry and flux density), transient radio sources might also occur during the short exposure (e.g., ~15 min pointing of MeerKAT) and be missed due to observing biases (detection and dilution). Therefore, finding a robust deconvolution method is key to optimizing both telescope time and transient detectability.

Recently, deep learning (DL) has contributed to achieving far better performance in solving inverse problems, such as image and video super-resolution, deblurring, and denoising (Zhang et al. 2017a, 2018; Xu et al. 2014; Nah et al. 2019) and computed tomography (CT) and magnetic resonance imaging (MRI) reconstructions (Adler & Öktem 2018). In astrophysics, DL has recently been used for tasks such as parametric deconvolution (Tuccillo et al. 2018), optical image deconvolution (Sureau et al. 2020), and radio image deconvolution (Nammour et al. 2022; Terris et al. 2023) (and the equivalent inpainting in the Fourier domain; Schmidt et al. 2022). The success of neural networks in these tasks relies on their excellent capability to learn the image prior from data. Based on these premises, in this study we design novel DL architectures that improve radio image time-series deconvolution in the context of radio transient source reconstruction. The structure of the article is the following. In Sect. 2 we review studies related to ours. Then, the inverse problem formulation is presented in Sect. 3. We introduce new neural network architectures to solve the problem in Sect. 4. Then, the performance of the proposed networks is evaluated through simulations: Sect. 5 details experimental setups and Sect. 6 presents results. Section 7 discusses hypotheses and limitations. Section 8 concludes our work.

2 Related work

2.1 Deconvolution in radio interferometry

Two approaches have been classically used in radio interferom-etry deconvolution: the variational maximum entropy method (Frieden 1972; Narayan & Nityananda 1986; Wernecke 1977) and the iterative CLEAN algorithm (Högbom 1974). Both methods generally perform well when dealing solely with point sources, but CLEAN is the most widespread technique in the radio astronomy community. This algorithm supposes a finite number of point sources. It restores them based on matching pursuit (Bergeaud & Mallat 1995) using a single basis vector and the impulse response (hereafter point spread function, PSF) of the telescope that made the observation. Clark (1980) proposed a variant of CLEAN by optimizing the algorithm with fast Fourier transform (FFT) and structuring the algorithm computations between major and minor cycles. Minor cycles are carried out in (gridded) image space, whereas major cycles fall in the ungridded visibility space. Going back and forth between these two spaces led to improvements in both fidelity and accuracy. This strategy was further developed in Schwab (1984). Abrantes et al. (2009) and Rau & Cornwell (2011) brought in further improvements by respectively taking into account the morphological and spectral behavior of the sources. More recently, several teams have addressed the deconvolution problem within the compressed sensing framework (Garsden et al. 2015; Girard et al. 2015; Dabbech et al. 2015; Carin 2009; Carrillo et al. 2012; Wiaux et al. 2009; Jiang et al. 2015), and then DL (Terris et al. 2023). Schmidt et al. (2022) also used DL, but solved the equivalent inpainting problem in the Fourier space. However, most of these methods do not take into account the temporal structures of time-evolving sources in the context of image time-series deconvolution. The method presented in Jiang et al. (2015) is the only existing method that considered a temporal prior. However, this method is extremely slow in practice, and struggles when dealing with interferometric images with realistic dimensions. Moreover, it introduces many false detections in empty regions of the reconstructed skies. For these reasons, we do not consider it in our empirical study that is presented later in this paper.

2.2 Deep learning and inverse problems

DL has shown remarkable results in solving inverse problems in image and video restoration. In inverse problems, the aim is to reconstruct the original data x from some observed data y. They are related through a degradation model with the form y = Dα(Ax) (Pustelnik et al. 2016), where matrix A models linear transformations (such as blurring) and Dα models other degradations. This function is parameterized by α. First-generation networks solving inverse problems (Xu et al. 2014; Dong et al. 2015; Kim et al. 2016) approximate x by learning in a supervised manner the inverse function x^=N(y;θ)$\hat x = N\left( {y;\theta } \right)$ from the data y, where θ are parameters of the network. These approaches perform well when the degradation model has fixed parameters (i.e., A and α are unique across the train and test data). Their performance deteriorates in a multiple degradation scenario (i.e., when A and α vary across the train and test data). In the case of radio interfer-ometric observation, the blurring operator A depends on the time of each image and on the considered location in the sky, and thus changes between samples. Zhang et al. (2018) improved upon the first-generation networks by proposing a single convolutional neural network (CNN) that can handle multiple degradations. This network learns the mapping x^=N(y,A,α;θ)$\hat x = N\left( {y,A,\alpha ;\theta } \right)$ from the data. More recently, Gu et al. (2019) proposed another way to handle multiple degradations based on the spatial feature transform (SFT) layer introduced in Wang et al. (2018). This layer provides an affine transformation of its input feature maps conditioned on the knowledge about A and α.

Other strategies that can handle multiple degradations use iterative approaches, often derived from convex optimization algorithms, coupled with neural networks. We broadly distinguish two methods. One of them is Deep Unrolling, which consists in unfolding the iterative loop of a classical iterative algorithm with a given number of iterations and representing all operations as layers of a neural network (Zhang et al. 2020; Adler & Öktem 2018; Monga et al. 2021; Diamond et al. 2017; Chiche et al. 2020). The network can be trained and optimized based on data, while keeping the knowledge of the degradation in its internal structure. The network is notably factorized into a prior step that encodes learned image prior and a data step which leverages knowledge about the degradations. The other one is Plug & Play, which uses an iterative proximal algorithm, for instance in the framework of alternating direction method of multiplier plug & play (Venkatakrishnan et al. 2013; Sreehari et al. 2016; Chan et al. 2017) or regular-ization by denoising (Romano et al. 2017; Reehorst & Schniter 2018), and replaces the operator related to the prior by a denois-ing CNN (Meinhardt et al. 2017; Arjomand Bigdeli et al. 2017; Gupta et al. 2018; Terris et al. 2023) or a series of denoising CNNs trained in different settings (Zhang et al. 2017b).

However, the former can encounter a memory issue at training time if we want to unroll an important number of iterations. Moreover, its data step generally applies the linear operator A and its adjoint operator AT to the input signal, diluting the signal too much if the operator is strongly ill-conditioned. The latter approach shares a drawback with classical iterative algorithms like CLEAN: it is slow because it requires an important number of iterations to deconvolve an image.

3 Problem

3.1 Interferometry imaging problem

This study deals with imaging by aperture synthesis from inter-ferometric data. The limited number of antennas and observing baselines, time, and frequencies restrict the amount of accessible samples of the sky visibility function. In addition, these visibilities are subject to noise that can be modeled as an additive Gaussian noise in the first approximation. In a limited field of view and ignoring direction-independent or direction-dependent effects and calibration issues, this ill-posed inverse problem can be expressed in the Fourier space (i.e., the measurement space) as Vy=M(V+ϵ),${V_y} = M\left( {V + } \right),$(1)

where Vy is the collection of observed visibilities, V is the true visibility function, ϵ is approximated as an additive white Gaussian noise, and M is a sampling mask representing the limited access of an interferometer to the measurement (depending on antenna configuration and observational parameters).

Equation (1) can be rewritten as a deconvolution problem formulated in the image or direct space. This problem links the observed degraded sky image to the corresponding true sky image y=hx+η,$y = h * x + \eta ,$(2)

with −1(Vy) = y, −1(M) = h, −1(V) = x, and −1(M) * ℱ−1 (ϵ) = η, and where * denotes the convolution operation; y is the observed image, called the dirty image; h is the PSF, also called the dirty beam, which represents the sampling operation in Fourier space by the interferometer; and x is the ground truth (GT) image. We note that σϵ is the noise level of ϵ. Thus, the corresponding variance for η can be obtained from σ = ||M||2σϵ = ||(h)||2σϵ.

3.2 Extension to transient imaging

To enable robust imaging of transient sources, instead of solving frame by frame, which can be subject to observation bias, we extend the problem in Eq. (2) to a deconvolution problem accounting for the temporal dependency of the different terms (i.e., sky, noise, and instrument sampling): yt=htxt+ηt,tI={ t0,,tT1 }.${y_t} = {h_t} * {x_t} + {\eta _t},\quad t \in I = \left\{ {{t_0}, \ldots ,{t_{T - 1}}} \right\}.$(3)

Here −1 (Vy,t) = yt, −1 (Mt) = h−1 (Vt) = xt and −1 (Mt) * −1(ϵt) = ηt. The noise level of η, is σt, = ||Mt||2σϵ = ||(h)||2σϵ. By stacking {yt}t∈I, {xt}tI and {ht}tI in the temporal dimension, we respectively obtain a dirty cube, a GT cube and a PSF cube. These cubes are three-dimensional data structures denoted Y, X, and H, respectively; I is the set of T time steps ordered following the observation intervals. In the single image problem (2), h depends on the total time and frequency integration of observation, while the sky x is supposed to be static. As the sky naturally rotates over the instrument during the observation, the morphology of h depends on the interferometer location on Earth, declination of the source, and observing dates. In the dynamic imaging problem extension (3), the ht operator has a time dependency (i.e., the instrumental response varies for consecutive single observing time intervals). As a result, both the sky and the interferometer responses vary over time. The associated mask Mt samples the Fourier transform of the sky at different time dates, enabling the possibility of capturing the temporal evolution of the observed sky.

We assumed that the datacube X only contains point-like sources for simplification. We assume each source has an angular scale that is much smaller than the angular resolution brought by the PSF. In addition, we assume that the cube contains mixtures of sources with constant and varying flux densities over time. Their locations on the sky will be random but constant during the observation.

Because a radio interferometric dataset provides exact information on the baseline length and orientation for all samples, the morphology and time dependency of ht are known for all t. Finally, we control the noise level σϵ to mimic the various quality levels of the observations. Following the discussion in Sect. 2, we are thus in a multiple degradation scenario, as in Zhang et al. (2018).

In this study, for the sake of simplification, we focus on the monochromatic case where the observing frequency is fixed. We therefore do not deal with the dependency in frequency of the imaging problem.

4 Method

In the context of the time-series image deconvolution problem (3), a single image deconvolution method, such as CLEAN, can be independently applied in a frame-by-frame approach. However, this method does not capture the temporal structure of the sky. A method capable of dealing with the temporal evolution of both telescope and the sky is required. Given recent successes of neural networks in various restoration tasks, we propose to solve problem (3) based on DL. In our setting, a network realizes the following mapping: X^=N(Y,H,σϵ;θ)$\hat X = N\left( {Y,H,{\sigma _};\theta } \right)$. Its input contains a degraded cube and information about the degradation (H, σϵ) in a non-blind manner. The symbol θ denotes parameters of the network that are learned from training data {Yi,Xi,Hi, σϵ,i}. We propose two implementations of the network N, called 2D-1D Net and Deflation Net, respectively.

4.1 Multiple degradations

We adopt the following scheme to incorporate knowledge about the image formation model and handle multiple degradations.

Each ht in H is originally of size l × l (with l = 256 in our study; see Sect. 5), but is first center-cropped with crop size r×r(r = 96 in our study) and projected onto a b-dimensional linear space by a PCA projection matrix Pb×r2$P\, \in \,{^{b \times {r^2}}}$. The matrix Ρ is learned from all PSFs that constitute the training PSF cubes. In our empirical study (which is detailed in Sects. 5 and 6), the value b = 50 explains 90% of the total variance. The higher and the closer to 1 this percentage is, the better it is (for example, Zhang et al. 2018 preserved about 99.8% of the energy of their training blurring kernels with PCA), but a value of b higher than 50 makes our DL models (detailed in the following) too large in terms of number of parameters and provokes a memory error at training time. We denote as htb$h_t^b$ this projected PSF, and htb$h_t^b$ is then concatenated with σt = ||Mt||2σϵ = ||T(ht)||2 • σϵ We denote this vector htb+1$h_t^{b + 1}$

Algorithm 1

2D-ID Net. Cr denotes the operation that center-crops a 2D structure with crop size r.

4.2 2D- 1D Net

We decouple N into two modules that sequentially process the input data. The first module is a network with 2D convo-lutional layers that successively and independently encode each image in the degraded cube into feature maps. Each of these encodings considers the PSF and the noise level used in the degradation. This transformation performs a deconvolution of the degraded image. This module is referred to as 2D Net. After these independent deconvolutions, the produced feature maps are stacked in an additional temporal dimension and given to the second module. This module captures temporal structures within the stacked maps and estimates the GT cube. The temporal profile of a source is continuous in time, which justifies this architectural choice. Because point sources do not spatially move in our study, the temporal structure is extracted based on ID convolutions layers along the time dimension. We call this module ID Net. By unifying the two modules, we build the entire network 2D-ID Net.

Algorithm 1 summarizes the 2D-ID Net. In the first place, each pair (yt,htb+1)$\left( {{y_t},h_t^{b + 1}} \right)$ is given to 2D Net to produce an intermediate feature map zt (lines 2-8). Figure 1 describes the structure of this subnetwork. This network is composed of an encoder that extracts input features and SFT Net that can manage multiple degradations. In this step, 2D Net deconvolves y, by using htb+1$h_t^{b + 1}$ The SFT layer uses this vector to modulate feature maps. This layer was introduced in Wang et al. (2018) and used in Gu et al. (2019) for the first time to handle multiple degradations in inverse problems. This layer applies an affine transformation to the feature maps conditioned on the degradation maps Ft(h)$F_t^{\left( h \right)}$ which are obtained by stretching htb+1$h_t^{b + 1}$ into size (b + 1) × height × width, where all the elements of the i-th map equal the i-th element of htb+1$h_t^{b + 1}$ Here, height and width respectively indicate the height and width of input images and feature maps. We see that in our experiments, height = width = 256. The affine transformation involves scaling and shifting operations SFT(Fin,Ft(h))=γFin+β,${\rm{SFT}}\left( {{F_{{\rm{in}}}},F_t^{\left( h \right)}} \right) = \gamma \odot {F_{{\rm{in}}}} + \beta ,$(4)

where γ and β are estimated by additional small convolutional layers and ⊙ is the Hadamard product.

Next, {zt}tI are stacked in the temporal dimension. This gives Z, a tensor of dimensions f×T× height × width (line 10). The output is passed to ID Net, composed of three ID convolutional layers working along the temporal dimension, interlaced with ReLU activations (line 11). The kernel size of each ID convolutional layer is set to 5. With three such layers, the temporal receptive field is 5 + 4 + 4= 13, which is enough considering the temporal extension of the experimental transient events in this work (see Sect. 5.1.2). Figure 2 details the ID Net architecture.

thumbnail Fig. 1

2D Net. Each convolutional layer outputs ƒ feature maps, ƒ = 32 in our study. The kernel size of each convolutional layer is set to 3 × 3.

thumbnail Fig. 2

ID Net. {zt}tI has dimensions f×T× height × width. Each convolutional layer outputs ƒ feature maps, except the last, which outputs images with one channel each. Each ID convolutional layer has kernel size 5.

4.3 Deflation Net

Algorithm 2 summarizes the Deflation Net. This network is based on the same subnetworks as 2D-ID Net but involves a different computation flow. Specifically, it decouples the reconstruction of constant and transient sources. In the first place, both input images {yt}tI and PSFs {ht}tI are averaged over the temporal dimension (lines 1 and 2). The average input image presents a reduced noise level (computed in line 4) compared to each image in the input cube. The average PSF is better conditioned than each PSF of the PSF cube. Then, the average image is deconvolved in the feature space based on the average PSF to give the average sky features (line 7). Next, this average sky is recon-volved by the corresponding PSF (line 14) and subtracted to the individual degraded sky in the feature space at each time step. Each resulting image only contains transient sources. This sky is then deconvolved based on the individual PSF to reconstruct transient sources (line 15). This individual deconvolved image and the average deconvolved sky are finally summed in the feature level via skip connections (line 16). This processing is done for each time step, and all outputs are then sent to the final 1D Net (lines 19 and 20).

Algorithm 2

Deflation Net. Cr denotes the operation that center-crops a 2D structure with crop size r.

5 Experiment

5.1 Datasets

We generate disjoint training, validation, and test sets of GT and PSF cubes at various noise levels. The corresponding dirty cubes are generated based on Eq. (3).

5.1.1 PSF cubes

We simulate the interferometric response of MeerKAT in the L-band, using its current 64-antenna distribution. The observing frequency is fixed to 1420 MHz. The location of MeerKAT on Earth is Long = 21.33°, Lat = −30.83°, h = 1195 m. The minimum pointing elevation is set to 2 degrees. Based on these parameters, we randomly pick PSF cubes by sampling the local visible sky for a given observing time and observing rate. Each cube accounts for the typical eight-hour tracking observation of a randomly selected source in the J2000 reference frame that is visible in the local sky during the eight hours. This period is divided into successive intervals of 15-min scans. They produce a 15-min integrated PSF associated with the (u,v) coverage computed toward the source’s current location. Each PSF is projected into a spatial grid of 256 × 256 pixels. Hence, the cube has the dimensions 32 × 256 × 256. Adopting the notations from Sect. 3.2, we have T = 32 and ti+1ti = 15 min. We generated 435 training, 50 validation, and 50 test PSF cubes samples. For all of them the source elevations at the beginning of the observation are above 20 degrees as seen from MeerKAT. Figure 3 depicts the 32 (15 min) frames of a PSF cube. As the source appears to be rotating above the interferometer during the eight-hour observation, the projected baselines vary, leading to a continuous rotation and warping of the PSF.

Figure 4 depicts the effective distribution of elevation in training PSFs set. This distribution is skewed toward 30° elevation in the case of the telescope at hand, MeerKAT. This is expected because the PSF simulation accounts for the visibility of sources, in the southern sky, as seen from MeerKAT. For any given observation duration (e.g., 8 h max) and integration (e.g., 15 min/image), we computed the telescope source tracking and its aperture projection, both required to derive the effective (u,v) coverage and therefore the PSF. An astrophysical source is associated with a unique pair of coordinates (declination δ and right ascension α) on the celestial sphere (see Fig. A.1 for an illustration). Depending on the source declination (δ, in the equatorial frame), hour angle (associated with right ascension α and time of observation), and observation duration, not all (α,δ) directions are accessible. Therefore, a uniform random distribution of directions, drawn from the accessible direction window in the sky, will appear skewed around the local elevation of the south celestial pole (SCP), located around 30° for MeerKAT. Circumpolar sources close to the SCP are always accessible.

The PCA projection matrix P that encodes knowledge about the PSF is learned from all PSFs of the train PSF set (i.e., from the 435 × 32 = 13 920 PSFs). The value of PCA components b = 5 explains 90% of the total variance. Figure 5 shows the ten main PCA eigenvectors that composed the PSF of the whole set.

5.1.2 GT cubes

We suppose the sources to be unresolved points sources in the sky image. The size of the pixel on the sky is fixed to 1.5″. We assume that within a sky image of size 30′ × 30′ (i.e., 1200 × 12 pixels), at most 30 constant and 2 transient point sources can be placed. This distribution of sources can be considered compatible with shallow imaging like that of MeerKAT. Following this distribution, we generate 39 000 training sky cubes with dimensions 32 × 256 × 256. These data are divided into three equal parts, containing zero, one, or two transient sources per field. Each validation and test set contains 66 cubes following the same distribution of sources, with half of them containing a transient source and the other half containing two. The source peak flux density (i.e., the value of the pixel of a constant source in the ground truth) is randomly sampled between 1 and 100. Regarding a transient source, its amplitude is a discrete function A(ti) sampled from the continuous functions A(t). For each source, this profile is randomly chosen between the following models: gate, gaussian, and skew-normal. Their parameters are randomly chosen, and the transient source’s maximum amplitude is randomly chosen between [50, 100]. Examples of temporal profiles are shown in Sect. 6, Fig. 6. Our neural network systems consider normalized pixel values (i.e., the GT cube’s pixel values are divided by 100) so that they lie between 0 and 1.

thumbnail Fig. 3

PSF test cube chosen for evaluation. The text in each panel reports: time step, elevation in degrees, azimuth in degrees. Each PSF is normalized to 1 (black). The gray color scale was reversed for clarity. The apparent rotation of the PSF with time is the result of the projection of the interferometer aperture toward the source direction, which depends on the interferometer location, declination of the source, and observing time (see Fig. A.1 for an illustration).

thumbnail Fig. 4

Histogram of distribution of elevations in the training PSF set. The distribution is skewed toward the south celestial pole where circumpolar sources can be observed from MeerKAT.

thumbnail Fig. 5

PCA eigenvectors for the first ten largest eigenvalues computed over the whole PSF set.

5.2 Training procedure

From each GT cube in the training, validation, and test sets, the corresponding dirty cube is generated based on Eq. (3). Mini-batch gradient descent with a batch size of 4 is used for training. For each example in the batch, a train PSF cube is randomly picked, and the noise level σϵ is randomly sampled from [0,6]. Data augmentation with random flipping and/or transposition was performed. The learning rate is set to 10−4, and we train our models for 100 epochs each. The loss function is the pixel-wise mean-squared error (MSE) between GT and estimated cubes. Vafaei Sadr et al. (2019) claimed that if GT images are skies with point sources, then the MSE loss function would be suboptimal and instead proposed to smooth the point sources. However, our study obtained satisfactory empirical results with this pixel-wise MSE between GT and estimated cubes.

5.3 Evaluation metrics

Given a constant or transient source at a certain location in a GT cube X, we can first define a subcube obtained by locally cropping the cube around the source with a patch size ρ × ρ (ρ = 3 in our study, defining the region 𝒟s). We can also extract a subcube at the same location for the estimated cube. We can then compute the root MSE (RMSE) between the two subcubes. This error quantifies the fidelity of the restored temporal profile. The GT subcube norm can normalize this value. We denote as NRMSES this error for the source S.

To measure the input signal quality, for each source s in the dirty cube {yt}tI with location (is, js) we evaluate its signal-to-noise ratio S/Ns. We adopt the following definition: S/Ns=i,j𝒟syτ[ i,j ]σb.${S \mathord{\left/ {\vphantom {S {{N_{\rm{s}}}}}} \right. \kern-\nulldelimiterspace} {{N_{\rm{s}}}}} = \sum\limits_{i,j \in {{\cal D}_s}} {{{{y_\tau }\left[ {i,j} \right]} \over {{\sigma _b}}}.} $(5)

Here τ refers to the temporal localization of the transient. It corresponds to the time step when the amplitude of the transient source is maximum in the GT cube. For a constant source, we set τ = 15. 𝒟s is the local patch of size 3 × 3 centered on the true source location (iS, jS). The parameter σb denotes the background noise estimated on yτ after excluding 𝒟S. In radio images, Peak S/N is the ratio of peak flux density (usually a single pixel) of the source to the local noise root mean square. Here, to absorb pixel gridding bias, the source flux is accounted over the region 𝒟S. This ensures that all the recovered flux density of the source is properly accounted for in the presence of noise. This error could lead to detrimental and unfair representations of light curve reconstruction with different methods.

Moreover, to measure the performance of background denoising (i.e., restoration of the empty region of the sky) we exclude all 𝒟S subcubes for a GT cube. We operate the same procedure on the corresponding estimated cube and compute the RMSE between the two. We note this metric RMSEnoise.

We compare the proposed algorithms for the cube restoration against frame-by-frame CLEAN deconvolution of the dirty cubes. To avoid biasing the final result during the CLEAN final restoring beam step, we considered only the detected CLEAN components associated with the detected sources. PSF cube and noise level are provided to CLEAN in a non-blind manner. They intervene in defining a threshold from which the algorithm iteration stops. Furthermore, to prevent bias on the use of CLEAN due to a high-noise environment of the dirty cube, we stopped CLEAN based on a maximum number of iterations and, as for other reconstruction methods, also considered only counting the flux density around the source location in 𝒟S.

thumbnail Fig. 6

Reconstructions of temporal profiles related to some transient sources in the test cubes. The horizontal and vertical axes indicate the time step and normalized amplitude for each subpanel. The figure compares methods at different noise levels σϵ. The names of the sources are, from left to right: source 1, source 2, source 3, source 4, and source 5.

6 Results

6.1 Fixed test PSF cube and varying noise

We picked a PSF test cube and evaluated the methods on the test sky cubes with the following noise levels: σϵ ∈ {0,1,2,3,4,5,6}. The injected noise levels were selected to range from low noise level cases, where the constant and transient sources are readily detectable in the dirty cubes, to high noise level cases, where no sources can be seen (such a high noise level is displayed on the first line of Fig. 7). Figure 3 shows the PSF test cube that has been picked. We see that the PSF rotates with time.

Figure 6 compares the reconstruction of temporal profiles related to several transient sources in the test cubes. The figure compares methods at different noise levels σϵ. We observe that DL-based methods produce better reconstructions than CLEAN. With increasing noise level σϵ, the performance of CLEAN rapidly degrades, whereas that of DL-based methods remains good. The plots of Fig. 6 related to source 1 illustrate this. Moreover, even in the presence of noise, DL-based methods provide a better restoration and preservation of the start and end dates of the transient signal. This is granted by the temporal modeling capability of these methods. Conversely, in the case of CLEAN, the restored start and end dates of the transient is distorted because of the noise in single frames. The plots related to source 5 at noise level σϵ = 3 and σϵ = 4 illustrate these statements regarding the end of the transient signal. Among DL-based methods, Deflation Net seems better for restoring transient signals at a high-noise regime. The plots of source 3 depict this. This suggests that the architectural choice of Deflation Net decoupling constant and transient source reconstructions is appropriate with respect to the inverse problem. Even at the highest noise level σϵ = 6, Deflation Net can restore the end of the transient signal. Figure 7 visualizes source 3 reconstructed from different methods, at noise level σϵ = 4. We see that DL-based methods more efficiently reconstruct the transient source than CLEAN. Moreover, Deflation Net performs the best in reconstructing the end of the transient source. These results are particularly important whenever one telescope wants to react quickly upon detecting a potential transient. A transient could be detected on a nearly real-time imaging pipeline that integrates our trained network as soon as its light curve rises. Alerts could therefore be created and distributed earlier and with more confidence concerning false detections.

As a matter of comparison and robustness, Fig. 8 compares reconstructions of temporal profiles for constant sources. As expected, DL-based methods systematically present reconstructions with higher fidelity. Moreover, they produce light curves with more stable amplitude variations than the CLEAN-based method. We must ensure that the trained network correctly reconstructs the light curves of constant sources without making them look like flickering transient sources. Moreover, Deflation Net presents less varying reconstructed constant sources than 2D-1D Net. Deflation Net is understood to provide a better reconstruction of constant sources for the following reasons. First, the noise level is reduced in the average input sky, and the average PSF is better conditioned than the individual PSF frame. Deconvolution of an average sky by the average PSF is thus easier for constant sources. Figures 9 and B.1 illustrate this by respectively displaying the average input sky for various noise levels and the average PSF. Second, the skip connections of Deflation Net (line 16 of Algorithm 2) allow reconstructions at different time steps to be distributed near the average deconvolved sky. Figure 10 compares the average variance of reconstructed constant sources over the test cubes at different values of σϵ. The figure confirms that DL-based methods reconstruct constant sources with better fidelity than CLEAN, and that Deflation Net produces the lowest variances concerning constant sources. On average, constant sources reconstructed by 2D-1D Net present a variance ~3.0 times smaller than those reconstructed by CLEAN. Deflation Net’s variance is ~8.6 times smaller than CLEAN’s.

After aggregating results of all inferences with all of the noise levels σϵ ∈ {0,1,2,3,4,5,6}, the upper subpanel of Fig. 11 shows NRMSEs averaged over sources belonging to the same S/N interval delimited by deciles in S/Ns. We observe that DL-based methods generally perform better than CLEAN, except when the S/Ns is very high (S/Ns > 197.5). This case indeed corresponds to that of σϵ = 0, for which CLEAN is optimal. In other cases, DL-based methods, on average, perform better than CLEAN. By comparing positions of error bars indicating standard deviations, we observe that in many cases (S/Ns between 40.5 and 128.6), DL-based methods present inhomo-geneous performance over different sources, but their worst performance is statistically still better than the best performance of CLEAN. At a high-noise regime (S/Ns below 40.5), DL-based methods perform better than the CLEAN-based method on average. Within the DL-based methods, Deflation Net performs better than 2D- 1D Net on average in most noisy cases (S/N between 0 and 86). On average, CLEAN presents NRMSES values that are ~2.4 times higher than 2D-1D Net and ~2.7 times higher than Deflation Net.

Figure 7 illustrates the high performance of DL-based methods in background denoising. We observe that they effectively restore the empty sky around the sources. This is less the case for the frame-by-frame CLEAN method, which captures noise and generates residual noisy pixel distributions around the true source. Figure 12 compares RMSEnoise, averaged over the test cubes at different values of σϵ. We observe that for all values of σϵ, DL-based methods better suppress background noise than CLEAN. Within the DL-based methods, Deflation Net better restores the background sky because the noise level is reduced and the PSF is better conditioned in averaged input sky. On average, CLEAN presents RMSEnoise values that are ~1.8 times higher than the 2D-1D Net values and ~2 times bigger than the Deflation Net values.

thumbnail Fig. 7

Visualization of source 3 in Fig. 6 reconstructed from different methods, at noise level σϵ = 4.

thumbnail Fig. 8

Reconstructions of temporal profiles of constant sources. The horizontal and vertical axes indicate the time step and normalized amplitude for each subpanel.

thumbnail Fig. 9

Constant source in the input degraded sky between time steps 0 and 6, and average of the input skies over all the time steps. First row: noise level σϵ = 2. Second row: σϵ = 3. The noise level is reduced in the averaged sky.

thumbnail Fig. 10

Average variance of reconstructed light curves of constant sources over the test cubes at different values of the noise level σϵ. The variances are computed on cubes normalized between 0 and 1. The vertical bars show the standard deviation.

6.2 Varying test PSF cube and fixed noise

In this section we set σϵ = 3 and aggregate results over all of the PSF test cubes.

Figure 13 shows average NRMSES for sources in the test set for ten equally spaced bins of PSF elevations. The apparent envelope for each method delimits the first and the last ten percentiles. We see that for all bins, our DL-based methods, on average, outperform CLEAN in large margins. This is especially true for Deflation Net, which presents the last ten percentiles near the first ten percentiles of CLEAN in most cases (elevations > 41 degree). Among the DL-based methods, here again, in most cases Deflation Net performs better on average than 2D-1D Net (elevations between 27 and 82 degrees). This confirms that the Deflation Net architecture efficiently tackles the inverse problem. On average, CLEAN presents NRMSE values that are ~2.6 times bigger than 2D-1D Net and ~3.2 times bigger than Deflation Net. Furthermore, in most cases (elevations > 34 degrees), the envelope of Deflation Net is smaller than those of 2D-1D Net and CLEAN. Therefore, Deflation Net brings less error dispersion in source light curve reconstruction than other methods. This can be explained by its skip connections that set reconstructions at different time steps near the average deconvolved sky. Our methods perform worse at elevations between 20 and 27 degrees and better at elevations between 82 and 89 degrees. Between these bounds, their performance seems to increase with increased elevation. This is coherent: the closer to 90 degrees the elevation, the better conditioned the PSF is, and therefore the easier it is to deconvolve. Conversely, lower elevation means that the PSF is worse conditioned and undergoes a strong projection effect, making the deconvolution task harder.

Second, after aggregating the results of all inferences with all the test PSF cubes over the test sky cubes, the lower subpanel of Fig. 11 shows NRMSEs averaged over sources belonging to the same S/N interval delimited by deciles in S/Ns. We observe that DL-based methods perform better on average than CLEAN for all intervals of S/Ns. We can state the following by observing the black whiskers indicating standard deviations around mean values: in most cases, DL-based methods present inhomogeneous performance, but their worst performance is statistically still better than the best performance of CLEAN. Within the DL-based methods, Deflation Net performs slightly better than 2D-1D Net when S/Ns is higher than 44, and outperforms 2D-1D Net when S/Ns is below 44. This confirms again that the Deflation Net architecture is appropriate regarding the inverse problem. On average, 2D-1D Net presents NRMSES that is 3.10 times smaller than that of CLEAN. Deflation Net shows NRMSES values that are 3.50 times smaller than that of CLEAN.

thumbnail Fig. 11

Average source reconstruction performance of 2D-1D Net, Deflation Net, and CLEAN, over the test sky cubes with (upper sub-panel) a fixed test PSF cube and the noise levels σϵ ∈ {0,1,2,3,4,5,6); (lower subpanel) a fixed noise level σϵ = 3 and the 50 test PSF cubes. Regarding all of these inferences, sources in the test sky cubes have different input S/Ns. These S/Ns values can be divided into ten deciles. This figure visualizes NRMSES averaged over sources in each decile. The black whiskers indicate the standard deviations.

thumbnail Fig. 12

log(RMSEnoise) averaged over the test cubes at different values of σϵ. The vertical bars show the standard deviation. Because of the log scale, the standard deviation is higher when RMSEnoise is small. This is the case for the DL-based methods.

thumbnail Fig. 13

Mean NRMSES for sources in the test set for ten equally spaced bins of PSF pointing elevations. The envelope for each method delimits the first and the last ten percentiles.

thumbnail Fig. 14

Reconstructions of temporal profiles of source 3 and source 5 (also shown in Fig. 6) at different noise levels σϵ, from the normalized test cubes.

6.3 Importance of temporal modeling

In this subsection we evaluate the benefit brought by the 1D Net in 2D-1D Net. This measures to what extent capturing the temporal structure of a signal increases the signal reconstruction performance. To do so, we compare the performance of CLEAN, 2D-1D Net, and 2D Net. The last is a variant of 2D-1D Net from which we excluded the 1D Net part, and we made the last 2D convolutional layer of 2D Net output a one-channel image at each time step. This mode reduces our method to a classic imager that forms a single image on time-integrated data. Figure 14 compares the reconstructions of temporal profiles of some transient sources in the test cubes with the fixed PSF used in Sect. 6.1 and illustrated in Fig. 3. As expected, we observe that with increasing noise levels, 2D Net rapidly loses the capability to reconstruct the ends of the light curve, which lodge at the noise level. This drawback is shared with CLEAN. On the contrary, 2D-1D Net succeeds in restoring them properly. Figure 15 visualizes source 5 reconstructed from different methods, at noise level σϵ = 3. We see that 2D-1D Net reconstructs the transient profile more efficiently down to the disappearance. 2D Net fails to reconstruct it, similarly to CLEAN. These figures illustrate that, despite the noise, 1D Net allows correctly reconstructing the beginning and the end of a transient event by enabling temporal modeling.

The upper subpanel of Fig. 16 aggregates inference results on the test cubes with the fixed PSF cube illustrated in Fig. 3 and the noise levels σϵ ∈ {0,1,2,3,4,5,6}. The lower subpanel of Fig. 16 aggregates inference results on the test cubes with the fixed noise level σϵ = 3 and the 45 PSF test cubes used in Sect. 6.2. Both figures compare NRMSEs averaged over sources of the same interval delimited by deciles in S/Ns. On average, we observe that 2D Net performs slightly better than CLEAN in terms of NRMSEs. 2D-1D Net significantly outperforms both methods and presents on average ~2.3 times smaller NRMSEs than 2D-1D Net and CLEAN. This exemplifies the benefit brought by the temporal modeling capability of 1D Net.

Even if 2D Net performs similarly to CLEAN based on NRMSEs, the former performs better regarding background denoising. This is shown in Fig. 17, which compares RMSEnoise, averaged over the test cubes at different values of σe and the PSF illustrated in Fig. 3. For all values of σϵ, 2D Net better suppresses background noise than CLEAN by large margins. This shows that even if 2D Net does not extract temporal features, its convolutional layers enable highly efficient denoising. 2D-1D Net performs even better than 2D Net regarding this background denoising task. This shows that the temporal modeling capability of 1D Net also contributes to increasing the background denoising performance. On average, 2D Net presents RMSEnoise that are ~1.5 times smaller than that of CLEAN. 2D-1D Net presents RMSEnoise that are ~1.2 times smaller than 2D Net. Figure 15 illustrates these observations. Especially at the end of the transient event, frame-by-frame CLEAN captures noise and generates residual noisy pixels around the true source. 2D Net generates less noisy pixels (it only generates a noisy pixel at time step t = 10). 2D-1D Net does not generate any obvious noisy pixel.

thumbnail Fig. 15

Visualization of source 5 in Fig. 14 reconstructed from different methods, at noise level σϵ = 2.

7 Discussion and limitations

In this section, we discuss some relevant points, regarding our methods. The first point concerns the support size of the frame. In our inverse problem, we approximate the aperture synthesis method as linear degradation embodied by a multiplication of the true data with a mask in the Fourier domain. Moreover, we carried out our study in the gridded space where all quantities are discrete set with constant frame support size. The mask, therefore, has the same support size as the sky. All frame support sizes between the sky, the mask, and the PSF cubes are linked. Therefore, if one wants to change the support size, new training of the networks is required. This is contrary to some other DL-based image or video restoration problems where the linear degradation is defined by a convolution rather than by a mask. In this case, a trained neural network can deconvolve an image of any size. Regarding our networks, as only the 2D Net module operates deconvolution, we should only retrain this module, and 1D Net can be frozen. From a dataset of ungridded visibilities, it is up to the scientist to decide which support size fits the scientific objective of the data. It should be trained on a support size that corresponds to the final products used (e.g., image catalog of a survey, transient detection pipeline).

Secondly, the total duration of transient events is worth considering. The total observation period and the number of temporal frames (which defined a slice PSF of the PSF cube) determine the upper limits on the entire duration and the shortest timescale of transient events we deal with. As an illustration, for a fixed number of PSF frames over the total observation period, an increase in the latter will increase the integration time of each PSF frame. Thus, if the total observation period is more extended, with a fixed number of PSFs, each PSF appears smoother and presents fewer secondary lobes. Therefore, each time the total observation period changes, our networks should be theoretically retrained with new PSF cubes. We can suppose that the profiles of transient events in the train sky cube are diverse enough to be representative of real transients that can be encountered. In this case, we can keep the same train sky set.

Conversely, if we suppose the length of the observation increases for a fixed PSF time integration, the third dimension of the cube increases, but each PSF will share the same characteristics as before. In that case, we should only retrain ID Net, and 2D Net can be frozen as its role is to deconvolve PSFs as was done previously.

Computational complexity is another discussion point. Once trained, our networks present faster reconstruction than frame-by-frame CLEAN because, contrary to CLEAN, they are not iterative algorithms and can benefit from efficient GPU-based architectures. This aspect makes them attractive to the radio astronomy community. On our Intel I9-10940X CPU and NVIDIA TITAN RTX GPU, 2D-1D Net reconstructs the sky cube of 32 frames in ~65 ms. This speed is ~75 ms for Deflation Net. Both of them contain 0.1 M parameters.

Regarding theoretical guarantee, one of the drawbacks of our DL-based methods is that they lack mathematical frameworks allowing us to derive the theoretical error bounds or uncertainty beyond empirical statistics on the results.

It is also possible to extend our proposed methods to the polychromatic case, where the inverse problem also has a frequency dependency. In this case, the dirty, GT, and PSF cubes are four-dimensional data structures, adding the frequency dimension. A simple way to realize this extension is (i) to stack several skies at different frequencies in the channel dimension at the input of the 2D Net module and (ii) to stack several PSFs at different frequencies in the channel dimension at the input of the SFT layer. The computation of the average sky and PSF, and the subtraction step in the Deflation Net should be done for each frequency.

Our neural network systems considered normalized pixel values; the ground truth images’ pixel values were divided by 100 (i.e., they lie between 0 and 1). In the same spirit, when facing different pixel intensity ranges occurring in other types of datasets, for example the real one, a min-max normalization allows us to bring our methods into conditions similar to those that are tackled in this paper. The neural networks can eventually be retrained.

Finally, our proposed methods can easily be plugged into modern imagers, such as those based on wide-field imaging using projection (e.g., WSCLEAN; Offringa et al. 2014) or those based on faceting (e.g., DDFacet; Tasse et al. 2018). These imagers address direction-dependent effects corrections (i.e., the correction of the widefield projection effects or the non-coplanar arrays) and its counterpart (killMS) takes care of the calibration regarding these effects. While this is not in the scope of our paper, making a calibration tool based on CNNs is also an interesting topic to consider. It would require a full recasting of the problem in the framework of the Radio Interferometer Measurement Equation (RIME; Smirnov 2011), including the relevant Jones matrices and using ungridded visibilities directly. We leave these considerations for future work.

thumbnail Fig. 16

Average source reconstruction performance of 2D-1D Net, 2D Net, and CLEAN, demonstrating the importance of temporal modeling. These methods are evaluated on the test sky cubes with (upper subpanel) a fixed test PSF cube and the noise levels σϵ ∈ {0,1, 2,3,4,5, 6); (lower subpanel) a fixed noise level σϵ = 3 and the 50 test PSF cubes. Regarding all of these inferences, sources in the test sky cubes have different input S/Ns. These S/Ns values can be divided into ten deciles. This figure visualizes NRMSES averaged over sources in each decile. The black whiskers indicate standard deviations.

thumbnail Fig. 17

log(RMSEnoise) averaged over the test cubes at different values of σϵ. The vertical bars show standard deviation. Because of the log scale, the standard deviation is higher when RMSEnoise is small. This is the case for the DL-based methods.

8 Conclusion

In this study we dealt with transient source reconstruction in the context of radio interferometric imaging. We formulated this problem as a deconvolution of image time series. We proposed two neural networks to address this task, namely 2D-1D Net and Deflation Net. Thanks to the SFT layer, they can handle multiple PSFs that vary depending on the observed sky positions. They involve the same submodules: 2D Net and 1D Net. The former deconvolves individual frames and the latter enables temporal sky modeling. 2D-1D Net is based on a simple feedforward inference, whereas Deflation Net involves different computational flows. The latter restores the average sky and uses it to isolate transient sources in individual frames. Experiments based on simulated data and metrics measuring temporal profile reconstruction and background denoising demonstrated superior performance of these DL-based methods over CLEAN in the presence of noise. Deflation Net performs the best, excelling in reconstructing constant sources and background denoising. The ablation study confirms the temporal modeling enabled by 1D Net significantly increases the sky cube reconstruction performance.

Learning methods adapted to deconvolution, temporal structures of transient sources, and specificities of instrumental response are key elements to efficiently analyzing the images obtained via radio interferometers in the SKA era. For instance, the raw sensitivity of MeerKAT enables deep imaging (i.e., a typical σ ~ 10 µJy in 15 min) as well as high cadence imaging capabilities in a large field of view. Being able to deconvolve the data in high-noise regimes efficiently will maximize the chance to discover new transients and overcome the limitations imposed by current deconvolution methods. Missing transients in the image plane can be due to the lack of sensitivity (i.e., detection problem) or the lack of sufficient temporal sampling (i.e., dilution problem) that averages out short-scale transients. The robust 2D and 1D image reconstruction brought by the trained networks introduced in this study has significantly improved the 3D estimation of the sky. Replacing classical frame-by-frame decon-volution methods with the DL-based reconstruction methods can lead to better use of telescope time. This will drastically reduce the required exposure time while enabling a faster temporal sky sampling.

Acknowledgements

This work is supported by the European Community through the grant ARGOS (contract no. 101094354) and by Safran Electronics & Defense.

Appendix A Coordinates of an astrophysical source

thumbnail Fig. A.1

Astrophysical source associated with a unique pair of coordinates (declination δ and right ascension α) on the celestial sphere. Figure adapted from Astronomy (2009).

Appendix B Average PSF over the total observation duration

thumbnail Fig. B.1

Average PSF over the PSF cube in Figure 3.

References

  1. Abrantes, F., Lopes, C., Rodrigues, T., et al. 2009, Geochem. Geophys. Geosyst., 10, Q09U07 [Google Scholar]
  2. Adler, J., & Öktem, O. 2018, IEEE Trans. Med. Imaging, 37, 1322 [CrossRef] [Google Scholar]
  3. Arjomand Bigdeli, S., Zwicker, M., Favaro, P., & Jin, M. 2017, in Advances in Neural Information Processing Systems, eds. I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (USA: Curran Associates, Inc.), 30 [Google Scholar]
  4. Astronomy, U. 2009, Celestial-Equatorial (RA/Dec) Demonstrator (USA: NASA) [Google Scholar]
  5. Bergeaud, F., & Mallat, S. 1995, Proc. Int. Conf. Image Process., 1, 53 [CrossRef] [Google Scholar]
  6. Booth, R., & Jonas, J. 2012, African Skies, 16, 101 [NASA ADS] [Google Scholar]
  7. Carin, L. 2009, IEEE Antennas Propag. Magazine, 51, 72 [CrossRef] [Google Scholar]
  8. Carrillo, R. E., McEwen, J. D., & Wiaux, Y. 2012, MNRAS, 426, 1223 [Google Scholar]
  9. Chan, S. H., Wang, X., & Elgendy, O. A. 2017, IEEE Trans. Comput. Imaging, 3, 84 [CrossRef] [Google Scholar]
  10. Chiche, B. N., Frontera-Pons, J., Woiselle, A., & Starck, J.-L. 2020, in 2020 Tenth International Conference on Image Processing Theory, Tools and Applications (IPTA), IEEE, 1 [Google Scholar]
  11. Clark, B. 1980, A&A, 89, 377 [NASA ADS] [Google Scholar]
  12. Dabbech, A., Ferrari, C., Mary, D., et al. 2015, A&A, 576, A7 [CrossRef] [EDP Sciences] [Google Scholar]
  13. Dewdney, P. E., Hall, P. J., Schilizzi, R. T., & Lazio, T. J. L. W. 2009, IEEE Proc., 97, 1482 [Google Scholar]
  14. Diamond, S., Sitzmann, V., Heide, F., & Wetzstein, G. 2017, ArXiv e-prints [arXiv:1705.08041] [Google Scholar]
  15. Dong, C., Loy, C. C., He, K., & Tang, X. 2015, IEEE Trans. Pattern Anal. Mach. Intell., 38, 295 [Google Scholar]
  16. Frieden, B. R. 1972, J. Opt. Soc. Am., 62, 511 [NASA ADS] [CrossRef] [Google Scholar]
  17. Garsden, H., Girard, J., Starck, J.-L., et al. 2015, A&A, 575, A90 [CrossRef] [EDP Sciences] [Google Scholar]
  18. Girard, J. N., Garsden, H., Starck, J. L., et al. 2015, J. Instrum., 10, C08013 [CrossRef] [Google Scholar]
  19. Gu, J., Lu, H., Zuo, W., & Dong, C. 2019, in Proceedings of the IEEE conference on computer vision and pattern recognition, 1604 [Google Scholar]
  20. Gupta, H., Jin, K. H., Nguyen, H. Q., McCann, M. T., & Unser, M. 2018, IEEE Trans. Medical Imaging, 37, 1440 [CrossRef] [Google Scholar]
  21. Högbom, J. 1974, A&AS, 15, 417 [Google Scholar]
  22. Jiang, M., Girard, J. N., Starck, J.-L., Corbel, S., & Tasse, C. 2015, Eur. Signal Process. Conf., 23, 1646 [Google Scholar]
  23. Kim, J., Kwon Lee, J., & Mu Lee, K. 2016, in Proceedings of the IEEE conference on computer vision and pattern recognition, 1646 [Google Scholar]
  24. Lorimer, D. R., Bailes, M., McLaughlin, M. A., Narkevic, D. J., & Crawford, F. 2007, Science, 318, 777 [Google Scholar]
  25. Meinhardt, T., Moller, M., Hazirbas, C., & Cremers, D. 2017, in Proceedings of the IEEE International Conference on Computer Vision, 1781 [Google Scholar]
  26. Monga, V., Li, Y., & Eldar, Y. C. 2021, IEEE Signal Process. Magazine, 38, 18 [CrossRef] [Google Scholar]
  27. Nah, S., Timofte, R., Gu, S., et al. 2019, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 1985 [CrossRef] [Google Scholar]
  28. Nammour, F., Akhaury, U., Girard, J. N., et al. 2022, A&A 663, A69 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  29. Narayan, R., & Nityananda, R. 1986, ARA&A, 24, 127 [Google Scholar]
  30. Offringa, A., McKinley, B., Hurley-Walker, N., et al. 2014, MNRAS, 444, 606 [Google Scholar]
  31. Pustelnik, N., Benazza-Benhayia, A., Zheng, Y., & Pesquet, J.-C. 2016, Wiley Encyclopedia of Electrical and Electronics Engineering (Hoboken: Wiley) [Google Scholar]
  32. Rau, U., & Cornwell, T. J. 2011, A&A, 532, A71 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  33. Reehorst, E. T., & Schniter, P. 2018, IEEE Trans. Comput. Imaging, 5, 52 [Google Scholar]
  34. Romano, Y., Elad, M., & Milanfar, P. 2017, SIAM J. Imaging Sci., 10, 1804 [CrossRef] [Google Scholar]
  35. Scaife, A. 2020, Phil. Trans. R. Soc. A, 378, 20190060 [CrossRef] [Google Scholar]
  36. Schmidt, K., Geyer, F., Fröse, S., et al. 2022, A&A 664, A134 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  37. Schwab, F. 1984, AJ, 89, 1076 [NASA ADS] [CrossRef] [Google Scholar]
  38. Smirnov, O. M. 2011, A&A, 527, A106 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  39. Sreehari, S., Venkatakrishnan, S. V., Wohlberg, B., et al. 2016, IEEE Trans. Comput. Imaging, 2, 408 [Google Scholar]
  40. Sureau, F., Lechat, A., & Starck, J.-L. 2020, A&A, 641, A67 [EDP Sciences] [Google Scholar]
  41. Swinbank, J. D., Staley, T. D., Molenaar, G. J., et al. 2015, Astron. Comput., 11, 25 [NASA ADS] [CrossRef] [Google Scholar]
  42. Tasse, C., Hugo, B., Mirmont, M., et al. 2018, A&A, 611, A87 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  43. Terris, M., Dabbech, A., Tang, C., & Wiaux, Y. 2023, MNRAS, 518, 604 [Google Scholar]
  44. Tuccillo, D., Huertas-Company, M., Decencière, E., et al. 2018, MNRAS, 475, 894 [NASA ADS] [CrossRef] [Google Scholar]
  45. Vafaei Sadr, A., Vos, E. E., Bassett, B. A., et al. 2019, MNRAS, 484, 2793 [CrossRef] [Google Scholar]
  46. van Haarlem, M. P., Wise, M. W., Gunst, A., et al. 2013, A&A, 556, A2 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  47. Venkatakrishnan, S. V., Bouman, C. A., & Wohlberg, B. 2013, in 2013 IEEE Global Conference on Signal and Information Processing, IEEE, 945 [CrossRef] [Google Scholar]
  48. Wang, X., Yu, K., Dong, C., & Change Loy, C. 2018, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 606 [Google Scholar]
  49. Wernecke, S. J. 1977, Rad. Sci., 12, 831 [NASA ADS] [CrossRef] [Google Scholar]
  50. Wiaux, Y., Jacques, L., Puy, G., Scaife, A. M., & Vandergheynst, P. 2009, MNRAS, 395, 1733 [NASA ADS] [CrossRef] [Google Scholar]
  51. Wilson, T. L., Rohlfs, K., & Hüttemeister, S. 2009, Tools of Radio Astronomy, 5 (Berlin: Springer) [Google Scholar]
  52. Xu, L., Ren, J. S., Liu, C., & Jia, J. 2014, in Advances in Neural Information Processing Systems, eds. Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, & K. Weinberger (USA: Curran Associates, Inc.), 27 [Google Scholar]
  53. Zhang, K., Zuo, W., Chen, Y., Meng, D., & Zhang, L. 2017a, IEEE Trans. Image Process., 26, 3142 [Google Scholar]
  54. Zhang, K., Zuo, W., Gu, S., & Zhang, L. 2017b, in IEEE Conference on Computer Vision and Pattern Recognition, 3929 [Google Scholar]
  55. Zhang, K., Zuo, W., & Zhang, L. 2018, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3262 [Google Scholar]
  56. Zhang, K., Gool, L. V., & Timofte, R. 2020, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3217 [Google Scholar]

All Tables

Algorithm 1

2D-ID Net. Cr denotes the operation that center-crops a 2D structure with crop size r.

Algorithm 2

Deflation Net. Cr denotes the operation that center-crops a 2D structure with crop size r.

All Figures

thumbnail Fig. 1

2D Net. Each convolutional layer outputs ƒ feature maps, ƒ = 32 in our study. The kernel size of each convolutional layer is set to 3 × 3.

In the text
thumbnail Fig. 2

ID Net. {zt}tI has dimensions f×T× height × width. Each convolutional layer outputs ƒ feature maps, except the last, which outputs images with one channel each. Each ID convolutional layer has kernel size 5.

In the text
thumbnail Fig. 3

PSF test cube chosen for evaluation. The text in each panel reports: time step, elevation in degrees, azimuth in degrees. Each PSF is normalized to 1 (black). The gray color scale was reversed for clarity. The apparent rotation of the PSF with time is the result of the projection of the interferometer aperture toward the source direction, which depends on the interferometer location, declination of the source, and observing time (see Fig. A.1 for an illustration).

In the text
thumbnail Fig. 4

Histogram of distribution of elevations in the training PSF set. The distribution is skewed toward the south celestial pole where circumpolar sources can be observed from MeerKAT.

In the text
thumbnail Fig. 5

PCA eigenvectors for the first ten largest eigenvalues computed over the whole PSF set.

In the text
thumbnail Fig. 6

Reconstructions of temporal profiles related to some transient sources in the test cubes. The horizontal and vertical axes indicate the time step and normalized amplitude for each subpanel. The figure compares methods at different noise levels σϵ. The names of the sources are, from left to right: source 1, source 2, source 3, source 4, and source 5.

In the text
thumbnail Fig. 7

Visualization of source 3 in Fig. 6 reconstructed from different methods, at noise level σϵ = 4.

In the text
thumbnail Fig. 8

Reconstructions of temporal profiles of constant sources. The horizontal and vertical axes indicate the time step and normalized amplitude for each subpanel.

In the text
thumbnail Fig. 9

Constant source in the input degraded sky between time steps 0 and 6, and average of the input skies over all the time steps. First row: noise level σϵ = 2. Second row: σϵ = 3. The noise level is reduced in the averaged sky.

In the text
thumbnail Fig. 10

Average variance of reconstructed light curves of constant sources over the test cubes at different values of the noise level σϵ. The variances are computed on cubes normalized between 0 and 1. The vertical bars show the standard deviation.

In the text
thumbnail Fig. 11

Average source reconstruction performance of 2D-1D Net, Deflation Net, and CLEAN, over the test sky cubes with (upper sub-panel) a fixed test PSF cube and the noise levels σϵ ∈ {0,1,2,3,4,5,6); (lower subpanel) a fixed noise level σϵ = 3 and the 50 test PSF cubes. Regarding all of these inferences, sources in the test sky cubes have different input S/Ns. These S/Ns values can be divided into ten deciles. This figure visualizes NRMSES averaged over sources in each decile. The black whiskers indicate the standard deviations.

In the text
thumbnail Fig. 12

log(RMSEnoise) averaged over the test cubes at different values of σϵ. The vertical bars show the standard deviation. Because of the log scale, the standard deviation is higher when RMSEnoise is small. This is the case for the DL-based methods.

In the text
thumbnail Fig. 13

Mean NRMSES for sources in the test set for ten equally spaced bins of PSF pointing elevations. The envelope for each method delimits the first and the last ten percentiles.

In the text
thumbnail Fig. 14

Reconstructions of temporal profiles of source 3 and source 5 (also shown in Fig. 6) at different noise levels σϵ, from the normalized test cubes.

In the text
thumbnail Fig. 15

Visualization of source 5 in Fig. 14 reconstructed from different methods, at noise level σϵ = 2.

In the text
thumbnail Fig. 16

Average source reconstruction performance of 2D-1D Net, 2D Net, and CLEAN, demonstrating the importance of temporal modeling. These methods are evaluated on the test sky cubes with (upper subpanel) a fixed test PSF cube and the noise levels σϵ ∈ {0,1, 2,3,4,5, 6); (lower subpanel) a fixed noise level σϵ = 3 and the 50 test PSF cubes. Regarding all of these inferences, sources in the test sky cubes have different input S/Ns. These S/Ns values can be divided into ten deciles. This figure visualizes NRMSES averaged over sources in each decile. The black whiskers indicate standard deviations.

In the text
thumbnail Fig. 17

log(RMSEnoise) averaged over the test cubes at different values of σϵ. The vertical bars show standard deviation. Because of the log scale, the standard deviation is higher when RMSEnoise is small. This is the case for the DL-based methods.

In the text
thumbnail Fig. A.1

Astrophysical source associated with a unique pair of coordinates (declination δ and right ascension α) on the celestial sphere. Figure adapted from Astronomy (2009).

In the text
thumbnail Fig. B.1

Average PSF over the PSF cube in Figure 3.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.