The ROAD to discovery: Machine-learning-driven anomaly detection in radio astronomy spectrograms

M. Mesarcik; A. J. Boonstra; M. Iacobelli; E. Ranguelova; C. T. A. M. de Laat; R. V. van Nieuwpoort

doi:10.1051/0004-6361/202347182

Home

All issues

Volume 680 (December 2023)

A&A, 680 (2023) A74

Full HTML

Open Access

Issue		A&A Volume 680, December 2023


Article Number		A74
Number of page(s)		12
Section		Astronomical instrumentation
DOI		https://doi.org/10.1051/0004-6361/202347182
Published online		08 December 2023

A&A, 680, A74 (2023)

The ROAD to discovery: Machine-learning-driven anomaly detection in radio astronomy spectrograms

M. Mesarcik¹, A. J. Boonstra³, M. Iacobelli³, E. Ranguelova², C. T. A. M. de Laat¹ and R. V. van Nieuwpoort¹^,2

¹ Informatics Institute, University of Amsterdam, Science Park 900, 1098 XH Amsterdam, The Netherlands
e-mail: m.mesarcik@uva.nl
² Netherlands eScience Center, Science Park 402, 1098 XH Amsterdam, The Netherlands
³ ASTRON, the Netherlands Institute for Radio Astronomy, Oude Hoogeveensedijk 4, 7991 PD Dwingeloo, The Netherlands

Received: 14 June 2023
Accepted: 29 August 2023

Abstract

Context. As radio telescopes increase in sensitivity and flexibility, so do their complexity and data rates. For this reason, automated system health management approaches are becoming increasingly critical to ensure nominal telescope operations.

Aims. We propose a new machine-learning anomaly detection framework for classifying both commonly occurring anomalies in radio telescopes as well as detecting unknown rare anomalies that the system has potentially not yet seen. To evaluate our method, we present a dataset consisting of 6708 autocorrelation-based spectrograms from the Low Frequency Array (LOFAR) telescope and assign ten different labels relating to the system-wide anomalies from the perspective of telescope operators. This includes electronic failures, miscalibration, solar storms, network and compute hardware errors, among many more.

Methods. We demonstrate how a novel self-supervised learning (SSL) paradigm, that utilises both context prediction and reconstruction losses, is effective in learning normal behaviour of the LOFAR telescope. We present the Radio Observatory Anomaly Detector (ROAD), a framework that combines both SSL-based anomaly detection and a supervised classification, thereby enabling both classification of both commonly occurring anomalies and detection of unseen anomalies.

Results. We demonstrate that our system works in real time in the context of the LOFAR data processing pipeline, requiring <1ms to process a single spectrogram. Furthermore, ROAD obtains an anomaly detection F-2 score of 0.92 while maintaining a false positive rate of 2%, as well as a mean per-class classification F-2 score of 0.89, outperforming other related works.

Key words: telescopes / instrumentation: interferometers / methods: data analysis

© The Authors 2023

Open Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

1 Introduction

Radio telescopes are getting bigger and generating increasing amounts of data to improve their sensitivity and resolution (Norris 2010; van Haarlem et al. 2013; Foley et al. 2016; Nan et al. 2011). The growing system size and resulting complexity increases the likelihood of unexpected events occurring, thereby resulting in datasets that contain anomalies. These anomalies include failures in instrument electronics, miscalibrated observations, environmental events such as lightning, astronomical effects including solar storms, as well as problems in data processing systems among many others. We consider Radio Frequency Interference (RFI) unavoidable and therefore do not consider it an anomaly in this context. Currently, efforts to detect and mitigate these anomalies are performed by human operators, who manually inspect intermediate data products to determine the success or failure of a given observation. The accelerating data rates, coupled with the lack of automation, results in operator-based data quality inspection becoming increasingly infeasible (Mesarcik et al. 2020).

In the context of low-frequency radio astronomy, scientific data processing has been successfully automated by running complex workflows that perform calibration and imaging of interferometric data (de Gasperin et al. 2019; Weeren et al. 2016; Tasse et al. 2018; Wijnholds et al. 2010), radio frequency interference (RFI) mitigation (Offringa et al. 2010), and de-dispersion (Barsdell et al. 2012; Bassa et al. 2022) of time-domain data among many more. Additionally, continuous effort is being made to create high-performance real-time algorithms to improve the quality and reliability of the scientific data (Sclocco et al. 2019, 2016; van Nieuwpoort & Romein 2011; La Plante et al. 2021; Broekema et al. 2018). However, as of yet, there have been no attempts to fully automate the system health management (SHM) pipeline, and by virtue of the lack of work on this topic, no real-time implementations exist. This is in part due to the complexity of the challenge as well as the unavailability of SHM-specific datasets. Furthermore, the successes of SHM-based anomaly detection systems have been extremely impactful in fields ranging from industrial manufacturing (Bergmann et al. 2019a) to spacecraft system health (Baireddy et al. 2021; Spirkovska et al. 2010), thereby motivating this study.

The exponential growth of data production from modern instruments have made data-driven techniques and machine learning appealing to astronomers and telescope operators. However efforts in machine-learning-based anomaly detection are concentrated in scientific discovery rather than SHM, with approaches ranging from detecting unusual galaxy morphologies (Margalef-Bentabol et al. 2020; Storey-Fisher et al. 2021) to identifying new transients (Villar et al. 2021; Lochner & Bassett 2021; Ma et al. 2023). Unfortunately, these techniques are not directly applicable to the multi-station autocorrelation-based spectrographic data obtained from radio observatories, due to increased data complexity, a high-dynamic range caused by RFI, varying observation durations and frequency ranges, as well as the feature compounding problem (Mesarcik et al. 2020). It must be noted this work makes use of up-stream data products in the form of spectrograms, which are produced by all radio telescopes thereby enabling its applicability to other instruments.

The SHM anomaly detection problem differs from existing work for several reasons. Firstly, the data inspection performed by telescope operators involves analysing both known and unknown anomalies; where known anomalies should be classified into their respective classes and unknown anomalies should be differentiated from all other existing classes. This is in contrast with typical anomaly detection, which is normally posed as a one-class-classification problem. Furthermore, we find that class imbalance not only exists between the normal and anomalous classes (which is common for anomaly detection), but there is also strong imbalance between the anomalous classes. For these reasons, we propose a new framework for detecting and classifying SHM-based anomalies, that is capable of distinguishing both regularly occurring and rare events.

We find the multi-class classification approach more appropriate as it gives more flexibility to telescope operators. This is because the anomalousness of particular events entirely depends on the context of the science goals of an observation. For example, in observations relating to the epoch of reionisation (EoR; Yatawatta et al. 2013), the signal-to-noise ratio (S/N) is a huge concern, and as such any high-power anomalies such as solar storms should be identified and removed from the data. In contrast, in solar-physics-based observations by Vocks et al. (2018), the high-power solar events should implicitly be kept within the data and should not be flagged as anomalous. Therefore, by including a classification step within the anomaly detector system we offer greater flexibility to telescope operators in data quality inspection. Furthermore, we consider classification to be the first step in anomaly mitigation; for example, given the detection of a data loss event, a telescope operator may need to reestablish a network connection.

Fundamentally, all anomaly detection approaches rely on learning representations of normal data and then measuring some difference between the learnt representations of normal and anomalous data (Chandola et al. 2009). Recent developments in machine-learning leverage pre-trained networks by fine-tuning them on specific classes of anomaly detection datasets (Roth et al. 2021; Reiss & Hoshen 2021; Tack et al. 2020). However, we show that it is not possible to directly apply these pretrained networks to astronomical data due to large differences compared to the natural images used for pretraining since these spectrograms are in the time-frequency domain. This being said, efforts have been made in pretraining paradigms for astronomical data (Hayat et al. 2021; Walmsley et al. 2022); however, similarly to anomaly detection applied to astronomy, these methods are implemented with imaged galaxy data and not the dynamic spectra necessary for SHM. For this reason, we propose a new self-supervised learning paradigm that combines context prediction and reconstruction error (Doersch et al. 2015) as a learning objective and show that it is effective in learning robust representations of non-anomalous time-frequency data.

With this work, we make the following contributions: (1) a new dataset consisting of 6708 manually labelled autocorrelation-based spectrograms consisting of ten different feature classes; (2) a generic self-supervised learning (SSL) framework that is effective in learning representations of time-frequency data with a high-dynamic range; (3) a generic anomaly detection framework that can classify both commonly occurring known anomalies and detect unknown anomalies with a high precision; and (4) real-time performance for LOFAR with our implementation. This paper continues with an analysis of existing literature concerning anomaly detection in astronomy in Sect. 2, and Sect. 3 documents our data selection strategy and outline the labelling process used for evaluation of this work. In Sect. 4, we show the proposed SSL and anomaly detection frameworks. Finally, our results and conclusions are documented in Sects. 5 and 6.

2 Related work

Recent works that apply machine-learning-based anomaly detection to astronomy have so far focused only on scientific discovery, using galaxy images, transient signals or light curves. In this work, we apply machine learning-based anomaly detection to autocorrelation-based spectrograms obtained from the LOFAR telescope. This section unpacks the current landscape of machine-learning-based anomaly detection and the recent developments in applying it to astronomy-related fields.

2.1 Machine-learning-based anomaly detection

Machine-learning-based anomaly detection relies on modelling normal data and then classifying abnormality by using a discriminative distance measure between the normal training data and anomalous samples (Chandola et al. 2009). Autoencoding models are a popular approach for learning latent distributions of normal data (Bergmann et al. 2019a,b; Pidhorskyi et al. 2018; An & Sungzoon 2015). Anomaly detection using autoencoders can be performed either in the latent space using techniques such as one-class support vector machines (OC-SVM; Schölkopf et al. 1999), k-nearest-neighbours (KNN; Bergman et al. 2020), isolation forest (IF; Liu et al. 2008), or the reconstruction error (Mesarcik et al. 2022b). The use of pretrained networks to obtain latent representations of normal data have also been successful in anomaly detection (Bergman et al. 2020; Reiss & Hoshen 2021; Roth et al. 2021). By first training these models on an objective such as ImageNet classification (Fei-Fei et al. 2010), they are able to generalise to other tasks such as anomaly detection. Additionally, self-supervised learning (SSL) has been shown to be invaluable for finding meaningful representations of normal data (Yi & Yoon 2021; Li et al. 2021; Tack et al. 2020). Here, pretext tasks, which allow the model to learn useful feature representations or model weights that can then be used for other (downstream) tasks, are defined as learning objectives on the normal data such that the model can be fine-tuned for the downstream task of anomaly detection. In both the SSL and pretrained cases, KNN-based measures can be used to distinguish anomalous samples from the normal training data (Bergman et al. 2020; Yi & Yoon 2021).

In most machine learning-based anomaly detection, performance is evaluated according to the single-inlier-multiple-outlier (SIMO) or multiple-inlier-single-outlier (MISO; Burlina et al. 2019) settings on natural image datasets such as MVTecAD-(Bergmann et al. 2019a). With this paradigm in mind, we find that anomaly detection in the radio astronomical context is a multiple-inlier-multiple-outlier (MIMO) problem. In effect, anomaly detection formulations that make a strong assumption about the number of inliers or outliers are not directly applicable to the radio observatory setting due to the increased problem complexity. Furthermore, we find methods that rely on pretraining with natural images to be ill-suited to the spectrograms used in this work, due to differences in dynamic range and S/N, as shown by Mesarcik et al. (2022a).

Efforts have been made to detect anomalies in light curves and spectra in works such as Astronomaly (Lochner & Bassett 2021) and in transients in Malanchev et al. (2021). Astronomaly is an active learning framework developed for the classification of unusual events in imaged data or light curves at observatories to aid scientific discovery. This being said, it closely follows generic anomaly detection methods, where normal data are first projected to a latent representation and metrics such as IF are used to distinguish normal training samples from anomalous testing samples at inference time. Although Astronomaly assumes a MIMO context, it is still only able to detect unknown anomalies (or at least says all anomalies belong to the same class). This is in contrast with our work, where ROAD is capable of both distinguishing all known anomaly classes with a high level of precision and detecting unknown or rare anomalies.

Deep generative neural networks are also used for anomaly detection. Works by Villar et al. (2021), Mesarcik et al. (2020), and Ma et al. (2023) have shown that the Variational Autoencoders (VAEs) can be used for anomaly detection with astronomical data. Whereas Margalef-Bentabol et al. (2020), and Storey-Fisher et al. (2021) show that Generative Adversarial Networks (GANs) are effective in learning representations of normal images of galaxies thereby enabling reconstruction-error-based anomaly detection. In work by Zhang et al. (2018), GANs have also been shown to be effective in the Search for Extraterrestrial Intelligence (SETI) anomaly detection context. However, we find that our SSL method is more stable during training and better suited to anomaly detection using time-frequency data that have a high-dynamic range, ∈ [1, 100] dB, and a low S/N for cross-polarised features in the ‘xy’ and ‘yx’ stokes parameters such as the Galactic plane.

2.2 Representation learning in astronomy

As already mentioned, learning representations of high-dimensional data is essential to the anomaly detection problem. For this reason, among many others, tremendous effort has been made to find methods that learn robust projections of high-dimensional data (He et al. 2022; Chen et al. 2020; Grill et al. 2020; Doersch et al. 2015). These successes have materialised in the astronomical community with results mostly in the galaxy classification domain. Walmsley et al. (2022) showed that pretraining on the Galaxy Zoo DECaLS (Walmsley et al. 2021) dramatically improves model performance for several downstream tasks. Furthermore, Hayat et al. (2021) showed how contrastive learning can be applied to galaxy photometry from the Sloan Digital Sky Survey (SDSS; Gunn et al. 1998). The authors show that with novel data augmentations, they can achieve state of the art results on several downstream tasks. Furthermore, several additions and modifications have been made to the reconstruction-error-based loss functions of autoencoders. Mesarcik et al. (2020) showed how using both magnitude and phase information in VAEs improves performance of finding representations of astronomical data, whereas Villar et al. (2021) used a recurrent adaption of a VAE to make training more suitable to light-curve data. Similarly, Melchior et al. (2023) showed how the inclusion of self-attention mechanisms and redshiftpriors into the latent projection of autoencoders can improve the learnt representations of galaxy spectra.

In this work, we demonstrate that by using a simple adaption of a context-prediction self-supervised loss (Doersch et al. 2015) we effectively learn robust representations of spectrograms from the LOFAR telescope. Our Radio Observatory Anomaly Detector (ROAD) outperforms existing autoencoding models by a large margin on anomaly detection benchmarks.

2.3 Real-time scientific data processing

To cope with the increasing data rates from modern scientific instruments (Norris 2010; van Haarlem et al. 2013; Nan et al. 2011; La Plante et al. 2021), real-time algorithms have been developed for scientific data pipelines. Real-time methods for RFI detection (Sclocco et al. 2016, 2019; Morello et al. 2021), calibration (Prasad et al. 2014), fast radio burst (FRB) detection (Connor & van Leeuwen 2018), and correlation (van Nieuwpoort & Romein 2011; Romein et al. 2010) have been essential to modern radio telescope operations. However, very few machine learning techniques have been shown to be effective in real time. In a seminal work by George & Huerta (2018), machine learning gravitational wave detection algorithms were implemented in real time. Furthermore, Muthukrishna et al. (2022) showed that temporal convolutional networks (TCNs) can be implemented to detect transient anomalies in real time. To demonstrate the effectiveness of our work in the context of radio observatories, we investigated the computation performance and throughput of the proposed system. We show that our system is in real time in the context of the LOFAR telescope data processing pipeline.

3 Dataset

We created a new dataset for anomaly detection in radio observatories and document the data selection, preprocessing and labelling strategy used in this section. Applying machine learning to radio astronomical datasets poses a significant challenge, particularly when using time-frequency data. Methods for preprocessing and data selection need to be carefully considered, due to issues such as high-dynamic range (due to RFI among other events), combining thousands of stations for a single observation with complex-valued data and multiple polarisations, feature compounding, and many more. An additional challenge with applying machine learning to radio astronomy is the lack of labelled time-frequency datasets from radio telescopes as well as the availability of expert knowledge and the cost associated with creating a dataset.

3.1 Observation selection and preprocessing

The ROAD dataset is made up of observations from the Low Frequency Array (LOFAR) telescope (van Haarlem et al. 2013). LOFAR is comprised of 52 stations across Europe, where each station is an array of 96 dual polarisation low-band antennas (LBA) in the 10–90 MHz range and 48 or 96 dual polarisation high-band antenna antennas (HBA) in the 110–250 MHz range. The signals received by each antenna are coherently added in the station beamformer, resulting in each sub-band being approximately 200 kHz wide. These signals are then transported to the central processor to be correlated with a minimum channel width of about 0.7 kHz. This data product is referred to as a visibility and is the data representation used in this work. In contrast, other radio astronomical use-cases where machine-learning-based anomaly detection has been applied (such as detecting unusual galaxy morphologies) use an additional calibration step as well as a 2D Fourier transform and gridding to obtain sky maps.

The visibility data are four dimensional, with the dimensions corresponding to time, frequency, polarisation, and station. Different science cases result in different observing setups, which dictate the array configuration (i.e. the number of stations used), the number of frequency channels (N_f), the time sampling, as well as the overall integration time (N_t) of the observing session. Furthermore, the dual-polarisation of the antennas results in a correlation product (N_pol) of size 4. In this work, we only made use of the autocorrelations produced by LOFAR. We did this to minimise the labelling overhead and data size, as well as to simplify the potential feature compounding problem (Mesarcik et al. 2020).

As already mentioned, the required resolution of modern instruments cause the data products to be relatively large. The data size of an observation that consists only of autocorrelations is given by N_auto = N_tN_fN_stN_polN_bits, where N_st is number of stations. This means that a 10-h observation with a 1-s integration time, a 1 kHz channel resolution with a 50 MHz bandwidth, and a 32-bit resolution can result in observations sizes of the order of several terabytes. As this is orders of magnitude larger than the amount of memory available on modern GPUs that are used for training machine-learning algorithms, the data is sub-sampled in time and frequency according to Mesarcik et al. (2020) to result in observations of the order of one gigabyte.

Deep learning architectures typically require equally sized inputs; however, LOFAR observations can have a varying number of time samples and/or frequency bands. Therefore, additional resizing of the intermediate visibilities is done by resizing all observations to (256, 256) bins in time and frequency. This means that observations with fewer than 256 time samples are interpolated and those with more are down-sampled. Furthermore, as the autocorrelations contain no phase information, we only used the magnitude component of each spectrogram.

It must be noted that this processing does modify the morphologies of certain features, particularly those present with a low time resolution. However, as this preprocessing step is consistent across all spectrograms, the overall effects on the anomaly detector and classifier are negligible. In future work, we plan to associate the labels with the full resolution LOFAR data from the Long Term Archive (LTA)¹ and apply it to (256, 256) crops of the full resolution spectrograms.

We selected 110 observations from the LOFAR LTA comprised of a broad set of science use cases and the corresponding observing setups. Of the selected observations, we used the autocorrelations from 2431 LBA stations and 4277 HBA stations from an observation period between 2019 and 2022.

Table 1

Categorisation of data processing, electronic, astronomical, and environmental anomalies in the ROAD dataset.

3.2 Labelling methodology

The ROAD dataset contains ten classes that describe various system-wide phenomena and anomalies from data obtained by the LOFAR telescope. These classes are categorised into four groups: data processing system failures, electronic anomalies, environmental effects, and unwanted astronomical events. Table 1 shows the classes used as well as the description of events, their band and polarisation in which they occur. We note that the term ‘anomaly’ is used liberally in this context, while low power effects (that are only present in the cross polarisations) such as the Galactic plane passing through an observation are somewhat unavoidable. Nonetheless, for observations with extremely low S/N, such as The Epoch of Re-ionisation of the Universe (EoR; Yatawatta et al. 2013), the Galactic foreground signals need to be identified and removed. For this reason, we include such events in the ROAD dataset. Furthermore, we do not consider classes that track the systematic corruptions caused by ionospheric disturbances. This is because the ROAD dataset was created using data from the period of the minimum of the past solar cycle. Thus, the statistics for corruption effects such as scintillation are poorly represented in high-band and low-band data (although low-band data tracks these events better due to the frequency dependence of the signals). In future work, we plan to extend the dataset to consist of classes relating to more ionospheric disturbances.

Our labelling approach took into consideration anomalies that occurred at both the station and observation levels. For example, events such as lightning storms and high-noise events can look fairly similar, especially in the down-sampled context. However, lightning storms are geographically bound to affect all stations in a certain region and therefore only occurring at the station level. Additionally, lightning is highly correlated across stations in time, with minimal delay between the recorded events in each station. On the other hand, high-noise events usually affect only a single antenna at a time with no time dependency between antennas and stations. By this logic, all stations bound to the same geographic location with broad-band highpower events across all polarisations that are correlated in time were considered to be corrupted by lightning storms, whereas individually affected stations were labelled as high-noise events.

We make a distinction between first- and second-order events; for example, the first-order data-loss event corresponds to dropped information from consecutive time samples and/or frequency bands, and second order is for a single time sample or frequency band. We find this a useful distinction as the root cause of these events is different. In the case of first-order data-loss events, the problem can be traced to the correlator pipeline, whereas the second-order events are most likely from conversion overflows due to strong RFI. Additionally, we note some overlap between class labels; for example, it is common for a high-power noise event to trigger instability in an amplifier causing it to oscillate. However, the precise point of transition is often hard to find to distinguish these events from each other.

We labelled the dataset using LOFAR observations that were down-sampled and preprocessed as described in Sect. 3.1. We made multiple train-test splits during our experimentation to ensure consistent performance across models. Furthermore, the ROAD dataset is publicly available². The file is in the hdf5 format and consists of fields corresponding to the raw data, labels, frequency band information, station name, and source observation. Figure 1 illustrates all classes labelled in the available dataset.

3.3 Class imbalance

Due to the nature of anomaly detection, the number of normal samples greatly outnumbers anomalous ones. In the case of the ROAD dataset, and the LOFAR telescope more generally, we find there is a class imbalance not only between normal and anomalous classes, but also among the anomalous classes. For example, commonly occurring astronomical signals, such as the Galactic plane, are far better represented in the observations than unlikely events such as the amplifiers oscillating. Practically, this means that when we separate the samples into testing and training sets we also need to maintain the same occurrence rates with respect to the rates in the original dataset. We effectively down-sampled the testing data such that the occurrence rate (shown in the second-to-last column of Table 1) is maintained for evaluation. This means that each model needs to be tested multiple times with new samples taken from the testing pool of anomalous samples to effectively evaluate its performance. We evaluated each model ten times with different random seeds to ensure accurate reporting.

3.4 Radio frequency interference considerations

As previously mentioned, we consider RFI to be unavoidable, and thus we deem it a normal class. A key problem with using the RFI masks associated with the spectrograms from LOFAR (as done in Mesarcik et al. 2022a) is that they are generated using AOFLagger. AOFLagger is a SumThreshold-based algorithm that indiscriminately flags all high-power events as interference. In the context of this work, such an approach would result in many of the high-power anomalies such as lightning, solar storms, high-noise events, and oscillating tiles being flagged as RFI. In effect, if we were to blank the RFI before training our models, we would likely remove many of these high-power features, thereby decreasing the efficacy of the model. This would in turn make our end goal of mitigating the anomalies more difficult to achieve, since different classes should lead to different actions by the telescope operators. For this reason, we did not use the RFI masks associated with the spectrograms.

4 The radio observatory anomaly detector

As outlined in preceding sections, ROAD is designed to detect previously unseen system behaviours and classify known anomalies observed by the LOFAR telescope. To accommodate these requirements, we find it necessary to combine two approaches: supervised classification and self-supervised anomaly detection. This section outlines the motivations and design decisions made for the implementation of ROAD.

4.1 Problem formulation

Given the i th spectrogram V_i·(v, τ, b, p) from the dataset and model m with parameters θ_m, we would like to predict whether an anomaly is present and which class it belongs to, if it is a known event, such that, $m_{θ_{m}} (V_{i} (v, τ, b, p)) = {\begin{array}{l} 0, if normal \\ [1, N], if known anomaly, \\ N + 1, if unknown anomaly \end{array}$ ${m_{{\theta _m}}}\,\left( {{V_i}\left( {v,\tau ,b,p} \right)} \right) = \left\{ {\matrix{ {0,\,{\rm{if}}\,{\rm{normal}}} \hfill \cr {\left[ {1,N} \right],\,{\rm{if}}\,{\rm{known}}\,{\rm{anomaly,}}} \hfill \cr {N + 1,\,{\rm{if}}\,{\rm{unknown}}\,{\rm{anomaly}}} \hfill \cr } } \right.$ (1)

where v, τ, b, and p are the indexes corresponding to frequency band, time sample, baseline, and polarisation, respectively, and N is the number of known anomaly classes. Supervised approaches assume that each class is represented in the training set and try to minimise the following loss function: $ℒ_{sup} = min_{θ_{m}} \sum_{i} ℋ (m_{θ_{m}} (V_{i} (v, τ, b, p)), l_{i}),$ ${{\cal L}_{{\rm{sup}}}} = \mathop {{\rm{min}}}\limits_{{\theta _{\rm{m}}}} \,\sum\limits_i {{\cal H}\,\left( {{m_{{\theta _{\rm{m}}}}}\,\left( {{V_i}\left( {v,\tau ,b,p} \right)} \right),{l_i}} \right),}$ (2)

where ℋ is an entropy-based measure of similarity and l is the encoded vector of labels corresponding to the contents of V. During inference, the supervised classifier produces an estimate of which classes are most probable in a given spectrogram, and the ‘argmax’ function selects the most likely classification. However, as illustrated by the results in Sect. 5, the performance of such a supervised classifier severely deteriorates when exposed to unseen or out of distribution (OOD) classes during testing. To remedy this, we disentangled the two model objectives; namely, we used a supervised classifier to identify the known classes present in the training set and a self-supervised anomaly detector to classify unseen anomalies.

4.2 Self-supervised representation learning

Self-supervised-learning (SSL) methods learn useful feature representations by training on secondary objectives called ‘pretext tasks’, so that once trained, the model weights can be utilised for downstream applications. We define two pretext tasks that allow the model to learn useful representations for anomaly detection in astronomical data: context prediction and reconstruction error. Context prediction is a pretext task that makes a model classify the positional relationship between two patches taken from the same image. The two patches are projected to some latent representations, z₀ and z₁, using a backbone network, f, while keeping track of their position label, c, on a 3 × 3 grid as proposed by Doersch et al. (2015). Then, using g, a two-layer multi-layer perceptron (MLP), we classify the positional relationship from the latent representations, as given by $ℒ_{con} = \sum_{i} \sum_{j} ℋ (g (z_{i, j, 0}, z_{i, j, 1}), c_{j}),$ ${{\cal L}_{{\rm{con}}}} = \sum\limits_i {\sum\limits_j {\,{\cal H}\,\left( {g\left( {{z_{i,j,0}},\,{z_{i,j,1}}} \right),\,{c_j}} \right)} ,}$ (3)

where i corresponds the index of each spectrogram, j is the index of each context-pr edition pair in a single spectrogram, and c_j is the positional label. Additionally, to ensure the model does not learn positional relationships based purely on the bordering values of each patch, we augment each neighbour in the training process. In the implementation, we randomly crop the patches between 100% and 75% of their original size followed by resizing them to their original dimensions. We illustrated the context prediction loss and patch selection in Fig. 2.

Furthermore, to enforce consistency across the representations of similar-looking patches, we use reconstruction error. Reconstruction error maintains consistency by ensuring that two patches with common features in visibility space should occupy nearby locations in the latent space and therefore should be reconstructed similarly. The reconstruction loss is given by $ℒ_{recon} = \sum_{i} \sum_{j} | V_{i, j, 0} - d (z_{i, j, 0}) | + | V_{i, j, 1} - d (z_{i, j, 1}) |,$ ${{\cal L}_{{\rm{recon}}}} = \sum\limits_i {\sum\limits_j {\left| {{V_{i,j,0}} - d\left( {{z_{i,j,0}}} \right)} \right|\, + \,\,\left| {{V_{i,j,1}} - d\left( {{z_{i,j,1}}} \right)} \right|} ,}$ (4)

where d is a de-convolutional decoder that should have significantly fewer parameters than the backbone network ƒ. We do this to ensure that the model has more capacity to learn suitable representations instead of prioritising reconstruction. For completeness, we represent the full SSL learning objective as $ℒ_{SSL} = λ ℒ_{con} + (1 - λ) ℒ_{recon} + λ_{reg} \sum_{i} \sum_{j} (z_{i, j, 0}^{2} + z_{i, j, 1}^{2}),$ ${{\cal L}_{{\rm{SSL}}}} = \lambda {{\cal L}_{{\rm{con}}}} + \left( {1 - \lambda } \right){{\cal L}_{{\rm{recon}}}} + {\lambda _{{\rm{reg}}}}\,\sum\limits_i {\sum\limits_j {\left( {z_{i,j,0}^2 + z_{i,j,1}^2} \right),} }$ (5)

where λ is a hyper-parameter that changes the influence of each component of the loss. Additionally, we use regularisation in the form of minimising the square size of the latent projections z. Regularisation is used in order to enforce the most compact representations in z. We experimentally select λ = 0.5 and λ_reg = 1 × 10⁻⁶ and illustrate λ’s impact in Sect. 5.

Fig. 1

Illustration of 10 examples from the ROAD dataset.

Fig. 2

Illustration of self-supervised training procedure used in ROAD; we used random cropping for augmentation.

4.3 Distinguishing normal from anomalous samples

Although we have described a method for learning representations of normal data, the model is incapable of accurately distinguishing between normal and anomalous samples. Several options exist for anomaly detection when utilising the learnt representations of normal training data methods. The simplest involves measuring the distance between a given sample and the normal training data (Bergman et al. 2020) using a K-nearest-neighbour (KNN) lookup. This assumes that larger distances correspond to more anomalous samples. However, as we already made use of some of the labelled data for the supervised classifier, we find it beneficial to fine-tune a shallow MLP on top of SSL representations to perform anomaly detection. As the SSL backbone learns representations on the patch-level and ROAD dataset labels are on the spectrogram-level, we first need to concatenate the latent representations of each patch to return to the correct dimensionality before training the MLR Notably, we propagate the gradients during the fine-tuning step through both the MLP and the backbone network, f, such that the distance between normal and anomalous representations at the spectrogram level are consolidated. We show, in Sect. 5, that fine-tuning dramatically outperforms random initialisation and KNN-based anomaly detection. Furthermore, we find that using the fine-tuning approach dramatically improves the time-complexity of the system.

Additionally, we need to determine how to threshold the anomaly scores produced by either the fine tuned models or the KNN-distance-based approach. Here, we utilise the threshold from the area-under precision recall curve (AUPRC) which results in the maximum F-β score. A discussion on the evaluation metrics used can be found in Sect. 5, and the results pertaining to a change of this threshold can be found in Fig. 9.

4.4 Combining classification with anomaly detection

The final consideration when constructing ROAD is how to effectively combine the fully supervised classifier y_sup ∈ [0, N] and the fine tuned anomaly detector y_ssl ∈ [0,1]. Simply put, we consider normal predictions from the detector more likely to be correct, and if there is a disagreement between the two models then we flag the sample as an unknown class of anomalies that the classifier may have not seen. The overall method is shown in Fig. 3 and is summarised by $y = {\begin{array}{l} 0, if y_{ssl} = 0 \\ y_{sup}, if y_{ssl} = 1 and y_{sup} \neq 0. \\ N + 1, if y_{ssl} = 1 and y_{sup} = 0 \end{array}$ $y = \left\{ {\matrix{ {0,\,{\rm{if}}\,{y_{{\rm{ssl}}}} = 0} \hfill \cr {{y_{{\rm{sup}}}},\,{\rm{if}}\,{y_{{\rm{ssl}}}} = 1\,{\rm{and}}\,{y_{{\rm{sup}}}} \ne 0.} \hfill \cr {N + 1,\,{\rm{if}}\,{y_{{\rm{ssl}}}} = 1\,{\rm{and}}\,{y_{{\rm{sup}}}} = 0} \hfill \cr } } \right.$ (6)

We validate this approach in Sect. 5 by showing that it is optimal when assuming that normality is better defined by the SSL output.

5 Experiments

We evaluate the performance of ROAD using the dataset described in Sect. 3. The evaluation considers both the computation and model performance using both the binary anomaly detection as well as the multi-class classification results. In all cases, we use the F-β score to evaluate the model performance. The F-β score is the harmonic mean between precision and recall, in the context of this work, precision is the anomaly detection performance that is sensitive to the number of false positives and recall is the detection performance relative to the number of false negatives. Moreover, in the context of telescope operations it is necessary to minimise the number of false negatives. In other words, it is more acceptable to classify some normal samples as anomalous than classifying anomalous samples as normal. Following this logic and work by Kerrigan et al. (2019), we consider β = 2 to be the most appropriate as it weighs recall more heavily than precision. For all evaluations we use the threshold from the area under precision recall curve (AUPRC), which maximises the F-2 score.

We do not quantify the benefits of ROAD with regard to imaging. The purpose of ROAD is to provide an efficient preview of large interferometric data products to telescope operators, thereby informing scientists of how best to post-process the data in the presence of instrumental and environmental anomalies. It is anticipated that leveraging the outputs of the model would facilitate the elimination of samples containing anomalies, thus enhancing the overall image fidelity. However, for the sake of brevity and focus of this paper, we leave the actual quantification of the improvements to the imaging to future works.

Fig. 3

Illustration of inference pipeline of ROAD; we combine both supervised and self-supervised learning to effectively detect radio-observatory-based anomalies.

5.1 Model parameters and training

To validate our approach, we experimented with several modern machine-learning architectures of various model sizes. In all cases, we used the same backbone architecture for both the supervised classifier and the SSL models; furthermore, we utilised the same two-layer MLP for position classification. Additionally, the decoder used for the SSL-reconstruction loss is a five-layer architecture with strided de-convolution and batch normalisation.

For every experiment, each model is trained three times while randomising input seeds on each run. As already mentioned in Sect. 3.3, the low occurrence rates of some anomalous features mean we need to sub-sample the anomalous classes in the test data to ensure comparable occurrences relative to normal LOFAR telescope operations. This means we run ten separate evaluation loops for the sub-sampled test data. The results shown in this section reflect the mean and standard deviations from 30 runs of each model. The SSL and the supervised models are trained for 100 epochs, while fine-tuning using the two-layer MLP is done for only 20 epochs to prevent over-fitting. We use a batch size, patch size, and latent dimensionality of 64 across all experiments, utilising the Adam optimiser with a learning rate of 1 × 10⁻³ to maintain consistency. In all cases, we use the official pytorch-based implementations of the various backbones, with the exception of ViT, for which we utilise an open-source implementation. The code, experiments, and model weights are available online³.

Furthermore, to ensure no vanishing or exploding gradients while training, we clip each autocorrelation to the 1st and 99th percentiles and take its natural log. Additionally, we normalise each magnitude-based autocorrelation between 0 and 1.

Fig. 4

Per-class mean F-2 score-based performance of each model shown in Table 2.

5.2 Anomaly detection and classification

To maximise the model performance relative to the problem specification shown in Eq. (1), we find the best mean performance of several different backbones. These are different sizes of ResNet (He et al. 2016), ConvNeXt (Liu et al. 2022), and ViT (Dosovitskiy et al. 2021). Notably, our method is agnostic to backbone and could easily be extended to include architectures or model sizes. In Table 2, we present the per-class results after applying the combination of the supervised classifier and the fine tuned anomaly detector specified by Eq. (6). Furthermore, we plot the mean performance of each model in Fig. 4 to facilitate comparison. We note that all evaluated anomaly detection models utilised fine-tuning to ensure they had been exposed to the same amount of data. Additionally, ROAD-KNN utilises a KNN lookup to determine the distances in the latent space rather than using the MLP prediction.

We find that the ResNet34 exhibits the overall best average performance on the classification task, giving an average increase in F-2 score of 1% relative to the purely supervised model. We note that the performance of ROAD is directly dependant on the supervised performance. We show that the SSL pre-training is highly influential to the overall model performance as it gives a <5% increase over the randomly initialised (random init) model without pre-training. Furthermore, we find that our SSL-based approach outperforms the variational autoencoder-based model with fine-tuning (VAE) by <5%, as well as being <3% better than KNN-based anomaly detectors (ROAD-KNN). Finally, we show that using pre-trained weights from ImageNet classification with fine-tuning (ImageNet) results in a 2% decrease in performance relative to our SSL pre-training paradigm.

Across all experiments, it is clear that the high-noise element and oscillating tile classes have the highest standard deviation. We attribute this to the small number of examples present in both the testing and training set after adjusting for occurrence rates. In addition to this, the features represented in these classes can vary significantly from sample to sample and band to band.

To simulate a real-world setting where many unknown anomalies can be present in a given observation, we remove several classes from the training set and test models’ performance on the original test set. We refer to these classes removed as out-of-distribution (OOD). The objective of this experiment is to see how well the model will react to OOD anomalies and whether it can correctly classify them as anomalous. To effectively simulate this scenario, we randomly remove between one and seven classes and do this ten times while training a model for each removal step. Figure 5 shows the average model performance from the ten runs for both the supervised classifier as well as the fine tuned SSL anomaly detector when removing a number of classes from the training set. Here, it is clear that the supervised model suffers much more from the OOD effects than the SSL-pre-trained one, exhibiting a performance drop of between 5% and 18%, thereby illustrating the benefit of using ROAD when both a classifier and detector are in the loop.

We illustrate the t-distributed stochastic neighbour embedding (t-SNE) projections of the latent dimensions from each model in Fig. 7 to gain an intuition about the model performance. The same random seed and perplexity parameters are used for all plots shown; here, the perplexity estimates the number of neighbours each point should have (for more information, see Wattenberg et al. 2016). In the leftmost plot the non-fine tuned SSL model is shown; we can see that both normal and anomalous classes are grouped closely together, with the exception of clusters pertaining to ‘first-order data loss’, ‘ionospheric RFI reflections’, and ‘solar storms’. Furthermore, we find the normal data are distributed across two clusters, these being LBA and HBA features. It is interesting that even with no explicit training signals the SSL model without fine-tuning is still capable of distinguishing a variety of classes and phenomena. The middle plot shows the effects of fine-tuning on the SSL representations. The fine tuned SSL model is significantly better at distinguishing normal from anomalous samples, with the LBA/HBA separation in the normal samples completely disappearing. Furthermore, the clusters corresponding to features that were once well separated, such as ‘solar storm’, are now better grouped with the anomalous samples. Finally, in the rightmost plot we can see the learnt-supervised representations of the test data. Here, it is clear that the supervised model is the most capable of separating both anomalous and normal classes alike. It must be noted, however, that the classes relating to ‘Galactic plane’, ‘source in the side-lobes’, and ‘normal’ are overlapping. Therefore, by combining the boundary related to the SSL fine tuned embedding with the specificity of the supervised model, we are able to better detect anomalies.

An interesting consequence of the class imbalance and the few number of samples certain events such as ‘oscillating tile’ is that ROAD benefits from fewer backbone parameters and does not scale with model size, as it over-fits to the training data. This is illustrated in Fig. 6, where it is also shown that ResNets offer the best performance. This being said, we expect that with more samples from the infrequent classes the model performance should scale proportionally with its number of parameters. This is further validated by Fig. 8, where we plot the model performance relative to the amount of training data. Here, it is clear that the model performance scales linearly with training data-size. Furthermore, the fine tuned model outperforms its purely supervised counterpart for all training set sizes.

Table 2

F2-score classification performance on the ROAD dataset.

Fig. 5

One-class anomaly detection performance for a purely supervised model and the fine tuned SSL anomaly detector when removing a number of classes from the training set. The ResNet34 backbone is used for both training paradigms.

Fig. 6

One-class anomaly detection performance after fine-tuning of various backbone networks when varying the number of available parameters.

Fig. 7

t-SNE projections of test data from ROAD dataset using the representation from the final layer of the SSL-pre-trained ResNet-34 with and without fine-tuning as well as the supervised classifier.

Fig. 8

Binary anomaly detection performance when changing the amount of supervision used to train a ResNet-34 backbone for each training paradigm.

5.3 Model ablations

To validate the correctness of the SSL-model training objective, we perform several ablations. In Table 3, we show the effect of only using the reconstruction term, ℒ_recon, or only the context prediction term, ℒ_eon, or using the combined loss ℒ_recon + ℒ_con. We show that the combination of the two terms improves both the anomaly detection and the average classification performances by 2%, which at the scale of the LOFAR science data processing pipeline results in a significant improvement.

Furthermore, in order to determine the relative contribution of each of the losses to the overall performance of ROAD we modify the λ hyper-parameter and measure the overall model performance. Figure 10 shows that with 0.3 ≤ λ ≤ 0.7 the SSL anomaly detection obtains optimal performance.

In addition to the loss-function-based ablations we also consider the effect of changing the combination function used between the supervised and SSL model shown in Eq. (6). These results are shown in Fig. 9, where we vary both the anomaly detection threshold set by the maximum F-β score as well as the combination function. In the plot, ‘combination function #ľ uses the definition expressed in Eq. (6), where the anomaly detector defines both normality and the unknown anomaly events. We define ‘combination function #2’ as $y = {\begin{array}{l} N + 1, if y_{ssl} = 1 and y_{sup} = 0 \\ y_{sup}, otherwise \end{array},$ $y = \left\{ {\matrix{ {N + 1,\,{\rm{if}}\,{y_{{\rm{ssl}}}} = 1\,{\rm{and}}\,{y_{{\rm{sup}}}} = 0} \hfill \cr {{y_{{\rm{sup}}}},\,{\rm{otherwise}}} \hfill \cr } ,} \right.$ (7)

such that y_ssl is only used to define unknown anomalous events. In the leftmost plot, we can see that combination function #1 consistently offers the best precision level, yet this is at the cost of a marginally decreasing the recall (<0.4%). The effect of this is that combination function #1 results in optimal F-2 score performance when the β is greater than 1. Futhermore, we evaluate the false positive rate using combination function #1 and find that it results in a false-positive rate of approximately 2%.

Table 3

Model performance (F2 score) after fine-tuning when varying the SSL loss function for a ResNet34 backbone.

5.4 Computation performance analysis

We evaluate the computational performance of ROAD during inference on a Nvidia AIO GPU using CUDA 11.7 and using driver release 515.65.0l. The KNN-based experiments utilise the GPU-based implementation of FAISS⁴. We use a batch size of 1024, with a patch size and a latent dimensionality of 64. Furthermore, for the case of the KNN search we assume 1000 normal training samples to populate the search space. In all cases we use bfloatl6 representations of the input data so as to ensure the tensor-cores are fully utilised. With these results, we performed 1000 forward passes and measured the resulting latency, throughput in spectrograms per second, and peak memory allocation.

The computation performance of the respective models can be seen in Table 4, where it is clear that the supervised model has the lowest computational overhead. We relate the difference performance between the supervised and SSL model to the dimensionality of the models’ inputs and required concatenation of the patches on each forward pass. As the SSL operates on the patch level, there are substantially fewer convolution operations that need to be applied (approximately 16), resulting in decreased peak memory performance. ROAD consists of both the supervised and SSL models, and as such the overall performance is given by the addition of the respective values, such that it takes less than 1 ms to predict the normality of a given spectrogram. This is more than 1000× faster than the existing correlator implementations on the IBM Blue Gene/P supercomputer (Romein et al. 2010). Notably, however, the KNN-based model performs significantly worse, suggesting that density-based KNN anomaly detectors are less suitable for real-time applications at observatories.

Fig. 9

Mean classification performance of the ResNet-34 backbone after fine-tuning when changing the threshold used for anomaly detection as well as the combination function. Combinations #1 and #2 correspond to Eqs. (6) and (7), respectively.

Fig. 10

Mean anomaly detection F2-score performance of the ResNet34 when varying the relative contribution of ℒ_recon and ℒ_con. When λ is minimum, the loss favours ℒ_recon, while a for greater losses ℒ_con dominates.

Table 4

Computational performance of anomaly detectors.

6 Conclusions and future work

In this paper, we present the first real-time anomaly detector for system-wide anomalies in spectrographic data from radio telescopes. We produced a freely available dataset that contains 7050 autocorrelation-based spectrograms from the LOFAR telescope with labels relating to both commonly occurring anomalies and rare events. This work provides a formulation of anomaly detection in the SHM context of telescope operations and illustrates how purely supervised models are ill-suited to the problem. Furthermore, we propose a new SSL paradigm for learning normal representations of spectrographic data. We combine both the SSL and supervised models and demonstrate how it remedies the shortcomings of supervised methods. We demonstrated that even with limited examples of anomalous data, our fine tuned SSL model can significantly outperform its supervised counterpart. The ROAD and dataset are the first major effort to address the system health management problem in radio telescopes and its potential benefit to all radio observatories is very promising.

We expect through providing open source access to both our models and dataset, the continued efforts of the wider community will increase the amount of training data from scarce events, thereby enabling other training paradigms such as contrastive learning with larger models that are currently unsuited to the highly unbalanced problem. Furthermore, we identify several directions for future work in the area of radio observatory anomaly detection, namely using the cross-correlations to enhance training by using radio interferometer-specific losses. Another interesting direction would be to use Bayesian deep learning to give uncertainty estimates from the classifier such that samples with low confidence would rely on the detector output. Finally, we would like to propagate the labels from the down-sampled data to the full-resolution data from LOFAR Long Term Archive, such that the performance could be better evaluated in the context of the full LOFAR data-processing pipeline.

In future work, we would like to see ROAD tested with data from different radio telescopes. We expect that instruments with roughly the same operating bands and time resolution would be good candidates. In previous work (Mesarcik et al. 2022a), we show that unsupervised machine-learning-based methods for RFI detection are directly transferable between the simulated data from the HERA telescope and real data from LOFAR. One potential problem is that there may be a domain shift between the ROAD dataset and data produced by another instrument. This could be addressed by labelling a few examples of anomalies in other instruments’ spectrograms and fine-tuning the ROAD model using the supplied weights and the new small dataset. In this manner, the overhead of extensive labelling would be avoided. However, in principle, ROAD can be applied to any radio telescope provided that a new labelled dataset is produced for the specific instrument. We expect that the anomaly categorisation used for the ROAD dataset is generic enough to be directly transferred to other instruments. However, we note that features such as ’oscillating tile’ are LOFAR-specific.

Furthermore, we propose investigating how best to integrate RFI detection and self-supervised anomaly detection for radio telescopes. Foundation models from Bommasani et al. (2021) offer a promising future. Here, a single self-supervised model could be trained on the normal data and then fine-tuned on both RFI-segmentation and anomaly detection tasks. In this manner, a model would be able to learn both representations of anomalous samples as well as RFI-contaminated data, which may improve model performance, generalisability, and false positive rates. We would thus avoid the problem of potentially classifying RFI as anomalies and vice-versa.

Acknowledgements

This work is part of the “Perspectief” research programme “Efficient Deep Learning” (EDL, https://efficientdeeplearning.nl), which is financed by the Dutch Research Council (NWO) domain Applied and Engineering Sciences (TTW). The research makes use of radio astronomy data from the LOFAR telescope, which is operated by ASTRON (Netherlands Institute for Radio Astronomy), an institute belonging to the Netherlands Foundation for Scientific Research (NWO-I).

References

An, J., & Cho, S. 2015, Variational autoencoder based anomaly detection using reconstruction probability. Special lecture on IE, 2, 1 [Google Scholar]
Baireddy, S., Desai, S. R., Mathieson, J. L., et al. 2021, in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (IEEE), 1951 [CrossRef] [Google Scholar]
Barsdell, B. R., Bailes, M., Barnes, D. G., & Fluke, C. J. 2012, MNRAS, 422, 379 [CrossRef] [Google Scholar]
Bassa, C. G., Romein, J. W., Veenboer, B., van der Vlugt, S., & Wijnholds, S. J. 2022, A & A, 657, A46 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Bergmann, P., Fauser, M., Sattlegger, D., & Steger, C. 2019a, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9584 [Google Scholar]
Bergmann, P., Löwe, S., Fauser, M., Sattlegger, D., & Steger, C. 2019b, VISI-GRAPP 2019 – Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, 5, 372 [Google Scholar]
Bergman, L., Cohen, N., & Hoshen, Y. 2020, arXiv e-prints [arXiv:2002.10445] [Google Scholar]
Bommasani, R., Hudson, D. A., Adeli, E., et al. 2021, arXiv e-prints [arXiv:2108.07258] [Google Scholar]
Broekema, P. C., Mol, J. J. D., Nijboer, R., et al. 2018, Astron. Comput., 23, 180 [NASA ADS] [CrossRef] [Google Scholar]
Burlina, P., Joshi, N., & Wang, I. J. 2019, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 11499 [Google Scholar]
Chandola, V., Banerjee, A., & Kumar, V. 2009, ACM Comput. Surv., 41, 1 [CrossRef] [Google Scholar]
Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. 2020, arXiv e-prints [arXiv:2002.05709] [Google Scholar]
Connor, L., & van Leeuwen, J. 2018, AJ, 156, 256 [NASA ADS] [CrossRef] [Google Scholar]
de Gasperin, F., Dijkema, T. J., Drabent, A., et al. 2019, A & A, 622, A5 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Doersch, C., Gupta, A., & Efros, A. A. 2015, in IEEE International Conference on Computer Vision (ICCV), 1422 [CrossRef] [Google Scholar]
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. 2021, in International Conference on Learning Representations [Google Scholar]
Fei-Fei, L., Deng, J., & Li, K. 2010, J. Vision, 9, 1037 [CrossRef] [Google Scholar]
Foley, A. R., Alberts, T., Armstrong, R. P., et al. 2016, MNRAS, 460, 1664 [NASA ADS] [CrossRef] [Google Scholar]
George, D., & Huerta, E. 2018, Phys. Rev. D, 97, 044039 [NASA ADS] [CrossRef] [Google Scholar]
Grill, J.-B., Strub, F., Altché, F., et al. 2020, in Advances in Neural Information Processing Systems [Google Scholar]
Gunn, J. E., Carr, M., Rockosi, C., et al. 1998, AJ, 116, 3040 [NASA ADS] [CrossRef] [Google Scholar]
Hayat, M. A., Stein, G., Harrington, P., Lukić, Z., & Mustafa, M. 2021, ApJ, 911, L33 [NASA ADS] [CrossRef] [Google Scholar]
He, K., Zhang, X., Ren, S., & Sun, J. 2016, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 770 [Google Scholar]
He, K., Chen, X., Xie, S., et al. 2022, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition [Google Scholar]
Kerrigan, J., Plante, P. L., Kohn, S., et al. 2019, MNRAS, 11, 1 [Google Scholar]
La Plante, P., Williams, P., Kolopanis, M., et al. 2021, Astron. Comput., 36, 100489 [NASA ADS] [CrossRef] [Google Scholar]
Li, C.-L., Sohn, K., Yoon, J., & Pfister, T. 2021, arXiv e-prints [arXiv: 2104.04015] [Google Scholar]
Liu, F. T., Ting, K. M., & Zhou, Z.-H. 2008, Eighth IEEE International Conference on Data Mining (Pisa, Italy), 413 [Google Scholar]
Liu, Z., Mao, H., Wu, C.-Y., et al. 2022, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) [Google Scholar]
Lochner, M., & Bassett, B. 2021, Astron. Comput., 36, 100481 [NASA ADS] [CrossRef] [Google Scholar]
Ma, P. X., Ng, C., Rizk, L., et al. 2023, Nat. Astron., 7, 492 [NASA ADS] [Google Scholar]
Margalef-Bentabol, B., Huertas-Company, M., Charnock, T., et al. 2020, MNRAS, 496, 2346 [Google Scholar]
Malanchev, K. L., Pruzhinskaya, M. V., Korolev, V. S., et al. 2021, MNRAS, 502, 5147 [Google Scholar]
Melchior, P., Liang, Y., Hahn, C., & Goulding, A. 2023, AJ, 166, 74 [NASA ADS] [CrossRef] [Google Scholar]
Mesarcik, M., Boonstra, A.-J., Meijer, C., et al. 2020, MNRAS, 496, 1517 [CrossRef] [Google Scholar]
Mesarcik, M., Boonstra, A.-J., Ranguelova, E., & van Nieuwpoort, R. V. 2022a, MNRAS, 516, 5367 [NASA ADS] [CrossRef] [Google Scholar]
Mesarcik, M., Ranguelova, E., Boonstra, A.-J., & van Nieuwpoort, R. V. 2022b, Array, 14, 100182 [CrossRef] [Google Scholar]
Morello, V., Rajwade, K. M., & Stappers, B. W. 2021, MNRAS, 510, 1393 [NASA ADS] [CrossRef] [Google Scholar]
Muthukrishna, D., Mandel, K. S., Lochner, M., Webb, S., & Narayan, G. 2022, MNRAS, 517, 393 [NASA ADS] [CrossRef] [Google Scholar]
Nan, R., Li, D., Jin, C., et al. 2011, Int. J. Mod. Phys. D, 20, 989 [Google Scholar]
Norris, R. P. 2010, Proceedings – 6th IEEE International Conference on e-Science Workshops, e-ScienceW 2010, 21 [Google Scholar]
Offringa, A. R., de Bruyn, A. G., Biehl, M., et al. 2010, MNRAS, 405, 155 [NASA ADS] [Google Scholar]
Pidhorskyi, S., Almohsen, R., Adjeroh, D. A., & Doretto, G. 2018, Adv. Neural Inform. Process. Syst., 6822 [Google Scholar]
Prasad, P., Wijnholds, S. J., Huizinga, F., & Wijers, R. A. M. J. 2014, A & A, 568, A48 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Reiss, T., & Hoshen, Y. 2021, arXiv e-prints [arXiv:2106.03844] [Google Scholar]
Romein, J. W., Broekema, P. C., Mol, J. D., & van Nieuwpoort, R. V. 2010, ACM SIGPLAN Notices, 45, 169 [CrossRef] [Google Scholar]
Roth, K., Pemula, L., Zepeda, J., et al. 2021, CVPR, accepted [arXiv:2106.08265] [Google Scholar]
Schölkopf, B., Williamson, R., Smola, A., Shawe-Taylor, J., & Platt, J. 1999, in Proceedings of the 12th International Conference on Neural Information Processing Systems, Denver, CO, 582 [Google Scholar]
Sclocco, A., van Leeuwen, J., Bal, H., & van Nieuwpoort, R. 2016, Astron. Comput., 14, 1 [NASA ADS] [CrossRef] [Google Scholar]
Sclocco, A., Vohl, D., & Van Nieuwpoort, R. V. 2019, RFI 2019 – Proceedings of 2019 Radio Frequency Interference: Coexisting with Radio Frequency Interference [Google Scholar]
Spirkovska, L., Iverson, D., Hall, D., et al. 2010, in Space Ops 2010 Conference (Reston, Virginia: American Institute of Aeronautics and Astronautics) [Google Scholar]
Storey-Fisher, K., Huertas-Company, M., Ramachandra, N., et al. 2021, MNRAS, 508, 2946 [NASA ADS] [CrossRef] [Google Scholar]
Tack, J., Mo, S., Jeong, J., & Shin, J. 2020, arXiv e-prints [arXiv:2007.08176] [Google Scholar]
Tasse, C., Hugo, B., Mirmont, M., et al. 2018, A & A, 611, A87 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
van Haarlem, M. P., Wise, M. W., Gunst, A. W., et al. 2013, A & A, 556, A2 [CrossRef] [EDP Sciences] [Google Scholar]
van Nieuwpoort, R. V., & Romein, J. W. 2011, Int. J. Parallel Programm., 39, 88 [CrossRef] [Google Scholar]
Villar, V. A., Cranmer, M., Berger, E., et al. 2021, ApJS, 255, 24 [NASA ADS] [CrossRef] [Google Scholar]
Vocks, C., Mann, G., Breitling, F., et al. 2018, A & A, 614, A54 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Walmsley, M., Lintott, C., Géron, T., et al. 2021, MNRAS, 509, 3966 [NASA ADS] [CrossRef] [Google Scholar]
Walmsley, M., Scaife, A. M. M., Lintott, C., et al. 2022, MNRAS, 513, 1581 [NASA ADS] [CrossRef] [Google Scholar]
Wattenberg, M., Viégas, F., & Johnson, I. 2016, Distill, 1, e2 [CrossRef] [Google Scholar]
Weeren, R. J. V., Williams, W. L., Hardcastle, M. J., et al. 2016, ApJS, 223, 2 [CrossRef] [Google Scholar]
Wijnholds, S. J., Van Der Tol, S., Nijboer, R., & Van Der Veen, A. J. 2010, IEEE Signal Process. Mag., 27, 30 [CrossRef] [Google Scholar]
Yatawatta, S., de Bruyn, A. G., Brentjens, M. A., et al. 2013, A & A, 550, A136 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Yi, J., & Yoon, S. 2021, in Lecture Notes in Computer Science, 375 [CrossRef] [Google Scholar]
Zhang, Y. G., Hyun Won, K., Son, S. W., Siemion, A., & Croft, S. 2018 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2018 – Proceedings, 1114 [Google Scholar]

¹

https://lta.lofar.eu/

²

https://zenodo.org/record/8028045

³

https://github.com/mesarcik/ROAD

⁴

https://github.com/facebookresearch/faiss

All Tables

Table 1

Categorisation of data processing, electronic, astronomical, and environmental anomalies in the ROAD dataset.

In the text

Table 2

F2-score classification performance on the ROAD dataset.

In the text

Table 3

Model performance (F2 score) after fine-tuning when varying the SSL loss function for a ResNet34 backbone.

In the text

Table 4

Computational performance of anomaly detectors.

In the text

All Figures

	Fig. 1 Illustration of 10 examples from the ROAD dataset.
In the text

	Fig. 2 Illustration of self-supervised training procedure used in ROAD; we used random cropping for augmentation.
In the text

	Fig. 3 Illustration of inference pipeline of ROAD; we combine both supervised and self-supervised learning to effectively detect radio-observatory-based anomalies.
In the text

	Fig. 4 Per-class mean F-2 score-based performance of each model shown in Table 2.
In the text

	Fig. 5 One-class anomaly detection performance for a purely supervised model and the fine tuned SSL anomaly detector when removing a number of classes from the training set. The ResNet34 backbone is used for both training paradigms.
In the text

	Fig. 6 One-class anomaly detection performance after fine-tuning of various backbone networks when varying the number of available parameters.
In the text

	Fig. 7 t-SNE projections of test data from ROAD dataset using the representation from the final layer of the SSL-pre-trained ResNet-34 with and without fine-tuning as well as the supervised classifier.
In the text

	Fig. 8 Binary anomaly detection performance when changing the amount of supervision used to train a ResNet-34 backbone for each training paradigm.
In the text

	Fig. 9 Mean classification performance of the ResNet-34 backbone after fine-tuning when changing the threshold used for anomaly detection as well as the combination function. Combinations #1 and #2 correspond to Eqs. (6) and (7), respectively.
In the text

	Fig. 10 Mean anomaly detection F2-score performance of the ResNet34 when varying the relative contribution of ℒ_recon and ℒ_con. When λ is minimum, the loss favours ℒ_recon, while a for greater losses ℒ_con dominates.
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

[1] An, J., & Cho, S. 2015, Variational autoencoder based anomaly detection using reconstruction probability. Special lecture on IE, 2, 1 [Google Scholar]

[2] Baireddy, S., Desai, S. R., Mathieson, J. L., et al. 2021, in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (IEEE), 1951 [CrossRef] [Google Scholar]

[3] Barsdell, B. R., Bailes, M., Barnes, D. G., & Fluke, C. J. 2012, MNRAS, 422, 379 [CrossRef] [Google Scholar]

[4] Bassa, C. G., Romein, J. W., Veenboer, B., van der Vlugt, S., & Wijnholds, S. J. 2022, A & A, 657, A46 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[5] Bergmann, P., Fauser, M., Sattlegger, D., & Steger, C. 2019a, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9584 [Google Scholar]

[6] Bergmann, P., Löwe, S., Fauser, M., Sattlegger, D., & Steger, C. 2019b, VISI-GRAPP 2019 – Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, 5, 372 [Google Scholar]

[7] Bergman, L., Cohen, N., & Hoshen, Y. 2020, arXiv e-prints [arXiv:2002.10445] [Google Scholar]

[8] Bommasani, R., Hudson, D. A., Adeli, E., et al. 2021, arXiv e-prints [arXiv:2108.07258] [Google Scholar]

[9] Broekema, P. C., Mol, J. J. D., Nijboer, R., et al. 2018, Astron. Comput., 23, 180 [NASA ADS] [CrossRef] [Google Scholar]

[10] Burlina, P., Joshi, N., & Wang, I. J. 2019, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 11499 [Google Scholar]

[11] Chandola, V., Banerjee, A., & Kumar, V. 2009, ACM Comput. Surv., 41, 1 [CrossRef] [Google Scholar]

[12] Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. 2020, arXiv e-prints [arXiv:2002.05709] [Google Scholar]

[13] Connor, L., & van Leeuwen, J. 2018, AJ, 156, 256 [NASA ADS] [CrossRef] [Google Scholar]

[14] de Gasperin, F., Dijkema, T. J., Drabent, A., et al. 2019, A & A, 622, A5 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[15] Doersch, C., Gupta, A., & Efros, A. A. 2015, in IEEE International Conference on Computer Vision (ICCV), 1422 [CrossRef] [Google Scholar]

[16] Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al. 2021, in International Conference on Learning Representations [Google Scholar]

[17] Fei-Fei, L., Deng, J., & Li, K. 2010, J. Vision, 9, 1037 [CrossRef] [Google Scholar]

[18] Foley, A. R., Alberts, T., Armstrong, R. P., et al. 2016, MNRAS, 460, 1664 [NASA ADS] [CrossRef] [Google Scholar]

[19] George, D., & Huerta, E. 2018, Phys. Rev. D, 97, 044039 [NASA ADS] [CrossRef] [Google Scholar]

[20] Grill, J.-B., Strub, F., Altché, F., et al. 2020, in Advances in Neural Information Processing Systems [Google Scholar]

[21] Gunn, J. E., Carr, M., Rockosi, C., et al. 1998, AJ, 116, 3040 [NASA ADS] [CrossRef] [Google Scholar]

[22] Hayat, M. A., Stein, G., Harrington, P., Lukić, Z., & Mustafa, M. 2021, ApJ, 911, L33 [NASA ADS] [CrossRef] [Google Scholar]

[23] He, K., Zhang, X., Ren, S., & Sun, J. 2016, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 770 [Google Scholar]

[24] He, K., Chen, X., Xie, S., et al. 2022, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition [Google Scholar]

[25] Kerrigan, J., Plante, P. L., Kohn, S., et al. 2019, MNRAS, 11, 1 [Google Scholar]

[26] La Plante, P., Williams, P., Kolopanis, M., et al. 2021, Astron. Comput., 36, 100489 [NASA ADS] [CrossRef] [Google Scholar]

[27] Li, C.-L., Sohn, K., Yoon, J., & Pfister, T. 2021, arXiv e-prints [arXiv: 2104.04015] [Google Scholar]

[28] Liu, F. T., Ting, K. M., & Zhou, Z.-H. 2008, Eighth IEEE International Conference on Data Mining (Pisa, Italy), 413 [Google Scholar]

[29] Liu, Z., Mao, H., Wu, C.-Y., et al. 2022, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) [Google Scholar]

[30] Lochner, M., & Bassett, B. 2021, Astron. Comput., 36, 100481 [NASA ADS] [CrossRef] [Google Scholar]

[31] Ma, P. X., Ng, C., Rizk, L., et al. 2023, Nat. Astron., 7, 492 [NASA ADS] [Google Scholar]

[32] Margalef-Bentabol, B., Huertas-Company, M., Charnock, T., et al. 2020, MNRAS, 496, 2346 [Google Scholar]

[33] Malanchev, K. L., Pruzhinskaya, M. V., Korolev, V. S., et al. 2021, MNRAS, 502, 5147 [Google Scholar]

[34] Melchior, P., Liang, Y., Hahn, C., & Goulding, A. 2023, AJ, 166, 74 [NASA ADS] [CrossRef] [Google Scholar]

[35] Mesarcik, M., Boonstra, A.-J., Meijer, C., et al. 2020, MNRAS, 496, 1517 [CrossRef] [Google Scholar]

[36] Mesarcik, M., Boonstra, A.-J., Ranguelova, E., & van Nieuwpoort, R. V. 2022a, MNRAS, 516, 5367 [NASA ADS] [CrossRef] [Google Scholar]

[37] Mesarcik, M., Ranguelova, E., Boonstra, A.-J., & van Nieuwpoort, R. V. 2022b, Array, 14, 100182 [CrossRef] [Google Scholar]

[38] Morello, V., Rajwade, K. M., & Stappers, B. W. 2021, MNRAS, 510, 1393 [NASA ADS] [CrossRef] [Google Scholar]

[39] Muthukrishna, D., Mandel, K. S., Lochner, M., Webb, S., & Narayan, G. 2022, MNRAS, 517, 393 [NASA ADS] [CrossRef] [Google Scholar]

[40] Nan, R., Li, D., Jin, C., et al. 2011, Int. J. Mod. Phys. D, 20, 989 [Google Scholar]

[41] Norris, R. P. 2010, Proceedings – 6th IEEE International Conference on e-Science Workshops, e-ScienceW 2010, 21 [Google Scholar]

[42] Offringa, A. R., de Bruyn, A. G., Biehl, M., et al. 2010, MNRAS, 405, 155 [NASA ADS] [Google Scholar]

[43] Pidhorskyi, S., Almohsen, R., Adjeroh, D. A., & Doretto, G. 2018, Adv. Neural Inform. Process. Syst., 6822 [Google Scholar]

[44] Prasad, P., Wijnholds, S. J., Huizinga, F., & Wijers, R. A. M. J. 2014, A & A, 568, A48 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[45] Reiss, T., & Hoshen, Y. 2021, arXiv e-prints [arXiv:2106.03844] [Google Scholar]

[46] Romein, J. W., Broekema, P. C., Mol, J. D., & van Nieuwpoort, R. V. 2010, ACM SIGPLAN Notices, 45, 169 [CrossRef] [Google Scholar]

[47] Roth, K., Pemula, L., Zepeda, J., et al. 2021, CVPR, accepted [arXiv:2106.08265] [Google Scholar]

[48] Schölkopf, B., Williamson, R., Smola, A., Shawe-Taylor, J., & Platt, J. 1999, in Proceedings of the 12th International Conference on Neural Information Processing Systems, Denver, CO, 582 [Google Scholar]

[49] Sclocco, A., van Leeuwen, J., Bal, H., & van Nieuwpoort, R. 2016, Astron. Comput., 14, 1 [NASA ADS] [CrossRef] [Google Scholar]

[50] Sclocco, A., Vohl, D., & Van Nieuwpoort, R. V. 2019, RFI 2019 – Proceedings of 2019 Radio Frequency Interference: Coexisting with Radio Frequency Interference [Google Scholar]

[51] Spirkovska, L., Iverson, D., Hall, D., et al. 2010, in Space Ops 2010 Conference (Reston, Virginia: American Institute of Aeronautics and Astronautics) [Google Scholar]

[52] Storey-Fisher, K., Huertas-Company, M., Ramachandra, N., et al. 2021, MNRAS, 508, 2946 [NASA ADS] [CrossRef] [Google Scholar]

[53] Tack, J., Mo, S., Jeong, J., & Shin, J. 2020, arXiv e-prints [arXiv:2007.08176] [Google Scholar]

[54] Tasse, C., Hugo, B., Mirmont, M., et al. 2018, A & A, 611, A87 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[55] van Haarlem, M. P., Wise, M. W., Gunst, A. W., et al. 2013, A & A, 556, A2 [CrossRef] [EDP Sciences] [Google Scholar]

[56] van Nieuwpoort, R. V., & Romein, J. W. 2011, Int. J. Parallel Programm., 39, 88 [CrossRef] [Google Scholar]

[57] Villar, V. A., Cranmer, M., Berger, E., et al. 2021, ApJS, 255, 24 [NASA ADS] [CrossRef] [Google Scholar]

[58] Vocks, C., Mann, G., Breitling, F., et al. 2018, A & A, 614, A54 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[59] Walmsley, M., Lintott, C., Géron, T., et al. 2021, MNRAS, 509, 3966 [NASA ADS] [CrossRef] [Google Scholar]

[60] Walmsley, M., Scaife, A. M. M., Lintott, C., et al. 2022, MNRAS, 513, 1581 [NASA ADS] [CrossRef] [Google Scholar]

[61] Wattenberg, M., Viégas, F., & Johnson, I. 2016, Distill, 1, e2 [CrossRef] [Google Scholar]

[62] Weeren, R. J. V., Williams, W. L., Hardcastle, M. J., et al. 2016, ApJS, 223, 2 [CrossRef] [Google Scholar]

[63] Wijnholds, S. J., Van Der Tol, S., Nijboer, R., & Van Der Veen, A. J. 2010, IEEE Signal Process. Mag., 27, 30 [CrossRef] [Google Scholar]

[64] Yatawatta, S., de Bruyn, A. G., Brentjens, M. A., et al. 2013, A & A, 550, A136 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]

[65] Yi, J., & Yoon, S. 2021, in Lecture Notes in Computer Science, 375 [CrossRef] [Google Scholar]

[66] Zhang, Y. G., Hyun Won, K., Son, S. W., Siemion, A., & Croft, S. 2018 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2018 – Proceedings, 1114 [Google Scholar]