Open Access
Issue
A&A
Volume 674, June 2023
Article Number A218
Number of page(s) 12
Section The Sun and the Heliosphere
DOI https://doi.org/10.1051/0004-6361/202346404
Published online 23 June 2023

© The Authors 2023

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article is published in open access under the Subscribe to Open model. Subscribe to A&A to support open access publication.

1. Introduction

Solar flares are the most intense explosive events in the solar system (Lin 2011). The accelerated particles emit light across the electromagnetic spectrum, from gamma rays to radio waves. High-intensity radio emission characterises solar radio bursts (SRBs), which manifest as complex signals in dynamic spectra. Based on the structure of their dynamic spectra, SRBs are divided into five categories, ranging from Type I to Type V (Pick 2008). Because Type III bursts can occur hundreds of times each day, detecting them and understanding their spectral features is a computational challenge. This problem has become even more difficult in recent years with the development of technologies such as the LOw-Frequency ARray (LOFAR; Van Haarlem et al. 2013), which provides high-volume data streams (up to 3 Gb s−1 at a single station) of radio burst observations that need to be classified accurately in real time. With the development of LOFAR for Space Weather (LOFAR4SW; Carley et al. 2020a), a system update aimed at autonomously monitoring solar radioactivity, the need for automated data pipelines for solar radio bursts has become more immediate. Presently, software pipelines for autonomously detecting SRBs are an essential component of such a system. The research presented in this paper identifies deep learning as an important component of such a pipeline.

Recent work on object detection algorithms, such as you only look once (YOLO; Redmon et al. 2016), classification algorithms, such as support vector machines (SVM; Evgeniou & Pontil 2001), and random forest (RF; Louppe 2014), has shown the need for a high-quality simulated training set to improve the accuracy and robustness of algorithms when classifying and detecting Type III SRBs (Carley et al. 2020b). The work presented in this paper shows the crucial role that generative adversarial networks (GANs; Goodfellow et al. 2020) play in generating simulated images for such training sets.

Creating training sets for SRB classification using conventional techniques frequently involves the tedious effort of going through large data archives to locate and collate pertinent Type III SRB images. Generating simulated data significantly reduces this task through generation of data that not only look similar to SRBs but also creates it in volume and in a short period of time. Previously, a significant number of SRB-like images were produced using parametric modelling (Kalkan et al. 2018). This method produced Type III SRBs that were random in number, grouping, intensity, drift rate, heterogeneity, start-end frequency, and start-end duration, all of which are traits of a Type III. However, compared to Type IIIs observed in daily observations, these images lacked realism.

Several non-machine learning attempts have been made to automatically classify SRBs in dynamic spectra. Current algorithms implement the Hough or Radon transform methods as a way of recognising specified parametric shapes in images (Lobzin et al. 2009). Depending on the type of radio burst being categorised, these algorithms can reach up to 84% accuracy. Other methods include constant-false-alarm-rate detection (Lu et al. 2004; Salmane et al. 2018), which is essentially the detection of radio bursts in dynamic spectra employing de-noising and an adaptive threshold. The method works well with various types of radio bursts and has been reported to have a 70% accuracy.

Through the application of multi-modal deep learning to a spectrogram at millimetric wavelengths, deep neural networks have been shown to be highly successful at detecting SRBs (Ma et al. 2017). The technique combines auto-encoders and regularisation to achieve an accuracy of 82% in burst detection, but it has not been applied to metric wavelengths (the LOFAR range), where bursts can have much more intricate geometries.

Generative deep learning models, such as deep convolutional generative adversarial networks (DCGANS; Zhang et al. 2021), are playing a crucial role when classifying SRBs. The system is modified to convert a GANs generative type network into a classification technique. This method successfully classifies Type III solar radio bursts with an accuracy of between 89 and 92% when used with LOFAR metric wavelengths.

To identify SRBs, researchers have recently started using object detection algorithms such as faster R-CNN (Hou et al. 2020). With an average precision (AP) of 91%, this deep learning neural network was demonstrated to be accurate at extracting minor aspects of SRBs. However, the model lacks the capability of real-time detection, demonstrating a maximum of 17 frames per second on open-sourced datasets such as COCO (Ren et al. 2017).

Fast radio bursts (FRBs) and Search for Extra-Terrestrial Intelligence (SETI; Zhang et al. 2018) is another area where machine learning is being used in conjunction with radio interferometer observations. SETI employs machine learning to look for FRBs in planetary systems using its Allen telescope array. SETI uses a deep convolutional neural network (CNN; Wu 2017), known as resnet (He et al. 2016), to extract irregular high-frequency FRB spikes from within the dynamic spectrum, highlighting regular noise frequencies and radio frequency interference (RFI). This model produced a recall score of 95%.

Conor and van Leeuwen use deep neural networks to extract and classify features of FRBs within observations (Connor & van Leeuwen 2018). The research uses both simulated and real single Galactic pulsars to obtain a dataset for training the CNN. In this instance, due to the scarcity of FRBs, the training set was dominated by simulated bursts. They simulated most of their true positives but used only false positives that were generated in real surveys and labelled by eye. CNNs have been applied to this scarce data with above 99% accuracy when classifying such phenomena. This FRB extraction method illustrates how robust a CNN can be when using a hybrid dataset containing both real and simulated data.

With recent research moving to deep CNNs and object detection for classifying and detecting radio frequency phenomena, we decided to further investigate these topics. For object detection, CNNs come in a variety of different flavours, including YOLO, single shot detectors (Liu et al. 2016), region-CNN (R-CNN; Girshick et al. 2016), fast R-CNN (Girshick 2015), and faster R-CNN (Ren et al. 2017). YOLO has been the only algorithm to deliver high accuracy and real-time detections on datasets, although the other approaches listed have shown to be quite successful for object detection, just not in real time.

In our previous research, we adapted the YOLO algorithm to detect Type III SRBs (Scully et al. 2021). Using this configuration we obtained an accuracy score of 82.63%. However, one score we could not obtain then was mean Average Precision (mAP), which determines how precisely the algorithm can locate a certain object in an image, in this case, Type III SRBs. We noted some key areas that we could improve on to allow us to measure mAP, in particular the simulated training set.

In this paper, we apply multiple deep-learning methods to the problem of SRB simulation, detection, and classification. We use GANs to create a simulated training set of images comparable to real observations, which is then fed into YOLO to precisely detect and classify Type III SRBs.

The paper is organised as follows, In Sect. 2, we describe how our data were gathered with LOFAR and the phenomenon that is an SRB. In Sect. 3, we introduce the deep learning technique of convolutional neural networks and how we manipulate this deep learning technique for SRB simulation, detection, and classification, and the significance of SRB simulation techniques. In Sect. 4, we introduce GANs, and how they can improve SRB simulation. In Sect. 5, we introduce YOLO, our second deep learning method for SRB detection and classification, its architecture, and the datasets on which YOLO is trained and evaluated. In Sect. 6, we visualise and discuss the results produced by YOLO and how the introduction of GANs for simulating SRBs can generate a mAP score.

2. LOFAR and solar radio bursts

2.1. LOw Frequency ARray

The data for this study came from the LOFAR radio interferometer, which was erected in the north of the Netherlands and across Europe, and includes Ireland’s I-LOFAR station, which is shown in Fig. 1. For monitoring the radio universe in the comparatively uncharted low-frequency range of 10–240 MHz, LOFAR offers a special assortment of observational modes. Several radio astronomical objects can be observed simultaneously by LOFAR, which can also operate several stations at once. The system can operate as a multi-station, very long baseline interferometer or each station can function independently as a telescope.

thumbnail Fig. 1.

Irish Low-Frequency Array station IE613 (I-LOFAR) at Birr Castle, County Offaly. Coaxial cables are used to transport data from the Low Band Antennas (LBAs) and High Band Antennas (HBAs) to the ILT Cabinet (centre right), where it is amplified, filtered, and digitalised. Data is sent in international mode at a speed of ∼3.2 Gbps to Groningen, Netherlands. In the I-LOFAR Control Room, data is processed using REALTA in local mode (bottom left; Murphy et al. 2021).

LOFAR antenna stations provide the same basic functions as standard interferometric radio dishes. These stations feature a large collecting area and high sensitivity, as well as pointing and tracking capabilities, similar to typical radio dishes. Unlike traditional radio dishes, LOFAR stations physically do not move, instead, the system combines signals from individual antennae to construct a phased array using a combination of analogue and digital beam-forming techniques, making it more flexible and agile. Rapid telescope re-pointing and multiple, simultaneous observations from a single station are made possible by station-level beamforming. Then, a central processing unit can receive the stations’ digitised, beam-formed data and correlate it for imaging and observation analysis.

Single-station beamformed data is typically compiled into a dynamic spectrum with a time resolution of 5 microseconds and a frequency resolution of 195 kHz. The 488 frequency channels that make up the dynamic spectra allow for the recording of data at a rate of several terabytes (TB) per hour. The REAL-time Transient Acquisition backend (REALTA), a high-performance computing system for processing and storing unprocessed beamformed data has recently been developed by the I-LOFAR team, (Murphy et al. 2021). Due to the high volume of data, automated algorithms are required to sort and classify any phenomena of interest. In our case, the classification and detection of SRBs is the primary goal. In this study, we constructed training sets on which GANs and YOLO could be trained and assessed against observations from I-LOFAR.

2.2. Solar radio bursts

SRBs are usually investigated in dynamic spectra and are divided into five major spectral classes, spanning from Type I to Type V, based on their shape, frequency, and time length. The intricate characterisation of these radio bursts, however, makes classification extremely challenging. When employing machine learning approaches to categorise such occurrences, the data that the classification algorithms are trained on is crucial. SRBs are regularly observed in dynamic frequency versus time spectra.

The most frequent SRBs are Type IIIs, which are short-lived and appear in dynamic spectra as a vertical bright strip, as illustrated in Fig. 2. The numerous diverse forms a Type III might assume within this vertical strip makes the task more complicated. They can appear as being smooth or patchy, faint, or intense, superimposed on other radio bursts, freestanding or in groups, or immersed in strong RFI. For this research, we used Type IIIs in the frequency range of 10–100 MHz, where they generally occur as a vertical stripe.

thumbnail Fig. 2.

Example of a dynamic spectrum showing a real Type III solar radio burst between 20 and 90 MHz. Notice the Type III’s vertical strip-like shape and short duration in time, lasting only a couple of seconds.

3. Deep learning and data simulation

3.1. Convolutional neural networks

A CNN is a deep-learning neural network that is specifically designed for visual feature recognition. It is distinguished from a standard NN by the presence of numerous convolutional layers. These layers effectively do image filtering to produce image intensity gradients. Each filter produces a different gradient and is responsive to particular shapes. For example, the first layers of the CNN may contain filters that respond to horizontal or vertical lines, curves, and other simple geometric shapes. After the first filters are applied, the CNN then applies a max-pooling layer, which is a type of summing and downsampling. Another convolutional layer, max-pooling, and so forth follow. Max-pooling works by allowing deeper layers of the network to access larger and larger portions of the image. While simple geometries may be responsive to the early layers, with max-pooling, the subsequent layers become responsive to more complex shapes made from these geometries, such as circles, triangles, and complex polygons. The max-pooling and convolving can continue until the deepest levels of the network react to recognisable aspects, for example, facial characteristics.

In our research, the network’s final layers respond to SRBs and their precise structure. Depending on the complexity of the network, the number of convolutional and max pooling layers vary, but the final result is a single image vector representation. This vector is then used to classify the image. Similar to smaller neural networks, a CNN’s weights and biases (including those of the convolutional layers) must be trained. Given the size of the networks, fitting tens of thousands or even millions of parameters may be necessary for the most complex networks. To avoid under- or overfitting (Ying 2019), the number of training instances should at least be in or close to the same order of magnitude as the number of weights and biases in the network. A resulting problem is the lack of accessible databases with tens of thousands of images to train phenomena such as radio bursts. There are two ways to address this problem: (i) simulating a large number of training samples for the network, or (ii) using a method called transfer learning (Zhuang et al. 2021), in which we take an advanced and powerful CNN that has already been trained on millions of images of generic scenes (containing, among other things, everyday objects such as cars and people) and retrain a smaller portion of the network on our specific set of data (radio bursts). Transfer learning is based on the notion that the general forms learned by the CNN from ordinary items may be recycled for new objects, with studies showing that this is possible even for images that are wholly morphologically unlike, such as cars and solar radio bursts (Zhuang et al. 2021).

In this research, we implemented congruent CNN architectures to address the data problem. In deep learning, congruence refers to the similarity or consistency between different parts of a model or different models. It also refers to the similarity between the weights of different models trained on the same task, or the consistency of the outputs of different models when presented with the same input. In the case of GANs, we created two CNN architectures, with one being inverted and pitched them against each other to generate real simulated Type III SRBs. We then used transfer learning along with the simulated dataset to train an elongated CNN architecture YOLO to detect Type III SRBs.

3.2. SRB simulation

Simulating SRBs has been essential for generating large datasets for training detection and classification algorithms. It essentially eliminates the time-consuming task of searching large data archives to find good, clean images suitable for a training set. Simulating data also gets rid of the need for cleaning archival data of any artefacts such as RFI. We originally used parametric modelling to simulate SRBs in previous studies, in which polynomials were used to create overall Type III shape in dynamic spectra, with skewed Gaussians used for their time dependant intensity profile at each frequency. Using this method we created Type III radio bursts with a random number, grouping, intensity, drift rate, heterogeneity, start-end frequency, and start-end time. We placed the bursts in front of a background of synthetic and random RFI channels, as shown in Fig. 3.

thumbnail Fig. 3.

Parametric modelling simulation (a) compared to a real Type III observation (b). The parametric modelling method fails to simulate activity seen in a real observation such as small, faint Type III bursts or interference such as embedded RFI.

When training YOLO, this strategy has a number of advantages, including the ability to generate enormous amounts of automatically labelled Type III data in a short amount of time. Using parametric modelling, we were able to construct a dataset of 80 000 images of Type III SRBs for our previous YOLO model. However, this training set created many issues when it came to testing the model’s robustness. The first issue is that this data must be accurately labelled for training; YOLO needs to see what it is looking at because it is a supervised learning technique that requires a labelled dataset in order to identify a class, in this instance Type III SRBs. The automatic labelling system in the parametric modelling saved a lot of time, but there was no change in the Y-axis or height variables of these labelled bounding boxes, in other words, the height variable became overly saturated with the same static values in the training set, therefore the final YOLO detections had the same height variation as the training set, as seen in Fig. 4. As a result, we were unable to calculate the localised precision of our model.

thumbnail Fig. 4.

10-min segment from the testset of our previous model attempt. The lack of variation and over-saturation in the Y-axis or height variable in the training set meant that when our previous YOLO model was evaluated, the bounding box predictions (highlighted in green) in the test set had no variation in the Y-axis or height variable no matter the size of the Type III.

The second issue we experienced with the parametric modelling approach was the model’s lack of realism. While it provided many possibilities in terms of position, grouping, and overall shape and intensity, it did not provide us with the exact shape and intensity variation that we would see in a real observation.

With the introduction of GAN, we were able to generate Type III SRB simulations that were realistic. This allowed us to create SRBs that were similar to those seen in actual I-LOFAR observations, thus removing the need to trawl through data archives for the appropriate images for the training set. They also offered the Y-axis variation that YOLO needed to make localised Type III SRB detections.

4. Generative adversarial networks

GANs are a type of generative modelling that makes use of CNNs. Generative modelling is an unsupervised learning task that automatically finds and learns regularities or patterns in input data. The model created may be used to produce or output brand new instances that have similar attributes or features from the original dataset input. The generator model is an inverted CNN, used to generate new simulated instances, and the discriminator model, a binary classifier that attempts to classify examples as real or fake (generated), are the two sub-models that are trained as part of the GANs framework, see Fig. 5. The purpose of the generator is to accept random input values (noise) and generate an image from them using a deconvolutional neural network. To upsample the data, we employ 5 transpose convolutional layers with a variation of 2 × 2 and 1 × 1 strides. In order to accommodate for small negative values when the input is less than zero, we use Batch normalisation and Leaky ReLu activation after each transposition layer. The Tanh activation function is used at the top layer because, while creating images, they are often normalised to fall between [0,1] and [−1,1]. The discriminator is then fed a batch of real training data or fake training data, depending on the training stage, then downsampled using 5 convolutional layers with a combination of 2 × 2 and 1 × 1 strides. Each convolutional layer is followed by the use of Batch normalisation once more, followed by ReLu activation, which changes all negative values to 0. The Sigmoid activation function, used at the top layer, normalises the output in the [0, 1] range. These two sub-models are trained in an adversarial zero-sum game until the discriminator model is tricked approximately half of the time, indicating that the generator model is producing believable examples.

thumbnail Fig. 5.

An illustration of a GAN architecture for producing Type III SRB data in simulation.

4.1. Type III generation

The GAN was trained to create simulated Type III SRBs. The training set consisted of 2763 real Type III images that I-LOFAR obtained by merging several observation days, with each observation day broken up into 10-min chunks. The vertical strip shape of a Type III is visible in these 10-min chunks. In terms of solar activity, these observation days alternate between active and relatively quiet. This data is then cleaned using colour data augmentation to generate images that are free of interference, such as embedded RFI. We do not need to label any data because the GAN is an unsupervised algorithm, so the images are fed directly into the algorithm for training. The GAN algorithm is quite computationally demanding as we are trying to produce images from just general noise or random vectors. Therefore, the system used to train the GAN included two Nvidia Geforce RTX 2080 Ti GPUs connected via SLI, running Ubuntu 20.4.2 LTS on an AMD Ryzen Threadripper 1950× with 32GB of RAM. For the training configuration, we were using 90% of GPU capacity for a variety of different epochs at a batch size of 32, see Fig. 6.

thumbnail Fig. 6.

The loss error battle between the discriminator and the generator when generating Type IIIs. This illustrates the GAN’s learning pattern. Notice how no instance of training is the same. During training, the generator and discriminator losses should converge, which is an indication that the network is producing high-quality simulated images. This can be observed in plots (a) and (c). One key feature when training GANs is convergence failure seen in plots (b) and (d) (when generator loss spikes for a period of epochs; Goodfellow et al. 2014). This occurs when there is an inability to find the equilibrium between generator loss and discriminator loss. Images generated during this period are very poor and noisy.

During training, we produced 8 generated images after each epoch to create a collection of fake images. The GAN was trained numerous times, which allowed us to build a dataset of over 4500 images containing simulated Type III SRBs that were random in number, grouping, intensity, drift rate, heterogeneity, start-end frequency, and start-end time. The generated SRBs compared very well with real Type IIIs observed by I-LOFAR. We then removed noisy generated images from the simulated dataset that was produced when the generator error significantly spiked during training. The generated images were small at 128 × 128 pixels, so they were bulk-rescaled up to 256 × 256. To evaluate the images produced by the GAN, we use human perception as the most efficient way of evaluating these GAN-produced Type III SRBs (Borji 2019). We compared the GAN results along with parametric methods to real Type III SRBs observed by I-LOFAR in Fig. 7.

thumbnail Fig. 7.

Parametric modelling and GANs compared to real Type III SRBs observed by I-LOFAR. GANs produce more realistic examples of Type IIIs compared to the parametric modelling method. It is apparent from (a) that parametric modelling fails to create realistic Type III SRBs. The synthetic Type III from GAN (b) is more realistic to a real oberservation (c) than parametric. These GAN-produced Type IIIs were combined with real Type IIIs observed by I-LOFAR to create a training set for YOLO.

Human perception can be used to assess GANs, but it is particularly susceptible to prejudice from observers because SRBs are abstract objects that are not like everyday objects such as automobiles and people and are therefore more likely to be misclassified. To solve this problem, we present the fréchet inception distance (FID), a metric that measures the similarity of datasets generated by parametric modelling, GANs, and real observed data (Heusel et al. 2017). The FID uses the last pooling layer of the Inception CNN is used to capture features of an input image, without the classification scores (Szegedy et al. 2015). The objective of using the FID score is to get as close to 0 as possible indicating good similarity between generated datasets when compared to real observed data. Using this method, we calculate a FID score of 8.32 when comparing GANs data to real observed data indicating a good similarity between datasets. This GANs generated data was then used as a hybrid training set for YOLO.

5. You only look once (YOLO)

Identifying what entities are present in a given image, and where they are located, is a computer vision problem known as object detection. Detecting objects in an image is more difficult than classifying them, as classifying only distinguishes between objects but not their exact positions in an image. Additionally, classification fails when applied to images with numerous objects. YOLO employs a different strategy. YOLO is a CNN that, depending on its configuration, does real-time object detection. The method divides the image into grid regions and predicts bounding boxes and probabilities for each grid zone using a single CNN (see Fig. 8). Projected probabilities are used to weight these bounding boxes. The approach only looks once at the image since it only needs one forward propagation run through the neural network to provide predictions. Using YOLO, a single CNN can predict a variety of bounding boxes and class probabilities for those boxes. By using complete images for training, YOLO enhances detection performance. Because Type III SRBs are often short-lived (∼0.1–3 s) and have a drift rate of 500 MHz−1 in dynamic spectra (Reid & Ratcliffe 2014), we chose YOLO for this investigation. The fundamental benefit of YOLO is that it is very quick and can deliver accuracy that is virtually equivalent to Faster R-CNN (Ren et al. 2017).

thumbnail Fig. 8.

The Darknet-19 CNN architecture of YOLOv2. The Darknet-19 architecture consists of 19 convolutional layers and 5 maxpool layers. The Reorg layer combines both high- and mid-level features for better detection accuracy. In order to increase accuracy in YOLOv2, the fully connected layers of the CNN are eliminated, and K-means classification is used for detection and classification (Redmon & Farhadi 2017).

5.1. Dataset

The key feature of our updated YOLOv2 model from our previous work is the dataset. Instead of using the parametric modelling generated data, we used a hybrid dataset, which offered more realistic data for YOLO to train on. The hybrid dataset consisted of data generated by GANs and real observed data from I-LOFAR. This improved dataset of 6732 images, containing just over 60 000 Type III examples, is considerably smaller than the parametric modelling (80 000 images) training set. However, the improved dataset is more realistic and offers more robustness when testing on real I-LOFAR observations. There’s also less memory taken by the improved training set so it can be easily transferred without the risk of data corruption. The only constraint that comes with this approach is that this dataset needs to be manually labelled. This was done by using LabelImg, a bounding box annotating tool used to label objects within an image (Tzutalin 2015). Once the dataset was labelled, we had a training set consisting of 6732 images for training YOLOv2.

5.2. Model configuration

After we created the dataset, we set up the model. To create the model in YOLOv2, we used a framework called Darkflow (Trieu 2018), which is a TensorFlow python version of Darknet. As seen in Fig. 8, YOLOv2 has 19 convolutional layers and 5 maxpool layers. To optimise the model, the number of filters in the final convolutional layer was reduced, and the sizes of the bounding boxes were adjusted using anchor values or bounding box dimensions. In our previous research, we originally set the height of the bounding box to a static 90 MHz; this was due to the lack of Y-axis value variation seen in the parametric modelling approach. With our new and improved training set, we adjusted the anchor height value to 20 MHz, 30 MHz, 40 MHz, 50 MHz, and 70 MHz. This allowed us to capture most Type III sizes in terms of width and height detected by YOLOv2. The input size was also changed from 416 × 416 to 288 × 288 to achieve real-time frame rates. With this input size, the model can detect Type IIIs in real time (90 frames per second) as it only detects one class in greyscale format (Redmon & Farhadi 2017).

5.3. Training and validation

The YOLOv2 model was trained to detect and classify Type III SRBs. The training set employed a collection of 3000 simulated GAN images and 3732 real images. The data set included Type III samples with random start-end frequency, start-end time, drift rate, intensity, grouping, and heterogeneity. A subset of the training set, the validation set, contained 1500 Type III images produced by the GAN. The training and validation sets were both manually labelled which, although tedious, provides YOLO with precise instructions on what to train within the specified image. To fulfil the requirements of the Darkflows training set, these manually labelled training set images were converted into XML instructions. To determine whether the model was overfitting or underfitting, we had to construct the Darkflow framework to validate the training. Leaky ReLU was used as the activation function during training, and the model was trained using a learning rate based on the learning pattern of the model. If the model learns too quickly, the learning rate was updated, resulting in a smoother learning pattern. Stochastic gradient descent (SGD) was used to continuously update the learning parameters until convergence. We trained the YOLO model for 1000 epochs at a batch size of 16. With this configuration, training took 7 days with both training and validation loss decreasing with every iteration, as shown in Fig. 9.

thumbnail Fig. 9.

Comparing Training loss with validation loss illustrates how good YOLO is learning on the training set. It is also used to prevent the algorithm from overfitting. Each epoch represents 422 iterations, or when the dataset is passed forwards and backwards through YOLO once.

5.4. Test set

When testing our previous model, we chose a specific observation date, the 10th of September 2017. Although it provided it us with good benchmark results it never tested the robustness of our model as the observation was relatively uneventful. To test the model’s robustness, a test set is needed that has a variety of Type III examples. Therefore, multiple observations from different dates in the I-LOFAR archive were mined to build a test set that contained examples of busy and quiet periods of solar activity. In order for colour not to be an influencing factor in the model’s predictions, the image was converted to greyscale (Rafegas & Vanrell 2017). We concentrated our observations in the 10–90 MHz band as this is where the Type III’s vertical strip shape can be seen. The observations were then divided into 10-min intervals to provide a test set of 2763 images that contained around 35 000 Type III solar radio bursts. We then precisely annotated our ground truth bounding box values. When a Type III was labeled, the appropriate bounding box coordinates were saved in an XML file for mAP for comparison of the ground truth coordinates with the predicted coordinates of the models.

6. Results

The YOLOv2 model’s performance was measured using the test set described in Sect. 5.4. In our previous research, when evaluating the model’s performance, we viewed the model as a detection-classification problem. This was done by first predicting using the test set and then sifting through it searching for correctly identified Type III SRBs. Once a bounding box encompassed a Type III, we categorised it as correctly identified and then annotated a bounding box around the predicted bounding box. This meant we could only obtain a unit in which we represented our previous model’s performance results known as the f1-score. The f1-score calculates the balance between precision and recall, where precision refers to how accurately the model predicted an object’s position and recall is the proportion of true positives to all actual objects. True positive (TP) and false positive (FP) are obtained from intersection over union (IoU) from tested data. We compared YOLOs predicted bounding box to the ground truth bounding box to obtain IoU. A prediction is classified as TP if the IoU is greater than 0.5 and FP is if it is less than 0.5. Figure 10 illustrates the false negative (FN) for images where the model missed an identified Type III object. Then:

Precision = TP TP + FP $$ \begin{aligned} \mathrm{Precision}&= \frac{\mathrm{TP}}{\mathrm{TP + FP}}\end{aligned} $$(1)

Recall = TP TP + FN $$ \begin{aligned} \mathrm{Recall}&= \mathrm{\frac{TP}{TP + FN}}\end{aligned} $$(2)

f 1 score = 2 Precision Recall Precision + Recall $$ \begin{aligned} f1-\mathrm{score}&= 2 * \frac{\mathrm{Precision * Recall}}{\mathrm{Precision + Recall}} \end{aligned} $$(3)

Intersection over Union = Area of Overlap Area of Union . $$ \begin{aligned} \mathrm{Intersection\,over\,Union}&= \mathrm{\frac{Area\,of\,Overlap}{Area\,of\,Union}}. \end{aligned} $$(4)

thumbnail Fig. 10.

A visual representation of IOU thresholding. The green bounding box indicates the ground-truth or actual Type III, blue bounding box indicates a correctly predicted bounding box by YOLO (TP) and red bounding box indicates a false detection (FN) or IOU < 0.5.

Although we could determine how accurate our model was, we could never calculate mAP, which is a metric used in computer vision for evaluating the performance of object detection and image classification algorithms. It calculates the average precision across all classes and is expressed as a fraction between 0 and 1. It takes into account both the number of TP detections and FP detections. A higher mAP score indicates a better performance of the algorithm. The TP and FP variables in mAP are determined by comparing the ground truth bounding box to the model’s predicted bounding box (IoU).

The confidence threshold is an important consideration when assessing the model’s performance. The confidence threshold expresses the model’s level of certainty in predicting a Type III SRB, which ranges between 0 and 1. Figure 11 shows that the lower the confidence, the more detections made on a test image, but also the more false detections made.

thumbnail Fig. 11.

A visual representation of Table 1. As the confidence threshold decreases, the TP and FP values increase. Here, we can see where YOLO performs at its optimised confidence threshold. The key is to find the balance between both true positive and true negative values. For our model, we have evaluated using IOU threshold at 0.5 as it tests the model’s robustness and localised accuracy at detecting Type III SRBs.

The model was found to be optimal with the confidence threshold set to 0.35, as there is a balance between the TP and FP rate in terms of the models predictions (see Table 1 and Fig. 11). Figure 12 shows the bounding box predictions when model configuration is set to threshold 0.35, the resulting mAP for detecting Type III solar radio bursts is 77.71%. We plot the localised YOLO detections into a dynamic spectra observation made on the 10th of September 2017, see Fig. 13.

thumbnail Fig. 12.

YOLO making localised detections on a 10-min segment at the optimised confidence threshold of 0.35 on the testset (a). When the image is colour inverted (b), we can see the faint Type IIIs YOLO is picking up. Notice how YOLO picks up most Type IIIs in the image and ignores most RFI.

thumbnail Fig. 13.

YOLOv2 applied to an I-LOFAR observation made on the 10th of September 2017. The models’ detections capture the Type IIIs’ frequency range and length in time. The model predicts the most intense Type IIIs correctly and ignores somewhat low intensity as they are quite difficult to distinguish between Type III and RFI even to the human eye.

Table 1.

mAP scores associated with different confidence thresholds set in YOLO at different IoU thresholds.

7. Discussion

In our previous research, we had two issues with the detection quality and robustness of the model. The first issue was detection quality, the predicted bounding boxes were very static in the y-axis or height variable. The previous parametric modelling method also lacked the ability of producing simulated Type III SRB data comparable to real observed data. With the introduction of GANs for generating Type III examples, we could produce simulated SRB data with variation in the Y-axis or height variable. GANs also provided realistic Type III examples almost identical to real observed data and also free of any interference such as RFI. The second issue robustness, where the model couldn’t handle high volumes of data, having problems with SRB groupings or storms. Using the GANs simulated data, we could produce all sorts of Type III variations including groupings and storms but also other classes of Type IIIs such as inverted-U bursts and Type N bursts. Employing this new diverse training set, we could train YOLO to detect and classify the actual shape of a Type III. With this configuration of congruent deep learning models, we can accurately detect such phenomena.

In this research, we introduced the computer vision metric mAP, which calculates how well the model can accurately detect locally a Type III. One concern is the approach we took to calculate this metric. We approached the problem as a computer vision problem and evaluated it using computer vision standards applied in COCO dataset competitions, in which the IOU threshold is set to 0.5. This has many advantages in terms of accurate model detection and model robustness when tested against both busy and quiet solar activity observations. However, we could be missing out on some valuable detection data where a visually positive detection has been made (the bounding box is slightly overlapping a Type III) and the detection has been classified as a false detection due to the IoU threshold value of < 0.5 not being met. One could argue that we have been overly harsh on the model’s detections but we have done this with the view of having a more robust and accurate model for detecting Type III SRBs.

A key feature of our configuration of YOLO is the potential to detect Type III SRBs in real time. Our tests were conducted on small data streams, however, with the recent development of REALTA, a computing backend at I-LOFAR, the potential exists for the recording and processing of data in near real time. The combination of YOLO, as a real-time software pipeline, and REALTA’s hardware capabilities, could prove significant for near real-time space weather monitoring. LOFAR for Space Weather (LOFAR4SW) is a planned LOFAR improvement that would allow for frequent space weather monitoring, which will enable near-real-time monitoring of space weather phenomena such as solar flares and coronal mass ejections (Carley et al. 2020a). This will benefit not only space weather researchers but the radio astronomy community. It will increase our understanding of how space weather affects radio wave propagation in the inner heliosphere and ionosphere disturbances, as well as the impact this has on observing astronomical sources. A backend such as REALTA will be required to capture the data streams from a LOFAR4SW-equipped international station in local mode and analyse the raw data so that it may be used by space weather researchers and forecasters. With YOLO’s proven ability to detect Type IIIs in real time, and REALTAs ability to record and process data in near real time. The possibility of near real-time Type III SRB analysis is promising.

8. Conclusion

We applied a combination of deep learning models to the problem of SRB generation, detection, and classification, with the focus on real-time detection. We trained a GAN to produce realistic Type III SRBs, similar to those observed in real observations from I-LOFAR. We then labeled and combined this generated data with real observed data to produce a training set on which YOLOv2 could be evaluated. This particular configuration of YOLOv2 can achieve a mAP accuracy of 77.71% on a real data observation consisting of over 35 000 Type III solar radio burst examples while also achieving real-time frame rates (maximum 90 fps). The combination of YOLO, as a real-time software pipeline, and REALTA’s hardware capabilities, could prove significant for near real-time space weather monitoring.

We intend to develop this software pipeline further by increasing the size of the dataset using GANs but also adding variety, extending YOLO’s capability to detect other SRBs such as Type IIs. We have shown that with congruent deep learning model techniques, we can create a robust method of detecting Type III SRBs. Thus, illustrating that accurate real-time detection and classification of Type III SRBs is readily attainable.

Acknowledgments

LOFAR is one of the largest astrophysics projects in Europe, consisting of 12 international stations spread across Germany, Poland, France, UK, Sweden and Ireland, with additional stations and a central hub in The Netherlands, operated by the Netherlands Institute for Radio Astronomy (ASTRON). I-LOFAR was the Irish addition to this network and was constructed by members from Trinity College Dublin (TCD), University College Dublin (UCD), Armagh Observatory, Dublin City University (DCU), University College Cork (UCC) and National University of Ireland Galway (NUIG) with funding from Science Foundation Ireland (SFI), Department of Business, Enterprise and Innovation, Open Eir and Offaly County Council. J. Scully acknowledges support from SFI and the Technological University of the Shannon (TUS).

References

  1. Borji, A. 2019, Comput. Vis. Image Understand., 179, 41 [CrossRef] [Google Scholar]
  2. Carley, E. P., Baldovin, C., Benthem, P., et al. 2020a, J. Space Weather Space Clim., 10, 7 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  3. Carley, E. P., Gallagher, P., Mccauley, J., & Murphy, P. 2020b, EGU Gen. Assem. Conf. Abstr., 5109 [Google Scholar]
  4. Connor, L., & van Leeuwen, J. 2018, AJ, 156, 256 [NASA ADS] [CrossRef] [Google Scholar]
  5. Evgeniou, T., & Pontil, M. 2001, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2049, 249 [Google Scholar]
  6. Girshick, R. 2015, Proceedings of the IEEE International Conference on Computer Vision, 1440 [Google Scholar]
  7. Girshick, R., Donahue, J., Darrell, T., & Malik, J. 2016, IEEE Trans. Pattern Anal. Mach. Intell., 38, 142 [CrossRef] [Google Scholar]
  8. Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al. 2014, in Advances in Neural Information Processing Systems, eds. Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, & K. Weinberger (New York: Curran Associates, Inc.), 27 [Google Scholar]
  9. Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al. 2020, Commun. ACM, 63, 139 [CrossRef] [Google Scholar]
  10. He, K., Zhang, X., Ren, S., & Sun, J. 2016, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-Decem, 770 [Google Scholar]
  11. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. 2017, in Advances in Neural Information Processing Systems, eds. I. Guyon, U. V. Luxburg, S. Bengio, et al. (New York: Curran Associates, Inc.), 30 [Google Scholar]
  12. Hou, Y. C., Zhang, Q. M., Feng, S. W., et al. 2020, Sol. Phys., 295, 146 [NASA ADS] [CrossRef] [Google Scholar]
  13. Kalkan, E., Okur, F., & Altunışık, A. 2018, J. Constr. Eng. Manag. Innovat., 1, 139 [Google Scholar]
  14. Lin, R. P. 2011, Space Sci. Rev., 159, 421 [NASA ADS] [CrossRef] [Google Scholar]
  15. Liu, W., Anguelov, D., Erhan, D., et al. 2016, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9905, 21 [Google Scholar]
  16. Lobzin, V. V., Cairns, I. H., Robinson, P. A., Steward, G., & Patterson, G. 2009, Space Weather, 7, 1 [Google Scholar]
  17. Louppe, G. 2014, Mach. Learn., 63, 3 [Google Scholar]
  18. Lu, X., Wang, J., Dizaji, R., Ding, Z., & Ponsford, A. M. 2004, Can. Conf. Electr. Comput. Eng., 4, 2081 [Google Scholar]
  19. Ma, L., Chen, Z., Xu, L., & Yan, Y. 2017, Pattern Recognit., 61, 573 [NASA ADS] [CrossRef] [Google Scholar]
  20. Murphy, P. C., Callanan, P., McCauley, J., et al. 2021, A&A, 655, A16 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  21. Pick, M., & Vilmer, N. 2008, A&ARv, 16, 1 [NASA ADS] [CrossRef] [Google Scholar]
  22. Rafegas, I., & Vanrell, M. 2017, Proceedings - 2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017, 2018-January, 2697 [Google Scholar]
  23. Redmon, J., & Farhadi, A. 2017, Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017-January, 6517 [Google Scholar]
  24. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. 2016, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December, 779 [Google Scholar]
  25. Reid, H. A. S., & Ratcliffe, H. 2014, RAA, 14, 773 [NASA ADS] [Google Scholar]
  26. Ren, S., He, K., Girshick, R., & Sun, J. 2017, IEEE Trans. Pattern Anal. Mach. Intell., 39, 1137 [Google Scholar]
  27. Salmane, H., Weber, R., Abed-Meraim, K., Klein, K.-L., & Bonnin, X. 2018, J. Space Weather Space Clim., 8, A43 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  28. Scully, J., Flynn, R., Carley, E., Gallagher, P., & Daly, M. 2021, Irish Signal and Systems Conference 2021, 1-6 [Google Scholar]
  29. Szegedy, C., Liu, W., Jia, Y., et al. 2015, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 07-12-June, 1 [Google Scholar]
  30. Trieu, T. H. 2018, GitHub Repository Available online: https://github.com/thtrieu/darkflow (accessed on 14 February 2019) [Google Scholar]
  31. Tzutalin 2015, LabelImg, Free Software: MIT License [Google Scholar]
  32. Van Haarlem, M. P., Wise, M. W., Gunst, A. W., et al. 2013, A&A, 556, A2 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  33. Wang, Z., She, Q., & Ward, T. E. 2021, ACM Comput. Surveys, 54, 1 [Google Scholar]
  34. Wu, J. 2017, Introduction to Convolutional Neural Networks, 1 [Google Scholar]
  35. Ying, X. 2019, J. Phys.: Conf. Ser., 1168, 022022 [NASA ADS] [CrossRef] [Google Scholar]
  36. Zhang, Y. G., Gajjar, V., Foster, G., et al. 2018, ApJ, 866, 149 [NASA ADS] [CrossRef] [Google Scholar]
  37. Zhang, W., Yan, F., Han, F., et al. 2021, Front. Phys., 9, 1 [Google Scholar]
  38. Zhuang, F., Qi, Z., Duan, K., et al. 2021, Proc. IEEE, 109, 43 [CrossRef] [Google Scholar]

Appendix A: GAN’s objective function

The mathematics and loss functions behind GANs involve two neural networks: a generator and a discriminator. The generator network G takes a random noise vector z as input and produces a fake sample, while the discriminator network D takes as input a real sample x or a fake sample and outputs a scalar value indicating the probability that x or is real. During the training phase, the objective function of the discriminator ∇θd is the sum of positive and negative errors to be optimised:

θ d = 1 m i = 1 m [ log D ( x i ) + ( log ( 1 D ( G ( z i ) ) ) ] $$ \begin{aligned} \nabla {_\theta }{_d}=\frac{1}{m}\sum _{i=1}^{m}[\mathrm{log} D(x^i)+(\mathrm{log} (1-D(G(z^i)))] \end{aligned} $$(A.1)

Then, using the discriminator, the amount of error from the generator during the training phase can be easily calculated. The objective of the generator function ∇θg is given by:

θ g = 1 m i = 1 m [ log ( 1 D ( G ( z i ) ) ) ] $$ \begin{aligned} \nabla {_\theta }{_g}=\frac{1}{m}\sum _{i=1}^{m}[\mathrm{log} (1 - D(G(z^i)))] \end{aligned} $$(A.2)

In simple terms, the purpose of a GAN is to achieve a condition of equilibrium between the two neural networks. The search for this equilibrium involves solving the following equation, a minimax condition where the entire network attempts to optimise the V(D,G) function on the entire dataset (Wang et al. 2021):

m i n G m a x D V ( D , G ) = E x P d a t a ( x ) [ log D ( x ) ] + E z P z ( z ) [ log ( 1 D ( G ( z ) ) ) ] $$ \begin{aligned} min_{G} max_{D} V(D,G) = E_{x \sim Pdata^{(x)}} [\log D(x)] + E_{z \sim Pz^(z)} [\log (1 - D(G(z)))] \end{aligned} $$(A.3)

In practice, the GANs algorithm is trained using mini-batch stochastic gradient descent, where each batch of training data is used to update the parameters of both the generator and discriminator networks.

W n + 1 = W n ( 1 + l e a r n i n g r a t e g r a d i e n t ) $$ \begin{aligned} W_{n+1} = W_n\;(1 + learning rate * gradient) \end{aligned} $$(A.4)

All Tables

Table 1.

mAP scores associated with different confidence thresholds set in YOLO at different IoU thresholds.

All Figures

thumbnail Fig. 1.

Irish Low-Frequency Array station IE613 (I-LOFAR) at Birr Castle, County Offaly. Coaxial cables are used to transport data from the Low Band Antennas (LBAs) and High Band Antennas (HBAs) to the ILT Cabinet (centre right), where it is amplified, filtered, and digitalised. Data is sent in international mode at a speed of ∼3.2 Gbps to Groningen, Netherlands. In the I-LOFAR Control Room, data is processed using REALTA in local mode (bottom left; Murphy et al. 2021).

In the text
thumbnail Fig. 2.

Example of a dynamic spectrum showing a real Type III solar radio burst between 20 and 90 MHz. Notice the Type III’s vertical strip-like shape and short duration in time, lasting only a couple of seconds.

In the text
thumbnail Fig. 3.

Parametric modelling simulation (a) compared to a real Type III observation (b). The parametric modelling method fails to simulate activity seen in a real observation such as small, faint Type III bursts or interference such as embedded RFI.

In the text
thumbnail Fig. 4.

10-min segment from the testset of our previous model attempt. The lack of variation and over-saturation in the Y-axis or height variable in the training set meant that when our previous YOLO model was evaluated, the bounding box predictions (highlighted in green) in the test set had no variation in the Y-axis or height variable no matter the size of the Type III.

In the text
thumbnail Fig. 5.

An illustration of a GAN architecture for producing Type III SRB data in simulation.

In the text
thumbnail Fig. 6.

The loss error battle between the discriminator and the generator when generating Type IIIs. This illustrates the GAN’s learning pattern. Notice how no instance of training is the same. During training, the generator and discriminator losses should converge, which is an indication that the network is producing high-quality simulated images. This can be observed in plots (a) and (c). One key feature when training GANs is convergence failure seen in plots (b) and (d) (when generator loss spikes for a period of epochs; Goodfellow et al. 2014). This occurs when there is an inability to find the equilibrium between generator loss and discriminator loss. Images generated during this period are very poor and noisy.

In the text
thumbnail Fig. 7.

Parametric modelling and GANs compared to real Type III SRBs observed by I-LOFAR. GANs produce more realistic examples of Type IIIs compared to the parametric modelling method. It is apparent from (a) that parametric modelling fails to create realistic Type III SRBs. The synthetic Type III from GAN (b) is more realistic to a real oberservation (c) than parametric. These GAN-produced Type IIIs were combined with real Type IIIs observed by I-LOFAR to create a training set for YOLO.

In the text
thumbnail Fig. 8.

The Darknet-19 CNN architecture of YOLOv2. The Darknet-19 architecture consists of 19 convolutional layers and 5 maxpool layers. The Reorg layer combines both high- and mid-level features for better detection accuracy. In order to increase accuracy in YOLOv2, the fully connected layers of the CNN are eliminated, and K-means classification is used for detection and classification (Redmon & Farhadi 2017).

In the text
thumbnail Fig. 9.

Comparing Training loss with validation loss illustrates how good YOLO is learning on the training set. It is also used to prevent the algorithm from overfitting. Each epoch represents 422 iterations, or when the dataset is passed forwards and backwards through YOLO once.

In the text
thumbnail Fig. 10.

A visual representation of IOU thresholding. The green bounding box indicates the ground-truth or actual Type III, blue bounding box indicates a correctly predicted bounding box by YOLO (TP) and red bounding box indicates a false detection (FN) or IOU < 0.5.

In the text
thumbnail Fig. 11.

A visual representation of Table 1. As the confidence threshold decreases, the TP and FP values increase. Here, we can see where YOLO performs at its optimised confidence threshold. The key is to find the balance between both true positive and true negative values. For our model, we have evaluated using IOU threshold at 0.5 as it tests the model’s robustness and localised accuracy at detecting Type III SRBs.

In the text
thumbnail Fig. 12.

YOLO making localised detections on a 10-min segment at the optimised confidence threshold of 0.35 on the testset (a). When the image is colour inverted (b), we can see the faint Type IIIs YOLO is picking up. Notice how YOLO picks up most Type IIIs in the image and ignores most RFI.

In the text
thumbnail Fig. 13.

YOLOv2 applied to an I-LOFAR observation made on the 10th of September 2017. The models’ detections capture the Type IIIs’ frequency range and length in time. The model predicts the most intense Type IIIs correctly and ignores somewhat low intensity as they are quite difficult to distinguish between Type III and RFI even to the human eye.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.