Distinguishing between Flaring and Non-Flaring Active Regions: A Machine Learning Perspective

Large scale solar eruptions significantly impact space weather and damages space-based human infrastructures. It is necessary to predict large scale solar eruptions, which will enable us to protect our vulnerable infrastructures of modern society. We aim to investigate the difference between flaring and non-flaring active regions. We use photospheric vector magnetogram data from Solar Dynamic Observatory's Helioseismic Magnetic Imager to study the time evolution of photospheric magnetic parameters on the solar surface. We build a database of flaring and non-flaring active region observed on the solar surface from the years 2010 to 2017. We train the machine learning algorithm by the time evolution of these active region parameters. Finally, we estimate the performance obtained from this machine learning algorithm. We find the strength of some magnetic parameters namely total unsigned magnetic flux, total unsigned magnetic helicity, total unsigned vertical current and total photospheric magnetic energy density in flaring active regions are much higher compared to the non-flaring ones. These magnetic parameters in the flaring active region are highly evolving and complex. We are able to obtain good forecasting capability with a relatively high value of true skill statistic (TSS). We also find that time evolution of total unsigned magnetic helicity and total unsigned magnetic flux have very high ability to distinguish flaring and non-flaring active regions. It is possible to distinguish flaring active region from the non-flaring one with good accuracy. We confirm that there is no single common parameter which can distinguish all flaring active regions from the non flaring one. However, time evolution of top few magnetic parameters namely total unsigned magnetic flux and total unsigned magnetic helicity have very high distinguishing capability.


Introduction
Solar flare and coronal mass ejections are the two biggest explosions in the solar system. These two explosions release a huge amount of magnetic energy in the solar corona, creating disturbances in space weather. These two events directly impact the earth's atmosphere, causing geomagnetic disturbances. It is now well known that magnetic field structures in the sun are responsible for large scale eruptions. The study of magnetic fields in the sun is critical in understanding the energy build-up and release mechanism in solar flare and coronal mass ejection. Solar flares and coronal mass ejections are believed to be a storage-and-release mechanism by which the non-potential magnetic field of the solar corona is released abruptly (Priest & Forbes 2002;Shibata & Magara 2011). It is also believed that complex magnetic structures on the solar surface are related to the onset of solar eruptions. Many studies have been performed to investigate the relationship between solar eruptions and pho-tospheric magnetic parameters. Many active region parameters have been proposed to characterize the nonpotentiality of the magnetic field structures on the solar surface. Some of the well known nonpotentiality parameters are current helicity (Abramenko et al. 1996;Zhang & Bao 1999), vertical electric current (Leka et al. 1993), horizontal gradient of the longitudinal magnetic field (Zirin & Wang 1993;Tian et al. 2002), total photospheric magnetic free energy density Metcalf et al. 2005), strong magnetic shear (Low 1977;Kusano et al. 1995), reverse magnetic shear (Kusano et al. 2004;Vemareddy et al. 2012), shear angle (Ambastha et al. 1993), twist parameter (Pevtsov et al. 1994;Holder et al. 2004) etc. Although individual case studies indicate a strong relationship between these nonpotentiality parameters and the flare productivity, it is till now not clear which property is common in all the eruptive active regions which will distinguish them from other non-eruptive active regions.
Article number, page 1 of 13 arXiv:2003.03878v1 [astro-ph.SR] 9 Mar 2020 A&A proofs: manuscript no. ms It is now well known that magnetic field structures on the solar surface change significantly with time. A detailed study of this photospheric magnetic field evolution may shed light on the energy build-up and release mechanism due to solar eruptions. The most frequently discussed mechanism for the change in photospheric magnetic field structure is flux emergence and cancellation (Livi et al. 1989;Spirock et al. 2002;Sudol & Harvey 2005;Burtseva & Petrie 2013). Flux emergence and cancellation is found to play a significant role in some theories of the solar eruptions (van Ballegooijen & Martens 1989;Amari et al. 2010). Flux cancellation is also one of the necessary conditions for the formation of solar filaments (Martin et al. 1985;Gaizauskas et al. 1997;Martens & Zwaan 2001). Solar filaments are believed to be one of the major precursors for solar eruptions (Sinha et al. 2019). One can also predict the possible orientation of the ejected magnetic field due to solar eruptions by studying the hemispheric preference of the filament chirality (Martin et al. 1994;Hazra et al. 2018). In summary, the time evolution of photospheric magnetic field parameters plays an important role in the onset phase of the solar flare. However, given a large amount of solar data, it is almost impossible to analyze every solar eruptive event manually. We must have to build some reliable automated method which can analyze the eruptive active regions and distinguish them from other non-eruptive active regions.
In recent times, machine learning appears as a promising automated candidate for reliable forecasting of solar eruptive events (Ahmed et al. 2013;Bobra & Couvidat 2015;Bobra & Ilonidis 2016;Nishizuka et al. 2017;Hamdi et al. 2017;Ma et al. 2017;Filali Boubrahimi & Angryk 2018;Florios et al. 2018;Inceoglu et al. 2018). Machine learning is also used to identify the common parameter which is most important to distinguish the eruptive active region from other non-eruptive active regions. Dhuri et al. (2019) used machine learning to find out the critical criteria at the onset phase which can lead to a solar flare. Different types of data sets are used for the purpose of predicting eruptive events using machine learning. Yu et al. (2009) and Yuan et al. (2010) used line of sight magnetogram data obtained from Michelson Doppler Imager (MDI) for flare prediction. Aggarwal et al. (2018) used filament metadata for the prediction of eruptive events. However, most of the studies used vector magnetogram data obtained from the Helioseismic Magnetic Imager (HMI) onboard Solar Dynamics Observatory (SDO) for the purpose of flare prediction. Different machine learning classifiers have been used for the solar flare prediction. Some studies also used the time series of the magnetic field data obtained from HMI for flare prediction Ma et al. 2017).
In this paper, we aim to investigate the importance of the time evolution of magnetic parameters in terms of flare forecasting. We find that there is a significant difference between the eruptive and non-eruptive active region in terms of both strength and time evolution of photospheric magnetic parameters. We also try to find out the common magnetic parameter which will clearly separate the eruptive active regions from the non-eruptive ones. For this purpose, we train the machine learning algorithm using the time evolution of photospheric magnetic parameters. We are able to predict solar flare quite well. We find that total unsigned magnetic helicity and total unsigned magnetic flux have higher distinguishing capability compared to other photospheric magnetic parameters.
Section 2 describes the details of the data used in this study. We present a detailed manual comparison study between eruptive and non-eruptive active regions in terms of the time evolution of magnetic parameters in Section 3. We present a comparison study between eruptive and non-eruptive active regions using the machine learning algorithm in Section 4. We also describe the details about the machine learning algorithm and its performance in Section 4. Finally, we present the summary and conclusions of our study in Section 5.

Data for Active Regions
Helioseismic Magnetic Imager (HMI), an instrument onboard Solar Dynamics Observatory (SDO) spacecraft, provides us continuous full-disk photospheric magnetic field data Schou et al. 2012). HMI team developed an automated method that detects active region patches from the full disk vector magnetogram data and provides us a derivative data named, Space-weather HMI Active Region Patches (SHARP) (Bobra et al. 2014). The automatic detection algorithm operates on the line of sight magnetic field image and creates a smooth bounding curve, called bitmap centered at the flux weighted centroid. The HMI Stokes I, Q, U, V data was inverted within the smooth bounding curve by Very Fast Inversion of the Stokes Vector (VFISV) code, which is based on the Middle-Eddington model of the solar atmosphere. The 180 • ambiguity in the transverse component of the magnetic field was corrected using the minimum-energy algorithm (Metcalf 1994;Crouch et al. 2009). The inverted and disambiguated magnetic vector field data has been remapped to a Lambert Cylindrical Equal-Area projection which gives us decomposed Bx, By and Bz data. JSOC provides us this decomposed data. We have downloaded this decomposed data from the JSOC webpage. We have calculated 17 active region magnetic field parameters in every 12 minutes from this SHARP data. These parameters are listed with keywords and formula in a table (see below). Please note that we follow the same procedure to calculate active region magnetic field parameters as defined in Bobra & Couvidat (2015). We consider the pixels which are within bitmap and above a high confidence disambiguation threshold (coded value is greater than 60) for our magnetic parameter calculation. We use a finite difference method for calculation of computational derivative needed for parameter calculation. We use Green's function technique with a monopole depth of 0.00001 pixels for calculation of potential magnetic field which is necessary for the calculation of the total photospheric magnetic free energy density. We neglected active regions near the limb, where it is difficult to see magnetic structures due to the projection effect. Calculated magnetic field parameter data is also not reliable near the limb. Thus we only consider the data for our study which is within ±70 • from the disc center. We note that data for all of these magnetic parameters are available in the SHARP header. SHARP data products from SDO HMI can be found at jsoc.stanford.edu ).

Data for Solar Flare
We consider the solar flare for our study based on the peak Xray flux observed by GOES X-ray satellites. When Goes satellite detects a flare, it generally reports to the flare catalog. Then the flare is paired with its parent active region. Generally, five types of flares namely A, B, C, M, and X are observed by GOES satellites. While X and M class are high-intensity flares (intensity greater than 10 −5 Wm −2 ); other A, B, C flares are less intensive ones. For our study, we only consider X and M class flares as a flare. We also only consider the flares for our study which are within ±70 • of the central meridian and if there is also an associated parent active region.

Comparing Flaring and Non-Flaring Regions
Active regions NOAA 11166 (SHARP 401), NOAA 11283 (SHARP 833) and NOAA 11143 (SHARP 335) were chosen for the comparison study between flaring and the non-flaring active region. All of these active regions transit over the visible solar disk for a long duration. AR 11166 produces one X class and two M class solar flares during the passage over the visible solar disk, while AR 11283 produces two X class and five M class solar flares. In contrast, AR 1143 produces no flare during its transit. One may pose the question-why these three active regions behave so differently during their transit over the solar disk? is it possible to distinguish flaring and non-flaring active regions?
It is now well known that the different magnetic nature of the active regions is responsible for different behaviors. Here, we study the temporal evolution of photospheric active region magnetic parameters to get an idea about the difference between flaring and non-flaring active regions. Figures 1, 2, and 3 show the temporal evolution of four magnetic parameters-the total unsigned magnetic flux (Φ tot ), total unsigned current helicity (h c,tot ), total unsigned vertical current (J z,tot ) and the proxy of total photospheric magnetic free energy density (ρ tot ). All these four parameters have much a higher value in the case of the flaring active regions (AR 11166 and 11283) compared to the non-flaring active region (AR 11143). All four magnetic parameters also show significant evolution. Total unsigned magnetic flux for AR 11166 although decreases before the first large scale flare but increases for the other two flare (Fig. 1). The other three magnetic parameters for AR 11166 show an increasing trend before the first flare. Total unsigned magnetic flux for AR 11283 shows a significant decreasing trend before the first flare and increasing trend later (Fig. 2). Total unsigned magnetic helicity for both flaring active regions (AR 11166 and 11283) shows an increasing trend before the first flare and both the active region starts the flaring activity when the value of magnetic helicity is sufficiently high. Another interesting point is that once an active region starts flaring, it keeps on flaring. All four magnetic parameters also show significant evolution for non-flaring AR 11283 but have a much lower value compared to the other two flaring active regions (see Fig. 3). We also note that there is a high signal-to-noise ratio in the data near the solar limb, thus in our time evolution study, the value of magnetic parameters are not reliable during the start and end time (active regions are near the limb).
Change in the total unsigned magnetic flux during the active region transit is mostly due to flux cancellation and emergence on the solar surface. The disappearance of the magnetic flux is always observed when the magnetic flux of one particular polarity encounters the flux fragments of the opposite polarity. Some previous studies indicate that flux cancellation plays an important role in triggering solar eruptions (van Ballegooijen & Martens 1989;Amari et al. 2010). The total unsigned magnetic flux of the active region and the magnetic flux near the polarity inversion line (R-value) is also found to be correlated well with the flaring activity and the coronal X-ray luminosity (Schrijver 2007;Leka & Barnes 2007;Barnes & Leka 2008;Burtseva & Petrie 2013;Hazra et al. 2015). The emergence of the new magnetic flux is also a well-observed phenomenon and believed to be one of the mechanisms for the formation of the current sheet (Tur & Priest 1976;Wang & Tang 1993;Sudol & Harvey 2005). Our results point out that total unsigned magnetic flux is consid-erably higher in flaring active regions compared to non-flaring active regions.
We also found that flaring active regions are magnetically more complex compared to non-flaring active regions. One can characterize the magnetic complexity in an active region by different magnetic parameters, namely vertical electric current, magnetic helicity, twist, shear angle, photospheric magnetic free energy density, etc (Abramenko et al. 1996;Metcalf et al. 2005;Pevtsov et al. 1994;Park et al. 2008). Magnetic helicity, a measure of twist, shear, and inter-linkage of magnetic field lines, is a conserved quantity in an ideal MHD scenario (Berger & Field 1984). Change in the magnetic helicity reflects the deviation from the ideal MHD scenario, indicating the evolution of magnetic complexity inside the active regions. We find significantly higher magnetic helicity and excess magnetic free energy in the flaring active region compared to the non-flaring region. Recent theoretical and observational studies also suggest that the injection of magnetic helicity of both the same and opposite sign to system's global helicity can trigger the solar flare (Kusano et al. 2002;Moon et al. 2002;Park et al. 2008Park et al. , 2012Park et al. , 2013. Kusano et al. (2003) also developed a theoretical model of solar flare based on the annihilation of magnetic helicity. Our result indicates both the accumulation and annihilation of magnetic helicity before the onset of a solar flare.
In summary, our result highlights the importance of time evolution of magnetic parameters in distinguishing flaring and nonflaring active regions. However, one may pose the questionwhich magnetic parameter is more important? It is very difficult to tell. In reality, there is a large number of active regions appeared on the solar surface within a few days. It is difficult to analyze all active regions manually to predict the probability of that active region eruption. We have to develop some automated method which will help us to predict whether an active region will flare or not.

Comparing Flaring and Non-Flaring Regions using Machine Learning
In the previous section, we discussed the differences between flaring and non-flaring active regions based on the time evolution of magnetic parameters. In this section, we want to distinguish them based on the automated machine learning method. Machine learning is a branch of artificial intelligence which provides computer the ability to learn automatically and improve from the past experience without being explicitly programmed. Two types of learning namely, unsupervised learning and supervised learning exist in the machine learning literature. In an unsupervised learning scenario, we do not have to supervise the model. We will allow the computer model to work on its own to figure out the information. This learning technique mainly works with unlabelled data. In the supervised machine learning scenario, we have to train the computer model based on known well-labeled data. As we already have a well-known database of flaring and non-flaring active regions, we have used supervised learning techniques for our problem. In the supervised machine learning scenario, first, we have to train the computer-based on known flaring and non-flaring active region data. After the training, the computer will be able to tell us the possibility of active region eruption.
Article number, page 3 of 13 Mean photospheric magnetic free energy density ergs/cm 3ρ ∝ 1 Fraction of area with shear > 45 • m 2 Area with shear > 45 • /total area AREA_ACR Area of strong field pixels in a active region m 2 Area = Pixels

MEANGBT
Mean gradient of total field G/Mm

. Data Preparation
To train the supervised machine learning algorithms, we have to first define the positive and negative class. We follow the same definition as prescribed by Ma et al. (2017) and Dhuri et al. (2019) for this purpose. The active region which produces at least one X or M class solar flare during its transit over visible solar disk belongs to the positive class and the active region which does not produce any X or M class flare during its transit belongs to the negative class.
Most of the previous flare prediction studies consider only magnetic parameters which are 24 hours before the flare time. They did not consider any time evolution of magnetic parameters for their prediction purpose. However, most of the theoretical models suggest a change in magnetic parameters before the solar flare. The standard Flare model indicate that flux cancellation near the polarity inversion line is an important determinant for solar flare (van Ballegooijen & Martens 1989). Thus it is necessary to include the basic essence of the time evolution of magnetic parameters in the training purpose.
In this study, we consider the time evolution of magnetic parameters during a time window, named span, for the training purpose. The span time window is always before the loopback time. The loopback is the time window before the occurrence of the solar flare. Figure 4 shows a graphical representation of this selection. Let assume, we want to predict solar flare before 24 hours of the occurrence, and for this purpose, we consider the 12 hours of the time evolution of magnetic parameters which is 24 hours before the flare occurrence. In this situation, loopback is 24 hours and span is 12 hours. As we consider 17 magnetic parameters for our study, so we get 17 time series of magnetic parameter evolution for the training purpose. We represent each time series of magnetic parameter evolution by seven statistical parameters associated with the time series, namely, mean, median, skewness, kurtosis, standard deviation, first and third quartile. For a time series named T = [x 1 , x 2 , x 3 , ..., x n ], statistical summary parameters are defined in the following way: Median is the middle number of the ascending time series T, the first quartile is the middle value between the median and the smallest number of the time series T and third quartile is the middle value between the median and the largest number of the time series T. Standard deviation (σ) basically represents the dispersion around the mean. Skewness and kurtosis is basically a statistical measure to describe the distribution. While skewness is the measure of the symmetry of the dataset; kurtosis tells us how the tails of the distribution differ from the tails of the normal distribution. We consider 17 magnetic parameters for our study. So, we have a time series for every seventeen magnetic parameters. As we represent each of the time series by 7 statistical parameters, so we have 119 entries in the resultant matrix.

Different Supervised Machine Learning Techniques
There are different supervised machine learning classifiers in the literature. These supervised machine learning algorithms are used for training. Some of the well known supervised machine learning classifiers are-logistic regression, decision tree, Hazra, Sardar & Chowdhury: Distinguishing between Flaring and Non-Flaring Active Regions KNN, Naive Bayes, support vector machine, multilayer perceptron, random forest, etc. In the machine learning literature, every problem is unique. We do not know what algorithms to use, whether the problem can be modeled effectively. A baseline model is the simplest possible prediction model. This baseline model result will tell us whether the use of more advanced algorithms are adding any value in the result or not. There is no need for a complex, advanced machine learning algorithm for a particular problem if a simple baseline algorithm can do the same. In this study, we use the Logistic Regression classifier as a baseline model and compare the baseline result with the results obtained from some other complex machine learning algorithms, namely, support vector machine, and multilayer perceptron.

Baseline Model: Logistic Regression
Logistic Regression (LR) is one of the simplest and commonly used machine learning algorithms for the binary classification problem. It is easy to implement, easily interpretable, efficient and does not require high computation power; thus it can be used as a baseline model for the binary classification problem. This model estimates the probability of an event occurrence by fitting data to a logistic function. The equation used for logistic regression is: log( p 1−p ) = α 0 + α 1 x 1 + .... + α n x n where p is the probability of the event occurrence. x 1 , x 2 , ....x n is the number of independent variables. p 1−p is known as the odd ratio. If the odd ratio is positive, then the probability of event occurrence is more than 50 %. One of the major drawbacks of the algorithm is the assumption of linearity between the dependent and independent variable. This algorithm separates the classes by constructing a linear decision boundary between them. It is basically a linear classifier. It does not perform well if the classes are not linearly separable. We use lbfgs solver for our logistic regression classifier. Support Vector Machine: Support Vector Machine (SVM) is a classification algorithm that separates the data of two classes by finding a line (in 1-D) or a hyperplane (in higher dimensions) between two classes. In the SVM algorithm, the points near the line or the hyperplane are called support vector and the distance between the support vector and the line or the hyperplane is called margin. This algorithm tries to find hyperplane or line by maximizing the margin. SVM is highly suitable for linear classification problems. However, SVM can also solve nonlinear classification problems by moving lower-dimensional space to higher-dimensional space such that we can find the separating hyperplane in the higher dimension. These transformations are known as kernel trick.
Let assume we have N training points where each input x i has D attributes and belongs to any of the two classes y i = +1 or −1. In most of the cases, different classes are not-fully linearly solvable. In this situation, people generally use the soft margin SVM algorithm where the concept of a slack variable and the idea of a trade-off between the minimization of misclassification rate and the maximization of margin is introduced. One can describe the hyperplane by the equation w i x i + b − 1 = 0 where w is the normal to the hyperplane and b/||w|| is the normal distance Article number, page 5 of 13 A&A proofs: manuscript no. ms from origin to the hyperplane. In the SVM scenario, 1/||w|| is the margin. In the soft margin SVM algorithm, we have to select the variable b and w in a way so that we can describe our training data by: where, ψ i ≥ 0 is the slack variable. We can combine these two equation into a single equation: In the soft margin SVM algorithm, one has to maximize the margin and also have to reduce the misclassification rate. This can be done by minimizing an objective function subject to the previous condition. In a more general way, this problem can be defined as: Parameter C controls the trade-off between the size of the margin and the slack variable. This parameter is also known as the regularization parameter. Φ is the function that maps the input data into higher dimensional space, also known as the kernel function. It is sometimes tricky to find an appropriate kernel for a particular problem. We do not know whether our problem is linearly separable or not linearly separable. We have used kernel trick for our problem. It has been previously shown that radial basis function kernel projects vectors into infinite-dimensional space. Motivated by this fact, we have used a radial basis function kernel for our study. Choice of the rbf kernel ensures that our support vector machine algorithm will separate the classes by constructing a non-linear decision boundary.
Multilayer Perceptron: Multilayer perceptron (MLP) uses the concept of a neural network to predict the output based on some input features. Perceptron is a linear classifier which separates the input into two classes by a straight line. The output of a perceptron depends on the input i.e., the feature vector (x) which is multiplied by weight w and added to a bias b (simply, output = w.x + b). The final prediction will be after passing the output of the perceptron through a non-linear activation function.
Multilayer Perceptron is basically a deep neural network. It consists of an input layer where the feature vector is fed, an output layer for making the prediction about the input and an arbitrary number of hidden layers in between the input and output layer. Each neuron in the multilayer perceptron is connected with all other neurons of the previous layer. Neuron is basically a processing unit where inputs are summed using weights and result is passed via an activation function. In summary, the output of each basic processing unit (neuron) is: where x denotes the vector of inputs, w is the vector of weights, b represents the bias and Φ is the activation function. We use the RELU activation function for each neuron in the hidden layers and in the output layer, we use a sigmoid activation function.
Training the MLP algorithm involves the adjustment of bias and weights to minimize the error in the output. This is achieved by using forward and backward propagation. MLP is basically Hazra, Sardar & Chowdhury: Distinguishing between Flaring and Non-Flaring Active Regions a feed-forward network which involves constant backward and forward propagation until we achieve the desired result.
Forward Propagation: In this method, we move the signal from the input layer to the output layer via hidden layers. We measure the output or decision of the output layer with respect to the ground truth label. This measurement is also known as error.
Backward Propagation: In this process, we backpropagate the partial derivative of the error with respect to weights and bias from the output layer. A stochastic gradient descent algorithm is used to adjust the weights and bias in this process. The multilayer perceptron model has some advantage. A very complex model can be trained by the MLP model and fea-ture engineering is not required before training. However, it is difficult to explain the MLP model simply and parameterization is also complex. This model also needs more training data.

Performance Measure and Class Imbalance Problem
We will get a confusion matrix as a result of our classification algorithm-which consists of four entries, namely, TP, TN, FP, and FN. Here TP (True Positive) are the cases where positively labeled samples are correctly predicted as positive, TN (True Negative) are the cases where positively labeled samples are wrongly predicted as negative, FP (False Positive) are the cases where negatively labeled samples are predicted as positive and FN (False Negative) are the cases where negatively labeled samples are correctly predicted as negative. Accuracy in the classification problem is defined as the number of correct predictions made by the model over the total number of predictions made: Accuracy is a good performance measure when the data set is balanced. Some other performance measures are Precision (T P/(T P + FP)), Recall (T P/(T P + FN)) and the F-score (harmonic mean of precision and recall).
As solar active regions do not have flare/eruption most of the time, thus non-flaring active regions are much more compared to the number of flaring active regions. There is a huge imbalance between the number of flaring and non-flaring active regions. This problem is known as the class imbalance problem. In this case, accuracy will be very high if the model predicts almost all active regions as non-flaring (as the number of non-flaring regions is very high). However, we aim to predict the flaring active regions which are rare. Thus, accuracy is not a good performance measure for the class imbalance problem. Later, some other performance measures, namely, Heidke Skill Score (HS S 1 and HS S 2 ) and Gilbert Score (GS) have been used where, P and N are the total number of actual positive and negative samples. HS S 1 measures the improvement of the prediction over all negative predictions, while HS S 2 measures the same but over a random forecast. Gilbert Score (GS) measures the number of true positive (TP) obtained by chance.
However, all these measures HS S 1 , HS S 2 and GS till have some dependency on the ratio of class imbalance. To alleviate this problem, Bloomfield et al. (2012) introduced a new performance measure, namely True Skill Statistic (TSS) which is independent of the class imbalance ratio. True skill statistic is defined as: Value of TSS varies from −1 to +1 where perfect correct prediction scores +1, always wrong prediction scores −1 and random prediction scores zero. Flare prediction is a highly imbalanced problem. TSS is the most meaningful measure in the case of flare prediction scenario as it does not depend on the ratio of class imbalance.

Results
We use the active region vector magnetogram data from June 2010 to December 2017 for this study. After generating the dataset, we perform some preprocessing in our data set due to the presence of some missing values. We simply replace them with the mean of the corresponding features. It is also necessary to normalize the dataset before training as it will transform the ranges of all feature values into a uniform range. In this study, we use the zero-one data transformation technique for normalization. We randomly divide our dataset into a training set (70%) and a testing data set (30%). We maintain the same class ratio (N/P) in both training and testing data set following the prescription of Bobra & Couvidat (2015) and Ahmed et al. (2013). Please note that we do not include C class flares in our positive data set. Our data set is highly imbalanced as flaring regions are much less compared to non-flaring ones. Oversampling and undersampling are the two well-known strategies to make an imbalanced data set balanced. While oversampling involves the strategy of adding more positive examples in the dataset, undersampling involves the strategy of removing the majority negative examples from the dataset. However, both methods have some limitations. In oversampling, the addition of many replicas of positive examples may cause the model to memorize the pattern and makes the model prone to overfitting. On the other hand, we are removing many negative examples in the undersampling strategy, thus the computer is not learning from the entire data set.
In this study, we use the concept of weighted techniques to tackle the issue of class imbalance. In the weighted techniques, we provide more weights to the classes which we aim to predict. Ahmadzadeh et al. (2019) found that weighted techniques work better to tackle the class imbalance issue compared to oversampling and undersampling techniques in case of flare prediction studies. The weighted technique is also free from the limitations of undersampling and oversampling techniques. In our study, we aim to predict flaring active regions which are the minority one. We have provided more weight to the flaring active region compared to the non-flaring one such that our classifier does not focus exclusively on the non-flaring class which is the majority one. In our SVM classifier, We set the cost parameter of the flaring class higher compared to the non-flaring one to tackle the issue of class imbalance. We also use a similar technique for our baseline classifier to address the issue of class imbalance. In the case of our other classifier multilayer perceptron, we use the weight balancing technique to encounter the issue of class imbalance. In the weight balancing technique, we alter the weights of each training data during the computation of loss function. Generally, both positive and negative classes carry the same weight 1.0 in the computation of loss function. As our primary aim here is to predict minority classes (flaring one), we provide more weight to the flaring classes in the calculation of loss function compared to the non-flaring one. Table 1 lists the performance metrics found after running the baseline weighted logistic regression classifier on the five different data sets. We have provided the means and standard deviations in the table by repeating training and testing phases few times. We use a 24-hour forecasting window for the data sets namely loop24span12, loop24span24, and loop24span0. We use 12 and 24 hours time evolution of magnetic parameters respectively for the data sets loop24span12 and loop24span24 as training purposes (span 12 hours and 24 hours). While we use no time evolution of magnetic parameters for the data set loop24span0 as training purposes. The loop24span0 data set is almost similar like Bobra & Couvidat (2015) but with larger time Table 1. Flare prediction capabilities obtained from our baseline model Logistic Regression for five different data sets. First three data sets namely loop24span12, loop24span24, loop24span0 correspond to the data sets with same forecasting window of 24 hour but different span time of 12 hour, 48 hour and zero hour respectively. While last two data sets namely loop12span12 and loop48span12 correspond to the data sets with same span time of 12 hour but different forecasting window of 12 hour and 48 hour respectively. duration (June 2010 to December 2017). We use 12 hours time evolution of magnetic parameters respectively for the data sets namely loop12span12 and loop48span12 as training purpose but we use different forecasting windows. The forecasting window is 12 hours for the dataset loop12span12; while the forecasting window for the dataset loop48span12 is 48 hours. We found a very good TSS value for all five cases (see Table 1). We get the maximum TSS value ( 0.92) for the data set loop12span12. Interestingly we also find very good TSS value for the data set loop24span0 where we do not use any time evolution for training. In summary, time evolution does not have much impact on the performance of a classifier. We also find a good TSS value for the data set loop48span12. Our baseline model result indicates that it is possible to distinguish a flaring active region from the non-flaring one.
Next, we use two more advanced classifiers namely support vector machine and multilayer perceptron for our experiment. We wanted to know whether our selected advanced classifiers can improve the performance significantly compared to our baseline model. These two advanced classifiers can also separate non-linearly separable classes. We tune the hyperparameters of the support vector machine (SVM) algorithm by using a grid search algorithm provided by the Scikit-Learn software package (Pedregosa et al. 2012). This algorithm finds the best combination of hyperparameters after performing an exhaustive search over a predefined set of hyperparameters. Our SVM hyperparameters namely, regularization parameter (C) varies between 0.001 and 10 and the RBF parameter γ varies between 0.001 and 1. We use binary cross-entropy as a loss function and rmsprop as an optimizer for our multilayer perceptron algorithm. We also use Keras l2 regularizer as a kernel regularizer for our MLP algorithm. Table 2 lists the eleven performance metrics found af-ter running two different classifiers, namely support vector machine and multilayer perceptron, on the five different data sets. We find that both the support vector machine and the multilayer perceptron algorithm performs well to distinguish the flaring active regions from the non-flaring one. We also note that TSS values obtained from SVM and MLP classifiers are higher compared to TSS values obtained from our baseline logistic regression classifier. In summary, advanced classifiers perform well compared to the baseline model. However, the performance of the advanced classifiers is very close to the performance of the baseline model. The baseline logistic regression classifier (linear classifier) works very well in distinguishing flaring and nonflaring regions.
One may pose the question-which magnetic parameter is most critical for distinguishing flaring and non-flaring active regions? We follow the suggestions of Hamdi et al. (2017) to find out the best active region parameter. In this study, we use the time evolution of magnetic parameters for training machine learning algorithms. The magnetic parameter whose corresponding time evolution gives maximum true skill statistic (TSS) after the classification by logistic regression (baseline) and SVM algorithm is considered to be the best among all active region magnetic parameters in terms of distinguishing capability. We use the data set loop24span12 for this experiment. For this purpose, we train both SVM and LR algorithm by using the time evolution of a single magnetic parameter at a time and measure the TSS value for each case. The bar plot of Figure 5 shows that both the total unsigned magnetic helicity and total unsigned magnetic flux achieve maximum TSS. These results indicate that the time evolution of the total unsigned magnetic helicity and the total unsigned magnetic flux is the best indicator in terms of distinguishing capability. Bobra & Couvidat (2015) also found that the total Table 2. Flare prediction capabilities obtained from our complex classifier models namely support vector machine and multilayer perceptron for five different data sets. First three data sets namely loop24span12, loop24span24, loop24span0 correspond to the data sets with same forecasting window of 24 hour but different span time of 12 hour, 48 hour and zero hour respectively. While last two data sets namely loop12span12 and loop48span12 correspond to the data sets with same span time of 12 hour but different forecasting window of 12 hour and 48 hour respectively. We consider all magnetic parameters to create the data set. unsigned magnetic helicity parameter is the best active region parameter based on the Fisher criterion. However, we find that the time evolution of total unsigned magnetic flux also has an equally good distinguishing capability. We also note that time evolution of some other parameters namely total photospheric magnetic free energy density, total unsigned vertical current, and AREA_ACR also have a good distinguishing capability (see Fig.  5). Time evolution of total unsigned magnetic flux is an indicator of flux cancellation and emergence on the solar surface. On the other hand, the time evolution of magnetic helicity indicates the magnetic complexity of the active region. Our finding in terms of critical active region parameter is consistent with the earlier theoretical and observational findings. Previous studies suggested that there will be no overfitting if we use twelve to eighteen magnetic parameters (Bobra & Ilonidis 2016;Inceoglu et al. 2018). Because of this, we did not ap-ply any feature selection criterion before the application of a machine learning algorithm. Now, we select only the top five best magnetic parameters for our study. Our selected top five best magnetic parameters are total unsigned magnetic flux (US-FLUX), total unsigned current helicity (TOTUSJH), total unsigned vertical current (TOTUSJZ), total photospheric magnetic free energy density (TOTPOT), and the area of strong field pixels in the active region (AREA_ACR). We train our baseline logistic regression algorithm by using the time evolution of these five magnetic parameters. The bottom part of Table 1 lists the performance metrics found after running the classifiers over the data sets which considers only the best five magnetic parameters. We find that our baseline classifier performs quite well in terms of distinguishing capability even if we consider only the top five magnetic parameters. We also note that TSS values obtained by Fig. 5. Top bar plot shows the distribution of TSS values after running the logistic regression (baseline) classifier on the summarized time series of individual active region parameters. Bottom bar plot represents the same but used support vector machine algorithm. This plots clearly show that top few parameters, specifically, total unsigned magnetic flux and the total unsigned magnetic helicity have the best distinguishing capability using only the top five magnetic parameters are very close to the values obtained by considering all magnetic parameters.
How does the classifier performance change with the forecast window? To investigate this issue, we create the data sets with different forecasting windows (loop) but the same span. Table 1 shows the classification metrics obtained after running the classifier over three data sets, namely, loop12span12, loop24span12, and loop48span12. So, we have three data sets namely loop12span12, loop24span12, and loop48span12 for this study. We use a 12-hour time evolution of magnetic parameters for training in all cases. Table 1 shows that there is a decreasing trend of TSS value with our selected forecasting windows. However, we get very good TSS values for all three data sets. Our selected forecasting windows have very little impact on the performance of the classifier.
How does the time evolution of magnetic parameters for different time windows (span) change the performance of the classifier? For this purpose, we consider data sets having a fixed forecasting window of 24 hours but different span windows. Figure 6 shows the change of TSS values after running the SVM and MLP classifiers over these data sets. We do not find any increasing or decreasing trend of TSS values with the span size. Span size may not have much impact on the performance of the classifier. Table 3 shows the different performance metrics found after running some other classifiers namely, random forest, Knn, Naive Bayes on the loop24span12 data set. We have also listed the results of our previous three classifiers for comparison. The hyperparameters for our other classifiers are the following: Random Forest (n estimators=10, max depth=None, criterion=gini, class weight=balanced), Naive Bayes (Priors=None), KNN (number of neighbors k=1, dis-tance= Eucledian). A comparison between the results obtained from different classifiers confirms the robustness of our model in distinguishing flaring and non-flaring active regions. We also note that our baseline classifier Logistic Regression (which is easily interpretable) have very good performance. The code for this study can be found in our github repository (http://gitub.com/soumitrahazra/Flaring_Region_Prediction).  6. Variation of the true skill statistic (TSS) after running the support vector machine and multilayer perceptron over the data sets with different span but fixed 24 hour forecasting window, i.e. loopback.

Summary
We have performed a comparison study between eruptive and non-eruptive active regions in terms of the time evolution of photospheric magnetic parameters. We first performed this study manually. We have selected two eruptive active regions and one non-eruptive active region to find out the difference between the time evolution of magnetic complexity in eruptive and noneruptive active regions. We find a significant difference between the eruptive and non-eruptive active region in terms of both strength and time evolution of photospheric magnetic parameters. All of our selected magnetic parameters, namely total unsigned magnetic flux, total unsigned helicity, total photospheric magnetic free energy density, and the total unsigned vertical current have a much higher value in case of eruptive active regions compared to the non-eruptive active region. We find the signature of flux emergence and cancellation in the time evolution of the total unsigned magnetic flux. Time evolution of the total unsigned current helicity shows a strong indication of both helicity accumulation and annihilation at the onset of a solar flare.
As it is not possible to analyze all flaring events manually due to a large amount of solar data, we have used machine learning algorithms to distinguish eruptive active regions from the noneruptive one. We use the time evolution of photospheric magnetic parameters to train our baseline machine learning algorithm logistic regression. Motivated by Bloomfield et al. (2012) and Bobra & Couvidat (2015), we have selected true skill statistic (TSS) as a performance measure of our forecasting algorithm as TSS does not depend on the class imbalance. Solar flare prediction is a highly imbalanced problem as there is a fewer flaring active region compared to non-flaring ones. We obtain high TSS values for our baseline algorithm. A higher value of TSS also implies a lower false-negative rate which is very important for the forecasting study. We have then compared our baseline result with the results obtained from more advanced classifiers, namely, support vector machine and multilayer perceptron. We find that the use of advanced machine learning algorithms improves performance. But the improvement is not much significant.
We note that our TSS value is higher compared to previous studies. This may be due to our data set (as the data set is different from previous studies) or may be due to our choice of training the computer by the evolution of photospheric magnetic parameters. Martens & Angryk (2018) describes the importance of the "benchmark data set" for this kind of prediction studies. Preparation of the "benchmark data set" involves many processes like gathering a large amount of data, cleaning up the data and balancing these data. We have also performed one study where we did not use the time evolution of photospheric magnetic parameters to train the machine learning algorithm. We found a very good TSS value in that case. That's why we are not sure whether time evolution of magnetic parameters is really important for flare forecasting study. However, we note that we describe the time evolution by eight statistical parameters -which may not be a very good way to describe the time evolution (but this is the simplest one). We have not considered any statistical quantities to represent seasonal or periodic features in this study. Features generated from fast Fourier transform (FFT) of the time series will be helpful to capture the information of periodic features of the time series. Different characteristics of the time series carry some important pieces of information. There is also no established work at this moment which clearly determines the effective statistical features for improved flare forecasting. We leave this investigation for our future studies.
Next, we try to find out the common critical magnetic parameter which will clearly distinguish the eruptive active regions from the non-eruptive one. We find that time evolution of total unsigned magnetic helicity and total unsigned magnetic flux have a very high distinguishing capability. We note that the time evolution of unsigned magnetic flux is an indicator of flux emergence and cancellation on the solar surface; while time evolution of unsigned magnetic helicity represents the helicity accumulation and cancellation. Previous theoretical models describe the importance of both this mechanism in detail at the onset phase of the solar model. Our manual analysis with three active regions also supports this view. Now, we want to compare this result with some earlier studies. Bobra & Couvidat (2015) found that total unsigned magnetic flux has the best-distinguishing capability compared to others. However, they did not find any significant distinguishing capability of the total unsigned magnetic flux. Please note that they did not use the time evolution of magnetic parameters to train the machine learning algorithm. Ma et al. (2017) used the univariate time series clustering and multivariate time series decision tree for the purpose of flare prediction and found that both the total unsigned magnetic flux and the total unsigned magnetic helicity have very high distinguishing capability compared to other.
We also find very high TSS value when we consider only the evolution of the top five magnetic parameters for the training of machine learning algorithms. In summary, although we are not able to find a single critical magnetic parameter, we find that a combination of the top few magnetic parameters will give us almost similar distinguishing capability. This result is consistent with earlier studies. Earlier studies also found that the top few magnetic parameters can produce the forecasting capability comparable to their entire data set (Leka & Barnes 2003, 2007Ahmed et al. 2013;Bobra & Couvidat 2015;Hamdi et al. 2017). We note that "total" parameters are more valuable in flare study compared to the "mean" one. Welsch et al. (2009) also found that extensive magnetic parameters (whose value increases with size) have a stronger correlation with the flare productivity compared to the intensive one (value does not increase with size). This result indicates that larger and complex active regions are more flare prone compared to a smaller one.