A&A 476, 951-957 (2007)
DOI: 10.1051/0004-6361:20078004
M. Carbonell1 - J. Terradas2 - R. Oliver2 - J. L. Ballester2
1 - Departament de Matemàtiques i Informàtica, Universitat de les Illes Balears,
07122 Palma de Mallorca, Spain
2 -
Departament de Física, Universitat de les Illes Balears,
07122 Palma de Mallorca, Spain
Received 4 June 2007 / Accepted 26 July 2007
Abstract
Aims. Many studies of the North-South asymmetry of solar activity and its features have been performed. However, most of these studies do not consider whether or not the asymmetry of the time series under consideration is statistically significant. If the asymmetry is statistically insignificant, any study about its behavior is meaningless. Here, we discuss the difficulties found when trying to assess the statistical significance of the North-South asymmetry (hereafter SSNSA) of the most usually considered time series of solar activity.
Methods. We distinguish between solar activity time series composed of integer or non-integer and dimensionless data, or composed of non-integer and dimensional data. For each of these cases, we discuss the most suitable statistical tests which can be applied and highlight the difficulties in obtaining valid information about the statistical significance of solar activity time series.
Results. Our results suggest that, apart from the need to apply suitable statistical tests, other effects such as data binning, the considered units and the need, in some tests, to consider groups of data, substantially affect the determination of the statistical significance of the asymmetry.
Conclusions. The assessment of the statistical significance of the N-S asymmetry of solar activity is difficult and an absolute answer cannot be given, since many different effects influence the results given by the statistical tests. The quantitative results about the statistical significance of the N-S asymmetry of solar activity provided by different authors, as well as studies of its behaviour, must be considered with care because they depend on the chosen values of different parameters or on the considered units.
Key words: Sun: activity - methods: data analysis - methods: statistical
There is an important solar activity feature, the photospheric magnetic flux, for which the behavior of the N-S asymmetry has been also studied. Howard (1974), using magnetic flux data from Mt. Wilson, analyzed the period between 1967 and 1973; Rabin et al. (1991) using magnetic flux data from Kitt Peak, studied the period between 1975 and 1987; Knaack et al. (2004) have used Kitt Peak data about photospheric magnetic flux density to study the time interval between 1975 and 2003; Song et al. (2005) have used Kitt Peak data, between 1978 and 2002. However, of all these authors only Song et al. (2005) have estimated the statistical significance of the North-South asymmetry (hereafter SSNSA) of the photospheric magnetic flux time series under consideration, applying the binomial distribution, and so most of the conclusions of the rest of studies about the N-S asymmetry of photospheric magnetic flux have been obtained by visual inspection of the plot of the asymmetry versus time. Taking into account the fundamental role of the magnetic flux in solar activity, a quantitative assessment of its N-S asymmetry should be of great interest.
Before studying the behavior of the N-S asymmetry, the most important
point, sometimes forgotten, is to assess the SSNSA of the time series under
consideration. The most straightforward way to determine the SSNSA is
by means of the binomial distribution (Li et al. 1998; Li et al.
2003; Song et al. 2005). However, other statistical tests have also
been used, for instance, Joshi (1995), Temmer et al. (2001) and Joshi
& Joshi (2004) have followed Leftus (1960) using a -test to
assess the SSNSA of sunspot groups, H
flares and active
prominences; Temmer et al. (2002, 2006) have used a paired Student's
test to study the SSNSA of the hemispheric sunspot number; Ataç
& Özgüç (1996) used a sign test, introduced by
Gleissberg (1947), to determine the SSNSA of the flare index, and
Vizoso & Ballester (1990) and Carbonell et al. (1993) have used
Excess (Reid 1968; Wilson 1987) to obtain the SSNSA of sunspot
areas. In all the cases mentioned before, the authors concluded
that, to a great extent, the N-S asymmetry of the considered time
series was statistically significant.
However, a blind application of the above statistical tests to any considered solar activity time series can lead to misleading results. The commonly considered solar activity time series have two different forms: (a) composed of integer or non-integer data; (b) composed of dimensional or dimensionless data. The first characteristic, integer or non-integer data, is relevant for the statistical tests that can be applied. In the case of integer data records any statistical test can be applied, but in the case of non-integer data one needs to choose carefully what tests can be used. Related to the second characteristic, dimensional or dimensionless data, this raises two interesting problems when dimensional data are considered: (a) how does a change of units affect the statistical significance? (b) Is it possible to find a statistical test whose results are independent of the considered units?
Here, we search for quantitative conclusions about the SSNSA of the most often considered solar activity time series. To this end, we analyse the results obtained after the application of different statistical tests to solar activity time series made of integer and dimensionless data, and the effects on the SSNSA induced by data binning and point out the problems encountered when trying to assess the SSNSA of non-integer and dimensional solar activity time series. We perform these analysis since many conclusions about the behaviour of the N-S asymmetry of solar activity have been extracted without a proper quantitative evaluation of its statistical significance, or without considering whether a definitive answer about the SSNSA can be obtained.
The different solar activity time series analyzed in our study are:
Usually, the application of a statistical test is based on the
hypothesis of independence of experiments.
However, when we consider the N-S asymmetry of solar activity, the
data records of many of the considered time series are not independent. For instance, if we consider daily sunspot areas,
magnetic flux, solar prominences, sunspot number, etc. the values of these quantities today are not independent
of
the values yesterday, in particular, daily sunspot areas are
correlated with a typical correlation time of about 7 days (Oliver
& Ballester 1995).
Then, strictly speaking, solar activity does not satisfy the above
hypothesis, and, for this reason, the application of statistical tests
to determine the SSNSA is not appropriate.
Another consideration of interest is related to the characteristics of
the studied solar activity time series which can be
classified as composed of integer and dimensionless data, such as the
International sunspot number, the number of X-ray flares and the
number of solar active prominences, or composed of non-integer and
dimensional data, such as the H
flare index, sunspot areas,
Mount Wilson total magnetic flux and Kitt Peak averaged magnetic flux
density. This second consideration is of paramount importance since
it constrains the statistical tests that can be applied to the
above time series. The third consideration is that in all the
studies about the N-S asymmetry we only consider the solar activity
corresponding to the visible hemisphere.
Table 1: Surface expressed in dm2. The labels of the columns, from left to right, correspond to the normal distribution approximation to the binomial distribution, to the Pearson's chi-square test, to the Excess, and to the binomial distribution. The rows, from top to bottom, correspond to a highly statistically significant result, a statistically significant result, a marginally significant result, and a statistically insignificant result. The numbers shown in each column correspond to the number of events and its corresponding percentage with respect to the data records of the considered time series.
In the case of non-integer and dimensionless time series, only the tests Excess, Normal approximation to the Binomial distribution, and Pearson's chi-square test can be applied.
When the data are dimensional another problem appears related to the considered units. In solar activity time series, the accuracy of the non-integer and dimensional data is determined by the measurement process, so the data are truncated after some decimal places. One way to obtain integer data is to modify the units of the considered time series, and to apply the statistical tests described in the Sect. 2.3. Then, an interesting experiment is to consider what happens to the statistical significance when the units are modified. A simple way to do this is to generate two synthetic time series, corresponding to Northern and Southern hemispheres, made of non-integer and dimensional data. We have chosen two time series composed of data records representing surfaces expressed in square meters up to two decimal places. These non-integer data can be transformed to integer data by multiplying by 102 to obtain surfaces in dm2, or by multiplying by 104 to obtain surfaces in cm2. To the resulting time series we have applied the tests of the previous section and the results are shown in Tables 1, 2 which point out that, when going from dm2 to cm2, the SSNSA changes and increases. These results suggest that transforming non-integer dimensional data to integer data by changing the units modifies the statistical significance. These results point out a difficult problem because when dealing with dimensional time series the SSNSA will depend on the considered units, at least when the above tests are used. Our problem now is to find a statistical test applicable to non-integer and dimensional time series, and whose results are independent of the considered units.
Table 2: Surface expressed in cm2. Columns, rows and numbers in the rows have the same meaning as in Table 1.
A suitable test satisfying the above conditions is the
paired Student's t-test (Larson 1982). The characteristic statistic
is
expressed as
Table 3: Daily international sunspot number. Columns, rows and numbers in the rows have the same meaning as in Table 1.
Table 4: Yearly number of active solar prominences. Columns, rows and numbers in the rows have the same meaning as in Table 1.
Table 5: Daily number of X-ray flares. Columns, rows and numbers in the rows have the same meaning as in Table 1.
We study the effect of binning data on the statistical significance. We have binned the time series corresponding to the international sunspot number and X-ray flare in bins of 10, 20, 30, 40, 50and 60 days. The same statistical tests as in previous section have been applied to these new time series and in Figs. 1, 2 the SSNSA versus the number of days per bin is plotted. Binning data modifies the statistical significance, and the general trend is an increase of the SSNSA. If one starts from two north and south daily time series and computes the SSNSA, the obtained results would be different from those obtained by binning the times series for Carrington rotation and computing, again, the statistical significance. In order to compute a meaningful statistical asymmetry, what is the appropiate data binning, if any, to be considered?
![]() |
Figure 1: SSNSA of the international sunspot number time series versus the number of days per bin for the binomial distribution (dash-dot line); Excess (dashed line); normal approximation to the binomial distribution (solid line); and Pearson's chi-square test (dotted line). |
Open with DEXTER |
![]() |
Figure 2: SSNSA of the number of X-ray flare time series versus the number of days per bin for the binomial distribution (dash-dot line); Excess (dashed line); normal approximation to the binomial distribution (solid line); and Pearson's chi-square test (dotted line). |
Open with DEXTER |
The only considered solar activity time series whose data records are non-integer and dimensionless is that of hemispheric sunspot numbers. To this time series we have applied the tests given in Sect. 2.5. The results are shown in Table 6, which shows that there is a strong agreement between the statistical significances obtained from the different tests. Furthermore, the results indicate that the N-S asymmetry of hemispheric sunspot numbers is highly significant in about 60% of the considered days.
Table 6: Daily hemispheric sunspot numbers. Columns, rows and numbers in the rows have the same meaning as in Table 1.
Here, the Student's t-test described in Sect. 2.5 has been
applied. We have chosen
and n = 30, which means
that for sunspot areas we have considerered groups of 30months; for the H
flare index, groups of 30 days; and for
the Mount Wilson total magnetic flux and the Kitt Peak averaged
magnetic flux density, groups of 30 Carrington rotations.
Table 7 shows the results for the SSNSA and in the case of sunspot areas, only
of the thirty-
month groups are significant at the
level; in the case of the H
flare
index only
of the thirty-day
groups are significant at the
level; in the case of the Mount Wilson total magnetic flux
only
of the thirty Carrington rotation
groups are significant at the
level; and in the case of
the Kitt Peak averaged magnetic flux density only
of the thirty
Carrington rotations groups are significant at the
level.
However, an important point to be considered when applying this test
is the value chosen for n. We can highlight this point by
considering different values for n, and repeating the above
calculations. In Fig. 3 the SSNSA versus n, number of
elements in each group, has been
plotted, and it can be seen that the significance increases with n.
Thus the choice of the value of n is
important because it determines the significance of the asymmetry,
modifying it when n is modified.
Table 7:
Student's t-test. The numbers in the row define the
percentage of
groups of each time series that are statistically highly
significant (
). The value of n in the Student's t-test
formula is 30.
![]() |
Figure 3:
Student's t-test: SSNSA of sunspot area (solid line), H![]() |
Open with DEXTER |
![]() |
= | 60, 60, 60, 60, 60, 60, 2, 3, 4, 5, 6, 7, 60, 60, 60, 60, 60, | |
60, 2, 3, 4, 5, 6, 7, 60, 60, 60, 60, 60, 60, 2, 3, 4, 5, 6, 7 |
![]() |
= | 1, 2, 3, 4, 5, 6, 60, 60, 60, 60, 60, 60, 1, 2, 3, 4, 5, 6, 60, | |
60, 60,60, 60, 60, 1, 2, 3, 4, 5, 6, 60, 60, 60, 60, 60, 60 |
The results shown in Tables 3-7 correspond to the SSNSA of solar activity time series spanning different time intervals that cover different phases of the solar activity cycle. Due to the difference in the covered time intervals, these time series only overlap during a few solar cycles. In order to study the behaviour of the SSNSA with the phase of the solar cycle we have considered the Northern and Southern daily number of X-ray flares and the Northern and Southern daily hemispheric sunspot number time series. The first time series covers three solar cycles and the second covers six solar cycles, and they overlap during the period 1976-2004. We have split each time series as many times as possible using the following criterion: From the middle of the descending phase of one solar cycle to the middle of the ascending phase of the following solar cycle, covering the minimum of solar activity, and from the middle of the ascending phase to the middle of the descending phase of the same solar cycle, covering the maximum of solar activity. In this way, we obtained 5 and 11 shorter time series for the daily number of Northern and Southern X-ray flares and daily hemispheric sunspot numbers, respectively. Then, we applied to both time series the statistical tests of Sect. 2.5, except the binomial distribution test, because the daily hemispheric sunspot numbers is a non-integer time series. The results obtained applying the Excess test to the daily hemispheric sunspot number time series are shown in Fig. 4. This figure shows the differences between the SSNSA around the maximum and the minimum of solar activity and, although they are not large, a systematic difference appears: the SSNSA is always higher around the minimum of solar activity. It has been pointed out (Swinson et al. 1986; Vizoso & Ballester 1990) that the North-South asymmetry of solar activity reaches very high values around the minimum of solar activity, thus, if only a few sunspot groups appear on the Sun and all them are in the same hemisphere, the value of the asymmetry and the SSNSA would be very high. However, around the maximum of solar activity, sunspot groups are more evenly distributed between hemispheres giving place to a lower asymmetry and SSNSA. This could be an explanation for the dependence of the SSNSA on the solar cycle phase.
A similar behaviour appears for the case of the daily number of X-ray flare time series, although in this case the SSNSA in any considered phase of the solar cycle is very low, such as can be expected due to the low SSNSA obtained when this time series is considered as a whole (see Table 5). The results obtained with the rest of tests are similar to those shown in Fig. 4.
![]() |
Figure 4: SSNSA of the daily hemispheric sunspot numbers versus solar cycle. Triangles denote the SSNSA around the maximum of solar activity. Squares denote the SSNSA around the minimum of solar activity. |
Open with DEXTER |
We have discussed the difficulties encountered when trying to assess the SSNSA of the most common solar activity time series. We have found that in the case of integer or non-integer and dimensionless data sets several statistical tests such as the binomial distribution, normal approximation to the binomial distribution, chi-square test and Excess can be used, however, the obtained results strongly depend on the data binning applied. On the other hand, when non-integer and dimensional data are considered, a statistical test independent of the units can be used, the Student's t-test, but the obtained results depend again on the value chosen for the binning. Our results also suggest that there is a systematic difference between the values of the SSNSA around the maximum and the minimum of solar activity, which suggests that there is a dependence of the SSNSA on the solar cycle phase.
Taking into account these results, how can we assess the SSNSA? It seems that a definitive answer cannot be given because in one case it strongly depends on the data binning performed using, mostly, our terrestrial calendar (days, months, Carrington rotation, years, solar cycles, etc.) while solar activity does not care about it, and in the other case the answer depends on the number of elements in each considered group. In the case of integer or non-integer and dimensionless time series, the length of the bin should help to reveal physically meaningful results. Thus, for a dataset of daily values to take n=30 would be appropiate since this would correspond to about one solar rotation. Furthermore, if one wants to obtain some information about phases of solar activity such as the ascending or the descending branch of the solar cycle, or about the period around the maximum or the minimum of solar activity, then the length of the bin has to be chosen in agreement with the time intervals under study. These considerations could also be applied to the case of non-integer and dimensional time series because when using the Student's t-test we also need to make a choice for the value of n, the number of elements in each of the groups in which we split the dataset. Thus, a visual inspection of time series is not appropriate to ascertain the N-S asymmetry of solar activity time series but, on the other hand, to determine an absolute value of the SSNSA is difficult, worsened by the fact that the records of solar activity are not independent.
All the results obtained up to now on the SSNSA of different solar activity time series by different authors must be considered with care, as must be the studies performed of the behaviour of the N-S asymmetry of different solar activity features which assume that there is a real and significant asymmetry between hemispheres.
Acknowledgements
We acknowledge the National Geophysical Data Center, from whose ftp server the Kandilli flare index and international sunspot number were downloaded. We also acknowledge the Solar Influences Data Center (SIDC) for the compilation of the International Sunspot Number. NSO/Kitt Peak data used here were produced cooperatively by NSF/NOAO, NASA/GSFC, and NOAA/SEL. This study includes data about magnetic flux from the synoptic program at the 150-Foot Solar Tower of the Mt. Wilson Observatory and were kindly provided by J. Boyden. The Mt. Wilson 150-Foot Solar Tower is operated by UCLA, with funding from NASA, ONR, and NSF, under agreement with the Mt. Wilson Institute. The sunspot area data were compiled by D. Hathaway of NASA's Marshall Space Flight Center. J. Terradas thanks the Spanish Ministry of Science and Education for the funding provided by a Juan de la Cierva fellowship.