On the size distribution of spots within sunspot groups

Size distribution of sunspots provides key information about the generation and emergence processes of the solar magnetic field. Previous studies on the size distribution have primarily focused on either the whole group or individual spot areas. In this paper, we investigate the organization of spot areas within sunspot groups. In particular, we analyze the ratio, $\rm{R}$, of the area of the biggest spot ($\rm{A_{big\_spot}}$) inside a group, to the total area of that group ($\rm{A_{group}}$). We use sunspot observations from Kislovodsk, Pulkovo and Debrecen observatories, together covering solar cycles 17 to 24. We find that at the time when the group area reaches its maximum, the single biggest spot in a group typically occupies about 60% of the group area. For half of all groups, $\rm R$ lies in the range between roughly 50% and 70%. We also find R to change with the group area, $\rm{A_{group}}$, such that $\rm{R}$ reaches a maximum of about 0.65 for groups with $\rm{A_{group}}\approx 200\mu$Hem and then remains at about 0.6 for lager groups. Our findings imply a scale invariant emergence pattern, providing an observational constraint on the emergence process. Furthermore, extrapolation of our results to larger sunspot groups may have a bearing on the giant unresolved starspot features found in Doppler images of highly active sun-like stars. Our results suggest that such giant features are composed of multiple spots, with the largest spot occupying roughly 55--75% of the total group area (i.e. of the area of the giant starspots seen in Doppler images).


Introduction
Magnetic field of the Sun, the driving force of its activity and variability, is generated in its interior by a dynamo action and emerges at the surface as bipolar regions. The larger of these regions typically host sunspots which are dark photospheric features that trace out the locations of highly concentrated magnetic field on the solar surface (Solanki 2003). Direct measurements of the solar magnetic fields are only available for the last few decades, and hence, sunspot area records, which have been observed more or less regularly over more than a century, act as a proxy of the surface magnetism and provide an indirect way of understanding the long-term behaviour of the magnetic activity and variability.
Sunspots usually emerge in groups, and within a typical group, spots are arranged into two sub-groups, leading (closer to the solar equator) and following. Spots in the leading subgroup are bigger and more coherent than following one (Bray & Loughhead 1979;McIntosh 1981). Although the reason behind such a configuration is not yet understood, it is believed to be related to the flux emergence process where various forces (e.g. Coriolis force, convective motions, sub-surface shears etc.) act on a rising flux tube and determine its surface morphology (Borrero & Ichimoto 2011;van Driel-Gesztelyi & Green 2015).
One of the aspects providing clues to understanding the processes of solar magnetic field generation and emergence, is the size distribution of spots. Studies of the size distribution of both the sunspot groups and the individual spots have been carried out in the past by various authors (Kuklin 1980;Bogdan et al. 1988;Baumann & Solanki 2005;Nagovitsyn et al. 2012;Tlatov & Pevtsov 2014;Muñoz-Jaramillo et al. 2015). Using the group area data from the Royal Greenwich Observatory (RGO), Bog-dan et al. (1988) and Baumann & Solanki (2005) found that the size distribution, over its most range, could be fairly well described by a log-normal function. Physically, a log-normal size distribution suggests that all spots are generated by a global dynamo action, with smaller spots being the fragmented products of the larger flux concentrations (Kolmogorov 1941). Later, Parnell et al. (2009) and Thornton & Parnell (2011) analysed various observations of the magnetic field emergence and concluded that they could all be described by a single power-law distribution, covering seven orders of magnitude in flux. However, using individual spot area measurements from the Kislovodsk Mountain Station and group areas from RGO, Solar Observing Optical Network (SOON), Pulkovo, and Kislovodsk, Nagovitsyn et al. (2012) and Muñoz-Jaramillo et al. (2015), respectively, showed that the overall size distribution was better characterized by two separate distribution functions for 'small' and 'big' groups or spots. They argued that such a bi-modal size distribution could possibly imply a more complex process of spot formation, e.g. driven by small-scale and global dynamos in parallel (Nagovitsyn & Pevtsov 2016). A similar conclusion was also reached by Tlatov & Pevtsov (2014) who analysed the magnetic flux and sunspot area records from Helioseismic and Magnetic Imager (HMI). However, results from recent high-resolution observations and numerical simulations suggest that small-scale dynamo is mainly responsible for the generation of the (non-spot) weak fields (Pietarila Graham et al. 2010;Stenflo 2012;Rempel 2014;Karak & Brandenburg 2016).
Thus, understanding the spot size distribution calls for further investigations. Furthermore, understanding the spot area distribution on the Sun can help us to understand the poorly known area distribution of starspots on other Sun-like stars (e.g., Article number, page 1 of 7 arXiv:2104.03534v1 [astro-ph.SR] 8 Apr 2021 Solanki & Unruh 2004). In this paper, we re-address the question of the size distribution of sunspots. However, instead of analysing the overall spot or group size distribution as done in previous studies, we here focus on the distribution of spot areas within individual groups. We describe the data and the approach used in this study in Sect. 2 and the results in Sect. 3, while Sect. 4 summarises our conclusions.

Data and Methods
We use sunspot area catalogues from three different sources, namely, Debrecen Observatory 1 (Baranyi et al. 2016;Győri et al. 2017), Kislovodsk Mountain station 2 (Nagovitsyn et al. 2007) and Pulkovo observatory 3 (Mikhailov 1955). Kislovodsk (1954Kislovodsk ( -2019 and Pulkovo (1932Pulkovo ( -1991 catalogues provide daily records of the total area of every individual group, A group (including umbrae and penumrae), the total area of the biggest spot, A big_spot (including umbra and penumra), as well as the number, N, of individual umbrae in each group. Since some spots might include multiple umbrae (i.e. multiple umbrae embedded within the same penumbra, e.g., see Fig. 2), for some groups N can be higher than the number of the individual spots in the group. The Debrecen record is the official continuation of the RGO programme after it was ceased in 1976 (Baranyi et al. 2016). This catalogue 4 lists daily measurements of A group and N values. Each spot, in this catalogue, carries a unique group label using which we calculate A big_spot from Debrecen as well.
Since area measurements from different observatories show systematic differences (Fligge & Solanki 1997;Baranyi et al. 2001;Balmaceda et al. 2009), a cross-calibration is necessary before using them for any analysis. Various past studies have found that both the daily and the individual area measurements from the catalogues considered here (Deberecen, Kislovodsk and Pulkovo) are very similar to each other, and also to the measurements from RGO, with mutual cross-calibration factors being close to unity (Nagovitsyn et al. 2007;Balmaceda et al. 2009;Baranyi et al. 2013;Muñoz-Jaramillo et al. 2015;Mandal et al. 2020). For the purpose of this work, we use the latest calibration by Mandal et al. (2020) who compared these datasets statistically over the overlapping periods to generate a consistent and homogeneous area record.
To understand the area distribution of spots within individual groups, we focus on the biggest spot in the group and derive the ratio, R, between the area of the biggest spot and the total group area: According to this definition, the maximum possible value of R is 1, for groups with only one listed spot. Let us remind here that, throughout this paper, the term 'spot' or 'sunspot' refers to a structure that has a solitary umbra or multiple umbrae embedded within a single penumbra. All the areas used in this study are corrected for the foreshortening effect.

Instantaneous distributions
We start by considering all data on every given day, i.e. irrespective of the group's evolutionary phase on that day. R values are then calculated for all individual group entries. Figure 2a shows the derived distributions of R from Debrecen (black histogram), Kislovodsk (green) and Pulkovo (red) data. All three distributions display some common features, e.g., a sharp primary peak at R=1 and a flatter secondary peak around R = 0.5. The peak at R=1 comes from the single-spot groups for which A big_spot = A group . Such groups make approximately 19% of all the listed entries in our catalogues. Since we are interested in the distribution of spots within a group, we remove these single spot groups from our sample. In addition to that, we also reject small groups, with area less than 30 µHem, as they are likely to be associated with larger measurement uncertainties (see Muñoz-Jaramillo et al. 2015). We note, however, that a change of this threshold to different values within the range 30-80 µHem led to the results only marginally different from those presented below, while reducing the statistics. The final distributions, after implementing these two restrictions, are shown in Fig. 2b. Two distinct features are evident in all three distributions. First, the distributions are clearly asymmetric, being skewed towards higher ratios (skewness=0.03) and their mode values lying around R 0.5. Secondly, although the peak at R=1 has now disappeared, a weak secondary peak is still seen at R 0.97, which is pronounced in Debrecen and Pulkovo data and is significantly weaker in Kislovodsk data. We will return to this peak in Sect. 3.2.
The shape of these histograms might possibly be affected by the complex evolutionary scenarios of sunspot groups, such as fragmentation or coalescence (Schrijver et al. 1997). Furthermore, while spots emerge relatively fast, their decay times are much longer (Howard 1992). Thus, as has been pointed out in the past, the results derived from a snapshot distribution (i.e. considering all spots independently of their evolution) can potentially be dominated by the longer decay phase of sunspot groups (Baumann & Solanki 2005). To examine the influence of these effects on the obtained distributions, we track every group in our catalogues and re-derive the R distributions such that each group is considered only once.

Tracking individual sunspot groups
We first visually inspected a subset of Debrecen images 6 from the period 1986-1999, which trace individual groups. These images are extractions of the full-disc photographs and within each of these snapshots, spots forming a group are uniquely numbered. These images allowed us to trace out 40 individual groups (chosen randomly). By following these groups from their first emergence to their disappearance, we have noticed that, during the initial growth or final decay phase, the following spots are significantly smaller than the leading spots. In other words, as has also been noticed in earlier studies, leading spots form faster than the following ones (McIntosh 1981;Rempel & Cheung 2014), and decay more slowly that following spots (Bumba 1963). This leads to R values very close to (though not exactly) 1 during these periods. This is, in fact, the reason behind the weak  secondary peak at R∼0.97 which we see in the instantaneous distributions ( Figure 2b).
Therefore, we now use the information from all the catalogues considered here to track sunspot groups over their lifetime. To minimise the errors introduced by the foreshortening corrections, only groups within central meridian distances of ±65°are considered here. Following the snapshot analysis described in the previous section, we leave out single-spot (N=1) groups, and groups with area less than 30 µHem. Our aim here is to consider every group only once over its entire lifetime. Usually, such studies consider the time when a group area (A group ) reaches its maximum value. However, in our case, the second parameter entering R, the size of the biggest spot A big_spot , also evolves with time and A big_spot and A group do not necessarily reach their maximum at the same time. In fact, we find that only 65% of the total groups analyzed here, have the maximum of these two quantities on the same day. For 20% of all groups, A big_spot reaches its maximum after the maximum in A group (termed a positive delay here) whereas 15% of the remaining groups show a negative delay. Both, the positive and the negative delays mainly happen in big complex groups with many spots, which require a separate study. For the purpose of this paper, we only consider groups with no delay, i.e. when the groups and the biggest spot areas reach their maximum on the same day.
The distribution of R derived for these groups is shown in Figure 3a. around R= 0.5, and unlike the snapshot distributions, there are no secondary peaks. To further understand these distributions, we calculate the quartiles Q1, Q2 and Q3, which describe the sample 25, 50 and 75 percentiles, respectively (Ross 2000); see Table 1. In Figure 3a, the shaded regions highlight the quartiles computed for the distribution averaged over the three datasets. The range between Q1 and Q3 represents the interquartile range (IQR=Q3-Q1), where 50% of the data points around the median value, Q2, lie. IQR is considered to be a measure of the spread of the distribution. For all the individual and the average distributions, IQR is only 0.26. Such a low IQR indicates that the data are clustered about the central tendency, i.e. the median value (Q2) of R=0.6. Thus, in the majority of groups, a single large spot occupies about 48-74% of the total group area. Some properties of sunspot groups, such as their rotation rates (Balthasar et al. 1986), growth and decay rates (Howard 1992;Hathaway & Choudhary 2008;Muraközy et al. 2014;Muraközy 2020Muraközy , 2021, hemispheric asymmetry (Mandal & Banerjee 2016) etc, have been found to show a clear dependence on the group sizes. Therefore, we now investigate whether a similar behaviour also exists for R. To examine this, we bin the group areas (in bins of equal widths in log(A group )) and calculate the median value of R in each bin. Figure 3b shows the dependence of R on A group for the three observatories. Barring the first few points below 100 µHem and a single bin centred at 650 µHem for Kislovodsk, R stays above 0.6 over the entire range of sunspot group sizes. We also observe an interesting three-phase evolution in R. First, as A group increases from 30 to 200 µHem, R rises rapidly from about 0.55 to 0.65. Between about A group of 200 and 600 µHem it decreases somewhat and then eventually settles at a constant value of around 0.6 beyond A group ≥600 µHem. Such a trend can be explained by the fact that most of the small groups (A group <50 µHem) have a simple bipolar structure with roughly equal distribution of areas between the two constituent spots and thus R 0.5. As A group increases, the compact leading spot grows rapidly (McIntosh 1981;Rempel & Cheung 2014). This leads to an increase in R value and explains the initial rise of R from 0.55 to 0.65 below A group 200 µHem. The drop beyond that to a constant value of 0.6 is unexpected and interesting. We suggest that this aspect of the data indicates a degree of scale-invariance in the flux emergence process, i.e a large active region is just a scaled up version of a small active region, with the size of all the spots increasing in proportion. In this interpretation, the slow decrease in R seen for A group ≥200 µHem is a consequence of the larger bright-points in a small active region being scaled up in flux to become small spots in larger active regions.
Our findings are also of relevance for understanding giant starspots seen on highly active (rapidly rotating) sun-like stars (see, e.g., Berdyugina 2005;Strassmeier 2009). In Fig. 3c, we show an extrapolation of the R vs A group relation up to A group =5000 µHem. We perform this extrapolation by applying a linear fit to the averaged data (i.e averaged over the three individual datasets) over the range A group =500-1200 µHem. The grey shaded region in Fig. 3c highlights the 1-σ uncertainty associated with this fitting. We find that the largest spot within a huge group (of ≈5000 µHem) would cover roughly 55-75% of the group area. These giant, though unresolved, spot structures have already been observed on highly active (rapidly rotating) sunlike stars and thus, our result puts constrains on the properties of such starspots. We stress, however, that this is a phenomenological projection of the solar case to other stars, and other factors might play a role in determining the size of individual spots on such stars.
Thus far we have shown that, at maximum group growth, approximately 60% of the total group area is occupied by a single big spot. We now analyse the spot distribution within the remaining 40% of the group area. For this, we consider the relationship between N and A group . We bin the group areas A group in log scale and calculate the average values of N in each bin. The result is plotted in Figure 4. Overall, the number of individual umbrae within groups, N, increases with A group . The rate of increase slows down with the increasing group area and the overall profile can be described with a power law function, N∝A 0.58 group (dashed line in Fig. 4). Since N is the number of umbrae within a given group while some spots have multiple penumbrae, the dependence of the number of spots within a group on its area would be somewhat flatter than N(A group ). Our result implies that, with the increasing group area, new flux emerges in the form of new spots (such that the number of spots in a group increases), while the primary biggest spot within the group keeps growing, too (which keeps the R constant).

Variation of R with solar activity
It is well known that many sunspot parameters, such as sunspot areas, number, etc, exhibit periodic changes with the solar cycle (Hathaway 2015). Therefore, we also analyze the behaviour of R with time. In particular, we look for changes in R from cycle to cycle, as well as during the activity maxima and minima. We first analyze the distribution of R during solar activity maxima and minima. For better statistics, we merged the data from all cycles and from all the observatories. We define cycle maxima and minima as 12-month periods centered at the times of the maxima and minima in the 13-months running mean of the sunspot area (in the merged series), respectively. Now, there are significantly more bigger spots and groups during cycle maxima than during minima, while R depends on the group area ( Figure 3b). Hence, to eliminate this size dependency, we first fit a lognormal function (R ∼ exp − (log(A group )) 2 /2σ 2 ) to the data in Fig. 3b and then divide each measured R by the value this fit returns for the corresponding A group to obtain the normalized R value. Figure 5a shows these normalized R distributions for cycle maxima and minima in blue and red, respectively. These distributions are nearly identical in shape and their mean, median and mode values (listed in the legend) are very similar. We further test the hypothesis that the two samples come from the same distribution by applying the two-sided Kolmogorov-Smirnov (K-S) test which checks for the equivalence of two datasets by comparing their empirical cumulative distribution functions (ECDFs) (Berger & Zhou 2014). The test statistics, D, is a measure of the difference between the ECDFs. The null hypothesis, that both samples come from the same underlying population, is rejected if D > D crit . On our two R distributions, D=0.06 and D crit =0.12, with p value being 0.44. Thus, in our case the null hypothesis is true, and we therefore conclude that no substantial variation in R is observed between the minimum and maximum of a sunspot cycle. Finally, in Figure 5b, we show the cycle-to-cycle variation of the normalized R. Filled circles represent the median values (second quartile) of R over each cycle, whereas the third and first quartiles of the corresponding R distributions are shown as the upper and lower limits of the error bars. We see no significant variation in R from cycle to cycle.

Conclusion
Analysis and understanding of the size distribution of sunspots and sunspot groups provides insights into the origin and evolution of solar magnetism. In this work, we have investigated the distribution of the spot sizes within sunspot groups. Historical sunspot archives from Kislovodsk (1954-2019), Pulkovo (1932-1991 and Debrecen (1974Debrecen ( -2019 observatories have been analyzed to derive the ratio, R, between the area of the biggest spot in a group (A big_spot ) and the total area of that group (A group ). Our conclusions are: 1. When considering all sunspot groups independently of their evolution, the distribution of R shows a clear peak around 0.5, i.e. a single big spot within a group occupies roughly 50% of the whole group area (Fig 2). The distribution, however, shows a secondary peak at, or close to, unity, which we attribute to the group's evolution. Namely, the following spots in a group typically emerge later and decay faster, so that during the initial and final stages of the group evolution, the leading spot is often the only observable spot in the group or there are only small following spots. 2. By tracking individual groups over their lifetime we found that at the time when the group area reaches its maximum, the biggest spot in a group typically occupies about 60% of the group area. For half of all groups, R lies in the range between roughly 50% and 70% (Fig 3). 3. We also find R to change with the group area, A group . In smaller groups (30≤ A group <200 µHem), R increases from 0.55 to 0.65. It then decreases slightly before settling at a nearly constant value of about 0.6 for A group ≥700 µHem (Fig 3). form N ∝ A group 0.58 (Fig 4). Since some spots have multiple umbrae, the number of individual spots within groups will increase somewhat weakly with the group size than the rate given by this function. 5. Extrapolation of our results to significantly bigger sunspot groups (such as unresolved starspot groups observed on highly active stars) suggests that a) such groups are composed of multiple spots, in agreement with the study by Solanki & Unruh (2004); b) a significant fraction of 55-75% of the group area is occupied by a single giant spot. Note that both these conclusions are, however, limited to spots formed by magnetic flux emerging in bipolar regions (e.g., Schuessler et al. 1996) and do not apply to starspots formed by magnetic flux being carried together, e.g. to the poles (Schrijver & Title 2001;Is , ık et al. 2018). 6. The distribution of R does not change between solar activity maxima and minima, and we see no cycle to cycle variation ( Fig 5).
In summary, this paper deals with the distribution of spots within a sunspot group, providing constraints on the flux emergence processes. In Babcock-Leighton dynamo models, flux emergence plays a key role as it is responsible for both toroidal magnetic flux through the photosphere (Cameron & Schüssler 2020) and for the generation of new poloidal magnetic flux through Joy's law (Hale et al. 1919). Despite its critical role, the details of the emergence process remain poorly understood. For example, the cause of Joy's law is not yet known: the Coriolis force is clearly implicated, but it is uncertain whether the underlying flows are those of the turbulent convective background (e.g., Parker 1955) or those internal to the rising flux tube (e.g., Caligari et al. 1995). We are now reaching the point where these possibilities can be modelled in detail. For example, numerical simulations starting from a magnetic flux concentration near the base of the convection zone and ending with after the flux has emerged through the photosphere into the solar atmosphere exist (Hotta & Iijima 2020), as do models where the magnetic and velocity fields from global dynamo simulations are coupled to models covering the emergence through the photosphere (Chen et al. 2017). Our findings of universal properties of this process, suggestive of a scale invariant emergence pattern, are an observational constraint on flux emergence process which will be useful in evaluating the different model simulations, and is thus critical for understanding the solar dynamo. Our findings require that in order to match the observations the ratio of the area of the largest spot in a group relative to the area of all spots in the group should be approximately 0.55-0.65 for the dominant majority of sunspot groups. More conservatively, we can say that a very strong constraint emerging from our work is that the largest sunspot covers more than half the area of the whole sunspot group and hence also carries more than half of the magnetic flux. In fact, flux emergence is also a critical driver for the dynamics of the solar atmosphere, and our results also relevant for understanding and modelling the atmospheric response to flux emergence where the distribution of the field at the surface is important. Further studies using a combination of simultaneous white-light and magnetic data (e.g. from SoHO/MDI (Scherrer et al. 1995), SDO/HMI (Lemen et al. 2012) or in the future from SO/PHI ) will additionally help us to understand the relation between the magnetic flux distribution and the observed spot distribution within a group.