Open Access
Issue
A&A
Volume 640, August 2020
Article Number A78
Number of page(s) 12
Section The Sun and the Heliosphere
DOI https://doi.org/10.1051/0004-6361/202037547
Published online 14 August 2020

© S. Mandal et al. 2020

Licence Creative CommonsOpen Access article, published by EDP Sciences, under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Open Access funding provided by Max Planck Society.

1. Introduction

Sunspots, the largest known dark photospheric features, are probably the most recognized manifestation of solar activity. Solar activity is driven and modulated by a common process, the solar magnetic field and its interaction with solar convection. Sunspots are one of the oldest (although indirect) measurements of solar magnetic fields. Hence, sunspot area records play an important role in our understanding of the long-term behavior of solar magnetic activity and variability.

Barring a few individual measurements (see Vaquero 2007 for a review of historical sunspot observations), systematic monitoring of sunspot areas began at the Royal Greenwich Observatory (RGO) in 1874. RGO recorded the daily areas and positions of sunspots. In the 20th century, various observatories around the world (e.g., Kodaikanal, Pulkovo, Mt. Wilson, Kislovodsk, to name a few) also initiated similar observing programs and started accumulating sunspot data. After continuing for a century, RGO stopped its campaign in 1976 and transferred the program to the Debrecen observatory, where such area observations are still underway on a daily basis. If all these available records are stitched together, the combined series covers a period of almost 150 years, which yields a data set suitable for studies of long-term changes in solar magnetism.

Such a composite series is extremely important for multiple solar applications. For example, individual sunspot group areas are required for reconstructions of the long-term evolution of the solar surface magnetic field (e.g., Jiang et al. 2011, 2014), estimates of the solar radiative flux suppression via the Photometric Sunpot Index (PSI; Brandt et al. 1994), or assessment of the sunspot magnetic field and its long-term changes (Tlatov & Pevtsov 2014; Nagovitsyn et al. 2017), while historical solar irradiance reconstructions (e.g., Foukal & Lean 1990; Fligge et al. 2000; Krivova et al. 2007, 2010; Dasi-Espuig et al. 2014, 2016; Yeo et al. 2017) often also use the daily total areas as input. Understanding and reconstructions of the past solar variability are, in turn, important for an assessment of the solar influence on Earth’s climate (see., e.g., Solanki et al. 2013).

It is, therefore, not surprising that significant efforts have been made towards cross-calibrating the various individual sunspot area datasets (Nagovitsyn 1997; Fligge & Solanki 1997; Baranyi et al. 2001, 2013; Hathaway et al. 2002; Balmaceda et al. 2009). This is not, however, a trivial task. Deviations in the observing facilities, seeing conditions, capturing devices, data processing techniques, etc., introduce partly significant systematic differences between the records. Two of the widely used area catalogs of modern times, as produced by Hathaway et al. (2002) and Balmaceda et al. (2009), utilize a combination of area observations from RGO and the Solar Optical Observing Network (SOON). However, SOON data has several critical limitations. Sunspot area values in this catalog are significantly (by almost 50%) underestimated as compared to RGO (Fligge & Solanki 1997; Hathaway et al. 2002; Balmaceda et al. 2009). To a large extent, this is related to the fact that these data missed spots smaller than 10 μHem and as the number of small spots varies with solar activity, a single calibration factor might introduce artifacts in the derived catalogs (see Foukal 2014). Furthermore, SOON has no direct overlap with RGO. Hence, the cross-calibration has to be done indirectly, for example, using Russian data1 as was done by Balmaceda et al. (2009), which amplifies the uncertainties further. Debrecen data, whose area measurements are found to be similar to those from RGO (Baranyi et al. 2013), have a very short overlap (of three years) with RGO.

Over the past few years, more sunspot data became publicly available in digital form. One significant development is that all data from the Pulkovo observatory (St. Petersburg) and its Mountain station in Kislovodsk have been digitized and made public (Nagovitsyn 1997). These data are unique in the sense that (i) they cover a long period (1932–2018), allowing for a significant direct overlap with RGO, (ii) the smallest areas recorded in these catalogs are the same as in RGO, that is, 1 millionth of a solar hemisphere (μHem), and (iii) earlier studies (Gnevysheva 1968; Balmaceda et al. 2009; Baranyi et al. 2013; Muñoz-Jaramillo et al. 2015) showed that their statistical properties were very similar to those of the RGO data. Also, daily sunspot observations from Kodaikanal solar observatory in India, have recently been digitized and cataloged (Mandal et al. 2017). Similarly to Pulkovo and Kislovodsk, they cover an extended period (1921–2011) and have a significant overlap with RGO.

In this work, we update and extend the calibrated sunspot area series of Balmaceda et al. (2009, hereafter BA09) by employing the additional and updated data sets. We describe the data we use in Sect. 2 and our methods in Sect. 3. In Sect. 3 we present and discuss our composite records of sunspot areas, that is: the daily corrected areas in Sect. 4.1, daily projected areas in Sect. 4.2, and individual group areas in Sect. 4.3. In Sect. 4.4, we present the calculated daily PSI values (constructed using our area composite) which serve as an important input for empirical irradiance models. Our conclusions are summarized in Sect. 5.

2. Data

In this study, we use sunspot area data from a total of nine observatories. Figure 1 shows the timeline of all these data sets, while Table 1 also lists the periods covered by each of them, the fractional temporal coverage, and the minimum reported sunspot area.

thumbnail Fig. 1.

Sunspot area datasets used in this work. Shaded curve in grey highlights the sunspot group number record for the reference period. See Table 1 for the abbreviations.

Table 1.

Details of each dataset used in this work.

The most extensive record comes from the RGO, which started observing the Sun in 1874 and continued until 1976 (Willis et al. 2013). These observations were carried out at several observatories at different locations (Royal Greenwich Observatory, England; Cape of Good Hope, South Africa; the Dehra Dun Observatory, India; the Kodaikanal Observatory, India; the Royal Alfred Observatory, Mauritius; along with contributions from the Harvard College Observatory; Melbourne Observatory; Mount Wilson Observatory and the US Naval Observatory), and then processed and combined into the final record at RGO. This allowed for an uninterrupted and consistent daily coverage over a period 100 years. This catalog2 provides daily individual group areas as well as their heliographic positions.

The next two datasets listed in Table 1 come from the Kislovodsk (1952–2018)3 (Nagovitsyn et al. 2007) and Pulkovo (1932–1991)4 observatories (Mikhailov 1955). The Pulkovo Observatory, originally established at 1839 with the aim of cataloging the positions of stars, started accumulating solar images (photosphere and chromosphere) in 1932. As in the case of RGO, observations were carried out at a number of various locations in the Soviet Union and then collected and processed at Pulkovo for the purposes of consistency within the final series. During the Second World War, Pulkovo observatory was severely damaged, regular observations were not possible, and the original photographic plates from the pre-war period were destroyed. In 1945, the observatory received support from the government for the restoration and continuation of the observational programme. Furthermore, the construction of a new branch, the Kislovodsk mountain station was initiated in 1948. Afterwards, both of these observatories, independently recorded daily sunspot data and their catalogs provide individual group area and positions. It is worth mentioning here that prior to May 2011, the positional information (latitude and longitude) of each group is provided only for the day of its first appearance, while afterwards positions are available on each day throughout the entire lifetime of a group.

The Debrecen observatory5 has taken up the official continuation of RGO programme since 1977 (Győri et al. 2011; Baranyi et al. 2016). Most of the observations are taken at Debrecen observatory and its Gyula Observing Station. However, to fill gaps in this catalog, observations from several contributing observatories (Abastumani Astrophysical Observatory, Georgia; Ebro Observatory, Spain; Helwan Observatory, Egypt; Kanzelhöehe Solar Observatory, Austria; Kiev University Observatory, Ukraine; Kislovodsk Observing Station of Pulkovo Observatory, Russia; Kodaikanal Observatory, India and Tashkent Observatory, Uzbekistan) are also used. Recently (2016 onwards), the observatory started using calibrated SDO/HMI observations to fill the missing days in their catalog. In order to maintain consistency and also to avoid propagation of potential uncertainties due to this additional scaling, we only used the Debrecen data between 1974 and 2015 during our cross-calibration process. We do, however, use the post-2016 Debrecen data to fill the remaining gaps (247 days have been filled with this data) in our final area composite after 2016.

The newly digitized data from Kodaikanal solar observatory6 in India is next on our list. This set of newly digitized high resolution white-light solar images spans more than a century (1904–2011). However, due to issues with the observing plates, the current sunspot catalog lists data only from 1921 to 2011 (Mandal et al. 2017). This catalog provides individual spot areas and positions, however they are yet to be classified into sunspot groups.

We also use sunspot observations from SOON (Solar Optical Observing Network; Giersch et al. 2018). This is a network of solar observatories operated by the US Air Force (USAF), which allows a continuous, 24-h monitoring of the Sun. Finally, the last three data sets are those from: (i) Rome Astronomical Observatory7 (Cimino 1967; ii) Yunnan Observatory8 in China (Wang et al. 1988); and (iii) Catania Astrophysical Observatory in Italy9 (D’Arrigo & Zappalà 1986; Zuccarello et al. 2011). All these three catalogs provide group areas and positions. A summary of all the data used in this work is given in Table 1.

Among all the sources (Table 1), RGO has the longest observing period (about 100 years), the highest data coverage (99%), as well as the smallest (together with Kislovodsk, Pulkovo and Debrecen) reported spot area (1 μHem). This makes RGO the most suitable as the reference series against which we calibrate all other records, as was also done by other studies in the past (Hathaway et al. 2002; Balmaceda et al. 2009; Baranyi et al. 2013; Muñoz-Jaramillo et al. 2015). The remaining data sets have different quality as well as data coverage. The longest among the remaining records is from Kodaikanal (90 years), followed by Kislovodsk (68 years), Pulkovo (60 years), Debrecen (42 years), Rome (43 years) and SOON (36 years). Catania and Yunnan are relatively shorter records of roughly 10 years each. Beside RGO, spots as small as 1 μHem have also been reported by Kislovodsk, Pulkovo and Debrecen, whereas Kodaikanal, Rome, Yunnan and Catania recorded larger spots (see Table 1). Finally, as mentioned above, SOON has a significantly higher lower area threshold of 10 μHem. Earlier studies (Gnevysheva 1968; Balmaceda et al. 2009; Baranyi et al. 2013; Muñoz-Jaramillo et al. 2015) have shown that area measurements from Kislovodsk, Pulkovo and Debrecen are of mutually similar scale (having calibration coefficients close to 1). Therefore, we divided the listed sources into two categories, primary and secondary. The purpose behind such labelling is to prioritize the “primary” sources when creating a composite series later, while the secondary datasets should then be used to fill the gaps which could not be covered by the primary ones. This classification is based on the following criteria. A dataset which is sufficiently long in duration (of three solar cycles or more) and has comparatively few data gaps (i.e., data coverage of 80% or more) is qualified as a primary source. As seen in Table 1, we note that five sources: RGO, Pulkovo, Kislovodsk, Debrecen, and SOON appear to satisfy these conditions. However, the minimum sunspot area reported in the SOON database, is significantly higher (10 μHem) than the other four observatories (1 μHem) and this can potentially affect the calibration process (see Foukal 2014). Hence, SOON, along with the remaining four other observatories (Kodaikanal, Yunnan, Catania and Rome), are labeled as secondary sources.

3. Method

The method we adopted to cross-calibrate the individual records is similar to that described by BA09. First, we identify the common observing days between any two observatories and performed the subsequent analyses over these overlapping periods only. A scatter diagram of daily area values between every given pair is then drawn. Representative examples are shown in Fig. 2. We then fit a straight line forced to pass through the origin to the data:

X = b Y , $$ \begin{aligned} {X=b*Y}, \end{aligned} $$(1)

thumbnail Fig. 2.

Different steps of the calibration process for the following pairs of observatories: RGO-Pulkovo (panels a, b); RGO-Kislovodsk (panels c, d); SOON-Kislovodsk (panels e, f); SOON-Debrecen (panels c, d). Blue dotted lines (in panels a, c, e, g) highlight the 3σ boundaries which are used to remove the outliers and bias. Final “cleaned” scatter diagrams are plotted (in panels b, d, f, h) along with the best linear fit (red lines) and the 45° slope (green lines). See text for details.

as shown by the solid red lines in Figs. 2a, c, e, g.

Points outside the 3σth threshold, defined as:

σ th = 1 N 1 i = 1 N ( A i Obs 1 b · A i Obs 2 ) 2 , $$ \begin{aligned} \sigma _{\rm th}~=~\sqrt{\frac{1}{N-1}\sum _{i=1}^{N}\left( A_{i}^\mathrm{Obs1}-b\cdot A_{i}^\mathrm{Obs2}\right)^2 }, \end{aligned} $$(2)

from the previously obtained regression line, are considered to be “outliers” and removed (highlighted with the blue dotted lines in Figs. 2a, c, e, g). We also reject points close to the origin (below the line joining the points [0, 3σth] and [3σth, 0]) as they may introduce a bias into the calculated slope. At this stage, we are only left with the data which satisfy all the above criteria as shown in Figs. 2b, d, f, h. The linear regression (Eq. (1)) is then applied once again to obtain a slope bxy. Since the choice of dependent and independent variable is completely arbitrary and may have an impact on the derived slope (commonly referred as the “attenuation bias”; Spearman 1904), we repeat the above procedure by swapping the variables and obtain a new slope byx. The final calibration factor (“b”), following the “bisector line” method (Isobe et al. 1990), is computed as:

b = ( b xy + 1 / b yx ) / 2 . $$ \begin{aligned} {b}=({b_{xy}+1/b_{yx}})/2.\end{aligned} $$(3)

The error associated with the final calibration factor (“b”) has contributions from many different sources: (i) σslope: the fitting errors associated with the individual slopes bxy and byx; (ii) σdiff: the difference between the slopes bxy and 1/byx; and (iii) σcycle: effects of the time dependent changes in the data onto the final “b” factor. For a chosen pair, we compute “b” for different solar cycles separately and the standard deviation of these individually measured “b” values is taken as σcycle. Thus, the final uncertainty, rather conservatively, is calculated as σ = σ slope 2 + σ diff 2 + σ cycle 2 $ \sigma=\sqrt{{\sigma_{\mathrm{slope}}}^2+{\sigma_{\mathrm{diff}}}^2+{\sigma_{\mathrm{cycle}}}^2} $.

4. Results and discussion

4.1. Corrected areas

4.1.1. Comparison of individual records

We applied the method described in Sect. 3 on all pairs of observatories that have an overlap with each other. The derived parameters, obtained using the corrected areas, are tabulated in Table 2. The table lists individual calibration factors bxy and byx (columns four and five) as well as the final calibration factor “b” (last column). We multiplied the areas recorded by observatories listed under “Obs2” with “b” to match the values from those listed under “Obs1”. With this definition, values of “b” close to 1 imply that the original area measurements obtained at the two observatories, on average, are similar to each other.

Table 2.

Calibration factors derived for different observatories.

Let us first discuss those cases where an observatory has a direct overlap with RGO (this is the case for Kislovodsk, Pulkovo, Rome and Kodaikanal). The “b” factor between RGO and Pulkovo, bRGO − Pul is derived to be 1.014 ± 0.069, in agreement with BA09’s result of 1.019. Similarly, bRGO − Kisl comes out to be 0.984 ± 0.094, which also agrees with the value of 0.979 reported by Baranyi et al. (2013). Thus, area measurements from Pulkovo and Kislovodsk are similar to those from RGO. Furthermore, Muñoz-Jaramillo et al. (2015) found that the individual sunspot group size distributions are also similar to each other. This is important for building a composite area series. The situation is different when we compare RGO to Kodaikanal. We find bRGO − Kodai to be 1.166 ± 0.132, which indicates that the spot areas in the current Kodaikanal catalog are lower (≈17%) than the RGO values. This can also be seen (at least qualitatively) in Mandal et al. (2017). Between RGO and Rome, bRGO − Rome equals to 1.091 ± 0.036 which is similar to the value obtained by BA09.

Next, we look at those observatories which have an insignificantly short (less than several years) overlap with RGO. Their inter-calibrations are accomplished “indirectly” (i.e., by using another source which overlaps with these observatories) with RGO. Two of the longest series in this list are Debrecen and SOON. Using the overlap between Debrecen and Pulkovo, we calculate bRGO − Deb to be 1.061 ± 0.091, which is consistent with the factor of 1.08 reported by Baranyi et al. (2013), even though it was obtained from a shorter period of Debrecen data.

For SOON, bRGO − SOON is estimated via Kislovodsk and is found to be 1.48 ± 0.102. Once again, it matches the values previously reported in the literature (Hathaway et al. 2002; Balmaceda et al. 2009). It is important to note that among all our data sources, SOON has the maximum area departure from RGO (almost 50%). As discussed in the introduction, there can be a number of reasons for this significant underestimation. According to Foukal (2014), it is those “too small to draw” spots (< 10 μHem) in the SOON catalog which are mostly responsible for this deficit. However, Győri et al. (2017) argued that the omission of small spots can only account for ≈3.4% of the area deficit and the measurement procedure may be responsible for the rest. By reanalyzing a portion of SOON data, Giersch et al. (2018) concluded that the rounding errors associated with the limb-correction overlay, used on the SOON drawings, can actually lead to an underestimation of spot areas by as much as 8.5%.

One of the main issues in calibrating areas between two observatories is to address the temporal evolution within a dataset. These fluctuations can arise due to changes in quality of instruments or capturing devices and measuring techniques, as well as from aging due to preservation of sunspot drawings and photographic plates over a longer time. Now, any such changes in one or both series will show up as time evolution in the derived calibration factor. To see the extent of such an effect, we plot the values of “b”, computed for each cycle separately, as a function of time in Fig. 3.

thumbnail Fig. 3.

Value of the calibration factor “b” between various pairs of observatories (see the legends in the panels) computed for each solar cycle separately.

Variations over shorter timescales (monthly or yearly) are not considered here as they are significantly affected by uncertainties coming from insufficient statistics. Different lines in Fig. 3 with various colors and symbols represent the evolution of “b” for different pairs of observatories (see legend in the figure).

Figure 3 demonstrates that “b” does vary with time for all tested combinations of observatories. However, for the cases, when both data sets in question are our “primary” choice (see Sect. 2 and Table 1), the variations are within the error-bars. In some other cases, the calibration factor shows significantly larger variations, e.g between SOON and Debrecen or Kodaikanal and RGO. However, all cases with large fluctuations (e.g., where Kodaikanal or SOON data enter) are found for the secondary sources which were merely meant to be used to fill occasional data gaps. This result (1) supports our choice of the primary sources and (2) justifies the use of a single “b” value for each pair of observatories (as listed in Table 2) for building the composite record.

4.1.2. Composite

At this stage, we are ready to generate a calibrated and homogeneous sunspot area series between 1874 and 2019. We start by using the data from four primary observatories from our list (Table 1), i.e. RGO, Pulkovo, Kislovodsk and Debrecen. RGO, being the absolute reference (for the reasons discussed in Sect. 2), is used as it is. Next, both Kislovodsk and Pulkovo have a direct and sufficiently long overlap with RGO (which Debrecen does not have). Their “b” values are also similar (see Table 2). However observations from Kislovodsk are considered to be better suited for the extension of RGO because of the stable background history of this catalog (Nagovitsyn et al. 2007). The other advantage of the Kislovodsk record over Pulkovo is that it offers an additional 28 years of added observations beyond 1991. Hence, we use areas from Kislovodsk as the main record to extend our catalog after the RGO period. The leftover missing days are first filled with areas from Pulkovo and then from Debrecen (see Fig. 4 and Table 3 for a summary of the observations constituting the final composite of daily corrected areas).

thumbnail Fig. 4.

Top panel: overview of the structure and the coverage of the final composite of the corrected sunspot areas. Different colours (see legend at the top) show data from different observatories. Y-axis is the number of days per year, for which data are available. Bottom panel: pie-charts highlighting the percentage of contributions of observatories to the complete calibrated series (1874–2019: left chart) and only to the post-RGO period (1977–2019: right chart).

Table 3.

Summary of the observations used for the final daily corrected area composite.

Our final catalog contains about 145 years of daily sunspot area values between 1874 and 201910. The total number of missing days in this series is 776 (corresponding to 1.4% of the total coverage). We could not fill these missing days with data from any of the remaining five observatories (Kodaikanal, Rome, Yunnan, SOON, Catania) because out of the 776 missing days, 443 days are between 1874 and 1922, where only RGO observations are available, and 321 days are between July, 2018 and Dec., 2019, where only observations from Kislovodsk are available (Fig. 4). We note here that the cataloging process at Kislovodsk and Debrecen for the last two years (2018 onward) is still in progress and we plan to fill these missing days as soon as the data become available. While we have compared a total of nine archives, only four of them have actually entered the final composite. We nevertheless show the results obtained from inter-comparisons of these “secondary” datasets and list their scaling coefficients in Table 2 for completeness. Panels a and b of Fig. 5 show the calibrated monthly and yearly averaged time series of corrected areas. To visualize the uncertainty, we overplot two area series generated with the two extreme limits of the errors in “b” i.e. b + σ and b − σ (from Table 2), shown as the shaded regions in Fig. 5b. As expected, the effect is prominent mostly during solar maxima when the total spot coverage is higher. This results in the corresponding uncertainty in the cycle amplitudes over the post-RGO period, which has to be kept in mind in relevant studies.

thumbnail Fig. 5.

Monthly (panel a) and yearly (panel b) averaged calibrated sunspot area series. Calculated error values (grey shaded regions) are only shown for the yearly series. The dotted vertical line marks the year 1976, when RGO stopped its observing campaign.

4.1.3. Comparison with BA09

While we have generally followed the procedure by BA09, there are also some differences. (i) Firstly, instead of SOON we use Kislovodsk and Debrecen data for the post-RGO period. With Kislovodsk data we extend our series till 2019 (the BA09 series ended in 2009), while with Debrecen data, we improve the daily data coverage by filling most of the intermittent data gaps. (ii) Secondly, our all four observatories (Kislovodsk, Pulkovo, Debrecen) have calibration factors (“b”) close to 1 whereas for the SOON data, used by BA0911, the value of “b” is ≈1.5. Hence, the uncertainties are expected to be lower in our catalog. (iii) Next, since RGO and SOON do not overlap directly, BA09 employed published Russian data for their cross-calibration. We use data from Kislovodsk and Pulkovo to extend the RGO series and both of them have significant overlaps with RGO. It is worth mentioning that the “Russian data” used by BA09 started only in 1968 whereas the updated Pulkovo catalog which we use here goes back to 1932. This significantly increases the overlap with RGO, which again helps to minimize the uncertainties.

We then compare the two compilations quantitatively. Since the RGO dataset is essentially the same in both studies, we focus only on the post-RGO era, that is, between 1977 and 2008 (when the BA09 series ended). In this period, our catalog utilizes daily data (AS) from Kislovodsk whereas BA09 used observations (A09) from Russian books “Solnechnye Dannye” (Period-I; between 1977 and 1985) and SOON (Period-II; between 1986 and 2008). The daily difference between the two composites, δA = AS − A09, is plotted in Fig. 6a. We also separately plot the histograms of δA for the two periods in Figs. 6b and c. As seen from the figure, for Period-I (black dots), our area values are systematically lower (by ∼6%) compared to areas in BA09. This was, in fact, already noted by BA09, who reported that the Russian area measurements used in their study were systematically larger than RGO by ∼8% between 1971 and 1976. However, without being able to do a detailed analysis of the reasons for this change of the correction factor with time, they refrained from correcting it. The Kislovodsk data that we use here do not show such an offset, and thus, resolve this issue with the compilation of BA09. For Period-II (red dots), BA09 used data from SOON, whereas our catalog uses Kislovodsk areas throughout. For this period, we do not see any systematic drift, but rather the differences are distributed symmetrically, with most of the values (∼80%) being below 200 μHem (Fig. 6). The differences are clearly higher during higher-activity periods, when the number of spots is considerably larger.

thumbnail Fig. 6.

Panel a: differences (δA = AS − A09) between daily area values obtained in this study (AS) and those published by BA09 (A09). Panels b and c: histograms of δA for the two periods (Period-I: 1977–1985; Period-II: 1986–2008).

Now, these smaller differences (≤ ∣200∣ μHem) are rather difficult to diagnose due to various uncertainties in area measurements as well as in the analysis procedure. But let us take a closer look at those days where the absolute difference is more than 500 μHem (although such cases are rather rare, < 8%). As mentioned already, in this period BA09 used data from SOON whereas we use data from Kislovodsk. To better identify the source of the discrepancies, we also compare measurements on the same days from three other observatories: Debrecen, Kodaikanal, and Rome. A small sub-sample (showing extreme departures of δA ≥ 1000 μHem) is presented in Table 4.

Table 4.

Examples of area values when δA is ≥1000 μHem.

After comparing the area values across observatories on various days, we could identify essentially all possible scenarios. From AS or A09 matching with no other records to both of them match only with some. This can be also be presented quantitatively by using a set of tolerance values (which account for the possible measurement errors in the original datasets). Comparing area values in this way, we find that, for ∼50%–70% of cases, either AS or A09 are the single outliers. In roughly 30%–50% of the cases, at least one other observatory provided a value similar to either AS or A09. These results show that there is no systematic bias towards one of the data sets, for example, that of Kislovodsk, Debrecen, SOON, etc. A more sophisticated and robust technique, for example, a “spot-to-spot” calibration, is needed to address and correct for these problems. This is beyond the scope of this current work and needs a separate study.

We also compared the two composites with the sunspot number series12. We only compared the post-RGO period (1977–2008). Figure 7a shows a scatter diagram between daily (monthly) sunspot number and daily (monthly) sunspot areas from this work, in black (red). The same but using the BA09 series is shown in Fig. 7b. The Pearson correlation coefficients (Rc) for daily records is Rc, this_work = 0.883 vs. Rc, BA09 = 0.866 and for monthly data Rc, this_work = 0.960 vs. Rc, BA09 = 0.950. Allowing for the non-linear relation between sunspot number and area, we also calculate the Spearman’s rank correlation coefficients (ρ) for daily records, ρ_this_work = 0.934 vs. ρ_BA09 = 0.934, and for monthly data, ρ_this_work = 0.971 vs. ρ_BA09 = 0.960. We also compute the scatter (as the standard deviation, σ) in the area values within the bins of 20 in the sunspot number as well as the 90% confidence intervals of the σ. Panels c and d of Fig. 7 show these results for the daily and monthly data, respectively. The scatter in the daily values (panel c) in our series is lower than that in BA09 for a significant part of the sunspot number range. The scatter in the monthly data (panel d) is comparable in the two series.

thumbnail Fig. 7.

Panel a: scatter diagram between daily (monthly) sunspot number and daily (monthly) sunspot areas from this work, in black (red). Panelb: same but for the areas from the BA09 series. The relevant correlation coefficients (cc) are printed in the respective panels. Panels c and d: binned values of scatter (σ) in sunspot areas (this work in black and BA09 series in blue) vs. sunspot number (binned values of 20) for the daily and monthly data, respectively. Error-bars represent the 90% confidence intervals of σ.

4.2. Projected areas

Studies such as irradiance reconstruction (Krivova et al. 2010; Yeo et al. 2017) use the projected area values as an input. Hence, in addition to the corrected area values, we also perform the cross-calibration with the projected areas. To achieve this, we use the same set of primary observatories as used in corrected area composite before, except Pulkovo. Pulkovo does not provide the projected areas and only lists the corrected ones (Table 1). This would not have been an issue had Pulkovo provided the time of observations which are required to transform the corrected areas into projected ones (as the longitudes are listed in Carrington coordinates). Hence, we decided to leave out data from Pulkovo and only use data from RGO, Kislovodsk and Debrecen to generate this catalog. The method of cross-calibrations is the same as described previously and the results are plotted in Fig. 8. Derived calibration factors are, bRGO − Kisl = 1.02 ± 0.025 and bDeb − Kisl = 1.01 ± 0.026. A summary of the final calibrated series of daily projected areas is given in Table 5.

thumbnail Fig. 8.

Similar to Fig. 2, but generated using projected area values. Blue dotted lines (in panels a–c) highlight the 3σ boundaries which are used to remove the outliers and bias. Final “cleaned” scatter diagrams are plotted (in panels b–d), along with the best linear fit (red lines) and the 45° slopes (green lines).

Table 5.

Summary of the observations used for the final daily projected area composite.

4.3. Individual group areas

Some applications, such as the Surface Flux Transport (SFT) models often used to reconstruct the evolution of the surface magnetic fields and irradiance (see, e.g., Jiang et al. 2011, 2014), it is important to have information on individual groups. Hence, in addition to the daily calibrated areas, we also provide the individual group areas. A direct comparison of individual sunspot groups among multiple datasets is not a trivial task. This requires not only an identification of the same group across different datasets, but also accounting for the group evolution due to the difference in observing times between observatories. This itself is a subject of a separate study study and is beyond the scope of this current paper. Nonetheless, since this information is needed, we perform a simple comparison here, outside of such a detailed study, and we provide a preliminary record of individual group areas.

For this purpose, we use individual group areas (corrected for foreshortening) from RGO, Kislovodsk and Debrecen and only choose the biggest individual group per day in each of these three observatories. The rest of the analysis is the same as presented in Sect. 3. The results for RGO vs. Kislovodsk and Debrecen vs. Kislovodsk are shown in Figs. 9a,b and 9c,d, respectively. The derived calibration factors are, bRGO − Kisl = 1.031 ± 0.056 and bDeb − Kisl = 1.006 ± 0.046. They are similar to the ones we previously obtained with the daily corrected areas (Table 2). Thus, this preliminary analysis suggests that the calibration factors listed in Table 2 are also applicable, in the first approximation, to individual group areas. Therefore, we construct the composite series of individual group areas using the corresponding “b” values from Table 2.

thumbnail Fig. 9.

Similar to Fig. 2, but generated using corrected individual group areas. Blue dotted lines (in panels a–c) highlight the 3σ boundaries which are used to remove the outliers and bias. Final “cleaned” scatter diagrams are plotted (in panels b–d) along with the best linear fit (red lines) and the 45° slopes (green lines).

4.4. Photometric Sunspot Index (PSI)

In this section, we present a daily Photometric Sunspot Index (PSI) series since 1874. PSI (Hudson et al. 1982; Brandt et al. 1994) is widely used in empirical irradiance reconstructions. PSI is a simple measure of reduction in solar output due to the presence of spots on the visible solar disc.

Quantitatively, the suppression of the radiative output due to a single spot is defined as:

Δ S S = μ A S ( C S 1 ) ( 3 μ + 2 ) 2 . $$ \begin{aligned} \Delta S_S = \frac{\mu A_S(C_S-1)(3\mu +2)}{2}. \end{aligned} $$(4)

Here AS is the individual projected sunspot group area and μ is the cosine of the heliocentric angle. The quantity (CS − 1) represents the residual intensity contrast of a sunspot with respect to the quiet photosphere. Following Brandt et al. (1992, 1994); Froehlich et al. (1994), we calculate it as:

C S 1 = 0.2231 + 0.0244 · log ( A S ) . $$ \begin{aligned} C_S - 1 = 0.2231 + 0.0244 \cdot \log (A_S). \end{aligned} $$(5)

The contributions from individual spots are summed up to derive the PSI as:

P S = i = 1 n ( Δ S S S Q ) i , $$ \begin{aligned} P_S =\sum _{i=1}^{n} \left( \frac{\Delta S_S}{S_Q}\right)_i, \end{aligned} $$(6)

where n is the total number of spots on the disc on a particular day. The result is expressed in units of SQ, the quite-Sun solar irradiance which is taken as 1361 W m−2 (Kopp & Lean 2011).

We calculate the daily PSI series by plugging in area values from our calibrated individual group area series into Eq. (6). The monthly and yearly values are plotted in Figs. 10a and b, respectively. Shaded regions in Fig. 10b highlight the upper and lower limits of PS, which are generated using the two extreme limits of calibrated areas shown in Fig. 5. Next, we compare our PSI series (PS) with the PSI values from BA09 (PS_BA09). We only perform it for the period of 1986–2008 where the SOON data were used by BA09 and the results are plotted in Fig. 11. Looking at the plot, we conclude that the differences (δP = PS − PS_BA09) are small and are mostly below 1%. Differences in the derived PSI values between the two series can be due to multiple reasons. Errors in sunspot area measurement is one such source and, by definition of PSI (Eq. (5)), errors in the measured spot positions (via μ) also contribute to it. Hence, the true errors associated with individual PSI values are possibly slightly larger than our current estimate. In recent years, some studies (e.g., Foukal 2014) claimed that missing small spots in sunspot area catalogs may introduce larger uncertainties in the derived PSI values due to the different contrast of small and big spots. Now, the PSI series of BA09 between 1986 and 2008 was constructed using the SOON catalog which is known to have regularly missed smaller spots. A comparison of that series with our values (which includes smaller spots) shows minor differences below 1%. Thus, the errors in PSI introduced by the calibration of records that miss small spots (such as SOON) seem to be low. Again, a further detailed study that includes individual “spot-to-spot” comparisons is necessary to confirm this conclusion.

thumbnail Fig. 10.

Generated PSI series for monthly (top panel) and yearly (bottom panel) averaged data. Calculated error values (Grey shaded regions) are only shown for the yearly series. The dotted vertical line marks the year 1976, when RGO stopped its observing campaign.

thumbnail Fig. 11.

Panel a: relative differences (δP in %) of daily PSI values between this study (PS) and BA09 (PS_BA09). Panel b: a histogram of δP values.

5. Summary and conclusions

A number of observatories around the globe have carried out measurements of sunspot areas and positions over the last century. RGO, the longest sunspot area database to date, started its campaign in 1874 and after continuing for a century, stopped it in 1976. Several other observatories from different parts of the world (e.g., Kodaikanal, Kislovodsk, Debrecen, Rome etc.) also carried out such observing programs throughout the 20th century. Sunspot area datasets are invaluable historical records of solar magnetic fields and are key to understanding the solar variability and its historical reconstructions. Hence, a long and consistent area series is expected to be of considerable use to the solar community. However, area measurements in each of these datasets are different from the others and hence, a merger is not a trivial task.

In this work, we analyze and compare sunspot group areas from a total of nine observatories (RGO, Kislovodsk, Pulkovo, Debrecen, Kodaikanal, SOON, Rome, Catania, Yunnan). It turns out that data from only four observatories (RGO, Kislovodsk, Pulkovo, Debrecen) are sufficient to produce cross-calibrated, up-to-date (1874–2019) catalogs of daily total and individual group areas. The remaining gaps (776 days in total) could not be filled with data from the other archives as the missing days lie either before 1922 or after 2016 and none of the other archives cover these periods. For completeness, we still list the derived scaling coefficients for all the data sets in Table 2, as future studies might perhaps find this useful. As in the earlier studies, we found that areas from Kislovodsk and Pulkovo observatories are in good agreement with RGO, while also having a very good temporal coverage. This is a significant advantage over the previous similar studies in which composite of total sunspot area time series were generated using SOON areas, in which sunspot areas are 50% smaller compared to RGO measurements. In addition, SOON does not have any direct overlap with RGO whereas both Kislovodsk and Pulkovo, used in our series, have extensive overlaps.

The selection of these observatories in constructing this catalog is further justified by our analysis of the variation of the calibration factors with time. We find that our chosen observatories (RGO, Kislovodsk, Pulkovo, Debrecen) are significantly more stable than the other observatories (SOON, Kodaikanal, Yunnan, Rome). In fact, just RGO and Kislovodsk together cover 94% of the observing days between 1874 and 2019. Overall, the use of Kislovodsk (and Pulkovo) helped us to reduce the uncertainties in the generated catalog. The remaining gaps are filled with areas from Debrecen which also has similar area measurements as RGO (with the calibration factor “b” being 1.06). Thus, our entire catalog is based on observations which come either directly from RGO or from slected observatories that are very similar to RGO in their properties. This increases the quality as well as the reliability of our catalog. In this paper, we also compare data from Kodaikanal, SOON, Rome, and Catania. Our results show that although some of these data sets cover extended periods of time (e.g., SOON and Kodaikanal), their area measurements are rather significantly from RGO and, more importantly, they display considerable scatter and/or trends compared to the other observatories.

We have compared our area values to the earlier version of the composite by BA09. In particular, by using the Kislovodsk data we have accounted for a systematic offset between roughly 1977 and 1985 which was present in BA09’s series. This offset, already noted by BA09 earlier, was due to the use of old Russian data in their series. Compared to the sunspot number, the scatter in our area values is smaller than in BA09. We emphasize, however, the need for an in-depth, “spot-to-spot” calibration study to address some complicated individual cases. In addition to the corrected areas, we also provide a calibrated projected area series and a preliminary series of individual group areas. Furthermore, by using our calibrated area catalog, we calculated the daily PSI, which is often used in irradiance reconstructions. Compared to the earlier PSI record from BA09, we found that the effect of “missed small spots” (e.g., due to their usage of the SOON data) onto the calculated PSI is not significant.

To take this work further, we plan to add more data sets in the future. Sunspot data from four Chinese stations: Qingdao Observing Station, Purple Mountain Astronomical Observatory, Yunan Astronomical Observatory, and Chinese Solar-Geophysical data have recently been digitized (including the parameter extractions, Lin et al. 2019). These sets of data cover almost 90 years (1925–2015) and will serve as a great source to further improve the catalog. The other followup work planned in this context, is to perform a “spot-to-spot” calibration between observatories. This will basically be a detailed comparison (such as the shape and size) of every sunspot that has been simultaneously recorded by multiple observatories. Such an analysis will help us to better understand the dependency of measurement errors on a particular spot size. Also, it could also provide insights on quantifying the time variation effects within an observatory. All the catalogs produced here are available online at the CDS13.


1

From Russian books “Solnechnye Dannye”.

10

This catalog is available at the CDS and at http://www2.mps.mpg.de/projects/sun-climate/data.html

12

Sunspot number series V2.0 from SIDC; http://www.sidc.be/silso/datafiles

Acknowledgments

We thank the anonymous reviewer for the encouraging comments and helpful suggestions. We also thank the teams of the archives used in this study for all the work they had invested into obtaining and making these data available to the community.

References

  1. Balmaceda, L. A., Solanki, S. K., Krivova, N. A., & Foster, S. 2009, J. Geophys. Res., 114, A07104 [NASA ADS] [CrossRef] [Google Scholar]
  2. Baranyi, T., Gyori, L., Ludmány, A., & Coffey, H. E. 2001, MNRAS, 323, 223 [Google Scholar]
  3. Baranyi, T., Király, S., & Coffey, H. E. 2013, MNRAS, 434, 1713 [Google Scholar]
  4. Baranyi, T., Győri, L., & Ludmány, A. 2016, Sol. Phys., 291, 3081 [NASA ADS] [CrossRef] [Google Scholar]
  5. Brandt, P. N., Schmidt, W., & Steinegger, M. 1992, Photometry of Sunspots Observed at Tenerife, 130 [Google Scholar]
  6. Brandt, P. N., Stix, M., & Weinhardt, H. 1994, Sol. Phys., 152, 119 [NASA ADS] [CrossRef] [Google Scholar]
  7. Cimino, M. 1967, Sol. Phys., 2, 375 [NASA ADS] [CrossRef] [Google Scholar]
  8. D’Arrigo, C., & Zappalà, R. A. 1986, Solar Observation Made at Catania Astrophysical Observatory During 1984, 58 [Google Scholar]
  9. Dasi-Espuig, M., Jiang, J., Krivova, N. A., & Solanki, S. K. 2014, A&A, 570, A23 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  10. Dasi-Espuig, M., Jiang, J., Krivova, N. A., et al. 2016, A&A, 590, A63 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  11. Fligge, M., & Solanki, S. K. 1997, Sol. Phys., 173, 427 [NASA ADS] [CrossRef] [Google Scholar]
  12. Fligge, M., Solanki, S. K., & Unruh, Y. C. 2000, A&A, 353, 380 [NASA ADS] [Google Scholar]
  13. Foukal, P. 2014, Sol. Phys., 289, 1517 [NASA ADS] [CrossRef] [Google Scholar]
  14. Foukal, P., & Lean, J. 1990, Science, 247, 556 [NASA ADS] [CrossRef] [Google Scholar]
  15. Froehlich, C., Pap, J. M., & Hudson, H. S. 1994, Sol. Phys., 152, 111 [CrossRef] [Google Scholar]
  16. Giersch, O., Kennewell, J., & Lynch, M. 2018, Sol. Phys., 293, 138 [NASA ADS] [CrossRef] [Google Scholar]
  17. Gnevysheva, R. S. 1968, Sov. Astron., 11, 976 [NASA ADS] [Google Scholar]
  18. Győri, L., Baranyi, T., & Ludmány, A. 2011, in Physics of Sun and Star Spots, eds. D. Prasad Choudhary, & K. G. Strassmeier, IAU Symp., 273, 403 [Google Scholar]
  19. Győri, L., Ludmány, A., & Baranyi, T. 2017, MNRAS, 465, 1259 [NASA ADS] [CrossRef] [Google Scholar]
  20. Hathaway, D. H., Wilson, R. M., & Reichmann, E. J. 2002, Sol. Phys., 211, 357 [Google Scholar]
  21. Hudson, H. S., Silva, S., Woodard, M., & Willson, R. C. 1982, Sol. Phys., 76, 211 [NASA ADS] [Google Scholar]
  22. Isobe, T., Feigelson, E. D., Akritas, M. G., & Babu, G. J. 1990, ApJ, 364, 104 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  23. Jiang, J., Cameron, R. H., Schmitt, D., & Schüssler, M. 2011, A&A, 528, A83 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  24. Jiang, J., Hathaway, D. H., Cameron, R. H., et al. 2014, Space Sci. Rev., 186, 491 [Google Scholar]
  25. Kopp, G., & Lean, J. L. 2011, Geophys. Res. Lett., 38, L01706 [Google Scholar]
  26. Krivova, N. A., Balmaceda, L., & Solanki, S. K. 2007, A&A, 467, 335 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  27. Krivova, N. A., Vieira, L. E. A., & Solanki, S. K. 2010, J. Geophys. Res., 115, A12112 [NASA ADS] [CrossRef] [Google Scholar]
  28. Lin, G. H., Wang, X. F., Liu, S., et al. 2019, Sol. Phys., 294, 79 [NASA ADS] [CrossRef] [Google Scholar]
  29. Mandal, S., Hegde, M., Samanta, T., et al. 2017, A&A, 601, A106 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
  30. Mikhailov, A. 1955, Observatory, 75, 28 [NASA ADS] [Google Scholar]
  31. Muñoz-Jaramillo, A., Senkpeil, R. R., Windmueller, J. C., et al. 2015, ApJ, 800, 48 [Google Scholar]
  32. Nagovitsyn, Y. 1997, Soln., Dannye, 38 [Google Scholar]
  33. Nagovitsyn, Y. A., Makarova, V. V., & Nagovitsyna, E. Y. 2007, Sol. Syst. Res., 41, 81 [Google Scholar]
  34. Nagovitsyn, Y. A., Pevtsov, A. A., & Osipova, A. A. 2017, Astron. Nachr., 338, 26 [Google Scholar]
  35. Solanki, S. K., Krivova, N. A., & Haigh, J. D. 2013, ARA&A, 51, 311 [NASA ADS] [CrossRef] [Google Scholar]
  36. Spearman, C. 1904, Am. J. Psychol., 15, 72 [Google Scholar]
  37. Tlatov, A. G., & Pevtsov, A. A. 2014, Sol. Phys., 289, 1143 [Google Scholar]
  38. Vaquero, J. M. 2007, Adv. Space Res., 40, 929 [Google Scholar]
  39. Wang, J. L. 1988, in Solar and Stellar Coronal Structure and Dynamics, ed. R. C. Altrock, 292 [Google Scholar]
  40. Willis, D. M., Coffey, H. E., Henwood, R., et al. 2013, Sol. Phys., 288, 117 [NASA ADS] [CrossRef] [Google Scholar]
  41. Yeo, K. L., Krivova, N. A., & Solanki, S. K. 2017, J. Geophys. Res., 122, 3888 [Google Scholar]
  42. Zuccarello, F., Contarino, L., & Romano, P. 2011, Contrib. Astron. Observ. Skalnate Pleso, 41, 85 [NASA ADS] [Google Scholar]

All Tables

Table 1.

Details of each dataset used in this work.

Table 2.

Calibration factors derived for different observatories.

Table 3.

Summary of the observations used for the final daily corrected area composite.

Table 4.

Examples of area values when δA is ≥1000 μHem.

Table 5.

Summary of the observations used for the final daily projected area composite.

All Figures

thumbnail Fig. 1.

Sunspot area datasets used in this work. Shaded curve in grey highlights the sunspot group number record for the reference period. See Table 1 for the abbreviations.

In the text
thumbnail Fig. 2.

Different steps of the calibration process for the following pairs of observatories: RGO-Pulkovo (panels a, b); RGO-Kislovodsk (panels c, d); SOON-Kislovodsk (panels e, f); SOON-Debrecen (panels c, d). Blue dotted lines (in panels a, c, e, g) highlight the 3σ boundaries which are used to remove the outliers and bias. Final “cleaned” scatter diagrams are plotted (in panels b, d, f, h) along with the best linear fit (red lines) and the 45° slope (green lines). See text for details.

In the text
thumbnail Fig. 3.

Value of the calibration factor “b” between various pairs of observatories (see the legends in the panels) computed for each solar cycle separately.

In the text
thumbnail Fig. 4.

Top panel: overview of the structure and the coverage of the final composite of the corrected sunspot areas. Different colours (see legend at the top) show data from different observatories. Y-axis is the number of days per year, for which data are available. Bottom panel: pie-charts highlighting the percentage of contributions of observatories to the complete calibrated series (1874–2019: left chart) and only to the post-RGO period (1977–2019: right chart).

In the text
thumbnail Fig. 5.

Monthly (panel a) and yearly (panel b) averaged calibrated sunspot area series. Calculated error values (grey shaded regions) are only shown for the yearly series. The dotted vertical line marks the year 1976, when RGO stopped its observing campaign.

In the text
thumbnail Fig. 6.

Panel a: differences (δA = AS − A09) between daily area values obtained in this study (AS) and those published by BA09 (A09). Panels b and c: histograms of δA for the two periods (Period-I: 1977–1985; Period-II: 1986–2008).

In the text
thumbnail Fig. 7.

Panel a: scatter diagram between daily (monthly) sunspot number and daily (monthly) sunspot areas from this work, in black (red). Panelb: same but for the areas from the BA09 series. The relevant correlation coefficients (cc) are printed in the respective panels. Panels c and d: binned values of scatter (σ) in sunspot areas (this work in black and BA09 series in blue) vs. sunspot number (binned values of 20) for the daily and monthly data, respectively. Error-bars represent the 90% confidence intervals of σ.

In the text
thumbnail Fig. 8.

Similar to Fig. 2, but generated using projected area values. Blue dotted lines (in panels a–c) highlight the 3σ boundaries which are used to remove the outliers and bias. Final “cleaned” scatter diagrams are plotted (in panels b–d), along with the best linear fit (red lines) and the 45° slopes (green lines).

In the text
thumbnail Fig. 9.

Similar to Fig. 2, but generated using corrected individual group areas. Blue dotted lines (in panels a–c) highlight the 3σ boundaries which are used to remove the outliers and bias. Final “cleaned” scatter diagrams are plotted (in panels b–d) along with the best linear fit (red lines) and the 45° slopes (green lines).

In the text
thumbnail Fig. 10.

Generated PSI series for monthly (top panel) and yearly (bottom panel) averaged data. Calculated error values (Grey shaded regions) are only shown for the yearly series. The dotted vertical line marks the year 1976, when RGO stopped its observing campaign.

In the text
thumbnail Fig. 11.

Panel a: relative differences (δP in %) of daily PSI values between this study (PS) and BA09 (PS_BA09). Panel b: a histogram of δP values.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.