Sunspot area catalogue revisited: Daily cross-calibrated areas since 1874

Long and consistent sunspot area records are important for understanding the long-term solar activity and variability. Multiple observatories around the globe have regularly recorded sunspot areas, but such individual records only cover restricted periods of time. Furthermore, there are also systematic differences between them, so that these records need to be cross-calibrated before they can be reliably used for further studies. We produce a cross-calibrated and homogeneous record of total daily sunspot areas, both projected and corrected, covering the period between 1874 and 2019. A catalogue of calibrated individual group areas is also generated for the same period. We have compared the data from nine archives: Royal Greenwich Observatory (RGO), Kislovodsk, Pulkovo, Debrecen, Kodaikanal, Solar Optical Observing Network (SOON), Rome, Catania, and Yunnan Observatories, covering the period between 1874 and 2019. Mutual comparisons of the individual records have been employed to produce homogeneous and inter-calibrated records of daily projected and corrected areas. As in earlier studies, the basis of the composite is formed by the data from RGO. After 1976, the only datasets used are those from Kislovodsk, Pulkovo and Debrecen observatories. This choice was made based on the temporal coverage and the quality of the data. In contrast to the SOON data used in previous area composites for the post-RGO period, the properties of the data from Kislovodsk and Pulkovo are very similar to those from the RGO series. They also directly overlap the RGO data in time, which makes their cross-calibration with RGO much more reliable. We have also computed and provide the daily Photometric Sunspot Index (PSI) widely used, e.g., in empirical reconstructions of solar irradiance.


Introduction
Sunspots, the largest known dark photospheric features, are probably the most famous manifestation of solar activity. Solar activity is driven and modulated by a common process, the solar magnetic field and its interaction with solar convection. Sunspots are one of the oldest (although indirect) measurements of the solar magnetic fields. Hence, sunspot area records play an important role in our understanding of the long term behaviour of solar magnetic activity and variability.
Barring few individual measurements (see Vaquero 2007 for a review of historical sunspot observations), systematic monitoring of sunspot area started at the Royal Greenwich Observatory (RGO) in 1874. RGO recorded daily areas and positions of sunspots. In the 20th century, various observatories around the world (e.g. Kodaikanal, Pulkovo, Mt. Wilson, Kislovodsk, to name a few), also initiated similar observing programs and Generated composites are available online at http://www2.mps.mpg.de/projects/sun-climate/data.html or at the CDS via anonymous ftp to cdsarc.u-strasbg.fr (130.79.128.5) or via http://cdsweb.u-strasbg.fr/cgi-bin/qcat?J/A+A/ started accumulating sunspot data. After continuing for a century, RGO stopped its campaign in 1976 and transferred the program to Debrecen observatory, where such area observations are still carried out on a daily basis. If all these available records are stitched together, the combined series covers a period of almost 150 years, which yields a data set suitable for studies of the long-term changes in solar magnetism.
Such a composite series is extremely important for multiple solar applications. For example, individual sunspot group areas are required for reconstructions of the long-term evolution of the solar surface magnetic field (e.g., Jiang et al. 2011Jiang et al. , 2014, estimates of the solar radiative flux suppression via the Photometric Sunpot Index (PSI; Brandt et al. 1994), or assessment of the sunspot magnetic field and its long-term changes (Tlatov & Pevtsov 2014;Nagovitsyn et al. 2017), while historical solar irradiance reconstructions (e.g., Foukal & Lean 1990;Fligge et al. 2000;Krivova et al. 2007Krivova et al. , 2010Dasi-Espuig et al. 2014Yeo et al. 2017) often also use the daily total areas as input. Understanding and reconstructions of the past solar variability are, Article number, page 1 of 12 arXiv:2004.14618v1 [astro-ph.SR] 30 Apr 2020 in turn, important for an assessment of the solar influence on Earth's climate (see., e.g., Solanki et al. 2013).
It is, therefore, not surprising that significant effort has been made towards cross-calibrating the various individual sunspot area datasets (Nagovitsyn 1997;Fligge & Solanki 1997;Baranyi et al. 2001;Hathaway et al. 2002;Balmaceda et al. 2009; Baranyi et al. 2013). This is, however, not a trivial task. Deviations in the observing facilities, seeing conditions, capturing devices, data processing techniques, etc., introduce partly significant systematic differences between the records. Two of the widely used area catalogues of modern times, as produced by Hathaway et al. (2002) and Balmaceda et al. (2009), utilize a combination of area observations from RGO and SOON (Solar Optical Observing Network). However, SOON data has several critical limitations. Sunspot area values in this catalogue, are significantly (by almost 50%) underestimated as compared to RGO (Fligge & Solanki 1997;Hathaway et al. 2002;Balmaceda et al. 2009). To a large extent, this is related to the fact that these data missed spots smaller than 10 µHem and as the number of small spots varies with solar activity, a single calibration factor might introduce artefacts in the derived catalogues (see Foukal 2014). Furthermore, SOON has no direct overlap with RGO. Hence, the cross-calibration has to be done indirectly, e.g. using Russian data 1 as done by Balmaceda et al. (2009), which amplifies the uncertainties further. Debrecen data whose area measurements are found to be similar to those from RGO (Baranyi et al. 2013), have a very short (3 years) overlap with RGO.
Over the past few years, more sunspot data became publicly available in digital form. A significant development is that all data from the Pulkovo observatory (St. Petersburg) and its Mountain station in Kislovodsk have been digitised and made public (Nagovitsyn 1997). These data are unique in the sense that (i) they cover a long period  allowing for a significant direct overlap with RGO, (ii) the smallest areas recorded in these catalogues are the same as in RGO, i.e. 1 millionth of a solar hemisphere (µHem), and (iii) earlier studies (Gnevysheva 1968;Balmaceda et al. 2009;Baranyi et al. 2013;Muñoz-Jaramillo et al. 2015) showed that their statistical properties were very similar to those of the RGO data. Also, daily sunspot observations from Kodaikanal solar observatory in India, have recently been digitised and cataloged (Mandal et al. 2017). Similarly to Pulkovo and Kislovodsk, they cover a long period  and have a significant overlap with RGO.
In this work, we update and extend the calibrated sunspot area series of Balmaceda et al. (2009) (hereafter BA09) by employing the additional and updated data sets. We describe the data we use in Sect. 2 and our methods in Sect. 3. In Sect. 3 we present and discuss our composite records of sunspot areas i.e. daily corrected areas in Sect. 4.1, daily projected areas in Sect. 4.2 and individual group areas in Sect. 4.3. In Sect. 4.4, we present the calculated daily PSI values (constructed using our area composite) which are an important input to empirical irradiance models. Our conclusions are summarized in Sect. 5.

Data
In this study we use sunspot area data from a total of nine observatories. Figure 1 shows the timeline of all these data sets, while Table 1 also lists the periods covered by each of them, the fractional temporal coverage and the minimum reported sunspot area. 1 from Russian books "Solnechnye Dannye" The longest record comes from the Royal Greenwich Observatory (RGO) which started observing the Sun in 1874 and continued until 1976 (Willis et al. 2013). The observations have been carried out at several observatories at different locations (Royal Greenwich Observatory, England; Cape of Good Hope, South Africa; the Dehra Dun Observatory, India; the Kodaikanal Observatory, India; the Royal Alfred Observatory, Mauritius; along with contributions from the Harvard College Observatory; Melbourne Observatory; Mount Wilson Observatory and the US Naval Observatory), and then processed and combined into the final record at RGO. This allowed for an uninterrupted and consistent daily coverage over a period 100 years. This catalogue 2 provides daily individual group areas as well as their heliographic positions.
The next two datasets listed in Table 1, are from Kislovodsk (1952-2018) 3 (Nagovitsyn et al. 2007) and Pulkovo (1932-1991) 4 observatories (Mikhailov 1955). Pulkovo Observatory, originally established at 1839 with the aim of cataloging the positions of stars, started accumulating solar images (photosphere and chromosphere) in 1932. As in the case of RGO, observations were carried out at a number of various locations in the Soviet Union and then collected and processed at Pulkovo allowing for a consistency of the final series. During the second world war, Pulkovo observatory was severely damaged, regular observations were not possible, and the original photographic plates from the pre-war period were destroyed. In 1945, the observatory received support from the government for restoration and continuation of the observational programme. Furthermore, the construction of a new branch, the Kislovodsk mountain station was initiated in 1948. Afterwards, both of these observatories, independently recorded daily sunspot data and their catalogues provide individual group area and positions. It is worth mentioning here that prior to May, 2011, the positional information (latitude and longitude) of each group is provided only for the day of its first appearance, while afterwards positions are available on each day throughout the entire lifetime of a group.
The Debrecen observatory 5 is the official continuation of RGO programme since 1977 (Győri et al. 2011;Baranyi et al. 2016). Most of the observations are taken at Debrecen observatory and its Gyula Observing Station. However, to fill gaps in this catalogue, observations from several contributing observatories (Abastumani Astrophysical Observatory, Georgia; Ebro Observatory, Spain; Helwan Observatory, Egypt; Kanzelhöehe Solar Observatory, Austria; Kiev University Observatory, Ukraine; Kislovodsk Observing Station of Pulkovo Observatory, Russia; Kodaikanal Observatory, India and Tashkent Observatory, Uzbekistan) are also used. Recently (2016 onwards), the observatory started using calibrated SDO/HMI observations to fill the missing days in their catalogue. In order to maintain consistency and also to avoid propagation of potential uncertainties due to this additional scaling, we only used the Debrecen data between 1974 and 2015 during our cross-calibration process. We do, however, use the post-2016 Debrecen data to fill the remaining gaps (247 days have been filled with this data) in our final area composite after 2016 .
The newly digitized data from Kodaikanal solar observatory 6 in India is next on our list. This set of newly digitized high resolution white-light solar images spans more than a century (1904- 2011). However, due to issues with the observing plates, the current sunspot catalogue lists data only from 1921 to 2011 (Mandal et al. 2017). This catalogue provides individual spot areas and positions, however they are yet to be classified into sunspot groups.
We also use sunspot observations from SOON (Solar Optical Observing Network) (Giersch et al. 2018). This is a network of solar observatories operated by the US Air Force (USAF), which allows a continuous, 24-h monitoring of the Sun. Finally, the last three data sets are those from: (i) Rome Astronomical Observatory 7 (Cimino 1967); (ii) Yunnan Observatory 8 in China (Wang 1988); and (iii) Catania Astrophysical Observatory in Italy 9 (D'Arrigo & Zappalà 1986;Zuccarello et al. 2011). All these three catalogues provide group areas and positions. A summary of all the data used in this work is given in Table 1.
Among all the sources (Table 1), RGO has the longest observing period (about 100 years), the highest data coverage (99%) as well as the smallest (together with Kislovodsk, Pulkovo and Debrecen) reported spot area (1µHem). This makes RGO the most suitable as the reference series against which we calibrate all other records, as was also done by other studies in the past (Hathaway et al. 2002;Balmaceda et al. 2009;Baranyi et al. 2013;Muñoz-Jaramillo et al. 2015). The remaining data sets have different quality as well as data coverage. The longest among the remaining records is from Kodaikanal (90 years), followed by Kislovodsk (68 years), Pulkovo (60 years), Debrecen (42 years), Rome (43 years) and SOON (36 years). Catania and Yunnan are relatively shorter records of roughly 10 years each. Beside RGO, spots as small as 1 µHem have also been reported by Kislovodsk, Pulkovo and Debrecen, whereas Kodaikanal, Rome, Yunnan and Catania recorded larger spots (see Table 1). Finally, as mentioned above, SOON has a significantly higher lower area threshold of 10 µHem. Earlier studies (Gnevy-sheva 1968;Balmaceda et al. 2009;Baranyi et al. 2013;Muñoz-Jaramillo et al. 2015) have shown that area measurements from Kislovodsk, Pulkovo and Debrecen are of mutually similar scale (having calibration coefficients close to 1). Therefore, we divide the listed sources into two categories, primary and secondary.
The purpose behind such labelling is to prioritize the "primary" sources when creating a composite series later, while the secondary datasets should then be used to fill the gaps which could not be covered by the primary ones. This classification is based on the following criteria. A dataset which is sufficiently long in time (3 solar cycles or more) and has comparatively few data gaps (i.e data coverage of 80% or more), qualifies as a primary source. From Table 1 we notice that five sources: RGO, Pulkovo, Kislovodsk, Debrecen and SOON appear to satisfy these conditions. However, the minimum sunspot area reported in the SOON database, is significantly higher (10 µHem) than the other four observatories (1 µHem) and this can potentially affect the calibration process (see Foukal 2014). Hence, SOON, along with the remaining four other observatories (Kodaikanal, Yunnan, Catania and Rome), get labelled as secondary sources.

Method
The method we adopted to cross-calibrate the individual records is similar to that described by BA09. First, we identify the common observing days between any two observatories and perform the subsequent analyses over these overlapping periods only. We then plot a scatter diagram of daily area values between every given pair. Representative examples are shown in Fig. 2. We then fit a straight line forced to pass through the origin to the data: as shown by the solid red lines in Fig 2a, c, e, g. Points outside the 3σ th threshold, defined as from the previously obtained regression line are considered to be "outliers" and removed (highlighted with the blue dotted lines in Fig. 2a, c, e, g). We also reject points close to the origin (below the line joining the points [0, 3σ th ] and [3σ th , 0]) as they may introduce a bias into the calculated slope. At this stage, we are only left with the data which satisfy all the above criteria as shown in Fig. 2b, d, f, h. The linear regression (Eq. 1) is then applied once again to obtain a slope b xy . Since the choice of dependent and independent variable is completely arbitrary and may have an impact on the derived slope (commonly referred as the 'attenuation bias'; Spearman 1904), we repeat the above procedure by swapping the variables and obtain a new slope b yx . The final calibration factor ("b"), following the 'bisector line' method (Isobe et al. 1990), is computed as The error associated with the final calibration factor ("b") has contributions from many different sources: (i) σ slope : the fitting errors associated with the individual slopes b xy and b yx ; (ii) σ dif f : the difference between the slopes b xy and 1/b yx ; and (iii) σ cycle : effects of the time dependent changes in the data onto the final "b" factor. For a chosen pair, we compute "b" for different solar cycles separately and the standard deviation of these individually measured "b" values is taken as σ cycle . Thus, the final uncertainty, rather conservatively, is calculated as σ = σ slope 2 + σ dif f 2 + σ cycle 2 .

. Comparison of individual records
We apply the method described in Section 3 on all pairs of observatories that have an overlap with each other. The derived parameters, obtained using the corrected areas, are tabulated in Table 2. The table lists individual calibration factors b xy and b yx (columns four and five) as well as the final calibration factor "b" (last column). We multiplied the areas recorded by observatories listed under 'Obs2' with "b" to match the values from those listed under 'Obs1'. With this definition, values of "b" close to 1 imply that the original area measurements obtained at the two observatories, on average, are similar to each other. Let us first discuss those cases where an observatory has a direct overlap with RGO (this is the case for Kislovodsk, Pulkovo, Rome and Kodaikanal). The "b" factor between RGO and Pulkovo, b RGO−Pul is derived to be 1.014±0.069, in agreement with BA09's result of 1.019. Similarly, b RGO−Kisl comes out to be 0.984±0.094, which also agrees with the value of 0.979 reported by Baranyi et al. (2013). Thus, area measurements from Pulkovo and Kislovodsk are similar to those from RGO. Furthermore, Muñoz-Jaramillo et al. (2015) found that the individual sunspot group size distributions are also similar to each other. This is important for building a composite area series. The situation is different when we compare RGO to Kodaikanal. We find b RGO−Kodai to be 1.166±0.132, which indicates that the spot areas in the current Kodaikanal catalogue are lower (≈17%) than the RGO values. This can also be seen (at least qualitatively) in Mandal et al. (2017). Between RGO and Rome, b RGO−Rome equals to 1.091±0.036 which is similar to the value obtained by BA09.
Next, we look at those observatories which have an insignificantly short ( less than several years) overlap with RGO. Their inter-calibrations are accomplished 'indirectly' i.e. by using another source which overlaps with these observatories with RGO. Two of the longest series in this list are Debrecen and SOON. Using the overlap between Debrecen and Pulkovo, we calculate b RGO−Deb to be 1.061±0.091, which is consistent with the factor of 1.08 reported by Baranyi et al. (2013), even though it was obtained from a shorter period of Debrecen data. For SOON, b RGO−SOON is estimated via Kislovodsk and is found to be 1.48±0.102. Once again, it matches the values previously reported in the literature (Hathaway et al. 2002;Balmaceda et al. 2009). It is important to note that among all our data sources, SOON has the maximum area departure from RGO (almost 50%). As discussed in the introduction, there can be a number of reasons for this significant underestimation. According to Foukal (2014), it is those 'too small to draw' spots (<10µHem) in the SOON catalogue which are mostly responsible for this deficit. However, Győri et al. (2017) argued that the omission of small spots can only account for ≈3.4% of the area deficit and the measurement procedure may be responsible for the rest. By reanalyzing a portion of SOON data, Giersch et al. (2018) concluded that the rounding errors associated with the limb-correction overlay, used on the SOON drawings, can actually lead to an underestimation of spot areas as much as 8.5%.
One of the main issues in calibrating areas between two observatories, is to address the temporal evolution within a dataset. These fluctuations can arise due to changes in quality of instruments or capturing devices and measuring techniques, as well as from aging due to preservation of sunspot drawings and photographic plates over a longer time. Now, any such changes in one or both series will show up as time evolution in the derived calibration factor. To see the extent of such an effect, we plot the values of "b", computed for each cycle separately, as a function of time in Fig. 3.
Variations over shorter timescales (monthly or yearly) are not considered here as they are significantly affected by uncertainties coming from insufficient statistics. Different lines in Fig. 3 with various colors and symbols represent the evolution of "b" for different pairs of observatories (see legend in the Figure). Figure 3 demonstrates that "b" does vary with time for all tested combinations of observatories. However, for the cases, when both data sets in question are our "primary" choice (see Sect. 2 and Table 1), the variations are within the error-bars. In some other cases, the calibration factor shows significantly larger variations, e.g between SOON and Debrecen or Kodaikanal and RGO. However, all cases with large fluctuations (e.g., where Kodaikanal or SOON data enter) are found for the secondary sources which were merely meant to be used to fill occasional data gaps. This result (1) supports our choice of the primary sources and (2) justifies the use of a single "b" value for each pair of observatories (as listed in Table 2) for building the composite record.  At this stage, we are ready to generate a calibrated and homogeneous sunspot area series between 1874 and 2019. We start by using the data from four primary observatories from our list (Table 1), i.e RGO, Pulkovo, Kislovodsk and Debrecen. RGO, being the absolute reference (for the reasons discussed in Sec. 2), is used as it is. Next, both Kislovodsk and Pulkovo have a direct and sufficiently long overlap with RGO (which Debrecen does not have). Their "b" values are also similar (see Table 2). However observations from Kislovodsk are considered to be better suited for the extension of RGO because of the stable background history of this catalogue (Nagovitsyn et al. 2007). The other advantage of the Kislovodsk record over Pulkovo is that it offers an additional 28 years of added observations beyond 1991. Hence, we use areas from Kislovodsk as the main record to extend our catalogue after the RGO period. The leftover missing days are first filled with areas from Pulkovo and then from Debrecen (see Fig. 4 and Table 3 for a summary of the observations constituting the final composite of daily corrected areas).
Our final catalogue contains about 145 years of daily sunspot area values between 1874 and 2019. This catalogue is available online with this publication and at http://www2.mps.mpg.de/ projects/sun-climate/data.html. The total number of missing days in this series is 776 (corresponding to 1.4% of the total coverage). We could not fill these missing days with data from any of the remaining five observatories (Kodaikanal, Rome, Yunnan, SOON, Catania) because out of the 776 missing days, 443 days are between 1874-1922 where only RGO observations are available and 321 days are between July, 2018 -Dec, 2019 where only observations from Kislovodsk are available (Figure 4). We note here that the cataloging process at Kislovodsk and Debrecen for the last two years (2018 onward) is still in progress and we plan to fill these missing days as soon as the data become available. While we have compared a total of nine archives, only four of them have actually entered the final composite. We nevertheless show the results obtained from inter-comparisons of these "secondary" datasets and list their scaling coefficients in Table 2 for completeness. Panels a and b of Fig. 5 show the calibrated monthly and yearly averaged time series of corrected areas. To visualize the uncertainty, we overplot two area series generated with the two extreme limits of the errors in "b" i.e b + σ and b − σ (from Table 2), shown as the shaded regions in Fig. 5b. As expected, the effect is prominent mostly during solar maxima when the total spot coverage is higher. This results in the corresponding uncertainty in the cycle amplitudes over the post-RGO period, which has to be kept in mind in relevant studies.

Comparison with BA09
While we have generally followed the procedure by BA09, there are also some differences. (i) Firstly, instead of SOON we use Kislovodsk and Debrecen data for the post-RGO period. With Kislovodsk data we extend our series till 2019 (the BA09 series ended in 2009), while with Debrecen data, we improve the daily data coverage by filling most of the intermittent data gaps. (ii) Secondly, our all four observatories (Kislovodsk, Pulkovo, Debrecen) have calibration factors ("b") close to 1 whereas for the SOON data, used by BA09 10 , the value of "b" is ≈1.5. Hence, the uncertainties are expected to be lower in our catalogue. (iii) Next, since RGO and SOON do not overlap directly, BA09 employed published Russian data for their cross-calibration. We use data from Kislovodsk and Pulkovo to extend the RGO series and both of them have significant overlaps with RGO. It is worth mentioning that the 'Russian data' used by BA09 started only in 1968 whereas the updated Pulkovo catalogue which we use here goes back to 1932. This significantly increases the overlap with RGO, which again helps to minimize the uncertainties.
Let us now compare the two compilations quantitatively. Since the RGO dataset is essentially the same in both studies, we focus only on the post-RGO era, i.e. between 1977 and 2008 (when the BA09 series ended). In this period, our catalogue utilises daily data (A S ) from Kislovodsk whereas BA09 used observations (A 09 ) from Russian books "Solnechnye Dannye" (Period-I;between 1977-1985 and SOON (Period-II;between 1986-2008. The daily difference between the two composites, δA=A S -A 09 , is plotted in Fig. 6a. We also separately plot the histograms of δA for the two periods in Figure 6b   the Russian area measurements used in their study were systematically larger than RGO by ∼8% between 1971 and 1976. However, without being able to do a detailed analysis of the reasons for this change of the correction factor with time, they refrained from correcting it. The Kislovodsk data that we use here do not show such an offset, and thus solve this issue with the compilation of BA09. For Period-II (red dots), BA09 used data from SOON whereas out catalogue uses Kislovodsk areas throughout. For this period, we do not see any systematic drift, but rather the differences are distributed symmetrically, with most of the values (∼80%) being below 200 µHem (Fig. 6). The differences are clearly higher during higher-activity periods, when the number of spots is considerably larger. Now, these smaller differences (≤ | 200 | µHem) are rather difficult to diagnose due to various uncertainties in area measurements as well as in the analysis procedure. But let us take a closer look at those days where the absolute difference is more than 500 µHem (although such cases are rather rare, <8%). As mentioned already, in this period BA09 used data from SOON whereas we use data from Kislovodsk. To better identify the source of the discrepancies, we also compare measurements on the same days from three other observatories: Debrecen, Kodaikanal and Rome. A small sub-sample (showing extreme departures of δA ≥1000 µHem) is presented in Table 4.  1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 2020 Time ( (Period-I: 1977(Period-I: -1985Period-II: 1986-2008. After comparing the area values across observatories on various days, we could identify essentially all possible scenarios. From A S or A 09 matching with no other records to both of them match only with some. This can be also be presented quantitatively by using a set of tolerance values (which account for the possible measurement errors in the original datasets). Comparing area values in this way, we find that, for ∼ 50% to 70% of cases, either A S or A 09 are the single outliers. In roughly 30% to 50% of the cases, at least one other observatory provided a value similar to either A S or A 09 . These results show that there is no systematic bias towards one of the data sets, e.g., Kislovodsk, Debrecen, SOON, etc. A more sophisticated and robust technique, e.g. a 'spot-to-spot' calibration, is needed to address and correct for these problems. This is beyond the scope of this current work and needs a separate study.
We have also compared the two composites with the sunspot number series 11 . We only compare the post-RGO period . Panel 7a shows a scatter diagram between daily (monthly) sunspot number and daily (monthly) sunspot areas from this work, in black (red). The same but using the BA09 series is shown in panel 7b. The Pearson correlation coefficients (R c ) for daily records is R c,this_work =0.883 vs R c,BA09 =0.866 and for monthly data R c,this_work =0.960 vs R c,BA09 =0.950. Allowing for the non-linear relation between sunspot number and area, we also calculate the Spearman's rank correlation coefficients (ρ) for daily records, ρ _this_work =0.934 vs ρ _BA09 =0.934, and for monthly data, ρ _this_work =0.971 vs ρ _BA09 =0.960. We also compute the scatter (as the standard deviation, σ) in the area values within the bins of 20 in the sunspot number as well as the  Fig. 7. a: Scatter diagram between daily (monthly) sunspot number and daily (monthly) sunspot areas from this work, in black (red). b: The same but for the areas from the BA09 series. The relevant correlation coefficients (cc) are printed in the respective panels. c and d: Binned values of scatter (σ) in sunspot areas (this work in black and BA09 series in blue) vs. sunspot number (binned values of 20) for the daily and monthly data, respectively. Error-bars represent the 90% confidence intervals of σ.
90% confidence intervals of the σ. Panels c and d of Fig. 7 show these results for the daily and monthly data, respectively. The scatter in the daily values (panel c) in our series is lower than that in BA09 for a significant part of the sunspot number range. The scatter in the monthly data (panel d) is comparable in the two series.

Projected areas
Studies such as irradiance reconstruction (Krivova et al. 2010;Yeo et al. 2017) use the projected area values as an input. Hence, in addition to the corrected area values, we also perform the cross-calibration with the projected areas. To achieve this, we use the same set of primary observatories as used in corrected area composite before, except Pulkovo. Pulkovo does not provide the projected areas and only lists the corrected ones (Table 1). This would not have been an issue had Pulkovo provided the time of observations which are required to transform the corrected areas into projected ones (as the longitudes are listed in Carrington coordinates). Hence, we decided to leave out data from Pulkovo and only use data from RGO, Kislovodsk and Debrecen to generate this catalogue. The method of crosscalibrations is the same as described previously and the results are plotted in Figure 8. Derived calibration factors are, b RGO−Kisl =1.02±0.025 and b Deb−Kisl =1.01±0.026. A summary of the final calibrated series of daily projected areas is given in Table 5.

Individual group areas
Some applications, such as the Surface Flux Transport (SFT) models often used to reconstruct the evolution of the surface magnetic fields and irradiance (see, e.g., Jiang et al. 2011Jiang et al. , 2014, it is important to have information on individual groups. Hence, in addition to the daily calibrated areas, we also provide the individual group areas. A direct comparison of individual sunspot groups among multiple datasets is not a trivial task. This requires not only an identification of the same group across different datasets, but also accounting for the group evolution due to the difference in observing times between observatories. This itself is a subject of a separate study and is beyond the scope of this current paper. Nonetheless, since this information is needed, we perform a simple comparison here without such a detailed study and provide a preliminary record of individual group areas. For this purpose, we use individual group areas (corrected for foreshortening) from RGO, Kislovodsk and Debrecen and only choose the biggest individual group per day in each of these three observatories. The rest of the analysis is the same as presented in Sect. 3. The results for RGO vs. Kislovodsk and Debrecen vs. Kislovodsk are shown in Fig. 9a,b and Fig. 9c,d, respectively. The derived calibration factors are, b RGO−Kisl =1.031±0.056 and b Deb−Kisl =1.006±0.046. They are similar to the ones we previously obtained with the daily corrected areas (Table 2). Thus, this preliminary analysis suggests that the calibration factors listed in Table 2 are also applicable, in the first approximation, to individual group areas. Therefore, we construct the composite series of individual group areas using the corresponding "b" values from Table 2.

Photometric Sunspot Index (PSI)
In this section we present a daily Photometric Sunspot Index (PSI) series since 1874. PSI (Hudson et al. 1982;Brandt et al. 1994) is widely used in empirical irradiance reconstructions. PSI Article number, page 9 of 12 is a simple measure of reduction in solar output due to the presence of spots on the visible solar disc. Quantitatively, the suppression of the radiative output due to a single spot is defined as: Here A S is the individual projected sunspot group area and µ is the cosine of the heliocentric angle. The quantity (C S − 1) represents the residual intensity contrast of a sunspot with respect to the quiet photosphere. Following Brandt et al. (1992Brandt et al. ( , 1994; Froehlich et al. (1994), we calculate it as C S − 1 = 0.2231 + 0.0244 · log(A S ).
The contributions from individual spots are summed up to derive the PSI as: where n is the total number of spots on the disc on a particular day. The result is expressed in units of S Q , the quite-Sun solar irradiance which is taken as 1361 W/m 2 (Kopp & Lean 2011). We calculate the daily PSI series by plugging in area values from our calibrated individual group area series into Eq. 6. The monthly and yearly values are plotted in Fig. 10a and b, respectively. Shaded regions in Panel 10b highlight the upper and lower limits of P S , which are generated using the two extreme limits of calibrated areas shown in Fig. 5 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000  data were used by BA09 and the results are plotted in Fig. 11. Looking at the plot, we conclude that the differences (δP=P S -P S_BA09 ) are small and are mostly below 1%. Differences in the derived PSI values between the two series can be due to multiple reasons. Errors in sunspot area measurement is one such source and, by definition of PSI (Eq. 5), errors in the measured spot positions (via µ) also contribute to it. Hence, the true errors associated with individual PSI values are possibly slightly larger than our current estimate. In recent years, some studies (e.g., Foukal 2014) claimed that missing small spots in sunspot area catalogues may introduce larger uncertainties in the derived PSI values due to the different contrast of small and big spots. Now, the PSI series of BA09 between 1986-2008 was constructed using the SOON catalogue which is known to have regularly missed smaller spots. A comparison of that series with our values (which includes smaller spots) shows small differences below 1%. Thus, the errors in PSI introduced by the calibration of records that miss small spots (such as SOON) seem to be low. Again, a further detailed study including individual 'spot-to-spot' comparisons, is necessary to confirm this conclusion.

Summary and Conclusion
A number of observatories around the globe carried out measurements of sunspot areas and positions over the last century. RGO, the longest sunspot area database to date, started its campaign in 1874 and after continuing for a century, stopped it in 1976. Several other observatories from different parts of the world (e.g. Kodaikanal, Kislovodsk, Debrecen, Rome etc.), also carried out such observing programs during the 20th century. Sunspot area datasets are invaluable historical records of solar magnetic fields and are key to understanding the solar variability and its historical reconstructions. Hence, a long and consistent area series is expected to be of considerable use to the solar community. However, area measurements in each of these datasets are different from the others and hence, a merger is not a trivial task.
In this work, we have analysed and compared sunspot group areas from a total of nine observatories (RGO, Kislovodsk, Pulkovo, Debrecen, Kodaikanal, SOON, Rome, Catania, Yunnan). It turned out that data from only four observatories (RGO, Kislovodsk, Pulkovo, Debrecen) are sufficient to produce crosscalibrated, up-to-date (1874-2019) catalogues of daily total and individual group areas. The remaining gaps (776 days in total) could not be filled with data from the other archives as the missing days lie either before 1922 or after 2016 and none of the other archives covers these periods. For completeness, we still list the derived scaling coefficients for all the data sets in Table 2, as future studies might perhaps find this useful. As in the earlier studies, we found that areas from Kislovodsk and Pulkovo observatories are in good agreement with RGO, while also having a very good temporal coverage. This is a significant advantage over the previous similar studies in which composite of total sunspot area time series were generated using SOON areas, in which sunspot areas are 50% smaller compared to RGO measurements. Along with that, SOON does not have any direct overlap with RGO whereas both Kislovodsk and Pulkovo, used in our series, have long overlaps. The choice of these observatories in constructing this catalogue is further justified by our analysis of the variation of the calibration factors with time. We find that our chosen observatories (RGO, Kislovodsk, Pulkovo, Debrecen) are significantly more stable than the other observatories (SOON, Kodaikanal, Yunnan, Rome). In fact, just RGO and Kislovodsk together cover 94% of the observing days between 1874 and 2019. Overall, the use of Kislovodsk (and Pulkovo) helped us to reduce the uncertainties in the generated catalogue. The remaining gaps are filled with areas from Debrecen which also has similar area measurements as RGO (with the calibration factor "b" being 1.06). Thus, our entire catalogue is made out of observations which are either directly from RGO or very similar to RGO in their properties. This increases the quality as well as the reliability of our catalogue. In this paper, we also compared data from Kodaikanal, SOON, Rome and Catania. Our results show that, although some of these data sets cover long periods (e.g. SOON and Kodaikanal), their area measurements are rather significantly different from RGO and, more importantly, display Years considerable scatter and/or trends compared with the other observatories.
We have compared our area values to the earlier version of the composite by BA09. In particular, by using the Kislovodsk data we have accounted for a systematic offset between roughly 1977 and 1985 which was present in BA09's series. This offset, already noted by BA09 earlier, was due to the use of old Russian data in their series. Compared to the sunspot number, the scatter in our area values is smaller than in BA09. We emphasize, however, the need for an in-depth, 'spot-to-spot' calibration study to address some complicated individual cases. In addition to the corrected areas, we also provide a calibrated projected area series and a preliminary series of individual group areas. Furthermore, by using our calibrated area catalogue we have calculated the daily PSI, which is often used in irradiance reconstructions. Compared to the earlier PSI record from BA09, we found that the effect of 'missed small spots' (e.g. due to their usage of the SOON data) onto the calculated PSI, is not significant.
To take this work further, we plan to add more data sets in the future. Sunspot data from four Chinese stations: Qingdao Observing Station, Purple Mountain Astronomical Observatory, Yunan Astronomical Observatory and Chinese Solar-Geophysical data have recently been digitized (including the parameter extractions) (Lin et al. 2019). These sets of data cover almost 90 years  and will be a great source to further improve the catalogue. The other followup work planned in this context, is to perform a 'spot-to-spot' calibration between observatories. This will basically be a detailed comparison (such as the shape and size) of every sunspots that has been simultaneously recorded by multiple observatories. Such an analysis will help us to better understand the dependency of measurement errors on a particular spot size. Also, it could also provide insights on quantifying the time variation effects within an observatory.
All catalogues produced here are available online with this publication and at http://www2.mps.mpg.de/projects/ sun-climate/data.html.

Acknowledgment
We thank the anonymous reviewer for the encouraging comments and helpful suggestions. We also thank the teams of the archives used in this study for all the work they had invested into obtaining and making these data available to the community.