Issue 
A&A
Volume 602, June 2017



Article Number  A69  
Number of page(s)  18  
Section  The Sun  
DOI  https://doi.org/10.1051/00046361/201630045  
Published online  14 June 2017 
New reconstruction of the sunspot group numbers since 1739 using direct calibration and “backbone” methods^{⋆}
^{1} MaxPlanckInstitut für Sonnensystemforschung, JustusvonLiebigweg 3, 37077 Göttingen, Germany
email: chatzistergos@mps.mpg.de
^{2} Space Climate Research Unit, University of Oulu, 90014 Oulu, Finland
^{3} Sodankylä Geophysical Observatory, University of Oulu, 90014 Oulu, Finland
^{4} Ioffe PhysicalTechnical Institute, 194021 St. Petersburg, Russia
^{5} School of Space Research, Kyung Hee University, Yongin, 446701 Gyeonggi, Republic of Korea
Received: 10 November 2016
Accepted: 18 February 2017
Context. The group sunspot number (GSN) series constitute the longest instrumental astronomical database providing information on solar activity. This database is a compilation of observations by many individual observers, and their intercalibration has usually been performed using linear rescaling. There are multiple published series that show different longterm trends for solar activity.
Aims. We aim at producing a GSN series, with a nonlinear nonparametric calibration. The only underlying assumptions are that the differences between the various series are due to different acuity thresholds of the observers, and that the threshold of each observer remains constant throughout the observing period.
Methods. We used a daisy chain process with backbone (BB) observers and calibrated all overlapping observers to them. We performed the calibration of each individual observer with a probability distribution function (PDF) matrix constructed considering all daily values for the overlapping period with the BB. The calibration of the BBs was carried out in a similar manner. The final series was constructed by merging different BB series. We modelled the propagation of errors straightforwardly with Monte Carlo simulations. A potential bias due to the selection of BBs was investigated and the effect was shown to lie within the 1σ interval of the produced series. The exact selection of the reference period was shown to have a rather small effect on our calibration as well.
Results. The final series extends back to 1739 and includes data from 314 observers. This series suggests moderate activity during the 18th and 19th century, which is significantly lower than the high level of solar activity predicted by other recent reconstructions applying linear regressions.
Conclusions. The new series provides a robust reconstruction, based on modern and nonparametric methods, of sunspot group numbers since 1739, and it confirms the existence of the modern grand maximum of solar activity in the second half of the 20th century.
Key words: Sun: activity / sunspots / methods: statistical
Values of the group sunspot number series are only available at the CDS via anonymous ftp to cdsarc.ustrasbg.fr (130.79.128.5) or via http://cdsarc.ustrasbg.fr/vizbin/qcat?J/A+A/602/A69
© ESO, 2017
1. Introduction
Observations of sunspots on the solar disc have been performed regularly since the advent of telescopes in the early 17th century. These measurements constitute the longest ongoing observational programme in astrophysics, providing important insights into solar activity and variability on centennial timescales.
However, these observations have been carried out by different people, with different instruments, at various locations. In some cases observations were taken for a different purpose but were also later used to define sunspot numbers. The definition of a sunspot group might have changed with time, gaps exist within the series of individual observers, and the various series do not necessarily all overlap with each other. Even for the same observer, the quality of the record may vary with time owing to, for example gaining experience, ageing of the observer (e.g. deteriorating eyesight), change of instrumentation, or varying conditions at the observing location. There have been several attempts to harmonize these measurements and to produce a homogeneous composite series. The first effort was made by Rudolf Wolf from Zürich who introduced the Wolf sunspot number (WSN) in 1848 (Wolf 1850, continued and updated as the international sunspot number, ISN), given by the formula (1)where k is a weighting factor to normalize the various observers with each other, S the number of sunspots, and G the number of sunspot groups. It is important that, for the sake of homogeneity, data from only one primary observer were used for each day. If the data from the primary observer were not available for a given day, data from the secondary, tertiary, etc., observer were used, but only one observation was used per day, ignoring all other available data. The original records and notebooks of Wolf are not readily available now, implying that WSN cannot be reconstructed from scratch. This series contains annual values back to 1700, while monthly and daily values go back to 1749 and 1818, respectively. Since 1981 the WSN/ISN series has been synthesized by the Royal Observatory of Belgium (Clette et al. 2007), adapted to include all available observers for each day, rather than only the primary observer. The WSN/ISN series has been recently updated as version 2.0 by correcting for some proposed inhomogeneities (Clette et al. 2014).
More than a century after the work by Wolf, Hoyt & Schatten (1998) introduced the group sunspot number (GSN) series (HoSc98, hereafter), which is based on the number of sunspot groups only, neglects individual spots and includes data from all observers on the same day. The daily GSN is defined as (2)where k_{i} is the individual correction factor of the ith observer, G_{i} is the GSN reported by the ith observer, N is the total number of observers on the given day, and the constant 12.08 was introduced to match the average level of R_{g} to that of R_{s} over the period 1874–1976. The GSN series was designed to be more robust than WSN/ISN since it only considers sunspot groups and reduces uncertainties in the counts of individual sunspots. In addition, the GSN series includes a much greater number of raw data than WSN and is extended further back in time to 1610. An important advantage is that for the GSN series, a complete database of the raw data (published as Hoyt & Schatten 1998, and revised recently by Vaquero et al. 2016) is available, which makes it possible to reconstruct the entire series from scratch.
The homogenization and crosscalibration of the data recorded by earlier observers was always performed through a daisychaining sequence of linear scaling normalization of the various observers, using the k–factors. This means that starting with a reference observer, the k–factors are derived for overlapping observers. The latter data are in turn used as the reference for the next overlapping observers, etc. As is apparent, this leads to error accumulation in time when moving further away from the reference observer.
It has become obvious that the old series need to be revised because of the newfound data and the outdated methodology based on constant k–factors. The issue with such methods is twofold. Firstly, such methods assume that counts by two observers are proportional to each other, which is generally not correct. Secondly, the k–factors are assumed to be constant for the entire operational period of each observer, whereas in reality the acuity of the observers and sensitivity of the instruments may vary with time. A dedicated activity of the research community (Clette et al. 2014) has led to several new sunspot series discussed below.
Cliver & Ling (2016, ClLi16, hereafter) have attempted to revise the GSN series using essentially the same methodology as Hoyt & Schatten (1998). They claim, however, that the earlier part of the Royal Greenwich Observatory (RGO hereafter) data (i.e. 41 yr before 1915) might suffer from uneven quality owing to the purported learning curve process. Therefore, they corrected the GSN values over this period by normalizing them to the data by Wolfer using a second degree polynomial fit. The inhomogeneity of the early RGO data is still a matter of debate, however. Other studies did not find any extensive problem with RGO data: Sarychev & Roshchina (2009), Clette et al. (2014), and Lockwood et al. (2016b) reported as potentially problematic periods before 1880, 1900, and 1877, respectively, while data from Aparicio et al. (2014) and Carrasco et al. (2013) do not exhibit any apparent trend with respect to RGO data after ~1885 and 1890, respectively. Thus, the period of 1874–1915 used by ClLi16 to “recalibrate” the RGO dataset is not well defined. The ClLi16 series covers the period 1841–1980 and yields the highest level of sunspot activity in the mid19th century among all available reconstructions.
Svalgaard & Schatten (2016, SvSc16, hereafter) also used the method of daisychaining k–factors. But these authors introduced five key observers (called “backbones”, BB hereafter) to calibrate each overlapping secondary observer to these BBs. Thus, they seemingly reduced the number of daisychain steps because some daisychain links are moved into the BB compilation rather than being eliminated. The problem with this method is that most of the BB observers did not overlap with each other. Thus their intercalibration was performed via series extended using secondary observers with lower quality and poorer statistics. In the end, this introduces even more daisychain steps, since each BB observer is normalized to the neighbouring observer using a threestep procedure. The SvSc16 series also reduced the number of sunspot groups after 1940 by 7% to take into account the possible effect of the introduction of the Waldmeier classification of sunspot groups (Waldmeier 1939). However Lockwood et al. (2016a,c) have questioned the necessity for such a correction for the GSN. The SvSc16 series covers the period 1610–2015 and suggests a rather high level of solar activity in the 18th and, especially, 17th centuries.
All of these sunspot number series used calibration methods based on the linear scaling regression to derive constant k–factors. However, this linear k–factor method has been demonstrated to be unsuitable for such studies (Lockwood et al. 2016d; Usoskin et al. 2016a,b), leading to errors in the reconstructions that employ them.
An alternative method was proposed by Usoskin et al. (2016b, UEA16, hereafter), who calibrated each observer directly to the reference dataset, avoiding the daisy chain and error accumulation. The method is based on comparison of the active day fraction statistics of an observer with that in the reference dataset (RGO data for the period 1900–1976). The quality of each observer is characterized by the acuity observational threshold so that the observer is assumed to miss all sunspot groups that are smaller than this threshold, and to report all sunspot groups that are larger than this threshold. The acuity threshold for each observer is found by matching their active day fraction statistic with that of an artificially created reference dataset. The UEA16 series covers the period 1749–1995 and yields a moderate level of sunspot activity in the 18th and 19th centuries, lying between the HoSc98 and SvSc16 series.
Another revision of the GSN series was carried out by Lockwood et al. (2014) who corrected it for some apparent inhomogeneities. However, since this study is close to the HoSc98 series, we do not consider it separately here.
Backbones used in this study.
Thus, presently there are a number of sunspot reconstructions using different methods of calibration and yielding results that are inconsistent with each other. The most critical implication of these series is that they yield different longterm trends for the activity of the Sun (Lockwood et al. 2016b; Kopp et al. 2016). Over the 19th and 20th centuries, ClLi16 and SvSc16 show no trend, while HoSc98 and UEA16 show an increase in solar activity.
In an attempt to bridge the methodologies underlying previous studies and present more accurate error estimates, we present here a recalibration of the GSN data using an amendment of the most direct nonparametric calibration method described in Usoskin et al. (2016a). Similarly to SvSc16, we incorporate BB observers. However, the calibration of overlapping observers is performed with a nonlinear nonparametric probability distribution function (PDF) derived from sunspot group counts for days when two observers overlap. This allows us to account for the error propagation in a straightforward manner. Calibration of the different resulting BB series is achieved with daisy chaining.
The data we use are introduced in Sect. 2. The procedure, including information about all individual BB observers and their processing is described in Sect. 3. Our composite series is presented and compared with other existing series in Sect. 4, where we also discuss the stability of our method and potential problems of our series. We summarize our results in Sect. 5.
2. Data
We employ the database^{1} of the sunspot group numbers recorded by individual observers that was recently published by Vaquero et al. (2016) as an update of the HoSc98 database. Observers are uniquely identified by their identification number in the database. Here we use these identification numbers as well.
We apply the following filters to these data:

Data by Wolfer (1880–1928, id 338) were merged with those by Billwiller and Wolfer (1876–1879, id 335). The two series were combined together to a single series, since they do not directly overlap. The two series differ in that the former includes observations solely by Wolfer, while the latter includes observations made by both Wolfer and Billwiller. By merging these two series together, we can increase the length of the Wolfer series and its overlap with observations by Schmidt.

Data from Flaugergues, H., Aubenas (1794–1795, id 22) were also merged with those from Flaugergues, H., Viviers (1788–1830, id 227) using the same procedure. These two datasets were obtained by the same observer, Flaugergues, who performed the bulk of his observations in Viviers, Ardéche, but who relocated to Aubenas for a period of about two years. The dataset from Aubenas contains merely 91 observations for these two years, a period of otherwise sparse observations (we have only nine records from all other observers used here). The overlap of the observations of Flaugergues from Aubenas to other observers is less than three days and does not provide adequate statistics to properly calibrate this series. Considering that the two locations are close to each other in the south of France, we make the assumption that the observing conditions were not significantly different. This enables us to merge the two Flaugergues series. Furthermore, because of the poor overlap with other series, inclusion of these data does not affect the rest of our series.
The HoSc98, ClLi16, SvSc16, and UEA16 series were downloaded from the SILSO^{2} (Royal Observatory of Belgium) website.
3. Calibration process
3.1. Algorithm and primary observers
We have developed an automated algorithm to perform the calibration of sunspot records by individual observers which includes the following steps:

First, we selected primary BB observers who provided long andhighquality observations.

Next, we calibrated the data from all other observers, denoted as secondary observers hereafter, to the primary BB observers using periods of overlapping observations (sufficient overlap is required, see Sect. 3.3), and produced the “BB series”, which are composites of data from the BB observer and all other observers calibrated to him/her.

Individual BB series were crosscalibrated to each other, using the daisychain procedure.

Finally, the composite series of daily GSN was constructed by averaging the calibrated BB series.
The calibration was carried out using a direct nonparametric method to a single reference dataset with a straightforward propagation of errors. No regression was used and the acuity of the observers was assumed constant over their entire observing life. The method is described in detail in Sect. 3.2
The selected sequence of the primary BB observers is Kanzelhöhe, RGO, Wolfer, Schmidt, Schwabe, Flaugergues, and Horrebow (see Table 1). The BB observers were selected to be those with sufficiently long observational records of high quality. We also used Schubert, Zucconi, and Hagen as standalone BBs. Because of the lacking bridge in the data in the middle of the 18th century, we were unable to directly calibrate these three observers to a single observer acting as a BB. Thus we did this by the extended statistics of the calibrated BB series. These observers are important since they cover periods over the 18th century when no other data are available. Our reference observer is RGO (but restricted to the period between 1900–1976) and all other BB series were calibrated to the level of RGO.
All data from RGO prior to 1900 were ignored when considering the primary BB observer because of the disputed inhomogeneity, as discussed in the Introduction. We discuss the effect of this decision on our calibration in Sect. 4.2.3.
Fig. 1 Temporal coverage by the BBs used here. Solid black lines represent the primary BB observers, while grey lines depict the extension of the BBs using calibrated secondary observers. 
3.2. Secondary observers
Each BB series was also filled with all available secondary observers calibrated to the primary BB observers. As secondary observers we selected all the observers that have at least one nominal year of overlap with the primary BB observer. To avoid a distortion of statistics, each observer was included only in one BB. The assignment of observers to the BBs was made based on the length of the overlapping period and by trying to match observers with comparable quality BB observers. The only two successive BB observers whose observations do not overlap in time are Horrebow and Flaugergues. The bridging was made using Staudacher data. In this case, we chose Horrebow as the BB over Staudacher, because he observed more frequently and the data are of higher quality. Unfortunately, we were not able to go further back in time than Hagen (1739), because of the very sparse observations over this period with no observer making observations both before and after 1739 with adequate data to perform the calibration. Table 1 and Fig. 1 provide key information about the BB observers and series.
All the observers we used for various BBs are listed in Tables A.1 through A.7. Figure 2 shows the number of days within each year covered by (a) the different BB series (i.e. including both primary and secondary observers) and by (b) our final composite series. One can see that the coverage is very good after ca 1800, but very poor in 1780–1795. This poorly covered period has led to large uncertainties in the daisychain method in the 18th century.
Fig. 2 Annual coverage (number of observational days per year) by the different BB series (coloured curves in panel a)) and by our final composite series (panel b)). 
3.3. Construction of the backbone series
We started by building a direct calibration matrix (cf. Usoskin et al. 2016a) between the secondary observer to be calibrated and the primary BB observer for the days when both have observations. If, on a given day, N_{1} and N_{2} groups were recorded by the primary and secondary observers, respectively, then unity was added to the row N_{1} and column N_{2} of the matrix. In this way, the matrix was filled with all the overlapping days. Then the matrix was normalized such that each of its values were divided by the total sum over the corresponding column. Thus, we obtained a matrix of probability density functions (PDF) to find a value of G^{∗} reported by the primary observer for each day with the given value G reported by the secondary observer. This allows a direct calibration of the secondary observer to the primary observer by replacing the G value with the PDF of G^{∗}. This is the most straightforward method for calibration applied directly to the data.
However, this matrix can potentially have some gaps due to poor statistics and limited range of overlap between the observers. In such cases, we fill the gaps by fitting the statistically significant part of the matrix with a function (3)where ⟨ G^{∗} ⟩ are the mean counts of the primary observer (i.e., the mean of the PDF of each column of the matrix) for a given count of the secondary observer G, R_{0}, B, and a are constants calculated for each pair of observers individually. We used the weighted least mean squares to find the bestfit parameters. This functional shape (asymptotic exponential approach to a constant offset in the difference) was proposed by Usoskin et al. (2016a) and found suitable for this kind of dependence, using synthetic data that were based on RGO sunspot group area data.
Only those columns of the matrix that contain more that 20 overlapping days were included into the fitting procedure. If the fit deviated by more than one group from the actual mean ⟨ G^{∗} ⟩, such columns were excluded, and the fit was redone. In such cases we refilled the column matrices using a PDF derived with a bootstrap Monte Carlo (MC, hereafter) simulation. For this, we randomly selected half of the overlapping days from the two observers, reconstructed the matrix using this halfstatistics and recalculated the fit for the matrix. This process was repeated 1000 times. The result of this simulation was used as a PDF for the corresponding column in the matrix.
An example of the matrix is shown in Fig. 3a for Winkler (secondary observer, G) and RGO (primary reference observer, G^{∗}) over the period of their overlap (1900–1910 with 2480 common days). It is apparent that RGO typically reported more groups than Winkler for the same day, since most of the matrix values lie above the line expected for a perfect match between the two (black line). The matrix of the difference, G^{∗}−G versus G, is shown in Fig. 3b. The red circles with error bars represent the mean ⟨ G^{∗} ⟩ value in each G column and its (asymmetric) 1σ intervals. The green curve shows the best fit of the functional form of Eq. (3). It is obvious that the relation between G^{∗} and G is nonlinear and cannot be represented by a simple linear scaling k–factor. One can see that, because of the limited overlap, the matrix is well constructed only for G < 9. For higher values, the fit (Eq. (3)) has to be used. The full matrix with the values filled with the MC method for G > 8 is shown in Fig. 3c.
Fig. 3 Example of the construction of the calibration matrix for Winkler (secondary observer, G) to RGO (primary, G^{∗}) over 1900–1910. Panel a) shows the original distribution matrix G^{∗} vs. G: the black line has a slope of unity. Panel b) shows the difference, G^{∗}−G vs. G. Panel c) is the same as b) but the empty columns for G^{∗}> 8 have been filled with the results of the MC simulation. The red circles with error bars depict the mean G^{∗} values for each G column and their 1σ uncertainty. The yellow line shows the k–factor used in Hoyt & Schatten (1998). 
Each secondary observer was calibrated to the BB observer by replacing, from the matrix, every daily count G with the PDF of the calibrated counts G^{∗}. In this way we directly convert the observations of the secondary observer to the BB condition without making any assumption about the type of relationship (e.g. linearity) and with a straightforward error estimate.
For each BB we constructed a composite series by averaging all the PDFs of all the available observations for every day, so that again, instead of one count for each day, we get a distribution based on all available observers. This composite of averaged PDFs includes possible errors in a straightforward way.
Only observers with a sufficiently long record of relatively good quality were included into the analysis. The selection of secondary observers was made using the following criteria:

1.
The overlap with the primary BB observer should be not lessthan 20 common days of observations. This criterion was notapplied for early years (see Sect. 3.3.1).

2.
Observers with an overall record longer than 10 yr were considered only if their overlap with the primary BB observer was at least 4 yr. This is merely to make sure that longrunning observers are not calibrated with a small fraction of their observations that might not be representative.

3.
In cases in which we need to perform the fit to extrapolate to missing values in the matrix, we requested the conversion matrix for a selected observer to have sufficient data to cover at least three Gvalue bins. This is necessary since the function described by Eq. (3) has three parameters.

4.
The matrix should cover, with sufficient statistics, at least onequarter of the range of counts reported by the secondary observer.

5.
Observers were excluded from the analysis if the difference matrix (see an example in Fig. 3b) had an average offset of more than two groups for the G values from 0 to 5.

6.
Observers, whose data could not be fitted accurately enough (χ^{2} per degree of freedom <6), were also excluded.
After the calibration process of all observers, we compared each individual observer with the composite BB series they were part of. We excluded those that showed significant and systematic discrepancies. Four observers were removed as they showed such differences, namely Taipei observatory (Id 456), Lunping (Id 457), Mojica, Cochabamba, Bolivia (Id 628), and XE (Id 715). We also excluded the Locarno station (Id 614), because of the possible lack of stability after 1980 (Clette et al. 2016).
There are also some special cases, which are described below in detail.
3.3.1. Sparse data: Schwabe and earlier backbones
Because of the lack of data for the first years of the Schwabe BB, we have not applied the criterion 1 from the list above to his data. Furthermore, while constructing the calibration matrix we considered observations not only during overlapping days but also within ±1 day; if there was no direct overlap, we first checked one day earlier and then one day later, making sure that no more than one pair entered the matrix. Possible errors due to shortlived groups are negligible compared to the gain of the increased statistical sample (Willis et al. 2016; Usoskin et al. 2016b). These relieved constraints were also applied to the BBs covering earlier periods, when the statistics were poor.
3.3.2. Correcting for low quality observations: Flaugergues, Schubert, Zucconi, and Hagen backbones
For most BBs, we were able to match observers with a relatively similar quality. This was not the case for Flaugergues, though. Flaugergues’ data are very important, because they are the only record covering a relatively extended period in the early 1800s. However, the G values he reported are significantly lower than those by other observers during that period, implying that his observations are of lower quality (higher acuity observational threshold). Therefore, a calibration of all other observers, with higher quality data, directly to Flaugergues would reduce their quality while increasing the uncertainties. In order to avoid that, we made use of a corrected Flaugergues series, calibrated to the mean level of the other observers of the period. In order to make the correction, we assumed that the acuity threshold for Flaugergues is A = 100 msd, which is greater than for any other observer (Usoskin et al. 2016b). In this case the acuity threshold for Flaugergues does not even have to be the correct one, but it only should allow us to calibrate the overlapping observers without downgrading their quality. Applying the 100 msd threshold and the method described in Usoskin et al. (2016a), we obtained the following parameters for Eq. (3) for Flaugergues: a = 0.18, R = 6.94, and B = 6.03. Then other observers were calibrated to this “corrected” Flaugergues series.
The same process with the same threshold was used for the Schubert, Zucconi, and Hagen BBs.
3.4. Intercalibration of backbone series
Once the BB series were constructed and calibrated to the primary BB observer, different BB series had to be intercalibrated to each other. We used the RGO BB as the reference one, and the others were calibrated to it using a daisy chain. The calibration of the BB series was performed using a procedure similar to that for the individual observers, by constructing the crosscalibration matrix between the whole BB series this time. However in this case, we have, for each day, not a single G value but a PDF from each observer (now the entire composite BB series is considered an observer). In order to account for that, we constructed the calibration matrix using a MC simulation as described below. For each day with simultaneous observations from both “observers” (the BB series), we randomly selected G values corresponding to the PDFs and filled the matrix. This process was performed 1000 times for each day, and the final matrix was computed as the average among all the individual matrices.
Monte Carlo simulations were used to calibrate the secondary BB to the reference one accounting for the error propagation. We randomly picked a G value from the PDF for each day of the secondary BB series and obtained, from the matrix, the PDF of the G^{∗} values for the reference BB. This was repeated 1000 times and the average PDF of the G^{∗} values was considered as the calibrated PDF of the secondary BB series for that day.
The procedure is illustrated in Fig. 4, which shows the result of the calibration of the secondary Wolfer BB series to the primary RGO BB series. It is evident from the panel a) that the RGO BB G values are systematically higher than those of the Wolfer BB (the difference is positive), implying that RGO is a better observer than Wolfer. After the calibration (panel b), the two series match each other so that the mean difference is consistent with zero in the entire range of G values implying that the calibration was carried out correctly.
Fig. 4 Difference between the Wolfer and RGO backbones. Panel a) shows an uncalibrated matrix after the full MC filling; panel b) shows the same matrix after the calibration. The red circles depict the average values in every column with their 1σ uncertainty ranges. 
This procedure works well for all the BBs. However, the results for the Horrebow BB series are very uncertain. The overlap of this series with the Flaugergues BB series is short and occurs only during activity minima around 1775 and 1795, which gives merely four points (G values) to perform the fit and to extrapolate to the rest of the range of values. Since the method gives a realistic estimate of the uncertainties, this is clearly expressed in large error bars for the 18th century.
3.5. Construction of the final series
After all the BBs were calibrated to the reference RGO series, the final composite series was produced. First, for each day, all the available BB series values (in the form of a PDF) were merged into a single PDF for that day. From the daily PDFs of the calibrated G values we produced the monthly G values using a MC simulation. For this, for each day with available data within a month, we randomly selected a G value from the final daily PDF and then computed the monthly value as the arithmetic mean of these daily values. This procedure was repeated 1000 times, and the PDF of the monthly values was constructed for each month. This MC method considers all the uncertainties straightforwardly. Finally, we collected the mean and asymmetric ±1σ uncertainty level (a table is available at the CDS).
Next, the annual numbers of sunspot groups with their asymmetric ±1σ uncertainties were calculated from the monthly values in the same manner as monthly values from the daily values. The final annual series is given in Table B.1 and shown in Fig. 5. The GSN in years without reliable values are denoted by − 99.
Fig. 5 Annually averaged number of sunspot groups. This work is indicated in black with the ±1σ area shaded; HoSc98 is indicated in yellow; UEA16 is shown in blue; SvSc16 is shown in green; and ClLi16 is indicated in red. Numbers on top of the curves denote the conventional solar cycle numbering. 
Fig. 6 Differences of the annual GSN between our series and other series (as denoted in the legend). Positive values imply that our series is higher. The grey shading denotes the ±1σ range of our series. The numbers denote the conventional solar cycle numbering. 
4. Validation of the results
4.1. Comparison with other series
Other published GSN series are also shown in Fig. 5, but without the uncertainties. While all the series are dominated by the 11yr solar cycle, the centennial variability differs among different reconstructions. The ClLi16 and SvSc16 series are systematically higher than our reconstruction in the 19th and 18th centuries, while the HoSc98 series is somewhat lower. The present result is close to UEA16 and lies between the “high” and “low” models.
Figures 6 and 7 show the difference between various other series and the result presented here.
One can see that all the series agree with each other in the 20th century, except the SvSc16 series which is systematically lower than all others, although still within the error bars.
The UEA16 series is very close to our series during cycle maxima, while there are noticeable differences around the minima. The two series diverge for cycles 2 (our series is lower than UEA16), 8–9 (ours is higher), and 21–22 (ours is lower). The differences in cycles 22–23 can be explained by different observers used: while UEA16 used only RGO and Koyama over that period, we used here more than 150 observers, which allows us to estimate the activity more accurately.
During the solar cycle minima our series agrees with SvSc16, but there are distinct differences during the maxima. The SvSc16 series gives higher values over the cycles 1–5 and 8–11, while lower values are found for almost all cycles over the 20th century. These differences can be at least partly explained by the −7% ad hoc adjustment applied by SvSc16 to the data after 1940 and by the choice of Koyama as the reference observer (see also a discussion about this in Sect. 4.2).
Over the 20th century, the ClLi16 series is essentially the same as that of HoSc98, but they deviate over the 19th century so that maxima in the ClLi16 series are 3–4 groups higher than in HoSc98, and hence also than in ours. Keeping in mind that we ignored the RGO data before 1900 and used Wolfer as the reference for that period, the higher values by ClLi16 suggest a possible overcorrection of the RGO series by these authors. This is in agreement with the findings of Lockwood et al. (2016b).
In Fig. 8 we show the secular trends of different series considered here, using the nonparametric SSA (singular spectrum analysis, Vautard et al. 1992). The SSA method is based on decomposition of a time series into several components with distinct temporal behaviours. It is very convenient for the identification of longterm trends and quasiperiodic oscillations, especially in the conditions when the secular trend is subdominant with respect to the main periodicity. As the secular trend we consider the first SSA components of the SN series. We used the time window for the SSA in the range of 80–100 yr, where the result is stable. All series show that the activity level was highest in the late 20th century, corresponding to the modern grand maximum, but the relative enhancement differs among series. The greatest increase over the last 200 yr (defined as the ratio of the values in 2000 and in 1750) is observed for the HoSc98 series (≈2.6), followed by the UEA (1.9) and our final series (1.7). Finally, SvSc16 series yields 1.3. Thus, the modern grand maximum is observed in all series. According to this work, this grand maximum is weaker than that in the HoSc16 series but greater than in the SvSc16 series.
Fig. 7 Differences of the solar cycle averaged GSN between our series and other series (as denoted in the legend). Positive values imply that our series is higher. The grey shading denotes the ±1σ range of our series. 
Fig. 8 Longterm secular trend in different SN series, studied here, defined as the first SSA component. The shading represents only statistical uncertainties of the SSA method. 
4.2. Tests of stability
4.2.1. Choice of backbone observers
Fig. 9 Matrices of the G value difference between Wolf and Schmidt, where Schmidt (panel a)) and Wolf (panel b)) are selected as reference observers. 
As primary BB observers, we selected those with sufficiently long observational periods of the best quality for each epoch. This is illustrated in Fig. 9, which shows the difference matrices for Wolf and Schmidt for two cases: Schmidt is considered as the primary observer and Wolf as the secondary (panel a) and vice versa (panel b). It is apparent that Schmidt was a better quality observer and is more appropriate to be chosen as the primary BB observer. By choosing Wolf as the BB observer, we would need to degrade Schmidt and other observers.
Fig. 10 Difference between the main reconstructed series and all 50 auxiliary series produced with different backbone combinations. Annual values are shown. Grey shaded area indicates the ±1σ uncertainties of the main series. 
To test whether our final series is robust against the choice of the primary BB observers, we repeated the same analysis for different BB combinations. We used all possible combinations of highquality longlasting observers over four different intervals: (1) RGO (1900–1976), Koyama (1947–1984), Mt Wilson (1923–1958); (2) Wolfer (1880–1928), Quimby (1889–1921); (3) Schmidt (1841–1883), Spoerer (1861–1893), Weber (1859–1883), Wolf (1848–1893); (4) Schwabe (1826–1867), and Stark (1813–1836). This led to 48 alternative reconstruction series. Additionally, we constructed two more series by replacing Kanzelhöhe (1957–2010) with Cragg (1947–2009) and Locarno (1958–2010) and keeping all the other BBs as in the main series. Thus the total number of various GSN reconstructions was 50. We also included Flaugergues and Horrebow BBs in all series, but excluded the standalone BBs. The reference observer was chosen between RGO, Koyama, and Mt Wilson. Locarno has been excluded from all composites and our main series, however, we include it here as a BB to evaluate its effects on the calibration. We note that Quimby, as an individual observer, has overlap only with RGO, Wolf and Spoerer, while Stark has no overlap with any other BB observer used here. Thus, many of these auxiliary series result from disconnected BBs and are sometimes based on poor statistics. They can be used to assess uncertainties related to the BB selection, but as individual series, they are much less reliable than our main composite series. In this process, we did not exclude any other observers except those automatically rejected by the code (Sect. 3.3). The selection of observers within the BBs was performed automatically and may, of course, differ from those listed in Tables A.1–A.7.
Figures 10 and 11 show the differences between our main series and the different auxiliary series, described above. The difference is mostly within the ± 1σ interval. Moreover, if the three main BB observers, i.e. RGO, Wolfer, and Schwabe, are fixed, the differences among the reconstructed series are quite small (Fig. 11) and, thus, the choice of other BBs is not important. Using Koyama as the BB observer instead of RGO leads to systematically lower counts of sunspot groups (see blue curve in Fig. 10), but these counts are still within the 1σ error bars.
Thus, we can conclude that the method is stable regarding the exact choice of the BB observers with the potential uncertainty lying within the formal error bars.
Fig. 11 Difference between the main reconstructed series and the auxiliary series produced with different backbone combinations that include RGO, Wolfer, and Schwabe. Grey shaded area represents the ±1σ uncertainties of the main series. 
4.2.2. Shape of the matrix
The majority of the calibration matrices constructed for individual observers have a shape (see Fig. 3) similar to that expected from synthetic data with an artificial acuity threshold applied (Usoskin et al. 2016a). This implies that the quality of an observer can be adequately quantified by his/her acuity observational threshold. However, distorted behaviour was found for some observers during periods of high solar activity, so that an observer, who is “poor” (counting less groups than the reference observer) during periods of low and moderate activity, may appear to report more groups during solar activity maxima as if he/she were a better observer than the reference observer. This is caused by the low statistics and such columns in the matrix were replaced by the fit (Sect. 3.3). In the case in which this behaviour occurred over an extended region of the matrix, the observers were rejected by the code.
Fig. 12 Differences between the main annual reconstructed G series and those based on the reference RGO dataset for 1874–1976 and for 1916–1976 (blue and red, respectively). The grey shaded area depicts the ±1σ uncertainties of the main series. 
4.2.3. Quality of the RGO dataset
We also tested how crucial the choice of the exact reference period of the RGO dataset is. We repeated the same analysis, but considering the RGO dataset to start in 1874 and in 1916. Since a change of the reference period affects the statistics used for the calibration, allocation of some individual observers to specific BBs was automatically changed and was different than in Tables A.1 through A.7. Figure 12 shows the differences between the main series proposed here and these two alternative series. The result within the Kanzelhöhe BB is not affected at all, and for the rest of the BBs the difference is significantly smaller than the error bars, which are on average 0.14 and 0.10 for the annual values using RGO data for the periods of 1874–1976 and 1916–1976, respectively. At the same time, the use of the reference period shortened to after 1916 significantly decreases statistics, ignoring 42 yr of RGO data. Thus, we conclude that the present reconstruction is also robust against the choice of the reference period of the RGO dataset.
4.2.4. Other issues
Our method may suffer from an intrinsic problem related to a possible overestimate of G for periods of low activity. If a secondary “poor” observer reports no spots, the method corrects it to a finite nonzero value of G^{∗} (see e.g. Fig. 3). This is different from the linear k–factor method (e.g. SvSc16), in which zero values of a lowquality observer are always translated to zero values of the highquality reference observer.
We explicitly assume, similar to all other SN reconstructions, that the observational record of any observer is error free in the sense that they report exactly the number of sunspot groups that should be visible to them on the Sun on a given day (cf. Spearman 1904; Dudok de Wit et al. 2016). If this assumption were violated (e.g. weather or health conditions may temporarily reduce the acuity of the observer), the method would tend to slightly underestimate the reconstructed values at high activity levels, while overestimating the values at activity minima. However, at present there is no way to assess these kinds of errors and we have to rely on this assumption. We note that this also affects all other methods, including the linear k–factor.
We also assume (as is done in all other reconstructions) that the observational quality of an observer is constant in time. On the other hand, if it changed over time, especially outside the calibration period, it may introduce some additional uncertainties in the final result. However, in this work we cannot account for that and have to make the assumption on the constancy of the quality of the observer, as done by all the other reconstructions as well.
5. Summary and conclusions
We present a new reconstruction of the number of sunspot groups since 1739, along with realistic uncertainties, with daily, monthly, and annual time resolutions. The reconstruction is based on the daisychain normalization of individual observers via socalled “backbones” built up on the records of the key observers of different epochs. In contrast to most of the previous works, based on a simple linear k–factor scaling (e.g. Hoyt & Schatten 1998; Clette et al. 2014; Svalgaard & Schatten 2016), our reconstruction employs a direct nonparametric calibration of observers by linking the values during days of simultaneous observations (Usoskin et al. 2016a). This method is based on the assumption that the quality of the data of the various observers is maintained throughout their observing period, which may not be well validated (Lockwood et al. 2016b). This will be studied elsewhere. We also assume, as all other methods do, that daily records of each observer are error free. A further assumption is that the main differences between the observers is due to their different observing capabilities. This assumption is used merely to extrapolate for the values that are missing from the overlapping period. Thus this method works with a minimum number of assumptions and allows for a direct comparison of two observers with different observational skills. Uncertainties of the reconstruction were assessed using a Monte Carlo method applied to the derived PDFs. This approach accounts naturally for the error propagation without making additional assumptions (e.g. about the normality and independence of errors). In other words, we present a highly advanced daisychain reconstruction of GSN based on the most direct calibration of observers.
We tested the sensitivity of the method to the choice of the BB observers and of the reference period. We found that the reconstruction was robust and the result remained within the provided uncertainties.
The new series has been compared with other published GSN reconstructions, i.e. HoSc98, ClLi16, SvSc16, and UEA16. The new series lies close to UEA16, but is slightly higher than that in the 18th century. In contrast, it is systematically lower than ClLi16 in the 19th century and lower than SvSc16 in the 18th century. The latter two series are based on the k–factor scaling, which is shown to overestimate solar activity during solar cycle maxima (Lockwood et al. 2016d; Usoskin et al. 2016a,b). The new series confirms the existence of the modern grand maximum of activity in the second half of the 20th century, when sunspot cycles were significantly higher than during the 19th and 18th centuries.
The new GSN series provides a robust reconstruction of solar activity (the number of sunspot groups) with a realistic estimate of uncertainties and forms a basis for further investigation of centennial variability of solar activity over the last 270 yr.
Available at http://haso.unex.es/?q=content/data
Acknowledgments
We thank the anonymous referee for useful comments. T.C. acknowledges a postgraduate fellowship of the International Max Planck Research School on Physical Processes in the Solar System and Beyond. This work was performed in the framework of the ReSoLVE Center of Excellence (the Academy of Finland, project No. 272157). This work was partly supported by the BK21 plus programme through the National Research Foundation (NRF) funded by the Ministry of Education of Korea.
References
 Aparicio, A. J. P., Vaquero, J. M., Carrasco, V. M. S., & Gallego, M. C. 2014, Sol. Phys., 289, 4335 [NASA ADS] [CrossRef] [Google Scholar]
 Carrasco, V. M. S., Vaquero, J. M., Gallego, M. C., & Trigo, R. M. 2013, New Astron., 25, 95 [NASA ADS] [CrossRef] [Google Scholar]
 Clette, F., Berghmans, D., Vanlommel, P., et al. 2007, Adv. Space Res., 40, 919 [NASA ADS] [CrossRef] [Google Scholar]
 Clette, F., Svalgaard, L., Vaquero, J. M., & Cliver, E. W. 2014, Space Sci. Rev., 186, 35 [NASA ADS] [CrossRef] [Google Scholar]
 Clette, F., Lefèvre, L., Cagnotti, M., Cortesi, S., & Bulling, A. 2016, Sol. Phys., 291, 2733 [Google Scholar]
 Cliver, E. W., & Ling, A. G. 2016, Sol. Phys., 291, 2763 [NASA ADS] [CrossRef] [Google Scholar]
 Clette, F., Lefèvre, L., Cagnotti, M., Cortesi, S., & Bulling, A. 2016, Sol. Phys., 291, 2733 [Google Scholar]
 Dudok de Wit, T., Lefèvre, L., & Clette, F. 2016, Sol. Phys., 291, 2709 [NASA ADS] [CrossRef] [Google Scholar]
 Hoyt, D. V., & Schatten, K. H. 1998, Sol. Phys., 179, 189 [NASA ADS] [CrossRef] [Google Scholar]
 Kopp, G., Krivova, N., Wu, C. J., & Lean, J. 2016, Sol. Phys., 291, 2951 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Lockwood, M., Owens, M. J., & Barnard, L. 2014, J. Geophys. Res. (Space Phys.), 119, 5172 [Google Scholar]
 Lockwood, M., Owens, M. J., & Barnard, L. 2016a, Sol. Phys., 291, 2843 [NASA ADS] [CrossRef] [Google Scholar]
 Lockwood, M., Owens, M. J., Barnard, L., & Usoskin, I. G. 2016b, ApJ, 824, 54 [Google Scholar]
 Lockwood, M., Owens, M. J., Barnard, L., & Usoskin, I. G. 2016c, Sol. Phys., 291, 2829 [Google Scholar]
 Lockwood, M., Scott, C. J., Owens, M. J., Barnard, L., & Willis, D. M. 2016d, Sol. Phys., 291, 2785 [NASA ADS] [CrossRef] [Google Scholar]
 Sarychev, A. P., & Roshchina, E. M. 2009, Solar System Research, 43, 151 [NASA ADS] [CrossRef] [Google Scholar]
 Spearman, C. 1904, American Journal of Psychology, 15, 88 [Google Scholar]
 Svalgaard, L., & Schatten, K. H. 2016, Sol. Phys., 291, 2653 [Google Scholar]
 Usoskin, I. G., Kovaltsov, G. A., & Chatzistergos, T. 2016a, Sol. Phys., 291, 3793 [NASA ADS] [CrossRef] [Google Scholar]
 Usoskin, I. G., Kovaltsov, G. A., Lockwood, M., et al. 2016b, Sol. Phys., 291, 2685 [Google Scholar]
 Vaquero, J. M., Svalgaard, L., Carrasco, V. M. S., et al. 2016, Sol. Phys., 291, 3061 [Google Scholar]
 Vautard, R., Yiou, P., & Ghil, M. 1992, Physica D Nonlinear Phenomena, 58, 95 [NASA ADS] [CrossRef] [Google Scholar]
 Waldmeier, M. 1939, Astronomische Mitteilungen der Eidgenössischen Sternwarte Zurich, 14, 439 [NASA ADS] [Google Scholar]
 Willis, D. M., Wild, M. N., & Warburton, J. S. 2016, Sol. Phys., 291, 2519 [NASA ADS] [CrossRef] [Google Scholar]
 Wolf, R. 1850, Astronomische Mitteilungen der Eidgenössischen Sternwarte Zurich, 1, 3 [NASA ADS] [Google Scholar]
Appendix A: List of observers
In this section we list all observers that were used in each BB series. The tables contain information on the Id of the observer in the Vaquero et al. (2016) database, the name of the observer, the first year of observations employed here, the last year of observations employed here, the number of daily observations N_{d} used, and the number of overlap days of observations with the BB observer M_{d} (for Schwabe, Flaugergues, and Horrebow BBs, the values for ±1 days are also given). The BB observer is listed first and the others are sorted based on their Id.
List of observers used for the RGO backbone.
Appendix B: Additional table
Annual values of the proposed GSN series with the asymmetric 1σ intervals.
All Tables
All Figures
Fig. 1 Temporal coverage by the BBs used here. Solid black lines represent the primary BB observers, while grey lines depict the extension of the BBs using calibrated secondary observers. 

In the text 
Fig. 2 Annual coverage (number of observational days per year) by the different BB series (coloured curves in panel a)) and by our final composite series (panel b)). 

In the text 
Fig. 3 Example of the construction of the calibration matrix for Winkler (secondary observer, G) to RGO (primary, G^{∗}) over 1900–1910. Panel a) shows the original distribution matrix G^{∗} vs. G: the black line has a slope of unity. Panel b) shows the difference, G^{∗}−G vs. G. Panel c) is the same as b) but the empty columns for G^{∗}> 8 have been filled with the results of the MC simulation. The red circles with error bars depict the mean G^{∗} values for each G column and their 1σ uncertainty. The yellow line shows the k–factor used in Hoyt & Schatten (1998). 

In the text 
Fig. 4 Difference between the Wolfer and RGO backbones. Panel a) shows an uncalibrated matrix after the full MC filling; panel b) shows the same matrix after the calibration. The red circles depict the average values in every column with their 1σ uncertainty ranges. 

In the text 
Fig. 5 Annually averaged number of sunspot groups. This work is indicated in black with the ±1σ area shaded; HoSc98 is indicated in yellow; UEA16 is shown in blue; SvSc16 is shown in green; and ClLi16 is indicated in red. Numbers on top of the curves denote the conventional solar cycle numbering. 

In the text 
Fig. 6 Differences of the annual GSN between our series and other series (as denoted in the legend). Positive values imply that our series is higher. The grey shading denotes the ±1σ range of our series. The numbers denote the conventional solar cycle numbering. 

In the text 
Fig. 7 Differences of the solar cycle averaged GSN between our series and other series (as denoted in the legend). Positive values imply that our series is higher. The grey shading denotes the ±1σ range of our series. 

In the text 
Fig. 8 Longterm secular trend in different SN series, studied here, defined as the first SSA component. The shading represents only statistical uncertainties of the SSA method. 

In the text 
Fig. 9 Matrices of the G value difference between Wolf and Schmidt, where Schmidt (panel a)) and Wolf (panel b)) are selected as reference observers. 

In the text 
Fig. 10 Difference between the main reconstructed series and all 50 auxiliary series produced with different backbone combinations. Annual values are shown. Grey shaded area indicates the ±1σ uncertainties of the main series. 

In the text 
Fig. 11 Difference between the main reconstructed series and the auxiliary series produced with different backbone combinations that include RGO, Wolfer, and Schwabe. Grey shaded area represents the ±1σ uncertainties of the main series. 

In the text 
Fig. 12 Differences between the main annual reconstructed G series and those based on the reference RGO dataset for 1874–1976 and for 1916–1976 (blue and red, respectively). The grey shaded area depicts the ±1σ uncertainties of the main series. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.