Scrutinising the relationship between plage areas and sunspot areas and numbers

Context. Studies and reconstructions of past solar activity require data on all magnetic regions on the surface of the Sun (i


Introduction
Sunspots and faculae are the most prominent manifestations of solar surface magnetism (Solanki et al. 2006). Sunspots are relatively dark and cool areas on the surface of the Sun, whereas faculae are comparatively small and patchy bright regions usually seen in the vicinity of sunspots and in remnants of active magnetic regions. Faculae were originally discovered in full-disc white light images, where they are seen mostly near the limb. The co-spatial chromospheric features observed in the Ca ii K spectral range, called plage, are bright and easily observable over the whole Send oprint requests to: Theodosios Chatzistergos e-mail: chatzistergos@mps.mpg.de solar disc (Solanki 1993). The connection of facular regions to strong magnetic elds was rst pointed out by Babcock & Babcock (1955). Being a manifestation of the same physical process, the concentration of magnetic eld at the solar surface, sunspots and faculae are closely associated (Solanki 2003). However, their evolution diers. While individual facular elements generally do not live as long as sunspots, within a given active region facular regions can last much longer than sunspots, partly because sunspot decay products form new facular elements (e.g. Wang et al. 2012).
Sunspots have been observed on the solar disc since antiquity (Vaquero & Vázquez 2009;Arlt & Vaquero 2020). However, regular monitoring of sunspots began only with Article number, page 1 of 22 Article published by EDP Sciences, to be cited as https://doi.org/10.1051/0004-6361/202244913 Table 1. Previous studies of the relationship between sunspot and facular measurements.

Study
Plage data Sunspot data Period Relation Annual Monthly Daily Ca ii K plage areas and sunspot number series Dorotovi£ et al. (2010) Co ISNv1 19962006 quadratic a Foukal (1996) MW ISNv1 19151984 linear  MW b ISNv1 19441954 quadratic Kuriyan et al. (1982) Ko ISNv1 Yeo et al. (2020) SP d CEA17, HoSc98 19762017 power law e Ca ii K plage areas and sunspot area series Foukal & Vernazza (1979) SGD SGD 19691974 (∼0.5685) f Schatten et al. (1985) SGD SGD 19691982 (∼1840) Lawrence (1987b) SGD SGD 19741985 (∼530) Foukal (1993) SGD RGO 19541987 linear Foukal (1996) MW RGO 19151984 linear Foukal (1998 Chatzistergos et al. (2020b) Ky k BEA09 j 19281969 (∼1320) Magnetogram facular areas and sunspot area/number series Solanki & Unruh (2013) KP l KP areas 19742002 quadratic Shapiro et al. (2014) KP l KP areas 19742002 quadratic Borgniet et al. (2015) MDI m SOON n areas 19962007 (∼10160) Criscuoli (2016) MDI m ISNv2 19962011 linear quadratic Notes. Columns are: Bibliography entry; Type of plage and sunspot data used; Period covered by the analysed data; Form of the derived relationship (considering sunspot data as a function of plage areas) for annual, monthly, and daily data (values in parentheses are reported values for the ratio of plage to sunspot area). The column corresponding to monthly values is for periods intermediate to annual and daily. Only Tlatov et al. (2009), Bertello et al. (2016, and Singh et al. (2021) used monthly values; Chapman et al. (1997Chapman et al. ( , 2001 and Chapman et al. (2011) used 100-day bins, Shapiro et al. (2014) 58-day bins, and Criscuoli (2016) and Priyal et al. (2017) 6-month averages. See Tables 2 and 3, and the footnotes in this table for information on the abbreviations. (a) Not explicitly stated by the authors, but can be inferred from Fig. 6A in Dorotovi£ et al. (2010). (b) The MW plage area series used in this study was produced by Foukal (1996). (c) The Ko plage area series used in this study was produced by Tlatov et al. (2009). (d) This is the Bertello et al. (2016) disc-integrated 1Å Ca ii K composite index with SP and Synoptic Optical Long-term Investigations of the Sun Integrated Sunlight Spectrometer data. The SP data are not the full-disc spectroheliograms used in our study. (e) This is a convolution of a power-law relation with a nite impulse response lter. (f) This was roughly estimated from Fig. 4 in Foukal & Vernazza (1979), who argued that there is large scatter in the data rendering sunspot areas poor indicators of plage areas. (g) The Ko plage area series used in this study was produced by Chatterjee et al. (2016). (   Notes. (a) The series is available at http://www2.mps.mpg. de/projects/sun-climate/data.html (b) Available at https: //wwwbis.sidc.be/silso/datafiles the advent of the telescope in the early 17th century. Later, Wolf (1850) started compiling sunspot measurements by various observers, and created the rst sunspot number series, the Wolf number, later renamed the international sunspot number series (ISNv1), which continue to be produced to this day (Clette et al. 2007). This series goes back to 1818 with daily cadence, while annual values are available back to 1700. More recently, Hoyt & Schatten (1998) introduced a dierent measure of solar activity based on sunspot counts, namely the number of sunspot groups or group sunspot number (GSN). This enabled the early sunspot observations performed between 1609 and 1700 to be exploited as well. In the last few years, the database of sunspot data has been scrutinised and updated to include more data (e.g. Vaquero et al. 2016;Carrasco et al. 2018Carrasco et al. , 2020Carrasco et al. , 2021Hayakawa et al. 2020aHayakawa et al. ,b, 2021Vokhmyanin et al. 2020). The method of how to include individual sunspot series has also been a matter of debate (Lockwood et al. 2016;Usoskin et al. 2016a). This led to the release of a new version of the international sunspot number, ISNv2 (Clette & Lefèvre 2016), and a number of alternative GSN series (e.g. Lockwood et al. 2014;Cliver & Ling 2016;Svalgaard & Schatten 2016;Usoskin et al. 2016bUsoskin et al. , 2021Chatzistergos et al. 2017;Willamo et al. 2017). In addition to sunspot and sunspot group numbers, the areas of sunspots have also been recorded since the early telescopic observations (Arlt et al. 2013;Arlt & Vaquero 2020), albeit considerably less systematically than sunspot numbers, which makes crosscalibration of the individual records extremely challenging. Therefore, the earliest sunspot area measurements typically employed in solar activity and irradiance studies are those from the Royal Greenwich Observatory (RGO) dating back to 1874.
Observations of bright faculae have also been reported since the advent of telescopes (Vaquero & Vázquez 2009). However, due to their low contrast when observed in the continuum, observations of facular regions had long been rather episodic, with regular observations over a long time span potentially carrying information on such regions going back to the late 19th century. More detailed information on facular regions is provided by Ca ii K observations, which have been performed at many places around the globe since 1892 (see Chatzistergos 2017;Chatzistergos et al. 2020c, and references therein), and thus provide a very good temporal coverage of the entire 20th century. These observations sample the lower chromosphere where faculae are seen as bright plage regions. Thus, an accurate measurement of plage properties in these observations is of primary interest for reconstructions of past solar magnetism (Chatzistergos et al. 2019d).
Article number, page 3 of 22  Various studies require information about both sunspots and faculae, such as irradiance reconstructions (e.g. Foukal & Lean 1990;Krivova et al. 2010;Dasi-Espuig et al. 2016;Wu et al. 2018;Lean 2018), reconstructions of the long-term evolution of the solar magnetic eld (e.g. Solanki et al. 2000Solanki et al. , 2002Cameron et al. 2010;Jiang et al. 2011Jiang et al. , 2013Nagovitsyn et al. 2016;Bhowmik & Nandy 2018;Krivova et al. 2021), or analyses of large-scale ow patterns (Rincon & Rieutord 2018, and references therein). However, even if the Ca ii K data are used directly, they cover only about 130 years, compared with 400 years of sunspot number data. The Ca ii K data series are too short for many purposes. For example, to get a good idea of solar inuence on climate, it is important to have solar irradiance reconstructions going back to the Maunder minimum, that is until at least 1700 CE. To do this, it is necessary to reconstruct the plage area starting from sunspot data. A rst step in that direction is to obtain a reliable relationship between faculae and sunspots spanning more than the last couple of sunspot cycles.
Multiple studies have endeavoured to determine the relationship between faculae and sunspots (e.g. Foukal & Vernazza 1979;Foukal 1996Foukal , 1998Kuriyan et al. 1982;Lawrence 1987b;Chapman et al. 1997;Tlatov et al. 2009;Solanki & Unruh 2013;Shapiro et al. 2014;Bertello et al. 2016;Criscuoli 2016;Yeo et al. 2020;de Paula & Curto 2020;Berrilli et al. 2020;Nèmec et al. 2022) using dierent data and methodologies. Table 1 summarises the ndings of some previous works, emphasising the analyses that used Ca ii K plage areas as the facular data. The majority of the previous studies supported a linear and a quadratic relation (considering sunspot data as a function of plage areas) when annual and daily areas were considered, respectively. Lawrence (1987b,a) also found that the ratio of the annual plage to sunspot areas varies over the solar cycle (SC) being lowest during SC maxima, hence hinting at a nonlinear relation. Furthermore, Lawrence (1987b) Probability distribution functions (PDFs) for plage fractional areas as a function of the sunspot fractional areas from MEA20. Both plage and spot areas were taken on the same days. The PDFs are shown in bins of 0.005 and 0.0005 fractional areas for plage and sunspots, respectively. Each panel shows the PDF matrix for a given Ca II K archive; Co (a)), Ko (b)), MD (c)), MW (d)), RP (e)), SF2 (f)), SP (g)), and the CEA20 plage area composite (h)). The PDFs are colour-coded between white for 0 and bright blue for 1. Circles denote the average plage area value within each column. For columns with less than 20 days of data these circles are shown in black, otherwise in red. Also shown is the asymmetric 1σ interval for each column. Two dierent ts to the average values are overplotted: a power-law t (black) and a square-root function (green). The number of days included in each column is also shown as a solid black histogram (see right-hand axis). The period of overlap between the two archives shown in a given panel (expressed in solar cycles), as well as the total number of overlapping days used to construct each matrix (N) and the total number of days within this time interval are listed in the top part of each panel. Bertello et al. (2016) reported that the relation depends on the SC strength. Most of these studies used plage areas derived from uncalibrated historical Ca ii K data (except for Chapman et al. 1997Chapman et al. , 2001Chapman et al. , 2011, who used CCD-based data) and mainly either data from a single archive or simply appending data from dierent observatories without cross-calibrating them, for example in the Solar Geophysical data series (SGD, hereafter) (see Sect. 2). At the same time, relations derived from historical archives carry signicant uncertainties due to intrinsic inconsistencies of the various datasets and their processing (see Sect. 3.3). Therefore, any study based on these data Article number, page 5 of 22 Scatter plots between annual plage fractional areas and the annual MEA20 sunspot fractional areas (left columns) and the annual CEA17 GSN series (right columns). Two dierent ts are overplotted, a power-law (black) and a linear (yellow) t. The dotted line has a slope of 17, which corresponds to the mean ratio between the CEA20 plage area series and the MEA19 sunspot area series (see Sect. 3.1). The period of overlap between the two archives (expressed in solar cycles) as well as the slope of the linear t are shown in each panel.
(e.g. reconstruction of solar irradiance variations) is also affected by the intrinsic inconsistency of the various archives. These limitations can be overcome by exploiting the results of the analysis of Ca ii K observations by Chatzistergos et al. (2018bChatzistergos et al. ( , 2019bChatzistergos et al. ( , 2020c. Here we use the consistent record of plage areas derived by Chatzistergos et al. (2020c) from the analysis of 38 Ca ii K archives together with several available series of sunspot observations to study the relationship between plage areas and sunspot data. We examine the functional form of the relationship for various sunspot datasets, and also analyse the dependence of this relationship on the activity level and the bandwidth of the Ca ii K observations.
In Sect. 2 we give an overview of the various full-disc Ca ii K observations, and the published plage and sunspot time series considered in this study. In Sect. 3 we study the relationship between the plage areas and the various sunspot records. We then use these relationships to re-Article number, page 6 of 22 Fig. 1, but for the CEA17 GSN series. A linear t (orange) and a power-law t (black dashed) were performed to the mean values of each column. The slope of the linear t is also listed in each panel. The black dotted line has a slope of 0.005. construct plage time series based on sunspot observations, which we test by comparing with the actual plage datasets. We also discuss our ndings and compare them to results presented in the literature. Finally, we summarise our results and draw our conclusions in Sect. 4.

Ca ii K plage series
We use the time series of projected plage areas produced and made available by Chatzistergos et al. (2020c).
These plage area series were produced after accurately processing over 290,000 images from 43 historical and modern Ca ii K archives (Chatzistergos et al. 2016(Chatzistergos et al. , 2018a(Chatzistergos et al. , 2019a(Chatzistergos et al. ,b,c, 2020a spanning the period 18922019. A composite of plage areas was produced by combining results from 38 archives (Chatzistergos et al. 2020c,d, hereafter CEA20) 1 covering the period 18922019. The composite includes separately projected and corrected areas for foreshortening; however, here we used only the projected areas. The ve archives considered by Chatzistergos et al. Fig. 4. Upper panel: CEA20 plage area composite (black line) along with scaled sunspot series by MEA20 (yellow asterisks), CEA17 (blue circles), ISNv2 (red triangles), and SvSc16 (green squares). All series are scaled linearly to the plage areas except MEA20, which is scaled with a square-root function. The solid lines are annual median values, while the grey shaded surface is the asymmetric 1σ interval. Lower panel: Ratio of the CEA20 plage areas composite to the sunspot area series by MEA20 (black) and sunspot number series by CEA17 (blue), SvSc16 (green), and ISNv2 (red) as a function of time. The ratios to the GSN series have been multiplied by 2000 to allow a direct comparison with the ratios to the sunspot areas. The blue shaded surface gives the range of annual values for the plage areas to sunspot number ratios from all sunspot number series used in this study. The ratios are calculated only for the days on which sunspot areas are greater than 0.0005 of the disc area or the sunspot number is greater than 0. The yellow (light blue) horizontal line indicates the mean ratio of plage areas to sunspot numbers (sunspot areas), while the purple horizontal line is the mean ratio for the plage areas to sunspot number series over four-year intervals around cycle maxima. The numbers below the curves in each panel denote the conventional SC numbering and are placed at times of SC maxima.
(2020c), but nally not used for the composite, included observations taken o-band. While we actually considered all individual series from Chatzistergos et al. (2020c), some of them are quite short and their analysis does not fortify the results or conclusions presented in the following. Therefore, for the sake of simplicity and brevity, we do not show the results for these series here. In the following we present results only for the 15 archives summarised in Table 2, where we also give the acronyms used for them in this paper. We refer the reader to Table 1 in Chatzistergos et al. (2020c) for the main characteristics of these archives. One exception is the plage area record from Meudon (Malherbe et al. 2022). Instead of using the series by Chatzistergos et al. (2020c), we produced a new plage area series by applying exactly the same process as Chatzistergos et al. (2020c). This was done to include newly digitised data covering mainly the period 19481961. The new dataset includes 5865 new images, which were missing in the series produced by Chatzistergos et al. (2020c).
Details of the processing procedure and the derivation of plage areas can be found in Chatzistergos et al. (2018bChatzistergos et al. ( , 2019bChatzistergos et al. ( , 2020c. Briey, images from the photographic archives were photometrically calibrated and compensated for limb-darkening as described by Chatzistergos et al. (2018b). Images taken with a CCD were also processed to compensate for the limb-darkening with the same method. All images were segmented to identify plage areas with a multiplicative factor to the standard deviation of the quiet Sun intensity values (Chatzistergos et al. 2019b). Chatzistergos et al. (2018bChatzistergos et al. ( , 2019b showed that the developed method signicantly reduces errors in the estimates of plage areas from the various historical Ca ii K archives. We also considered the previously published Ca ii K plage area series by Kuriyan et al. (1983), Foukal (1996Foukal ( , 2002 The processing and data selection vary considerably among the above published series (see Chatzistergos et al. 2019b, for more details). For example, the series by Kuriyan et al. (1983) was created by manually selecting the plage regions from the actual Ko photographs. Similarly, the SGD series includes areas determined manually from the physical photographs of the MM (06/194209/1979), MW (10/197909/1981, and BB (10/198111/1987) observatories. Foukal (1996) manually selected plage areas from an earlier version of the MW dataset digitised with an 8bit device. Foukal (2002) (2002), for which 12-month running mean values are provided. In this work we use the series by Bertello et al. (2010) after applying a linear scaling to match our derived plage area series from the 16bit MW data. We note that all these series include areas corrected for foreshortening in fractions of the hemisphere, while the ones by Chatterjee et al. (2016), Priyal et al. (2017), and Singh et al. (2018) include projected areas.

Sunspot series
For our study we used the sunspot area record compiled by Mandal et al. (2020b, MEA20, hereafter) 1 . We used the projected areas from this dataset as we did for the plage areas. This series is a composite of the RGO (18741976), Debrecen (19762019), and Kislovodsk (19772019) sunspot area records.
We also used the various available sunspot number and group number series. These series and their acronyms used here are summarised in Table 3. Briey, ISNv1, ISNv2, and SvSc16 used simple linear scaling to calibrate the records by the various observers. CEA17 used a non-linear and non-2 Available at https://www.ngdc.noaa.gov/stp/solar/ calciumplages.html parametric approach to cross-calibrate the counts by dierent observers, thus taking their diverse observing capabilities into account.
We note that the values from ISNv1 and ISNv2 were divided by 12.08 and 20.13 (12.08/0.6), respectively, to bring them to the level of the GSN series. Owing to the corrections introduced to ISNv1 to produce ISNv2 (see e.g. Clette & Lefèvre 2018), this scaling results in a small discontinuity between ISNv1 and ISNv2 over 1947 (Clette & Lefèvre 2016), with ISNv2 being consistently lower than ISNv1 afterwards. We further note that the series by SvSc16 has annual cadence, while the other series have daily values.

Relationship between plage and sunspot series
We rst analysed the relationship between the daily plage areas derived from the analysis of each individual Ca ii K archive and the sunspot area series from MEA20 by considering their entire respective overlap periods. Following Chatzistergos et al. (2017), we did this by using probability distribution function (PDF) matrices between the daily plage and sunspot areas. Briey, for a given archive, the PDF matrices are created by using another archive as the reference. For this, we rst sort the values of the considered archive in bins of specic width. For each bin, we compute the histogram of the co-temporal observations from the reference archive. Each histogram is then normalised to the total number of data points included, thus resulting in a PDF. Examples of such PDF matrices when considering the Ca ii K series as the references (ordinate) are shown in Fig. 1. We note that the PDF matrices were computed with bin sizes of 10 −3 , 10 −4 , and 0.1, for plage area, sunspot area, and sunspot number series, respectively; however, to aid the visualisation of the PDF matrices in the following they are shown with bins of 5×10 −3 , 5×10 −4 , and 1. We then t the plage area values averaged over every sunspot-area bin with a square-root and a power-law function. Since our purpose is to use sunspot data to reconstruct plage areas, we expressed plage areas as a function of sunspot data, which is the opposite of how previous results are listed in Table 1. The results are qualitatively similar for all archives, and the overall shape of the relationship is in general in agreement with previous studies. However, we nd that the power law ts the data even better than the square-root function reported in previous studies, although at the cost of an additional free parameter (the exponent). This also holds when the sunspot areas are taken as reference. Considering the CEA20 plage area composite and the MEA20 sunspot area record, we nd the relationship between them to be where p a and s a are the plage and sunspot areas, respectively. Table 4 lists the best t parameters for the ts between the CEA20 plage area and MEA20 sunspot area composite series along with the sums of squared residuals per degree of freedom (RSS/DOF). Interestingly, even when considering annual values (seen in Fig. 2), we see a clear tendency for a non-linearity in the relationship, although there is a considerable scatter for some series. The parameters of the Article number, page 9 of 22 ts for the annual values of the CEA20 plage area composite and the MEA20 sunspot area series are listed in Table  5.
These results are in agreement with those by Chapman et al. (1997), but in contrast to those of Foukal (1996Foukal ( , 1998 and Chapman et al. (2011), who reported a linear relationship between sunspot and plage areas when annual values were considered. Figure 3 shows the relationship between the various Ca ii K plage area series and the CEA17 GSN series. To a good rst approximation, the relation can be considered linear for all archives, although some archives hint at a weak non-linearity (see e.g. the results for SP, MD, or Co data). The parameters of the power-law t between the CEA20 plage area composite and the various sunspot series are given in Table 4 along with the parameters of a linear t and the resulting RSS/DOF. For the CEA20 plage area composite and the CEA17 GSN series, we nd that the power-law function ts the data better than the linear relation. Table 4 shows that the relation is only mildly aected by the choice of the sunspot series. For the CEA17 GSN series we adopt the relationship where s gn is the group count. The relation is, to a good approximation, linear for the annual values (see Fig. 2h). However, the slope of the t to the annual values tends to be slightly higher than that to the daily values. This is most likely due to the greater lifetime of plage ensemble regions compared to sunspots. The parameters of the ts for the annual values of the CEA20 plage area composite and the various sunspot series are listed in Table 5. This is in agreement with the results by Kuriyan et al. (1982), Foukal (1996), andBertello et al. (2016), who found a linear relation between annual ISNv1 and Ca ii K plage areas. We note that the shape of the relationships we nd for plage-sunspot areas and plage-sunspot number series imply a non-linear relation between sunspot areas and sunspot number series, which is in agreement with previous results (e.g. Fligge & Solanki 1997;Balmaceda et al. 2009;Carrasco et al. 2016;Nagovitsyn & Osipova 2021;Mandal et al. 2021). Figure 4 (top panel) shows the annual values of the plage area composite along with the scaled series by MEA20, CEA17, SvSc16, and ISNv2. We nd good agreement between all the series and the CEA20 plage area composite. An exception is the period before 1905; however, the coverage of the Ca ii K plage area composite during this period is poorer than at other times, which leads to poorer statistics in comparison to sunspot observations. Figure 4 (bottom panel) shows the ratio of the plage area composite to the various sunspot series. To compute the ratios we ignored the days when the sunspot number was 0 or sunspot areas were lower than 0.0005 of the disc area. We note that a lower threshold for the sunspot areas would leave the ratio over activity maximum periods unaffected, but would increase the ratio during activity minima. The ratio of the plage to sunspot areas is in the range [6,30] with an average of 17 ± 4 when annual values are considered. The range is [0.6,147] for the daily values with an average of 21 ± 11. These values are consistent with those from Schatten et al. (1985); Lawrence (1987a), and Chapman et al. (2001), but they are lower than those reported by Chapman et al. (2011). It is noteworthy that the ratio increases during the descending phase of SC 19. Also interesting is the ratio of the plage areas to the sunspot number, which was roughly the same (∼0.0043) during all activity maxima over SCs 1523.
A careful comparison of dierent panels of Figs. 1 and 3 seems to hint at changes in the relationship between faculae and spots with time. For example, panels e and f, which are limited to cycles 2224 exhibit weaker slopes than relationships covering earlier periods. In Appendix A we take a closer look at this, and actually do nd a weak dependence of the relationships on cycle strength.

Reconstructing plage areas from sunspot series
We now use the relationships derived in Sect. 3.1 to reconstruct plage areas from sunspot data and to analyse the performance of these reconstructions. We use the power-law function on sunspot area (Eq. 1) and GSN (Eq. 2), respectively. Figure 5a shows the reconstructed plage areas from the MEA20 sunspot area and the CEA17 GSN series, along with the CEA20 plage area composite. The residual plage areas between the CEA20 composite and the reconstructions from the MEA20 sunspot area and CEA17 GSN series are shown in Fig. 5b and c, respectively. Both reconstructions perform well, with RMS dierences (linear correlation coecients) between the CEA20 plage area composite and the reconstructions from the MEA sunspot area and the CEA17 GSN series being 0.009 and 0.0079 (0.83 and 0.88), respectively. Appendix C presents reconstructions using a linear, quadratic, or power-law relation with SC-dependent parameters, and also uses ISNv2 to reconstruct plage areas.

Comparison to other published plage areas
To compare our results with those based on earlier published plage area series in a consistent way, we repeated our analysis by applying the same procedure to such records. Figures 6 and 7 show the ratios of the previously published Ko and MW Ca ii K plage area series to the MEA20 sunspot area series. The gure clearly demonstrates that even when based on the same Ca ii K archive, but processed by different authors using dierent techniques and digitisation versions, the plage-to-spot ratio exhibits quite dierent behaviour.
The divergence is even more substantial when dierent archives are considered. In this respect, we note that ve of the considered plage series shown in Fig. 6 have only annual values and the results derived from them have very dierent statistics and are thus less informative than those derived from the daily values, while three of the series are in fractional projected areas and the rest in hemispheric areas corrected for foreshortening. The ratios for the Ko series by Chatterjee et al. (2016) and Ermolli et al. (2009b) slightly decrease with time, while for the record by Kuriyan et al. (1982) a slight increase is seen. Furthermore, the ratio for the same archive derived from the data by Priyal et al. (2017) rst increases towards SC 19 and then decreases again. All MW series show an increase in the ratio over SC 19 and then an abrupt drop over SC 21, consistent with the report of instrumental issues over that period (Chatzistergos et al. 2019b). The ratio for the MW series by Foukal (1996)  Comparison of plage areas reconstructed from sunspot areas and sunspot group number series. Panel a: Reconstructed plage areas from MEA20 sunspot area series (black) and the CEA17 GSN series (green) along with the CEA20 plage area composite (blue). Shown are annual median values (solid lines) along with the asymmetric 1σ intervals (shaded surfaces) only for the period covered by CEA20 plage area composite. The numbers below the curves denote the conventional SC numbering and are placed at SC maximum periods. Panels bc Left: Residual areas in fractions of the disc between the CEA20 plage area composite and the plage areas reconstructed from the MEA20 sunspot areas (panel b) and the CEA17 GSN series (panel c). The red dashed line denotes 0 plage area dierences. Panels bc Right: Distributions of the dierences in bins of 0.001 in fractional areas.
gesting that the plage areas over this period are probably overestimated in this series, as also suggested by Ermolli et al. (2009b) and Chatzistergos et al. (2019b). The SGD series returns an average plage-to-sunspot area ratio of ∼ 11 compared to the value of 18 obtained from the CEA20 composite. Furthermore, the ratio based on SGD also slightly decreases with time. The series by Foukal (2002) results in a value for the plage-to-sunspot area ratio similar to ours. However, in contrast to our result, this ratio decreases over SC 20 and 21 and increases again over SC 22. This is consistent with the conjecture of a potential problem in the MW data over SC 21 (Chatzistergos et al. 2019b), while the following increase is mainly due to the use of SP data after 1985, when MW stopped operation. These ndings are in agreement with the studies by Ermolli et al. (2018) and Chatzistergos et al. (2020c) who reported inconsistencies within the various published plage area series discussed here.
To our knowledge, only three of the previously published plage area records had been used for irradiance reconstructions: the series by SGD (used e.g. by Oster et al. 1982;Schatten et al. 1985;Foukal & Lean 1986, 1988Lean & Skumanich 1983;Lean et al. 2001), Foukal (2002, used e.g. by Foukal 2012), and Bertello et al. (2010, used by Ambelu et al. 2011. All three of these series show a slight decreasing trend with time (Fig. 6), which is in contrast to our results here. This suggests that the competing contributions from sunspots and plage to irradiance variations might not have been accounted for correctly in those irradiance reconstructions. Furthermore, the decreasing trend in the plage-to-spot area ratio with time can potentially disguise any long-term trend in solar irradiance.

Summary and conclusions
We have studied the relation of plage areas versus sunspot areas or numbers. To do so, we used the plage area com-Article number, page 11 of 22 a) Ko Kuriyan et al. (1982) c) Ko Chatterjee et al. (2016) e) Ko Priyal et al. (2017) b) Ko Tlatov et al. (2009) d) Ko Ermolli et al. (2009b) f) Ko Singh et al. (2018) Fig. 6. Ratio of published plage area series to the sunspot area composite by MEA20 (black), and the group sunspot number series by CEA17 (blue) and SvSc16 (green). The plage area series shown are those for the Ko data by Kuriyan et al. (1982, a), Tlatov et al. (2009, b), Chatterjee et al. (2016, c), Ermolli et al. (2009b, d), Priyal et al. (2017, e), and Singh et al. (2018, f). The rst two series provide only annual mean values, while the others have daily cadence. The ratios to the group sunspot number series have been multiplied by 2000 to be plotted alongside the ratios to the sunspot areas. The solid lines are annual median values. The blue shaded area denotes the range of annual values for the ratios from all the sunspot number series used in this study (see Sect. 2). The ratios for the series with daily values are calculated only for the days when the plage areas are greater than 0.0005 in disc fraction. The yellow (light blue) horizontal line represents the average ratio value between the MEA20 sunspot areas and the corresponding plage series (the CEA20 composite plage area series). The numbers at the bottom of each panel denote the conventional SC numbering, and are placed at SC maximum periods. posite created by Chatzistergos et al. (2020c) by combining the data derived from 38 Ca ii K archives. For comparison, we have also analysed the relationship using the underlying individual archives. The sunspot records considered are the sunspot area composite by Mandal et al. (2020a) and the various available sunspot number and group number series. The relationship between the daily plage and sunspot areas is best described by a power-law function and remains slightly non-linear also when annual values are considered. Recently, Nèmec et al. (2022) used a surface ux transport model to show that a more ecient cancellation of the diuse magnetic ux in faculae leads to a slower increase in facular coverage with activity compared to spot coverage, eventually resulting in a non-linear relationship between them. The relationship between the plage areas and the various sunspot number records considered here is also best represented with a power-law function; however, the linear relation is a good approximation for the annual values.
We also nd that the relation between the plage and sunspot areas depends on the bandwidth employed for the observations. Furthermore, the relation shows a dependence on the solar cycle strength. However, accounting for this dependence when reconstructing plage areas from sunspot areas (numbers) results in only small (minute) improvements.
Furthermore, the relationship between plage and sunspot is aected by the accuracy of the processing of the Ca ii K images. We showed that employing various published plage area series from the literature resulted in rather diverging trends in their ratios to sunspot areas and numbers. This stands when considering plage areas derived from data from the same archive (but treated dierently) and when considering data from dierent archives. The advantage of our study is that we have considered not only numerous individual archives, but also their composite, which builds on a careful and consistent processing, analysis, and cross-calibration of the individual records (see Chatzistergos et al. 2020c).
Thus, the results of our study have signicant implications for reconstructions of past irradiance variations (e.g. Domingo et al. 2009;Yeo et al. 2015;Krivova 2018, and references therein) and stellar activity studies (e.g. Lanza  Fig. 6, but for the plage area series for MW data by Bertello et al. (2010, a); MW data by Ermolli et al. (2009b, b); MW data by Foukal (1996, c); MW and SP data by Foukal (2002, d); MW data by Tlatov et al. (2009, e); and for MM, MW, and BB data by SGD (f);. The series in panels b), c), and f) have daily values, while the others only have annual mean values. The Bertello et al. (2010) series, which is a Ca ii K emission index series, was linearly scaled to match the plage areas from our MW data (see Sect. 2). et al. 2003Sect. 2). et al. , 2019Gondoin 2008;Reinhold et al. 2019). Such studies are often carried out assuming a constant ratio of faculae to sunspot areas, which we showed to not be very accurate. Appendix A: Dependence on cycle strength Comparing the dierent panels of Figs. 1 and 3, it is evident that the relationship between the plage areas and the sunspot observations shows dierences among the datasets. While some of the dierences are clearly due to specics of the individual plage series, there also seems to be a hint of changes in the relationship with time. Thus, for example, the relationships limited to cycles 22 to 24 (see panels e and f) have a weaker slope than relationships covering earlier periods. Here we take a closer look at the temporal behaviour of the relationship. In Fig. A.1 we compare the relationship between plage and sunspot areas for individual SCs for the Ko, MD, and MW series, which are the longest series considered in this study, and for the CEA20 plage area composite. We dened the strength of the SC as the median value of sunspot areas over each SC and use dierent colours to represent this in Fig. A.1, with dark blue for the weakest SCs through green and yellow for intermediate SCs to dark red for the strongest ones. Figure A.1 shows a clear dependence of the plage versus spot area relationship on the SC strength, with plage areas generally rising faster with sunspot areas for stronger cycles. This is probably because during stronger cycles, more active regions emerge overall, which in turn leads to an increased number of decaying regions, eventually seen as faculae. As a result, at a given instant in time there are more facular regions on the surface than there would be during weaker cycles for the same spot coverage. However, some deviations from strong cycles rising faster are also evident in our results. Thus, interestingly, the two strongest cycles 19 and 21 show a rather similar relationship (except for MW), but in Ko the values for SC 21 are somewhat higher than those for SC19, while the opposite is seen for MD. We note, however, that a signicant scatter at higher activity levels might aect the ts. In MW, plage areas for SC 19 reach notably higher values than in other series, while values for SC21 are signicantly lower in other records than would be expected for such a strong cycle. This hints at potential problems and inconsistencies in the MW archive over these periods of time. Figure A.1d shows the relation between the composite plage areas from CEA20 and MEA20 sunspot areas for each SC between 13 and 24. The general form of the studied relation and its dependence on the cycle strength is overall the same as when individual series are considered (panels ac). The change with the cycle strength is, however, signicantly clearer, with atter relations for weaker cycles. Again, SC 19 shows a very similar relationship to the next two cycles in strength, SCs 21 and 22.  Fig. A.1, except that it shows the Ca ii K plage area series versus the CEA17 GSN series (instead of the MEA20 sunspot area composite shown in Fig. A.1). The dependence of the relation between plage areas and the GSN on the SC strength is evident, although less pronounced than between the plage and sunspot areas. For most series, it manifests for higher sunspot group number, while it appears to be absent from the MD series. The results from the archives showing anomalous behaviour over certain SCs when compared to the sunspot areas exhibit the same behaviour when compared to the GSN series. This is unsurprising, considering that the various GSN and sunspot area series are rather consistent with each other over the 20th century.
To highlight the dierences between the archives and issues aecting them, Fig. A.3 shows the same relationships for the Co, Ko, MD, and MW series, and for the CEA20 composite, but now for individual cycles 1522 in each panel. We note that these four archives employ dierent nominal bandwidths for their observations, with Co and MD using 0.15Å, MW 0.2Å, and Ko 0.5Å. Furthermore, the CEA20 composite series was created with RP as the reference dataset, which has a nominal bandwidth of 2.5Å. A narrower bandwidth generally results in higher plage areas compared to a broader one (Chatzistergos et al. 2020c). The eect of the bandwidth on the relation between plage areas and sunspot data is discussed in Appendix B. The dierences between these four archives are generally higher for earlier cycles and are highest for SCs 18 and 19. In SC 20 the Ko, MD, and MW values lie comparatively close to each other, whereas the Co values are a factor of about 23 lower. The plage areas from the Ko and Co archives are generally close to the composite values and lower than MD and MW areas. However, in SC 21 MW plage areas are the lowest, while they are highest in SC 19. In addition, in SC19 the Co values are lower than all others, while in SC 22 the Ko values are clearly the lowest. Unfortunately, such a comparison for most of the other plage series is not possible due to their insucient coverage. Figure  , and MW (c) archives, and from the CEA20 plage area composite (d). The symbols denote the mean values of the PDF matrices (as shown in Fig. 3). The dashed lines are power-law ts and are coloured according to the strength of the SC with the strength increasing from black for the weakest cycle, via dark blue, turquise, green, lime, orange, bright red to dark red for the strongest SC.
of multiple overlapping archives is of great advantage (see also Chatzistergos et al. 2021).
To further study the variation of the derived relationship with the SC strength we relate the t parameters to the SCaveraged sunspot area. Figure A.4 shows the exponents of the power-law ts to the relation between plage and sunspot areas as a function of the SC-averaged sunspot area. Also shown are the linear ts to the values for individual cycles (solid red lines). Thus, SCs 13 and 14 were excluded from the ts due to poor data coverage. The ts for the CEA17 GSN series also exclude SC 24 since these records do not cover the entire SC 24. The exponent of the power-law t decreases consistently with increasing SC-averaged sunspot areas with a slope of ∼ −53. The slope of the linear relation between plage areas and the CEA17 GSN increases weakly with SC-averaged GSN with a coecient of 0.02. The opposite behaviour of the exponents for the sunspot areas and sunspot number series occurs because sunspot areas are measured in fractions and are always smaller than one, whereas sunspot numbers are typically greater than 1. We note that SC 21 has the lowest (highest) value of the exponent of the ts to sunspot areas (CEA17 GSN). SC 19 is trailing SC 21, even though SC 19 has a stronger amplitude in all the analysed series. We nd similar results for ISNv2 to those reported for the CEA17 GSN series, with a slightly lower slope of 0.019 instead of 0.02. The results for ISNv1, however, show a much lower dependence of the exponents on the cycle strength, with a slope of -9× where s a and s gn are the cycle-averaged sunspot area and group number, respectively. The values of s a and s gn for the various sunspot series mentioned in this study are given in To illustrate this in a more systematic way, Fig. B.1 (panels a and b) shows the exponent of the power-law ts to the average values of the PDF bins (as shown in Fig. 1  and 3, respectively) as a function of the nominal bandwidth of each archive. To avoid the potential eects of the activity variations with time, we only show the data covering the SC 23. We chose this cycle because it is covered by the largest number of available long-running archives, including photographic archives. There is a slight tendency for the exponent to increase (decrease) with the nominal bandwidth for the sunspot areas (GSN series), although with a high scatter. We recall that the opposite behaviour of the exponent is due to the sunspot areas always being lower than one, while sunspot number series are in general greater than 1. We note, however, that the actual bandwidth of the observations does not always correspond to the nominal value, and considerable ambiguities and temporal variations have been noticed for some of the archives (Chatzistergos et al. 2019b(Chatzistergos et al. , 2020c, hindering a more accurate evaluation of the bandwidth eect. These archives are shown by triangles in the gure, and it is evident that these archives (see Chatzistergos et al. 2019bChatzistergos et al. , 2020c are exactly the ones that show the highest scatter with respect to the expected relationships. For instance, Fig. B.1 suggests that SP and Kh used a narrower bandwidth, while BB, Ko, and UP a broader bandwidth than their nominal ones. Furthermore, inconsistencies within the individual archives (Chatzistergos et al. 2019b) also aect this comparison, for example the changes in the instrument used to record the BB and ML archives.
We further note that the dependence of the exponent of the ts on the bandwidth of the observations is lifted when using the series after their cross-calibration to RP, as done by Chatzistergos et al. (2020c) in order to merge them into the CEA20 plage area composite series. This is illustrated in panels c and d of Fig. B.1, which are the same as panels a and b, but for the plage area series after applying the scaling by Chatzistergos et al. (2020c). We nd that the exponents for all cross-calibrated series lie closer to that of RP compared to the original series (without the cross-calibration). The slope of the exponent is signicantly reduced compared to the values for the original plage area series. These results support the corrections done by Chatzistergos et al. (2020c).
Appendix C: Reconstructing plage areas from sunspot data and dierent relations Here we use the relationships derived in Sects. 3.1 and Appendix A to reconstruct plage areas from the various sunspot series and analyse the performance of these reconstructions. For comparison, we use three dierent relations: 1) a power law (Eq. 1) and 2) a square root (linear for CEA17 and ISNv2), both taken to be constant over the entire period analysed in this study (see Table 4), and 3) a power-law function with a linear dependence of the expo- Figure C.1a shows the reconstructed plage areas from the MEA20 sunspot area series with the three above mentioned relations along with the CEA20 plage area composite. Figure C.1bd show the residuals between the CEA20 plage area composite and the plage areas reconstructed as described above with all three relations from the MEA20 sunspot areas, the CEA17 GSN series, and ISNv2, respectively.
In all cases we nd that the power-law relation with the SC-dependent parameters performs better; however, the improvement is rather small. In particular, the RMS dierences between the CEA20 composite series and the series reconstructed with a square root, a power law, and a power law with SC-dependent exponents used on the MEA20 sunspot area series are 0.0096, 0.0090, and 0.0086, respectively, while the linear correlation coecients are 0.83, 0.83, 0.85, respectively. Hence, the overall improvement when using an SC-dependent power-law function is rather small. Nevertheless, this reduces the activity dependent eect on the reconstructed plage areas and improves the agreement over cycle maxima, which tend to be slightly underestimated for strong cycles for the reconstructions with a power law with time-independent parameters. Figure C.1 (panel c) shows the residuals between the CEA20 plage area composite and the reconstructed plage areas derived from the CEA17 GSN record. Again, for comparison, we also show the plage areas reconstructed using a linear, a power law with time-independent parameters, and a power-law function with SC-dependent exponents. All three reconstructions perform equally well, with RMS dierence to the plage area composite for the linear, power law, and power law with SC-dependent exponents being 0.0079, 0.0079, and 0.0078, respectively, while the linear correlation coecients are 0.87, 0.88, and 0.88. We note that using the ISNv2 series for the plage area reconstruction returns a marginally better agreement with the composite than when CEA17 is employed (Figure C.1 panel c; RMS dierences of 0.0076 and 0.0078 for power-law function with SC-dependent and SC-independent exponents, and 0.0078 for the linear relationship, all when considering the overlapping periods between ISNv2 and CEA17). We note that the distribution of residuals of the reconstructed plage areas from ISNv2 to the CEA20 plage area composite exhibits two peaks, for instance in the series reconstructed with a power-law function one peak is close to 0 and the other at about 0.0055. The latter arises due to days with sunspot number in ISNv2 of 0 for which the plage areas are closer to 0.01 (see Fig. 4). Similar double peaks are seen in the residuals for the reconstructions with MEA20 and CEA17, though they are less evident for CEA17. Comparison of reconstructed plage areas from sunspot areas using dierent relationships. Panel a: Reconstructed plage areas from sunspot data and from the CEA20 plage area composite. Shown are annual median values (solid lines) along with the asymmetric 1σ intervals (shaded surfaces). The numbers below the curves denote the conventional SC numbering and are placed at SC maximum periods. Panels bd Left: Residual areas in fractions of the disc between the CEA20 plage area composite and the plage areas reconstructed from the MEA20 sunspot areas (top panels), the CEA17 GSN series (middle panels), and the ISNv2 series (bottom panels). The reconstructions were produced with a simple linear scaling (blue, only for CEA17 GSN and ISNv2), with a square-root function (blue, only for the MEA20 sunspot areas), with a power-law function (black), and a power-law function with SC-dependent exponents (orange). The red dashed line denotes 0 plage area dierences. Panels bd Right: Distributions of the dierences in bins of 0.001 in fractional areas.