Issue 
A&A
Volume 586, February 2016



Article Number  A36  
Number of page(s)  14  
Section  Numerical methods and codes  
DOI  https://doi.org/10.1051/00046361/201526733  
Published online  22 January 2016 
New insights into time series analysis
I. Correlated observations
SUPA (Scottish Universities Physics Alliance) WideField Astronomy Unit,
Institute for Astronomy, School of Physics and Astronomy, University of Edinburgh,
Royal Observatory,
Blackford Hill,
Edinburgh
EH9 3HJ,
UK
email:
cfl@roe.ac.uk
Received: 12 June 2015
Accepted: 2 November 2015
Context. The first step when investigating time varying data is the detection of any reliable changes in star brightness. This step is crucial to decreasing the processing time by reducing the number of sources processed in later, slower steps. Variability indices and their combinations have been used to identify variability patterns and to select nonstochastic variations, but the separation of true variables is hindered because of wavelengthcorrelated systematics of instrumental and atmospheric origin or due to possible data reduction anomalies.
Aims. The main aim is to review the current inventory of correlation variability indices and measure the efficiency for selecting nonstochastic variations in photometric data.
Methods. We test new and standard datamining methods for correlated data using public timedomain data from the WFCAM Science Archive (WSA). This archive contains multiwavelength calibration data (WFCAMCAL) for 216,722 point sources, with at least ten unflagged epochs in any of five filters (YZJHK), which were used to test the different indices against. We improve the panchromatic variability indices and introduce a new set of variability indices for preselecting variable star candidates. Using the WFCAMCAL Variable Star Catalogue (WVSC1) we delimit the efficiency of each variability index. Moreover we test new insights about these indices to improve the efficiency of detection of timeseries data dominated by correlated variations.
Results. We propose five new variability indices that display high efficiency for the detection of variable stars. We determine the best way to select variable stars with these indices and the current tool inventory. In addition, we propose a universal analytical expression to select likely variables using the fraction of fluctuations on these indices (f_{fluc}). The f_{fluc} can be used as a universal way to analyse photometric data since it displays a only weak dependency with the instrument properties. The variability indices computed in this new approach allow us to reduce misclassification and these will be implemented in an automatic classifier which will be addressed in a forthcoming paper in this series.
Key words: methods: data analysis / techniques: photometric / astronomical databases: miscellaneous
© ESO, 2016
1. Introduction
The tremendous development in astronomical instrumentation and automation during the last few decades has given rise to several questions about how to analyse and synthesize the growing amount of data. Recently, various dedicated telescope systems, both on the ground and in space, have been used for widefield shallow, multiepoch, imaging surveys of low resolution to scan the sky in different wavebands with aims ranging from comprehensive stellar variability searches to exoplanet hunting; e.g. PanSTARRS, Kaiser et al. (2002); OGLE, Udalski (2003); SUPERWASP, Pollacco et al. (2006); CoRoT, Baglin et al. (2007); NSVS, Hoffman et al. (2009); Kepler, Borucki et al. (2010). These data have led to many discoveries in several areas of modern astronomy: asteroseismology, exoplanets, and stellar evolution (e.g. Huber et al. 2012; De Medeiros et al. 2013; Walkowicz & Basri 2013; PazChinchón et al. 2015). The next generation of these surveys, such as Gaia (BailerJones et al. 2013) and the VISTA Variables in Vía Láctea survey (VVV; Minniti et al. 2010), are providing a high data flow for a wide range of science applications in order to understand the dynamics and stellar variability of the Milky Way galaxy.
The first step to investigating time varying data is the detection of any reliable changes in star brightness (e.g. Welch & Stetson 1993; Stetson 1996; Wozniak 2000; Shin et al. 2009; Ferreira Lopes et al. 2015). This step is crucial to decreasing the running time by reducing the number of sources that slower steps, such as period finding and classification, are run on. The stochastic variations are mainly related to very bright sources, caused by saturation of the detector whereby the flux within the aperture bleeds out into nearby pixels and the measured magnitude becomes dependent on the sky brightness and seeing, or very faint sources where the sky noise dominates, providing an increase in the uncertainty of the measurements and a dependency on the sky brightness and seeing. Variability indices and their combinations have been used to identify variability patterns and to select nonstochastic variations (e.g. Damerdji et al. 2007; Shin et al. 2009; Ferreira Lopes et al. 2015), but the separation of true variables from noisy data is hindered because of wavelengthcorrelated systematics of instrumental and atmospheric origin, or due to possible data reduction anomalies. Detection methods have been optimized for specific variability signals to detect supernovae, microlensing, transits, and other variable sources (e.g. Alard & Lupton 1998; Wozniak 2000; Gössl & Riffeser 2002; Becker et al. 2004; Corwin et al. 2006; Yuan & Akerlof 2008; Renner et al. 2008). An important step to optimising this process is to review the current inventory of variability indices and determine the efficiency level for selecting nonstochastic variations in photometric data.
There are additional second order calibration effects, such as the Forbes Effect (e.g. Pakštienė 2003; Süveges 2014), which can produce apparent waves in the light curve due to variable extinction corrections for different coloured stars and may be difficult to distinguish from real quasiperiodic variations. This does not have a detectable effect in the public WFCAM Calibration (WFCAMCAL – Hodgkin et al. 2009; Cross et al. 2009) data used in this work because the extinction corrections are very low at the nearinfrared wavelengths of UKIRTWFCAM (0.9−2.5 μm), the wavelength dependence is also much lower than at ultravioletoptical wavelengths, and the colours of stars also have a narrower range.
Variability indices analyses in the present work.
The second step is to determine the main periods. There are various methods used in astronomy for frequency analysis, including the Deeming method (Deeming 1975), PDMJurkevich (Stellingwerf 1978; Dupuy & Hoffman 1985), string length minimization (Lafler & Kinman 1965; Stetson 1996; Clarke 2002), information entropy (Cincotta et al. 1995), the analysis of variance (ANOVA, SchwarzenbergCzerny 1996) and the LombScargle and its extension using error bars (Lomb 1976; Scargle 1982; Zechmeister & Kürster 2009). These methods are based on the fact that the phase diagram of the light curves (LCs) is smoothest when it is visualized with its real frequencies. Assessment of the significance of these frequencies is a pertinent problem due to nonGaussianity, multiperiodicity, nonperiodic variations, and the manner of how they should be taken into account (Süveges 2014). From this view the variability indices are a fundamental part of the variability analysis to save running time and decrease the number of miscalculations in the frequency analysis. The detection of nonperiodic variables, transients, and other aspects in regard to the significance of peaks in a periodogram has not been completely solved yet.
The last point is that the variability classification is intrinsically related to the determination of reliable periods and determining a set of parameters that allows us to distinguish all variability types. Automatic classifiers based on machine learning have been applied to several large timeseries data sets (e.g. Woźniak et al. 2004; Debosscher et al. 2007; Sarro et al. 2009; Blomme et al. 2010; Richards et al. 2011; Dubath et al. 2011). The inclusion of periodic and nonperiodic features, statistics, and more sophisticated model parameters have improved automatic classifiers (e.g. Richards et al. 2011). Misclassification, fuzzy boundaries between variable stars’ classes, mislabelled training sets, as well as full processing of terabytes of data are current scientific challenges (Eyer 2006).
The present paper is the first in a series of papers covering different aspects of variable star selection and classification. The first two articles are related to selection of variable stars using variability indices. In this paper, we discuss the selection of variable stars using correlation variability indices, while in the second of this series we will discuss noncorrelation variability indices; Paper III will be about periodicity search methods; Paper IV will be about the variable star classifier. In this work, we perform a comprehensive stellar variability analysis on time varying data. In Sect. 2.1, we describe the data used to compare each index, using a preselected catalogue of known variable stars to test how well each index selects these stars and the efficiency of the selection measured by how few additional stars are selected by the same cutoff value. In Sect. 3, we present an overview of commonly used correlation variability indices and propose five new variability indices. Next, in Sect. 4 we analyse the limits of correlated variability indices as well as proposing a false alarm probability for variability indices. We present our results and discussions in Sect. 6. Finally, in Sect. 7, we draw our conclusions and discuss some future perspectives.
2. Data
2.1. WFCAMCAL database
WFCAMCAL is a unique programme that is well fitted to test the panchromatic variability indices and our assumptions. This programme contains panchromatic data for 58 different pointings distributed over the full range in right ascension and spread over declinations of and . These were used to calibrate the UKIDSS surveys (Lawrence et al. 2007). The pointing closest to the zenith was chosen whenever a calibration field was observed. This was typically every hour early on in the UKIDSS observations and later every two hours, and some early nights had many additional observations (up to 40 in a night). During each visit the fields were usually observed with a sequence of filters, either through JHK or ZYJHK filters, within a few minutes. This led to an irregular sampling with fields observed again roughly on a daily basis, although longer time gaps are common and, of course, large seasonal gaps are also present in the data set.
The WFCAMCAL data are archived in the WFCAM Science Archive (WSA; Hambly et al. 2008). The data are processed by the Cambridge Astronomy Survey Unit (CASU) Irwin et al. (2004) and the Wide Field Astronomy Unit (WFAU) in Edinburgh, which also published the data through the WSA. The design of the WSA, the details of the data curation procedures and the layout of the database are described in detail in Hambly et al. (2008) and Cross et al. (2009). We use data from the WFCAMCAL08B release (observations up to the end of UKIRT semester 08B).
2.2. The WFCAMCAL variable star catalogue
Ferreira Lopes et al. (2015) performed a comprehensive stellar variability analysis of the WFCAMCAL database and presented the photometric data and characteristics of the identified variable stars as the WFCAM Variable Star Catalogue (WVSC1). The authors used standard datamining methods and introduced new variability indices designed for multiband data with correlated sampling. To summarize, the authors performed a careful analysis via cutoff surfaces to obtain a preselection with 6651 stars based on criteria established by numerical tests of the noise characteristics of the data. Next they combined four frequency analysis methods to search for the real frequencies in the LCs in each waveband and in the chromatic LC, i.e. comprised of the sum of all broadband filters. Finally, they obtained a ranked list of the best periods for each method and selected the very best period, which gave the minimum χ^{2} to cope with aliasing. Finally, the authors visually inspected all the phase diagrams of the 6651 stars and recovered a catalogue containing 319 stars in which 275 are classified as periodic variable stars and 44 objects as suspected variables or apparently aperiodic variables.
In this paper, we analyse this same sample from Ferreira Lopes et al. (2015). First, we selected all sources classified as a star or probable star having at least ten unflagged epochs in any of the five filters. This selection was performed from an initial database of 216,722 stars. Next, we test the efficiency of selection of variable stars with the variability indices presented in Sect. 3.
3. Variability indices
Table 1 summarizes 12 variability indices of which five are new indices proposed in this work. The present work discusses the efficiency of selection of each one and the best way to select variable stars via the current tool inventory. Sets of variability indices have been used, instead of one, to improve the selection process in the last few years (e.g. Shin et al. 2009). Indeed automatic classifiers also use these parameters to facilitate the classification of variable stars (e.g. Richards et al. 2011). The variability indices are a fundamental tool to improving all processes of the time domain analysis.
Currently, the WelchStetson indices (e.g. Welch & Stetson 1993; Stetson 1996, i.e. I_{WS}, J_{WS}, K_{WS}, and L_{WS} indices) are found to be significantly more sensitive than the “traditional” χ^{2}test for single variance, which uses the magnituderms scatter distribution of the data as a predictor (e.g. Pojmanski 2002). The improvements proposed by Stetson (1996) incorporated in the J_{WS} index allow us to compare wavebands with different numbers of epochs on an equal basis. The author uses the Bessel correction () to reduce the bias related to the sample size even though the index is the square of the correlation rather than the mean variance. The I_{WS} index was modified to quantify panchromatic flux correlations providing a new set of variability indices () by Ferreira Lopes et al. (2015). Moreover the authors proposed a new set of flux independent variability indices (). These were the first variability indices developed to analyse panchromatic surveys.
The statistical period search based in the analysis of variance (ANOVA; SchwarzenbergCzerny 1996) has been used to select nonstochastic variations. Nevertheless, this method is limited to the identification of periodic variations and requires more running time once its significance level is determined on phase diagrams for each frequency test. Using variability indices, we can identify nonstochastic variations independently based on their nature and thereby reduce the running time. The main goal of this work is to determine the best way to select variable stars without computing the variability periods. In the follow subsection, we summarize the and variability indices as well as improvements on these indices with a new approach.
3.1. The I and I panchromatic variability indices
The current tool inventory was added to by Ferreira Lopes et al. (2015) with a new set of variability indices to separate LCs that are dominated by correlated variations from those that are noisedominated. The authors introduced a new set of variability indices designed for multiband data with correlated sampling, which included one index that is highly insensitive to the presence of outliers in the timeseries data. First, the authors extended I_{WS} to create the I_{pfc} index defined as (1)where m is the number of filters, s is the combination type (between two or more epochs), n_{s} is the total number of correlations, and the Λ^{(s)} correction factor is (2)and Γ is given by (3)where u_{ijs} is the measured magnitude at the ith epoch and is in the j_{s}th combination. These indices allow us to compute correlations among s epochs (for more detail see Appendix A). As shown by Ferreira Lopes et al. (2015) increasing the number of correlated wave bands (s) makes the separation between correlated and uncorrelated variables more evident. Next the authors proposed a new index, , using Eq. (2), which is the sum of discrete values 1 or –1. This index is defined as (4)where , and is defined in Eq. (2). Finally, Ferreira Lopes et al. (2015) propose a general expression to determine the probability of a random event leading to a positive index. In the case of statistically independent events, this is given by (5)On the other hand, the expected value of for a random distribution is about 0. Meanwhile, the number of sources with negative values increases with s because there is an increase in the number of possible combinations that give a negative correlation.
3.2. Improvements on panchromatic and flux independent indices
The variability indices (see Eq. (1)) would work equally well if a set of observations are in the same bandpass, and if we correlate groups of observations observed over a short interval. Therefore we need to modify the indices to make them still more robust against different numbers of observations in each group. Similarly, we propose a new panchromatic, flux independent, variability index () and additionally combine these indices to create two new indices. To provide an expression to be used in multi or single waveband data we propose the follow notation:

1.
First, we compute the values of δ_{i} given by (6)where n_{x} is the number of epochs of waveband x, x_{i} are the flux measurements, is the mean flux, and σ_{x,i} denotes the flux errors. This parameter is equal to that used by Stetson (1996) to improve I_{WS} index and according to that work the measurements of correlations using δ_{i} allow us to compare data from different wavebands with unequal numbers of observations on an equal basis.

2.
Next, the δ_{i} values are computed for all measurements in each waveband using the respective values of n_{x}. As a result, we obtain a vector δ with N measurements collected in any waveband. In addition, we save the observation time for each δ_{i} value.

3.
We determine the time interval (ΔT) for which measurements enclosed in this interval are considered to be at virtually the same epoch. The chosen ΔT value comes from the arrangement of epochs and thus gives the minimum period (see Sect. 4.3). The total number of boxes (N_{box}) is given by T_{tot}/ ΔT, where T_{tot} is the total time span. The accuracy of the index must increase as ΔT decreases.

4.
Next, we compute the value of the variability index of order s in the kth box; (7)This equation performs all possible combinations without repetition among the n_{k} values. Indeed, the total number of combinations calculated is given by (8)

5.
Now, we can express the flux independent indices on a simple expression given by (9)where are the number of positive correlations according to Eq. (2). The indices include measurements obtained in one filter in contrast to indices where measurements are obtained in different filters.

6.
Next, we can compute the index given by (10)where it reduces to in the case when we only have measurements in different filters. The indices can be used to perform comparisons on an equal basis between stars with different number of epochs as well as using measurements obtained in one filter in contrast to and indices.

7.
An alternative way to compute the characteristic value of correlation is computing the median of correlations, given by (11)where Q_{(s)} encloses all Q_{(s,k)} correlations. The median value may provide a more robust value than the mean in the presence of outliers.

8.
Finally, we can use the in a correction factor related to instrument properties and outliers. This kind of factor can be defined as (12)where P_{s} is the expected value of pure noise for . The F^{(s)} ranges from 0 to 2 × (1−2 /s^{2}) providing an increase of its weight with s values. For instance, the maximum value of F^{(s)} is 1 for s = 2 and ~1.6 for s = 3. F^{(s)} is more efficient than because this increases the difference between values of correlated and uncorrelated data and we concentrate the pure noise values about zero. The parameter F^{(s)} is used to provide a new set of indices given by FL^{(s)} and FM^{(s)} (see Table 1).
The variability indices proposed above are determined from properties related to the magnitude and signal correlation values. Stellar variability searches based on such indices follow general assumptions: (i) intrinsic stellar variability can be typically identifiable from analysis of correlation measures observed in multiple or single wavebands; (ii) there are a minimum number of correlations required to discriminate between stochastic and nonstochastic variations (see Sect. 4.1); (iii) the interval between any two observations (τ) used to compute the correlations must be sufficiently phaselocked (see Sect. 4.3); and (iv) nonintrinsic variations are typically stochastic. Indeed, measurements due to the systematics of instrumental and atmospheric origin, or possible data reduction anomalies that display correlated properties may decrease the confidence level of variability indices. Such measurements are mainly related to temporal saturation of bright objects as well as systematic variations in the sky noise for faint stars.
4. Detection limits of correlated variability indices
The number of measurements and how many measurements designated as close, i.e. within a time span much shorter than the period of any variability, are fundamental information necessary to set better variability indices. The number of measurements determines how stringent the cutoff values must be, while the number of close measurements sets the most appropriate variability index. To determine which measurements are close or not, we need to determine a ΔT value such that it is a compromise between the number of correlated measurements and the minimum period that we are searching for. For instance, variability indices computed in boxes of ΔT, greater than the period, return values closer to those expected for noise. Lower values of ΔT lead to higher accuracy for variability indices that use correlation measurements.
Moreover, variability indices can be used in all processes of the timeseries analysis, as discussed in previous sections. Therefore, we must find new ways to allow us to increase the precision, reliability of these indices, and their connexions with the different types of variability. In Sect. 3, we described the current tool inventory and we proposed new variability indices with new correction factors to reduce bias. In the present section, we propose new ways to increase the precision of these variability indices and how to evaluate their reliability.
4.1. Number of correlated measurements
The minimum number of measurements that are sufficient for the use of a variability index are determined by the capability of separating variable and nonvariable stars. The statistical properties like mean, standard deviation, skewness, and kurtosis are not dependent on the number of measurements, although the errors on these measurements increase greatly as the number of measurements reduces towards the minimum for each measurement. On the other hand, we need contemporary (close) measurements to use correlated variability indices. These features may change for each variability index.
Consider two cases: one with N_{s} correlated data points and the other with N_{s} of pure noise for s = 2. In the case of pure noise, the number of positive and negative correlation must be the same , while for correlated data . Using indices we can determine the minimum number of correlations necessary to separate a purely correlated signal from pure noise assuming that there is an uncertainty in the sign of some correlations. We assume that this uncertainty (or fluctuation) on the number of positive correlations given by n_{f} provides an increase in the variability index of pure noise and a decrease otherwise. So the minimum separation between correlated and uncorrelated data is given by
where n_{f} is an integer with values less than N_{2}/ 2. The minimum number of correlations must be at least five according to this relation, given a single error. The general expression is given by
where the minimum value () to validate this relation is given by
where n_{f}/N_{s} is the fractional fluctuation of positive correlated measures (f_{fluc}). This equation analytically explains the increase in precision as well as the detachment between correlated and noncorrelated data with increasing s (see Fig. 8 of Ferreira Lopes et al. 2015). with the increasing s, so correlated data becomes more easily separable from pure noise with increasing s.
4.2. False alarm probability on variability indices
The statistical significance of a value is associated with the false alarm probability (FAP), which is the probability that the observed value was caused by random fluctations. The smaller the FAP then the larger is the statistical significance of this measurement and the tolerance usually adopted is about 1%. The determination of FAP in period searches is hindered because of nonGaussian distributions, observations scattered irregularly over a long time span, the unclear meaning of the number of independent frequencies, and the manner in which these should be taken into account (e.g. Terebizh 1998; Andronov 1994; Süveges 2014). Moreover, strong peaks in the periodogram can add some unrelated periods to the real periodicities since multiple periodic variations can interfere with each other and with the periodic patterns. A more detailed discussion about how account for this will be addressed in a forthcoming paper.
Additionally, significance values may depend on the function employed to make the periodicity search. Therefore, in some cases, it indicates the use of different techniques on different types of variable stars (Templeton 2004). For instance, the periodicity search methods based on Fourier series is less sensitive to nonsinusoidal and aperiodic signals. Methods based on the analyses of variance and multiharmonic periodograms allow us to assess more complex signals more easily (e.g. Wehlau & Leung 1964; Andronov 1994, 2003; SchwarzenbergCzerny 1996; Baluev 2009, 2013). A reliable period determination is critical to determine the variability nature. From this analysis, we may determine whether there is multi or single periodicity, the signal complexity, and the weight of each contribution to the light curve. This process may be facilitated if we can first determine whether the time series has reliable variability or not.
We can consider the significance of the variability indices by comparing the null hypothesis H_{0} of the observed time series (purely noise) against the alternative H_{1}, stating that there is no correlated signal in it. One way to evaluate statistical fluctuations on variability indices is to generate a large number of test timeseries sequences by shuffling the times, called bootstrapping. Following this approach, we are able to keep part of the correlated nature of the noise intrinsic to the data, as opposed to numerical tests based on pure Gaussian noise (Ferreira Lopes et al. 2015). However, we need many iterations to provide accurate values for FAP, i.e. we need to compute the variability indices n times, implying a longer running time.
On the other hand an analytical expression of FAP for variability indices may depend on the deviation from the mean, which varies according to the survey analysed since it must depend on detector efficiency, number of measurements, magnitude, etc. However, has a weak dependence on the properties of the survey in which Eq. (15) provides a value above which correlated sources may be distinguished from noise. The only term to be determined is the fractional fluctuation in positive correlated measurements (f_{fluc} = n_{f}/N_{s}). From our results, we propose the following empirical equation:
i.e. a constant (α> 0) plus a term related to the number of correlations. The second term in this equation decays quickly to zero as N_{s} increases and it provides high cutoff values on data with few epochs. We find since the f_{fluc} for must be greater than 0 for any number of correlations. Large values of α give a more complete selection, while smaller values result in a more reliable sample. The value was established in the Sect. 4.1. For instance, β = 4α^{2} is the lower limit for s = 2. Figure 1 shows the number of sources selected using Eq. (16) for s = 2 (black lines) and s = 3 (grey lines) as a function of α values. Solid lines indicate the number of sources of WVSC1 stars, while dashed lines show the efficiency of selection (ratio of total number of sources to number of known variables). The total number of sources is normalization by the total number of WVSC1 stars (319) to give an efficiency metric (E_{tot}). Table 2 shows the number of total sources selected (E_{tot}) and the fraction of WVSC1 stars (E_{WVSC1}) for some α values. For instance, to select about 90% of WVSC1 stars we need a subsample of 3.77 × 319 stars using α = 0.30 for s = 2. On the other hand to select about 92% of WVSC1 stars, we need a subsample of 3.71 × 319 stars using α = 0.48 for s = 3.
Fig. 1
Efficiency metric (E_{tot}: the ratio of total number of sources in the selection to good known variables in WVSC1) using Eq. (16) for s = 2 (black lines) and for s = 3 (grey lines). Solid lines denote E_{WVSC1}, the fraction of the good variables in the selection, while the dashed lines denote E_{tot}. A good choice of α returns a high fraction of good variables E_{WVSC1} for a low value of the efficiency metric E_{tot}. 
Efficiency metric for some α values.
4.3. Estimate of ΔT and correlated observations
The variation of variability indices with ΔT depends on many factors such as the variability period (P), the shape of the light curve, the signal to noise, and outliers. To estimate the influence of ΔT, we simulate a pure sinusoidal variation with a period (P). Next, we compute the indices and see how changing ΔT as a function of P affects how well we can separate a sinusoidal signal from random noise.
Figure 2 shows the indices as a function of ΔT/P. The indices decreases quickly to the expected values for random variations when s = 2, while, when s = 3, remains higher than the expected noise value for all ΔT/P. This result helps us to understand and use Δ T values. For instance, if Δ T< 0.1P, we obtain large values for , clearly separated from noise and thus detect variability more easily.
The distance between two measurements for a sinusoidal variation must be much less than ΔT/P = 0.5 to provide a large number of positive correlation values and allow variable stars to be separated from noise, according our simulations (see Fig. 2), and, preferably, ΔT/P ≤ 0.1. An interval of ΔT/P = 0.5 corresponds to the Nyquist frequency, so higher frequency signals cannot be properly reconstructed through aliasing, but it is imperative to sample at a much higher frequency than the Nyquist frequency to have adequately correlated signals necessary for discrimination from noisy signals. The Running Parabola method (Andronov 1997) determines an optimal time interval to describe light curves: their Fig. 4 is very similar to our Fig. 2 because both do better when sampling a greater range of phases of periodic or semiperiodic signals.
Fig. 2
versus ΔT/P for s = 2 (black lines) and for s = 3 (grey lines). The dashed lines indicate the expected values for random variation. 
Fig. 3
Histogram of the interval between observations for the WFCAMCAL08B data; tau is the interval between any two observations regardless of filter and these are binned logarithmically. The peaks at ~10^{3} days are intervals during a ZYJHK sequence and the peaks at ~1 day and multiples of 1day are repeat observations on subsequent nights. 
We determine Δ T predominantly by the cadence of the data. For the WFCAMCAL data, a sequence of three to five filters were observed over a period of ~0.005 day, and then each pointing reobserved roughly one day later, with longer intervals because of weather or seasonal limits on the observations; see Sect 2.1. This is shown in Fig. 3, which indicates the histogram of time between subsequent observations. There is a strong peak at τ ~ 10^{3} day, which corresponds to a ZYJHK sequence with duration of ~0.005 day and a second peak at ~1 day. There are also a variety of smaller peaks at other durations, often a few days apart (bad weather, nonphotometric nights), and a few small peaks at long durations of tens or hundreds of days (field not observed because it was too close to the Sun) or on timescales of 0.01 to 0.1 days: days when many calibration fields were taken for test purposes. The best choice of ΔT should be the minimum duration that encloses the correlated observations, up to the end of the last integration. The Δ T = 0.01day is a sensible choice as it is slightly wider than the typical box size so no observation is missed and allows us to look for variables with P ≥ 0.1day, although the main sampling peak at ~1 day may be expected to limit us to P> 0.5 days from the Nyquist frequency. Since the sampling is not rigidly at oneday intervals shorter periods are possible. Since we have correlated sampling at more than 20 times the frequency of the main sampling rate, we do not place additional constraints on the range of well sampled periods.
However, if we have equally spaced data, we are severely restricted. If we have s = 2 correlations and a spacing of τ, then Δ T ≥ 2τ and P ≥ 20τ, if s = 2. Given that at least two full periods are required for a confident identification of periodic behaviour, at least 40 observations would be required to constrain a very narrow range of periods and any highfrequency behaviour with f> 1 / 20τ would be in danger of being aliased, unless we took a more relaxed constraint on the correlation strength between each pair of measurements. Of course, that would reduce the discrimination power of the index to differentiate between noise and real variables. These indices do not require periodic signals: all they require is a high degree of correlation between consecutive measurements, so quasiperiodic and semiregular variables and even transients can be selected as long as they do not have frequencies that are higher than 0.1 × the sampling rate.
Thus, correlation indices become very inefficient for equally spaced data. Some deep extragalactic surveys have observations designed to maximize depth, so the observations are taken when seeing and sky levels are best. Hence, the observation structure can be pseudocorrelated, but not on fixed timescales. In Paper II of this series we will discuss indices that work better with uncorrelated observations.
A correlated data set may be expected to have at least half of the τ values in a peak or small set of peaks (if several filters with slightly different exposure times) at around the correlation frequency and then the main sampling peaks at τ_{samp}> 20τ_{cor}.
When we consider VISTA surveys, e.g. the VVV (Minniti et al. 2010), the data are observed as pawprints, which then get coadded into tiles (Cross et al. 2012), so there are always repeat observations on a short timescale compared to repeat epochs. These observations are also observed almost contemporaneously, with some tiling patterns jumping between the same jitter on different pawprints before moving onto the next jitter^{1}, so the time between pawprints is usually shorter than the integration time of the pawprint. Therefore, these are ideal for correlated indices applied to the pawprints.
Gaia (BailerJones et al. 2013) is another mission where correlated indices are extremely valuable. The main astrometric instrument observes stars as they transit across nine strips of detectors with τ ~ 5s. Stars are then reobserved by a second field of view 2h later or on the next revolution 6h later or on longer timescales due to the orbit and precession of the spacecraft.
Fig. 4
Histograms of the number of correlations N_{s} for s = 2 and s = 3 using bins of width 1. 
5. Data analysis
5.1. Broad selection and bias
From Sects. 4.1–4.3, we can determine the main constraints on variability analysis. We consider that there is at least one incorrect correlation measurement n_{f} = 1 in each LC, therefore, we limit our analyses to sources with more than four correlation measurements, according to Eq. (14). This is the minimum number of correlation measurements adopted in our analysis. Moreover, we adopted Δ T = 0.01 days based on the duration of the ZYJHK sequences. By following these constraints, we are considering all LCs that can possibly discriminate a correlated signal from noise with periods of at least greater than 0.1 days (see Sect. 4.3) We revisited the WFCAMCAL data instead of testing the variability indices with simple sinusoidal light curves, since this gives a more realistic test with correlated observations, real noise values, and a range of variable types.
Next, we compute the K_{fi}, L_{pfc}, , FL^{(s)}, and FM^{(s)} variability indices using a multiwaveband approach, as discussed in Sect. 3.2, on the data described in Sect. 2.1. Figure 4 shows the histogram of the number of correlation N_{s}, ranging from 1 to 1467 for s = 2 and from 1 to 863 for s = 3. These numbers are different because the ZYJHK measurements are obtained within a few minutes of each other but not necessarily in all filters. Additionally, the number of correlation measurements decreases quickly for very faint objects around the detection threshold. The total baseline varies from a few months up to three years and the cadence in a single passband can be considered to be quasistochastic with rather irregular gaps (see Hodgkin et al. 2009; Cross et al. 2009; Ferreira Lopes et al. 2015, for a better discussion).
5.2. Searching for periodic variations
To search for the best period, a LombScargle periodogram (Lomb 1976; Scargle 1982) was computed for each LC. We set the lowfrequency limit (f_{0}) for each periodogram to be f_{0} = 2 /T_{tot} days^{1}, where T_{tot} is the total time spanned by the LC. The highfrequency limit was fixed to days^{1} and the periodogram size was scaled to 10^{5} elements. Initially, we compute the LombScargle periodogram independently for each broadband filter (Y, Z, J, H, and K) as well as for the chromatic light curve (as described in Ferreira Lopes et al. 2015). The use of all broadband filters allows us to find variability periods in the cases where the photometry is high quality in some filters, but not others; e.g. in some filters the object may be saturated (sometimes leading to a nondetection at the correct location), very faint and close to the detection limit, or even too faint to be detected.
For each broadband filter, as well as for the chromatic light curve, we retain the ten periods corresponding to the highest peaks. Next these periods were refined following De Medeiros et al. (2013), namely, by maximizing the ratio of the variability amplitudes to the minimum dispersion in the phase diagram given by Dworetsky (1983). The minimum value provided is the refined period value and the full width at half maximum (FWHM) of the peak in the periodogram was assumed to be its uncertainty. Finally, we use the χ^{2} test as described in Ferreira Lopes et al. (2015) to select the very best period.
Fig. 5
Correlation variability indices versus K magnitude (left diagram) and versus number of correlations (right diagram) in each panel. The maximum number of sources per pixel is shown in brackets in each panel. The black circles denote the WVSC1 sources and the solid and dashed lines indicate the value that encloses 90% and 80% of them. 
6. Results and discussions
We analyse the efficiency of variability indices for selecting variable stars in the WFCAMCAL database. We evaluate responses of these indices as a function of magnitude and the number of correlations. This study allows us to trace important remarks about the most efficient way to select variable stars. The most efficient index is the one that encloses the majority of the WVSC1 stars with the fewest stars that do not belong to the WVSC1 catalogue and are mostly misclassifications. We compute the variability indices as described in Sect. 3.2. We detail our results in an analysis of correlation variability indices that were computed using a panchromatic approach as described in Sect. 3.2. Below we present our results using stars from the WVSC1 catalogue as a comparison.
6.1. Efficiency of variability indices
Figure 5 shows the distribution of K_{fi}, L_{pfc}, , FL^{(s)}, and FM^{(s)} variability indices as a function of magnitude and the number of correlations N_{s}, for s = 2 and s = 3. The solid and dashed lines are set to the values that must be adopted if we want to select 90% and 80% of the final WVSC1 catalogue, respectively. Figure 6 shows the histogram of these indices as a function of magnitude for stars selected using the cutoff value that includes 90% of the WVSC1 catalogue.
Fig. 6
Normalized histograms of the sources selected with a constant cutoff value for (black line), and FL^{(s)} (red line), and and FM^{(s)} (blue line) indices. The E_{tot} values for each index is shown in parentheses. The upper panel shows the histograms for s = 2, while the lower panel shows the histograms for s = 3. The cutoff values were determined considering a value that includes 90% of WVSC1 stars (see Table 3). 

presents a clear separation of WVSC1 stars from the other stars for K ≤ 17 mag. The lines that appear for K ≥ 17 mag are due to the high number of sources with just a few epochs (typically N_{s}< 20). This index produces discrete values and these are more evident for a small number of epochs. The right panel shows a higher dispersion for low numbers of N_{s} as expected. Statistical fluctuations may provide high contamination in this region, even though the number of correlations are above the minimum number that allows us to discriminate between them, according to Eq. (14). The index shows a similar behaviour although it displays a greater separation of WVSC1 stars from the other stars.

is equivalent to two previous indices under some constraints: it is equal J_{WS} for s = 2 and equivalent to when the correlations are obtained in different filters. J_{WS} has been used in the selection criteria for several surveys (e.g. Christiansen et al. 2008; McCommas et al. 2009; MoralesCalderón et al. 2009; Bhatti et al. 2010; Shappee & Stanek 2011; Pasternacki et al. 2011). provides a higher selection efficiency than previous indices and it performs combinations among measurements in ΔT intervals rather than across wavelengths. The indices present a clear separation of WVSC1 stars from other stars. If we assume a constant value of as a selection criterion, we observe that most of the nonvariable stars selected are faint stars. The separation between WVSC1 and other stars is clearer for s = 3 than for s = 2. Indeed, the number of sources preselected to enclose 90% of WVSC1 stars are about 30% fewer for s = 3 than s = 2. Two main features that are expected with increasing s are observed: the increase in the number of stars with negative indices values and a better discrimination between variable and nonvariable stars.

calculates the median of the correlation values in contrast to the mean encapsulated by . Both mean and median values are used to determine the central or typical value in a statistical distribution. The weight of the outliers is reduced in the median compared to the mean. Outliers in photometric data are commonly associated with variations in brighter stars caused by detector nonlinearity and saturation. On the other hand, the increasing dominance of correlated noise is expected for faint stars as the errors become dominated by the sky noise, rather than photon statistics. Figure 6 shows an underestimation of and indices implying on increase of misclassification. Indeed, the number of stars is higher of than for faint stars and against for brighter stars. Figure 6 shows an increase in the fractions of stars selected at both the bright and faint magnitudes for and indices, implying an increase in the misclassification rate. The misclassification rate is higher for faint stars when using than when using , and viceversa for bright stars. Therefore stars that match both criteria should have a lower misclassification rate at both the bright and faint ends, so agreement between these indices should be considered a selection criteria. Using larger s values gives less weight to outliers and partiallycorrelated noise, and this leads to a better estimations of the centre of the distribution for both the mean and median. Therefore, the efficiency of and increases and they tend to have similar E_{tot} values for different indices for higher s, especially if sources with small numbers of correlations are removed, as observed in Table 3 for s = 3 and N_{s}> 20. Meanwhile, the are the best indices to perform a selection of variable stars, when we only consider higher s values as well as only those sources with N_{s}> 20 (see Table 3).

FL^{(s)} and FM^{(s)} provide better efficiency values (E_{tot}) among those indices computed from correlation magnitudes (see Table 3). The F^{(s)} factor provides a concentration of nonvariables with values around zero as well as a reduction in the spread of bright sources. We observe a reduction of more than 300% on the number of sources preselected by and indices for s = 2 and about 20% for s = 3 (see Fig. 6). The large reduction is not found for s = 3 because these indices become more accurate with increasing s values and this therefore decreases the weight of F^{(s)}. On the other hand, only a slight decrease in E_{tot} was observed when we use as correction factor K_{WS}/ 0.789 (see Table 3). This factor is used to build the L_{WS} Stetson index (Stetson 1996), which can be expressed by .
Fig. 7
LCs and phase diagrams of C1 catalogue. The identifiers and periods are shown above each panel. 
Fig. 8
Examples of LCs shown intrumental bias. The identifiers are shown above each panel. 
Efficiency metric (E_{tot}) for variability indices analysed to select 90% and 80% of WVSC1 stars for N^{(s)}> 4 and N^{(s)}> 20, respectively.
In summary, the correlation variability indices discriminate between uncorrelated and correlated data, which is a typical feature of variable stars. We can enclose almost all WVSC1 stars in a sample with fewer than about 1500 sources. These indices still present a low efficiency for discrimination at the faint end or bright end. The root cause in each case may well be different: excess high values for bright sources are due to temporal saturation and few epochs and increases in measurement uncertainties for the faint stars, which makes the variability indices more sensitive to statistical fluctuations and systematics.
Figure 6 shows the histograms of sources selected for a constant cutoff value for the higher ranked indices. These indices return most of the WVSC1 stars with fewer nonvariable stars (see Table 3). However, these indices have a clear bias on selection from the point of view of magnitude since they are not evenly distributed along all K values. For s = 2, we observe that (black line) and FL^{(s)} (red line) display a prominent overselection for faint stars while FM^{(s)} (blue line) is biased for both brighter and faint stars. On the other hand, the three indices have similar bias for s = 3. This kind of result indicates that the efficiency of these indices may be similar for higher s values since the difference in E_{tot} between them is smaller for s = 3. Indeed, more than 60% of stars with k> 17.5 have N_{s}< 20. This low detection efficiency for the instrument in this region gives a maximum magnitude limit where we can sensibly use these indices.
On the other hand, if we use the analytical expression for f_{fluc,s} (see Sect. 4.2) we obtain a higher efficiency. This function allows us to analyse stars with lower N_{s} values (of course, above the minimum number of correlations N_{s}> 4) with a similar efficiency such as that obtained for FL^{(s)} and FM^{(s)} considering a more strict selection, i.e. N_{s}> 20. Using f_{fluc,s} we can enclose a greater number of WVSC1 stars with fewer contaminating sources selected (see Table 2). f_{fluc,s} is an empirical relation and it may be adapted according to its purpose.
6.2. Searching for variable stars
We use the α values (see Sect.4.2) to select at least 90% of the WVSC1 stars. Therefore, we adopt α = 0.30 for s = 2 and α = 0.46 for s = 3, which return a combined sample of 1133 variable stars candidates that were not included in the WVSC1 catalogue. The periods and their uncertainties were computed according to Sect. 5.2 and we visually inspected each star. Uncertainties in periods are of the order of 10%. According to our analysis, these stars can be divided into five main groups: (a) variable stars measured in few epochs; (b) variable stars with low signaltonoise and low confidence periods; (c) variable stars with amplitudes that are close to the noise level; (d) aperiodic variable stars or variables of such long periods that these data were insufficient for deriving them; and (e) false variables due to instrumental or reduction problems.
Our procedure has resulted in a catalogue with four new sources (C1). Figure 7 shows the C1 stars that may be included in [a, b, c] groups with variability indices’ values near to those expected from noise. Figure 8 shows some instrumental variations that can appear to give false positives for variable stars, including false variables due to instrumental saturation (left and middle panels) and data reduction problems (right panel). The separation between stars of abc and de groups is not possible using only variability indices. Discriminating between these sources using statistical analyses will be discussed in a forthcoming paper. Table 4 lists coordinates, periods, mean magnitudes, and the number of epochs in each filter for this sample. WVSC336 has a period estimate of 192 days, but its true period may be higher since we do not observe repeatable phases close to the minimum. This star was previously identified as a young stellar object at the Serpens molecular cloud (Oliveira et al. 2009). The set of stars identified in the WVSC1 catalogue in the dark nebula (Ferreira Lopes et al. 2015) as well as WVSC336 have a peculiar variability signature. These are almost certainly young stellar objects (YSOs) and they will be investigated in forthcoming works.
6.3. Twodimensional view of correlated data
Periodic objects in the WFCAM Variable Star Catalogue (C1).
Fig. 9
FL^{(s)}, FM^{(s)}, and logarithmic FL^{(s)} × N_{s} versus the variability indices for stars selected by f_{fluc} expression, using α = 0.30 for s = 2 and α = 0.46 for s = 3 (see Sect. 6.2) for s = 2 (left panels) and s = 3 (right panels). The WVSC1 stars are indicates with open black circles and the colours are set with the K magnitude. Objects shown with crosses do not have a Kband magnitude because they are either too faint or saturated. These objects have measurements in other bands. 
The f_{fluc} defined for variability indices, using the expression in Eq. (16), presents the best efficiency for selecting WVSC1 stars (see Tables 2 and 3). However, it returns a lot of false positives when we have few correlations. The combination of indices based on correlation signals () with those based with correlation values (FL^{(s)} and FM^{(s)}) may provide twodimensional view of correlated data.
Figure 9 shows FL^{(s)}, FM^{(s)}, and logarithmic of FL^{(s)} × N_{s} as a function of (named KFLs diagram) for s = 2 (left panels) and s = 3 (right panels). The KFLs diagrams allow us to discriminate between two main groups; (G1) faint stars where about 90% have due to a small number of correlations; (G2) is composed of stars with K ≤ 16.5, which includes 91% of WVSC1 stars. These groups display a clear separation if you multiply FL^{(s)} by the number of correlations (N_{s}) where G1 has log FL^{(s)} × N_{(s)}< 1.5 and G2 is delimited by log FL^{(s)} × N_{s}> 1.5. However, the last diagram is biased for N_{s} and so comparisons between different sources are difficult to make.
Stars included in G2 with low values of are mainly bright, saturated stars showing a low level of variability and false positive variations due to instrumental bias. The boundary between WVSC1 and other stars is not well defined in this diagram. Nevertheless, the KFLs diagram helps us enclose about 90% of WVSC1 stars with E_{tot} ~ 2 in G2.
7. Conclusions
From our results we can conclude that analysis of databases with fewer than four correlated measurements is not possible using and related indices (using factor F^{(s)}) when we consider the case of one wrong value. In these cases, we may use or indices to discriminate correlated from uncorrelated data. On the other hand, when we have enough correlation measurements, the variability indices provide unique features to carry out time domain analysis, which allows us to define a general method that can be applied to any survey with correlated epochs: the presents a low sensitivity to outliers and does not undergo strong variations with magnitude In addition, the has a clear interpretation, a theoretical definition of a value expected for noise, and a welldefined range of values from 0 to 1, and this range does not depend on the error bars, so a selection can be made that is independent of the number of measurements. Therefore, it may be used as a universal method to select correlated variations. Moreover, KFLs diagrams display two unique variability features related to intensity and number of positive correlated measurements, which allow us to improve E_{tot} by at least 40% (see Sect. 6.3).
The FL^{(s)} and FM^{(s)} values in the KFLs diagrams (see Fig. 9) may vary for different surveys. However, is not strongly dependent on instrumental features and its cutoff values can be adopted as universal values as can f_{fluc,s}. The f_{fluc,s} values are used to discriminate between correlated variations (variable stars) and noncorrelated variables (noise) and the fraction of variables is unbiased with respect to magnitude of observed wavelength. After selecting the variable star candidates using f_{fluc} we may improve the selection with the KFLs diagrams. Next, we can remove G1 stars and use levels of significance of some periodicity methods to determine which of these display periodic variations.
This work is the first in a series that make a detailed analysis of all processes of variable photometric data. In this first paper, we have investigated which indices give the most efficient selection when we have correlated observations. In the second paper of this series, we will consider uncorrelated observations and determine which are the best indices for selecting variable stars. In the coming years, we will apply these methods to very large surveys of the Milky Way, e.g. VVV, PanSTARRS, Gaia, and in the longer term to LSST. These studies will provide fast and reliable classifications of variable stars within the Milky Way, which will improve our understanding of the evolution of different stellar populations and thus the formation of structures within our Galaxy.
Acknowledgments
C. E. F. L. acknowledges a postdoctoral fellowship from the CNPq. N. J. G. C. acknowledges support from the UK Science and Technology Facilities Council. We would like to thank Lorenzo Rimoldini for his helpful comments which have improved this paper.
References
 Alard, C., & Lupton, R. H. 1998, ApJ, 503, 325 [NASA ADS] [CrossRef] [Google Scholar]
 Andronov, I. L. 1994, Odessa Astronomical Publications, 7, 49 [Google Scholar]
 Andronov, I. L. 1997, A&AS, 125, 207 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Andronov, I. L. 2003, in Interplay of Periodic, Cyclic and Stochastic Variability in Selected Areas of the HR Diagram, ed. C. Sterken, ASP Conf. Ser., 292, 391 [Google Scholar]
 Baglin, A., Auvergne, M., Barge, P., et al. 2007, in Fifty Years of Romanian Astrophysics, eds. C. Dumitrache, N. A. Popescu, M. D. Suran, & V. Mioc, AIP Conf. Ser., 895, 201 [Google Scholar]
 BailerJones, C. A. L., Andrae, R., Arcay, B., et al. 2013, A&A, 559, A74 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Baluev, R. V. 2009, MNRAS, 395, 1541 [NASA ADS] [CrossRef] [Google Scholar]
 Baluev, R. V. 2013, MNRAS, 436, 807 [NASA ADS] [CrossRef] [Google Scholar]
 Becker, A. C., Wittman, D. M., Boeshaar, P. C., et al. 2004, ApJ, 611, 418 [NASA ADS] [CrossRef] [Google Scholar]
 Bhatti, W. A., Richmond, M. W., Ford, H. C., & Petro, L. D. 2010, ApJS, 186, 233 [NASA ADS] [CrossRef] [Google Scholar]
 Blomme, J., Debosscher, J., De Ridder, J., et al. 2010, ApJ, 713, L204 [Google Scholar]
 Borucki, W. J., Koch, D., Basri, G., et al. 2010, Science, 327, 977 [NASA ADS] [CrossRef] [PubMed] [Google Scholar]
 Christiansen, J. L., Derekas, A., Kiss, L. L., et al. 2008, MNRAS, 385, 1749 [NASA ADS] [CrossRef] [Google Scholar]
 Cincotta, P. M., Mendez, M., & Nunez, J. A. 1995, ApJ, 449, 231 [NASA ADS] [CrossRef] [Google Scholar]
 Clarke, D. 2002, A&A, 386, 763 [Google Scholar]
 Corwin, T. M., Sumerel, A. N., Pritzl, B. J., et al. 2006, AJ, 132, 1014 [NASA ADS] [CrossRef] [Google Scholar]
 Cross, N. J. G., Collins, R. S., Hambly, N. C., et al. 2009, MNRAS, 399, 1730 [NASA ADS] [CrossRef] [Google Scholar]
 Cross, N. J. G., Collins, R. S., Mann, R. G., et al. 2012, A&A, 548, A119 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Damerdji, Y., Klotz, A., & Boër, M. 2007, AJ, 133, 1470 [NASA ADS] [CrossRef] [Google Scholar]
 De Medeiros, J. R., Ferreira Lopes, C. E., Leão, I. C., et al. 2013, A&A, 555, A63 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Debosscher, J., Sarro, L. M., Aerts, C., et al. 2007, A&A, 475, 1159 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Deeming, T. J. 1975, Ap&SS, 36, 137 [NASA ADS] [CrossRef] [MathSciNet] [Google Scholar]
 Dubath, P., Rimoldini, L., Süveges, M., et al. 2011, MNRAS, 414, 2602 [NASA ADS] [CrossRef] [Google Scholar]
 Dupuy, D. L., & Hoffman, G. A. 1985, International AmateurProfessional Photoelectric Photometry Communications, 20, 1 [NASA ADS] [Google Scholar]
 Dworetsky, M. M. 1983, MNRAS, 203, 917 [NASA ADS] [Google Scholar]
 Eyer, L. 2006, in Astrophysics of Variable Stars, eds. C. Aerts, & C. Sterken, ASP Conf. Ser., 349, 15 [Google Scholar]
 Ferreira Lopes, C. E., Dekany, I., Catelan, C., et al. 2015, A&A, 573, A100 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Gössl, C. A., & Riffeser, A. 2002, A&A, 381, 1095 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Hambly, N. C., Collins, R. S., Cross, N. J. G., et al. 2008, MNRAS, 384, 637 [NASA ADS] [CrossRef] [Google Scholar]
 Hodgkin, S. T., Irwin, M. J., Hewett, P. C., & Warren, S. J. 2009, MNRAS, 394, 675 [NASA ADS] [CrossRef] [Google Scholar]
 Hoffman, D. I., Harrison, T. E., & McNamara, B. J. 2009, AJ, 138, 466 [NASA ADS] [CrossRef] [Google Scholar]
 Huber, D., Ireland, M. J., Bedding, T. R., et al. 2012, ApJ, 760, 32 [NASA ADS] [CrossRef] [Google Scholar]
 Irwin, M. J., Lewis, J., Hodgkin, S., et al. 2004, in Optimizing Scientific Return for Astronomy through Information Technologies, ed. P. J. Quinn, & A. Bridger, SPIE Conf. Ser., 5493, 411 [Google Scholar]
 Kaiser, N., Aussel, H., Burke, B. E., et al. 2002, in Survey and Other Telescope Technologies and Discoveries, eds. J. A. Tyson, & S. Wolff, SPIE Conf. Ser., 4836, 154 [Google Scholar]
 Lafler, J., & Kinman, T. D. 1965, ApJS, 11, 216 [NASA ADS] [CrossRef] [Google Scholar]
 Lawrence, A., Warren, S. J., Almaini, O., et al. 2007, MNRAS, 379, 1599 [NASA ADS] [CrossRef] [MathSciNet] [Google Scholar]
 Lomb, N. R. 1976, Ap&SS, 39, 447 [NASA ADS] [CrossRef] [Google Scholar]
 McCommas, L. P., Yoachim, P., Williams, B. F., et al. 2009, AJ, 137, 4707 [NASA ADS] [CrossRef] [Google Scholar]
 Minniti, D., Lucas, P. W., Emerson, J. P., et al. 2010, New Astron., 15, 433 [NASA ADS] [CrossRef] [Google Scholar]
 MoralesCalderón, M., Stauffer, J. R., Rebull, L., et al. 2009, ApJ, 702, 1507 [NASA ADS] [CrossRef] [Google Scholar]
 Oliveira, I., Merín, B., Pontoppidan, K. M., et al. 2009, ApJ, 691, 672 [NASA ADS] [CrossRef] [Google Scholar]
 Pakštienė, E. 2003, Balt. Astron., 12, 584 [NASA ADS] [Google Scholar]
 Pasternacki, T., Csizmadia, S., Cabrera, J., et al. 2011, AJ, 142, 114 [NASA ADS] [CrossRef] [Google Scholar]
 PazChinchón, F., Leão, I. C., Bravo, J. P., et al. 2015, ApJ, 803, 11 [CrossRef] [Google Scholar]
 Pojmanski, G. 2002, Acta Astron., 52, 397 [NASA ADS] [Google Scholar]
 Pollacco, D. L., Skillen, I., Collier Cameron, A., et al. 2006, PASP, 118, 1407 [NASA ADS] [CrossRef] [Google Scholar]
 Renner, S., Rauer, H., Erikson, A., et al. 2008, A&A, 492, 617 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Richards, J. W., Starr, D. L., Butler, N. R., et al. 2011, ApJ, 733, 10 [NASA ADS] [CrossRef] [Google Scholar]
 Sarro, L. M., Debosscher, J., López, M., & Aerts, C. 2009, A&A, 494, 739 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Scargle, J. D. 1982, ApJ, 263, 835 [NASA ADS] [CrossRef] [Google Scholar]
 SchwarzenbergCzerny, A. 1996, ApJ, 460, L107 [NASA ADS] [CrossRef] [Google Scholar]
 Shappee, B. J., & Stanek, K. Z. 2011, ApJ, 733, 124 [NASA ADS] [CrossRef] [Google Scholar]
 Shin, M.S., Sekora, M., & Byun, Y.I. 2009, MNRAS, 400, 1897 [NASA ADS] [CrossRef] [Google Scholar]
 Stellingwerf, R. F. 1978, ApJ, 224, 953 [NASA ADS] [CrossRef] [Google Scholar]
 Stetson, P. B. 1996, PASP, 108, 851 [NASA ADS] [CrossRef] [Google Scholar]
 Süveges, M. 2014, MNRAS, 440, 2099 [NASA ADS] [CrossRef] [Google Scholar]
 Templeton, M. 2004, J. Am. Assoc. Variable Star Observers, 32, 41 [Google Scholar]
 Terebizh, V. Y. 1998, Astrophysics, 41, 198 [NASA ADS] [CrossRef] [Google Scholar]
 Udalski, A. 2003, Acta Astron., 53, 291 [NASA ADS] [Google Scholar]
 Walkowicz, L. M., & Basri, G. S. 2013, MNRAS, 436, 1883 [NASA ADS] [CrossRef] [Google Scholar]
 Wehlau, W., & Leung, K.C. 1964, ApJ, 139, 843 [NASA ADS] [CrossRef] [Google Scholar]
 Welch, D. L., & Stetson, P. B. 1993, AJ, 105, 1813 [NASA ADS] [CrossRef] [Google Scholar]
 Wozniak, P. R. 2000, Acta Astron., 50, 421 [NASA ADS] [Google Scholar]
 Woźniak, P. R., Williams, S. J., Vestrand, W. T., & Gupta, V. 2004, AJ, 128, 2965 [NASA ADS] [CrossRef] [Google Scholar]
 Yuan, F., & Akerlof, C. W. 2008, ApJ, 677, 808 [NASA ADS] [CrossRef] [Google Scholar]
 Zechmeister, M., & Kürster, M. 2009, A&A, 496, 577 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Appendix A: Examples of panchromatic indices
Consider multiwavelength data with measurements obtained contemporaneously in F_{1}F_{2}F_{3}F_{4} filters, given by; F_{1} = [a_{1},a_{2},··· ,a_{n}], F_{2} = [b_{1},b_{2},··· ,b_{n}], F_{3} = [c_{1},c_{2},··· ,c_{n}], and F_{4} = [d_{1},d_{2},··· ,d_{n}]. We build a vector with all
measurements U = [u_{11},u_{12},u_{13},u_{14},··· ,u_{n1},u_{n2},u_{n3},u_{n4}] where the subindices set the ith measurement and the respective filter. Thereafter, we can compute the of order 2, by expanding Eq. (1): (A.1)and for s = 3, (A.2)where the s last subindices of each Λ function set the filters that are being combined. Indeed the procedure proposed in Sect. 3.2 provides an easier and more general way to compute the panchromatic correlated indices.
The is equivalent to a geometric mean. Meanwhile the term in the sroot is always positive because of the modulus function. On the other side, and of order 2 is greater than and of order 3 if all Λ values are positive. This can be proved using the inequality of arithmetic and geometric means. Consider a set of multiwavelength data obtained at same time given by F_{1}F_{2}F_{3} filters or U = [u_{11},u_{12},u_{13}]. We can write (A.3)This equation is equivalent to the inequality of arithmetic and geometric means since we can make the substitutions , , and . Therefore the panchromatic variability indices must decrease with increasing s values if we assume that the most important contribution in the sum are those in which all Λ values are positive or all are negative.
All Tables
Efficiency metric (E_{tot}) for variability indices analysed to select 90% and 80% of WVSC1 stars for N^{(s)}> 4 and N^{(s)}> 20, respectively.
All Figures
Fig. 1
Efficiency metric (E_{tot}: the ratio of total number of sources in the selection to good known variables in WVSC1) using Eq. (16) for s = 2 (black lines) and for s = 3 (grey lines). Solid lines denote E_{WVSC1}, the fraction of the good variables in the selection, while the dashed lines denote E_{tot}. A good choice of α returns a high fraction of good variables E_{WVSC1} for a low value of the efficiency metric E_{tot}. 

In the text 
Fig. 2
versus ΔT/P for s = 2 (black lines) and for s = 3 (grey lines). The dashed lines indicate the expected values for random variation. 

In the text 
Fig. 3
Histogram of the interval between observations for the WFCAMCAL08B data; tau is the interval between any two observations regardless of filter and these are binned logarithmically. The peaks at ~10^{3} days are intervals during a ZYJHK sequence and the peaks at ~1 day and multiples of 1day are repeat observations on subsequent nights. 

In the text 
Fig. 4
Histograms of the number of correlations N_{s} for s = 2 and s = 3 using bins of width 1. 

In the text 
Fig. 5
Correlation variability indices versus K magnitude (left diagram) and versus number of correlations (right diagram) in each panel. The maximum number of sources per pixel is shown in brackets in each panel. The black circles denote the WVSC1 sources and the solid and dashed lines indicate the value that encloses 90% and 80% of them. 

In the text 
Fig. 6
Normalized histograms of the sources selected with a constant cutoff value for (black line), and FL^{(s)} (red line), and and FM^{(s)} (blue line) indices. The E_{tot} values for each index is shown in parentheses. The upper panel shows the histograms for s = 2, while the lower panel shows the histograms for s = 3. The cutoff values were determined considering a value that includes 90% of WVSC1 stars (see Table 3). 

In the text 
Fig. 7
LCs and phase diagrams of C1 catalogue. The identifiers and periods are shown above each panel. 

In the text 
Fig. 8
Examples of LCs shown intrumental bias. The identifiers are shown above each panel. 

In the text 
Fig. 9
FL^{(s)}, FM^{(s)}, and logarithmic FL^{(s)} × N_{s} versus the variability indices for stars selected by f_{fluc} expression, using α = 0.30 for s = 2 and α = 0.46 for s = 3 (see Sect. 6.2) for s = 2 (left panels) and s = 3 (right panels). The WVSC1 stars are indicates with open black circles and the colours are set with the K magnitude. Objects shown with crosses do not have a Kband magnitude because they are either too faint or saturated. These objects have measurements in other bands. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.