Issue 
A&A
Volume 627, July 2019



Article Number  A120  
Number of page(s)  11  
Section  Numerical methods and codes  
DOI  https://doi.org/10.1051/00046361/201935560  
Published online  12 July 2019 
Discretetime autoregressive model for unequally spaced timeseries observations
^{1}
Departmento de Matemáticas, Facultad de Ciencia, Universidad de Santiago de Chile, Av. Libertador Bernardo O’Higgins 3663. Estacion Central, Santiago, Chile
email: susana@mat.puc.cl
^{2}
Departmento de Estadística, Facultad de Matemáticas, Pontificia Universidad Católica de Chile, Av. Vicuña Mackenna 4860, 7 820436 Macul, Santiago, Chile
^{3}
Millennium Institute of Astrophysics, Santiago, Chile
^{4}
MaxPlanckInstitut für Astronomie, Heidelberg, Germany
^{5}
Faculty of Engineering and Sciences, Universidad Adolfo Ibañez, Diagonal Las Torres 2700, Peñalolén, Santiago, Chile
Received:
27
March
2019
Accepted:
14
June
2019
Most timeseries models assume that the data come from observations that are equally spaced in time. However, this assumption does not hold in many diverse scientific fields, such as astronomy, finance, and climatology, among others. There are some techniques that fit unequally spaced time series, such as the continuoustime autoregressive moving average (CARMA) processes. These models are defined as the solution of a stochastic differential equation. It is not uncommon in astronomical time series, that the time gaps between observations are large. Therefore, an alternative suitable approach to modeling astronomical time series with large gaps between observations should be based on the solution of a difference equation of a discrete process. In this work we propose a novel model to fit irregular time series called the complex irregular autoregressive (CIAR) model that is represented directly as a discretetime process. We show that the model is weakly stationary and that it can be represented as a statespace system, allowing efficient maximum likelihood estimation based on the Kalman recursions. Furthermore, we show via Monte Carlo simulations that the finite sample performance of the parameter estimation is accurate. The proposed methodology is applied to light curves from periodic variable stars, illustrating how the model can be implemented to detect poor adjustment of the harmonic model. This can occur when the period has not been accurately estimated or when the variable stars are multiperiodic. Last, we show how the CIAR model, through its state space representation, allows unobserved measurements to be forecast.
Key words: methods: statistical / methods: data analysis / stars: general
© ESO 2019
1. Introduction
Timeseries theory is vast and provides a myriad of methods to model the serial correlation of a temporal sequence. Most of these methods assume that the observed values are measured regularly in time. However, there are many cases where this assumption is not valid. When this happens, we say that the time series is sampled irregularly.
An irregular time series is defined as a real valued sequence measured at observational times t_{1}, …, t_{n}, such that the sequence t_{1}, …, t_{N} is strictly increasing and the distance between consecutive times t_{j} − t_{j − 1} is not constant.
Irregular time series can be observed in many disciplines. For example, natural disasters such as earthquakes do not occur regularly in time. Paleoclimate data are often sampled irregularly due to the difficulty in obtaining historical information. In astronomy, objects can be observed irregularly in time because of, for example, the dependency of optical telescopes on clear skies.
Analyzing irregular time series is challenging because there are still few robust statistical tools available. In astronomy, some efforts have been made that attempt to model irregular time series. One approach is to transform the irregular times into regular ones by interpolation (Rehfeld et al. 2011). Other approaches implement Gaussian processes to fit the light curves in the time domain (ForemanMackey et al. 2017), but this solution can be computationally expensive. In some cases, regular timeseries models have also been used to model astronomical data (for a review of such methods see e.g., Feigelson et al. 2018). Alternatively, the CARMA family of models has been popular for fitting astronomical time series (Kelly et al. 2014), however these models are defined as the solution of a differential stochastic equation, underlying the assumption of small time gaps between observations. Therefore, we consider it important to develop alternative models that can fit irregularly spaced time series under the assumption of discrete time gaps.
In Eyheramendy et al. (2018) we introduced a new model called the Irregular Autoregressive (IAR) model to fit unequally spaced time series. The IAR model is a discrete representation of the continuous autoregressive model of order 1 (CAR(1)), which is strictly stationary and ergodic. The IAR has some extra flexibility when compared with the CAR(1). In the model assumptions, the IAR can assume distributions other than Gaussian^{1}.
The IAR model has two parameters that need to be estimated from the observed series. One parameter represents the variance of the process and the other one the value of the autocorrelation function (ACF). A limitation of the IAR model (and also of the CAR(1) model) is that the ACF derived from the estimated model parameters – via Eq. (1) – is positive only. This is a limitation that the regular autoregressive model does not have; the latter can detect both positive and negative values of the ACF.
Negative values of the ACF appear often in some disciplines. For instance, they have been detected in some physical phenomena, such as the velocity for hard spheres (e.g., Williams et al. 2006; Alder & Wainwright 1970). In financial time series, it is common to find significant negative autocorrelation for weekly and monthly stock returns (further discussion can be found in Sewell 2011). Negative values of the ACF are also generally observed in stocks with a high trading frequency (e.g., Conrad et al. 1994; Campbell et al. 1997).
Wellknown examples of time series with negative values for the ACF are the antipersistent processes, which are characterized as having negative values of the ACF for all positive lags (Bondon & Palma 2007). One of the best known antipersistent processes is the Kolmogorov energy spectrum of turbulence (Gao et al. 2007). There are several examples of antipersistent processes in meteorology. For instance, Ausloos & Ivanova (2001) detect antipersistence in the fluctuations of the Southern Oscillation Index, such as the sea level pressure. Carvalho et al. (2007) also detected antipersistence in temperature anomalies. Other examples are in financial data; for example, the electricity prices in some Canadian provinces show an antipersistent behavior (e.g., Uritskaya & Uritsky 2015).
The problem of detecting negative values of the ACF in irregularly sampled time series has scarcely been addressed in the literature. Chan & Tong (1987) showed that a discretetime regular autoregressive model of order 1 (AR(1)) with negative coefficient can always be embedded in a suitably chosen continuoustime ARMA(2,1) process, but this is not necessarily a parsimonious solution. Alternatively, when an irregular time series has an antipersistent behavior it can be fitted by the continuoustime ARFIMA (CARFIMA) process (Tsai 2009) with an intermediate memory, that is, the Hurst parameter H is such that 0 < H < 1/2. In this work we propose another alternative that corresponds to an extension of the IAR model. We call this model the Complex Irregular Autoregressive (CIAR) model because it needs complex numbers to represent the series.
Complex stochastic processes have been addressed by several authors, such as Miller (1974) and Picinbono & Bondon (1997). These processes are often used in signalprocessing analyses since for example in telecommunication systems it is common to find complex signals. In addition, Dubois & Glanz (1986) and Sekita et al. (1991) introduced a complex extension of a regular autoregressive model, where they proposed to use the coefficients of these models for shape recognition. Furthermore, Martin (1999) proposed a method to estimate the parameters of the CAR(1) model in a complex time series. However, in these studies it is assumed that both the real and the imaginary part of the complex process are observed. The model proposed here assumes that only the real part of the process is observed and the imaginary part is a latent process.
In this study, we apply the CIAR model to astronomical data, but the model can be applied in any other field in which irregular time series are available. In astronomy, the analysis of the temporal behavior of variable stars, transients, or supernovae attracts significant interest because several properties of the objects can be obtained from the analysis of their light curves (e.g., Richards et al. 2011; Kelly et al. 2014; Elorrieta et al. 2016).
In particular, in the analysis of variable stars the temporal modeling of the light curves is an important step to classify them. From a harmonic model fitted to the light curves of periodic variable stars, several features are extracted which describe the physical behavior of a specific class of variable stars. We implement the CIAR model on the residuals of the harmonic model fitted on variable stars from OGLE (Udalski et al. 1999) and HIPPARCOS (Perryman et al. 1997). Under this scenario, our aim is to assess whether the harmonic model is capable of describing the temporal structure of the light curves of variable stars or some autocorrelation remains on the residuals.
The structure of this paper is as follows. In Sect. 2 we present the irregular discrete time series models: the IAR model proposed by Eyheramendy et al. (2018) and the CIAR model. In both cases the maximum likelihood estimation procedure is described. In the case of the CIAR model, we also provide expressions for forecasting unobserved measurements. In Sect. 3 we assess the variance and bias of the estimated parameters of the CIAR model using Monte Carlo simulations. Furthermore, we perform simulations in order to compare the performance in fitting negative values of the ACF of wellknown time series models and the CIAR model. In Sect. 4 we apply the CIAR model to light curves of variable stars from the OGLE and HIPPARCOS surveys. We also implement the CIAR model on an AGN to demonstrate the forecasting procedure. We compare the results obtained with the IAR model and illustrate some applications for these data. The article ends with a discussion and proposals for future work in Sect. 5.
2. Methods
2.1. Irregular Autoregressive (IAR) model
Letting {y_{tj}} be a time series observed at irregular times, where {t_{j}} is an increasing sequence of observational times for j = 1, …, n, the irregular autoregressive (IAR) process is defined as
where ϕ is the parameter of the ACF and ε_{tj} is a white noise sequence^{2} with zero mean and unit variance. We note that
The autocovariance function between two observational times, s and t, with s < t, is given by , and the ACF, . Therefore, if 0 < ϕ < 1 the sequence {y_{tj}} corresponds to a secondorder or weakly stationary process, that is, the time series has constant mean and finite second moment, and possesses an autocovariance function.
Furthermore, under some regularity conditions, the process {y_{tj}} is strictly stationary and ergodic (Eyheramendy et al. 2018).
We note that the IAR process is an extension of the regular autoregressive process. If t_{j} − t_{j − 1} = 1 is assumed, the IAR process becomes the autoregressive model of order 1 (AR(1)). Also, the IAR process is equivalent to the continuous autoregressive process of order 1 (CAR(1)) when a Gaussian distribution is assumed on the white noise sequence ε_{tj}. However, the IAR process is more flexible since it allows also nonGaussian data (Eyheramendy et al. 2018).
The finite past predictor of the process at time t_{j} is given by,
where the initial value is . Consequently, e_{t1} = y_{t1} and . Furthermore, is the innovation with variance .
The estimation of the model parameters can be performed by maximum likelihood. Minus the loglikelihood of the process when a Gaussian distribution is assumed on ε_{tj} is given by
For other distributional assumptions it can be written similarly. We can obtain the maximum likelihood estimator of by directly maximizing the loglikelihood (3),
but it is not possible to find a closed form expression for the maximum likelihood estimator of ϕ. However, iterative methods can be used.
A drawback of this model is that ϕ can only take values in the interval Alder & Wainwright (1970), since a negative ϕ to the power of a real (and not integer) number is a complex number. Therefore, the IAR model only allows us to estimate positive values of the autocorrelation. To detect negative values of the ACF, the IAR model must be extended. In the following section we introduce the CIAR which allows for negative as well as positive autocorrelation to be modeled.
2.2. Complex Irregular Autoregressive (CIAR) model
To derive a complex extension of model (1), we follow the approach of Sekita et al. (1991), which builds a complex autoregressive model for regular times. Supposing that x_{tj} is a complex valued sequence, such that, x_{tj} = y_{tj} + iz_{tj} ∀j = 1, …, n, and likewise, ϕ = ϕ^{R} + iϕ^{I} is the complex coefficient of the model and is a complex white noise, we define the CIAR process as,
where and . is the modulus of a complex number. We assume that only the real part y_{tj} is observed and that the imaginary part z_{tj} is a latent process. In addition, and , the real and imaginary part of ε_{tj}, respectively, are assumed to be independent with zero mean and positive variance and , respectively, where c is a fixed parameter that takes values in ℝ^{+}. Generally, we assume c = 1, and the initial values are and . In the following lemma we state some of the properties of this process.
Lemma 1. Consider the CIAR process x_{tj} described by Eq. (5). Define , and ρ_{k} as the variance, autocovariance, and autocorrelation, respectively, of the process x_{tj}. Subsequently, the expected value, the variance, the autocovariance, and autocorrelation of the process respectively satisfy

(a)
𝔼(x_{tj})=0,

(b)

(c)

(d)
,
where Δ_{k} = t_{j + k} − t_{j} denotes the time differences between the observational times t_{j + k} and t_{j}. In addition, is the complex conjugate of x_{tj}. See Appendix A for a proof of this lemma.
If ϕ=ϕ^{R} + iϕ^{I} < 1, the results on Lemma 1 above show that the complex sequence x_{tj} is a weakly stationary process. We note that the ACF ρ_{k} of the CIAR process decays at a rate ϕ^{tj + k − tj} (the socalled exponential decay). This autocorrelation structure is different to an antipersistent or intermediate memory CARFIMA process, since the ACF of the latter decays more slowly than an exponential decay. Thus, although both models can fit irregular time series with negative values of the ACF, the appropriate use of these models will depend on the correlation structure of the data. To make a decision about the most suitable model for the data, techniques based on the likelihood of the data, such as AIC, BIC, and so on, can be used.
In what follows, we express the CIAR model in terms of a statespace system. This representation allows us: (i) to implement the Kalman filter to obtain maximum likelihood estimators of the parameters of the model; and (ii) to forecast unobserved measurements.
2.3. Statespace system
A linear statespace system may be described by the following equations.
where Eq. (6) is known as the state equation which determines a vdimensional state variable X_{t}. Equation (7) is called the observation equation, which expresses the wdimensional observation Y_{t}. In addition, F_{t} is a sequence of v × v matrices called the transition matrices, and G ∈ ℝ^{w × v} is the observation linear operator of the observation matrix. Finally, W_{t} ∼ WN(0, R_{t}), V_{t} ∼ WN(0, Q_{t}) and V_{t} is uncorrelated with W_{t}.
In order to represent the CIAR model in a statespace system we need to rewrite Eq. (5). In the following lemma an alternative way of writing the CIAR model is proposed.
Lemma 2. The CIAR process described by Eq. (5) can be expressed by the following equation.
where , , δ_{j} = t_{j} − t_{j − 1}, and ϕ = ϕ^{R} + iϕ^{I}. See Appendix B for a proof of this lemma.
By following the representation of the CIAR model described in Lemma 2, we can express the observed process as
and the latent process as
We note that the observed process y_{tj} is an IAR with parameter ϕ, if we assume and . In addition, it is straightforward to show that is equivalent to ϕ_{I} = 0.
Another important consideration is that the observed process y_{tj} does not depend directly on . Consequently, the variance of the imaginary part c is a nuisance parameter, in the sense that it takes any value in ℝ^{+} and does not cause significant changes in the model.
Equation (8) can be represented by the statespace system of Eqs. (6)–(7) assuming t = t_{j} and . Given that only y_{tj} are actually observed, we obtain Y_{t} = y_{tj}. Therefore, G = (1 0) is the observation matrix under this representation.
To complete the specification we define the transition matrix as and the noise of Eqs. (6) and (7) as and W_{tj} = 0, respectively. Finally, the observation and state equations of the statespace representation of the CIAR model are,
We note that in this representation, the transition matrix and the variance of noise term of the state equation Q_{tj} = σ_{tj}^{2}𝕍(ε_{tj}) depend on time.
Lemma 3. Now we let . If supα_{tj} < 1, the process in Eqs. (11)–(12) is stable and has a unique stationary solution given by
where . Appendix C presents a proof of this lemma.
2.4. Estimation
For the statespace model of Eqs. (6)–(7), the onestep predictors X̂_{tj} = P_{tj−1}(X_{tj}) and their error covariance matrices are unique and determined by the initial values: and . Using the properties of model (5) we can rewrite Ω_{t1} as,
The Kalman recursions, for j = 1, …, n − 1 are defined as follows.
where {ν_{tj}} is called the innovation sequence.
The maximum likelihood estimators of the CIAR model parameters ϕ^{R} and ϕ^{I} can be obtained by minimizing the reduced likelihood defined as,
where Λ_{tj} and ν_{tj} come from the Kalman recursion. We developed scripts in the statistical language/software R and in Python to perform the estimation of the parameters of the model using Kalman recursions.
Another feature of this model is that it allows forecasting, the process of which is explained below.
2.5. Forecasting
Using the spacestate Eqs. (11)–(12) it is straightforward to define a forecasting expression for the CIAR model. We note that from the Kalman filter recursions (14) we obtain the estimates of the state X̂_{tj}, Λ_{tj}, Θ_{tj} and ν_{tj} at the last time point t_{n}. A onestep forecasting for the CIAR model can be obtained from the following equations.
where and δ = t_{n + 1} − t_{n}.
In addition, a confidence interval for the forecast Ŷ_{tn+1} can be defined by,
where e_{tn+1} = Y_{tn+1} − Ŷ_{tn+1} is the forecast error with variance 𝕍(e_{tn + 1}) = σ^{2}(1 − ϕ^{tj − tj − 1}^{2}) and z_{1 − α/2} is the 1 − α/2 quantile of the standard normal distribution.
3. Simulation results
3.1. Assessing the estimation performance of the CIAR model
In this section, we assess the performance of the estimation procedure proposed for the CIAR model. We perform Monte Carlo experiments based on 1000 repetitions of each simulation. In each repetition we generate a CIAR sequence from the model (5) using coefficients with different positive and negative values for the real part ϕ^{R}. In addition, both the imaginary part of the coefficient and the imaginary variance are set to ϕ^{I} = 0 and c = 1, and the real and imaginary errors are assumed to follow a Gaussian distribution with a mean equal to zero and a variance of one. The irregular times are generated using the following mixture of two exponential distributions,
We choose λ_{1} = 15 and λ_{2} = 2 as the means of each exponential distribution, respectively, and ω_{1} = 0.15 and ω_{1} = 0.85 as their respective weights. Under these parameters, the mean of the time differences in the simulated data is ≈3.95. The rationale for choosing this distribution came from mimicking time gaps observed in surveys such as the VVV^{3} where some observations are very close to each other followed by observations that are more separated from each other.
Table 1 shows the results of the Monte Carlo simulations, which suggest that the finitesample performance of the proposed methodology is accurate, both for positive and negative values of the parameter ϕ^{R}. In addition, we also estimate the parameter ϕ of the IAR model in each simulation. A precise IAR coefficient estimate is obtained when ϕ^{R} is positive, but when the CIAR process is generated with negative ϕ^{R} the estimation of the IAR coefficient is close to zero. This result is important since it shows that the IAR model cannot detect negative values of the ACF, unlike the CIAR model. Finally, we note that the accuracy of the estimated values does not depend on the magnitude of the coefficient.
Maximum likelihood estimation of complex ϕ computed by the CIAR model in the real part of the simulated CIAR data.
Another method commonly used for estimating autocorrelation in irregular time series is the discrete correlation function (DCF; Edelson & Krolik 1988). This method is based on computing a Pearson correlation coefficient between data pairs. Each data pair is composed by observations with a time difference in the interval (τ − δ/2, τ + δ/2) where τ is the order of the autocorrelation and δ is the bin window size. In order to assess whether this method can detect the autocorrelation of the CIAR process, we implement the DCF using the package sour of R (Edelson et al. 2017). The last two columns in Table 1 correspond to the mean of the DCF estimates of order τ = 1 and its standard deviation, respectively. We note that the DCF estimates are close to the ϕ^{R} parameter when it is positive, but with a large standard deviation. On the other hand, when ϕ^{R} is negative, the DCF estimates are not accurate.
3.2. Comparison of the CIAR with other time series models
We perform a simulation experiment to compare the performance of wellknown time series models, that is, IAR, AR(1), ARFIMA and CAR(1), in fitting a CIAR process. This experiment consists in generating 1000 sequences of the CIAR process {y_{1}, …, y_{n}} with length n = 300. In order to generate positive and negative values of the ACF, ϕ^{R} = 0.99 and ϕ^{R} = −0.99 are used. The remaining parameters are set as ϕ^{I} = 0, c = 1 and the irregular times are generated with a mixture of two exponential distributions. We then fit each sequence using the different timeseries models, and the CIAR model. The AR(1), ARFIMA, and CAR(1) models were estimated by maximum likelihood using the functions arima, arfima, and cts, respectively, from the statistical software R. To compare its performance we compute the root mean squared error (RMSE). Figure 1a shows that the RMSE estimated by the irregular timeseries models is smaller in comparison with that of either the AR or ARFIMA model which assume regular sampling. However, as can be seen in Fig. 1b the CIAR model shows significantly better performance in fitting the negatively correlated processes than the other implemented models. Therefore, we verify that a CIAR process with a large and negative value of ϕ^{R} cannot be correctly modeled with the conventional timeseries models, including those that assume irregular sampling.
Fig. 1. Boxplot of the root mean squared error computed for the fitted models on the 1000 sequences simulated of the real part of the CIAR process. In panel a, each CIAR process was generated using ϕ^{R} = 0.99. In panel b, each CIAR process was generated using ϕ^{R} = −0.99. The other parameters of the models are defined as ϕ^{I} = 0, c = 0 and length n = 300. The observational times are generated using a mixture of Exponential distribution with λ_{1} = 15 and λ_{2} = 2, ω_{1} = 0.15 and ω_{2} = 0.85. 

Open with DEXTER 
3.3. Computation of the autocorrelation in a harmonic model
An important advantage of the CIAR process over other timeseries models for unequally spaced observation is its ability to model weakly stationary time series with negative as well as positive values of the ACF. A wellknown example of a time series that can be negatively correlated is the following harmonic process,
where f is the frequency of the process and ϵ_{ti} is a Gaussian sequence with a mean of zero and a variance of σ^{2}. In addition, the amplitude A is a fixed parameter and the phase ψ is a random variable with uniform distribution between −π and π. We note that the weakly stationarity of the process y_{ti} is guaranteed when this distribution for ψ is assumed (for more details, see e.g., Lindgren et al. 2013). Assuming regular observational times, the onestep autocorrelation is given by ρ_{1} = cos(f) (Broersen 2006). This result is also satisfied under irregular times. We note that this autocorrelation is negative for f ∈ (π/2, π). In addition, we note that for higherfrequency values the harmonic process (18) becomes more antipersistent (Alperovich et al. 2017).
We performed a simulation study in order to assess whether the CIAR model can detect the correlation structure of an irregular harmonic model. We generated the irregular observational times t_{i} with i = 1, …, n using the mixture of exponentials distributions (Eq. (17)). The process y_{ti} is simulated with length n = 300, amplitude A = 20, and a unit variance for ϵ_{ti}. Simulations using (18) are run with k = 200 different frequencies taken equally spaced from the interval (0, π). We fit a CIAR model to each simulated sequence. In Fig. 2 the parameters estimated (yaxis) from the CIAR model for each of the 200 frequencies (xaxis) are shown as the black line, and the mapping of the frequency values to the theoretical autocorrelation cos(f) is depicted as a red line. We note that the ϕ^{R} parameter estimated by the CIAR model fits the theoretical values almost perfectly.
Fig. 2. Estimated coefficients ϕ^{R} (yaxis) by the CIAR model in k = 200 harmonic processes generated using frequencies (xaxis) in the interval (0, π). The black line corresponds to the coefficients estimated by the CIAR model. The red line is the theoretical autocorrelation of the process y_{ti} 

Open with DEXTER 
4. Application of the CIAR model to astronomical data
In this section we show some results of the implementation of the CIAR model in astronomical time series. Astronomical data are naturally irregularly sampled since it is not always possible to get observational data from optical telescopes due to a dependency on clear skies. Many astronomical objects, such as variable stars, transients, and supernovae, can be characterized by their brightness. Generally, this brightness is represented by a time series (light curve) of this object. We apply the CIAR model to the residuals of a harmonic model fitted to the light curves of variable stars.
4.1. Modeling light curves
One of the most important challenges in the analysis of variable stars is to classify them based on their temporal behavior. The pulsating variables represent a particular class of variable stars that are characterized by having a periodical behavior. Therefore, the light curves of pulsating variable stars are generally fitted by a harmonic model (see e.g. Debosscher et al. 2007; Richards et al. 2011; Elorrieta et al. 2016). The pharmonic model is defined as
where f_{1} corresponds to the dominant frequency estimated by the generalized LombScargle periodogram (GLS; Zechmeister & Kürster 2009). Following Debosscher et al. (2007), for example, p = 4 is assumed to fit the light curves. The main goal of implementing the irregular time series models in light curves from periodic variable stars is to detect whether the harmonic model is sufficient to capture all the temporal dependency in the light curve. Otherwise, the residuals of the harmonic model remain with autocorrelation.
We apply the CIAR model to the residuals of the harmonic model fitted on light curves of variable stars from the optical surveys OGLE and HIPPARCOS. We use these data since many classes of variable stars are available in the catalogue of these surveys.
In order to fit both models on these data, each light curve of the OGLE and HIPPARCOS surveys was fitted by a fourharmonic model (Eq. (19)). The residuals of each harmonic model are then fitted using both the IAR and CIAR models. In Fig. 3, we note that there is a high correlation between the estimated coefficients and of the IAR and CIAR models, respectively, on the light curves when both values are positive, which is consistent with the results of the Monte Carlo simulations.
Fig. 3. Plot of ϕ^{R} vs. ϕ_{IAR}, the parameter coefficients estimated by the CIAR and IAR models, respectively, from light curves in OGLE and HIPPARCOS. 

Open with DEXTER 
However, we observe that several cases identified as uncorrelated by the IAR model, that is, ϕ_{IAR} ≈ 0, have a high but negative autocorrelation estimated by the CIAR model. In other words, these light curves remain with negative dependency structure on the residuals after fitting the harmonic model. Doublemode cepheids (DMCEP) and eclipsing binaries (EB) make up the majority of these variable stars.
The time structure detected can be due to the fact that a light curve was incorrectly fitted by the harmonic model, for example when using an incorrect period. Also the CIAR model can detect a correlation structure in the residuals of a harmonic model fitted to the light curve of a multiperiodic variable star. Intuitively, if the light curve has two or more periods, a harmonic model fitted using only the dominant frequency is not sufficient to account for all its temporal dependency structure.
It is also important to assess the goodness of fit performance of the IAR and CIAR models on these data. We compute the RMSE after fitting each irregular model on the residuals of the harmonic model. These results do not vary significantly when ϕ^{R} is positive. However, if this coefficient is negative, the RMSE estimated by the CIAR model is smaller than the one obtained when we fit these data using the IAR model, as can be seen in Fig. 4. This result is consistent with those obtained in Sect. 3.2.
Fig. 4. Kernel density of the RMSE computed for the residuals of harmonic fit in the light curves when the CIAR coefficient is negative. The red density corresponds to the RMSE computed using the CIAR model, and the blue density corresponds to the RMSE computed using the IAR model. 

Open with DEXTER 
4.2. Statistical test for the autocorrelation parameter
The magnitude of the coefficients estimated by the irregular time series models depends on the period of the light curve. Light curves with short periods have, in general, smaller estimated coefficients. In other words, a small value of the estimated coefficient does not always imply uncorrelated residuals. If the light curve has a large period, a small coefficient could mean uncorrelated residuals. However, if the variable star has a shorter period, it could mean the opposite. Therefore, we cannot make decisions based only on the value of the parameter estimated by the model.
To overcome this problem we designed a statistical test that can assess when a value of ϕ is significantly different from (larger than) the values that can arise by chance. To do that, we estimated the distribution of the parameter for time series with autocorrelation. We then assessd how likely it is to observe a particular value of under this distribution. Following the methodology proposed by Eyheramendy et al. (2018) we selected 38 different frequencies to fit each light curve incorrectly, where each of them is a variation of the correct period in the interval (f_{1} − 0.5f_{1}, f_{1} + 0.5f_{1}). The factor f_{1} corresponds to the correct frequency of the light curve. To develop the test we assumed that the follows a Gaussian distribution, estimated using the 38 obtained from the CIAR fit of the time series that each assume a different incorrect frequency from the interval. Consequently, the null hypothesis here is that the belongs to this Gaussian distribution. We note that the test is formulated so that each estimated coefficient is compared only with the estimated ones in the same light curve fitted incorrectly.
Using this test, there are some examples in which the IAR model cannot distinguish if the model is correctly fitted or not, but the CIAR model can always do this. Figure 5 shows an example of a RRc star observed by the HIPPARCOS survey. Figure 5a shows the perfect fit of the harmonic model to the light curve. Figures 5b and c show that the CIAR and IAR models, respectively, have an estimate of the parameter close to zero on the residuals of the 39 fitted models. However, the CIAR model gives a value of significantly smaller than the remaining ones when the light curve is correctly fitted, giving a pvalue close to zero (Fig. 5b).
Fig. 5. Panel a: light curve of a RRc star observed by the HIPPARCOS survey. The continuous blue line is the harmonic best fit. Panel b: natural logarithm of the absolute value of the estimated parameter by the CIAR model on the residuals of the harmonic model fitted with different frequencies; xaxis shows percentage variation from the correct frequency, yaxis shows the natural logarithm of . Panel c: natural logarithm of the estimated parameter by the IAR model on the residuals of the harmonic model fitted with different frequencies; xaxis shows the percentual variations from the correct frequency, yaxis shows the natural logarithm of . 

Open with DEXTER 
4.3. Classification features estimated from the irregular timeseries models
One of the most important aims in the light curve analysis from variable stars is to find features that can discriminate one class of variable star from another. Finding a good feature is the key to building a classifier with good performance to detect stars of a given class. Generally, these features are extracted from the temporal behavior of the brightness of each variable star.
In this study, we propose to use the parameter estimated by the CIAR model as a feature. For example, for a multiperiodic variable star, it can then be expected that a harmonic model fitted using only the dominant frequency will not be enough to describe its temporal dependency structure. In other words, if we fit the CIAR and IAR models to the residuals of the harmonic model in Eq. (19) we should obtain values for the parameter of the ACF that are significantly different from zero. Therefore, it is natural to think that these coefficients are capable of distinguishing multiperiodic variable stars from other classes. In the OGLE and HIPPARCOS catalogs there are two classes of multiperiodic variable stars: the doublemode RR Lyraes (RRDs) and the doublemode Cepheids (DMCEPs).
We implement the IAR and the CIAR models on the residuals after fitting a harmonic model with only one period to the light curves of these classes. We found several examples of negative autocorrelation in the residuals of the harmonic model, which cannot be detected using the IAR model. One of these examples is a biperiodic doublemode Cepheid (OGLE ID:175210) in which the IAR and the CIAR coefficients estimated on these residuals are and . Consequently, we focus on the CIAR model estimates.
The features that can be extracted from the CIAR model are the parameter ϕ^{R} and the pvalue associated to the test performed on ϕ^{R}. It is interesting to assess whether these features can separate the multiperiodic classes from the other RRLyraes and Cepheids, respectively. Figures 6a and b show the distribution of the pvalues computed by the CIAR model. As can be seen from these figures there are significant differences between the classes of RRLyraes and Cepheids in the distributions of the computed pvalues. We note that the RRD and DMCEP classes take larger values in comparison to the other classes.
Fig. 6. Panel a: boxplot of the pvalue estimated from the CIAR model in the RRLyraes variable stars separated by subclass. Panel b: boxplot of the pvalue estimated from the CIAR model in the Cepheids variable stars separated by subclass. 

Open with DEXTER 
The use of the pvalue as a feature reflects the multiperiodic behavior of the RRD and DMCEP classes.
4.4. Forecasting astronomical data
In this section, we illustrate the forecasting procedure implemented for the CIAR model. For this we use the light curve of the AGN MCG63015 (Lira et al. 2015) measured in the Kband. The time series of this object has n = 237 observations taken over a period of approximately 4.5 years.
Initially, we normalize the time series and use the first 90% of the data to estimate the parameters of the CIAR model. We forecast the next observation at the time t_{j + 1} given the observational times t_{1}, …, t_{j} corresponding to the first 90% of the data. Later, we include the observation at the moment t_{j + 1} and reestimate the model to forecast the observation at the following time t_{j + 2}. This procedure is repeated iteratively until the remaining 10% of the data is forecasted, obtaining the vector of onestep forecasted values (ŷ_{j+1},…,ŷ_{n}). Figure 7a shows the normalized MCG63015 light curve and the forecasted values represented with red dots.
Fig. 7. Panel a: normalized Kband MCG63015 light curve. The red dots are the forecasted values. Panel b: zoom of the last 10% of the MCG63015 light curve. The red dots are the forecasted values using the CIAR model and the gray bars are the confidence intervals at the 90% level. 

Open with DEXTER 
The parameters estimated for this light curve were and for the first 90% of the data and and for all the data. In addition, we also compute the confidence interval at a 90% level for each forecasted value, which is shown in Fig. 7b. We note that the interval size is larger for larger time gaps.
5. Discussion
In this work we present an extension of the irregular autoregressive (IAR) model (Eyheramendy et al. 2018), that we call the Complex Irregular Autoregressive (CIAR) model. As opposed to the IAR model that can only estimate positive values of the ACF, the CIAR model can estimate both positive and negative values of the ACF. We have shown that this model is weakly stationary and its statespace representation is stable under regular assumptions. We propose a maximum likelihood estimation procedure for the parameters of this model, where the solution is reached using Kalman recursions. We developed a code in R and Python to perform an estimation of the model.
There is a strong connection between the three models: CAR(1), IAR, and CIAR. Assuming a null imaginary part and positive real part of the complex autocorrelation parameter, the CIAR model becomes the IAR model. The connection between both models is verified by Monte Carlo simulations in Sect. 3.1. Also, when Gaussian data is fitted in the models, the IAR and the CAR(1) are equivalent. Both the IAR and the CIAR models fit the irregularity of the time gaps between observations as discrete times as opposed to the CAR(1) model which considers time as a continuous variable. At this stage, only the IAR model can fit data that is not Gaussian; the CIAR and the CAR models can only fit Gaussian data.
We illustrate the contribution of the CIAR model in two applications. First, the CIAR process can fit irregular time series with negative values of the ACF, the shape of which is an exponential decay. Second, the CIAR process can identify series with negative values of the ACF such as a high frequency harmonic model or an antipersistent process. In both applications, we show with simulated data that the CIAR model performs better than other popular timeseries models.
A negative autocorrelated time series is characterized by more fluctuations (Box et al. 2015). Take for example Δ_{t} = y_{t} − y_{t − 1} and the variance . This variance is larger for ϕ < 0 than for positive values of ϕ. Figure 8 illustrates the behavior mentioned above from a CIAR process simulated with ϕ^{R} = 0.99 and ϕ^{R} = −0.99. We note that there are more oscillations in the CIAR process generated with negative autocorrelation.
Fig. 8. Panel a: CIAR process with positive autocorrelation generated using parameters ϕ^{R} = 0.99, ϕ^{I} = 0, c = 0 and length n = 300. Panel b: CIAR process with negative autocorrelation generated using parameters ϕ^{R} = −0.99, ϕ^{I} = 0, c = 0 and length n = 300. 

Open with DEXTER 
The application of the proposed model is performed on astronomical data. Particularly, we use the light curves of variable stars observed by the OGLE and HIPPARCOS surveys. As many variable stars show periodical behavior in terms of their brightness, it is common to fit the light curves by a harmonic model. We have discussed that the irregular time series models are useful to determine whether or not the residuals of the harmonic fit are still autocorrelated. When a time dependency structure remains on the residuals, we can conclude either that the light curve was not correctly fitted or that it corresponds to a multiperiodic variable star. Furthermore, we noticed that the estimated coefficients are highly related to the frequency of the light curves. In order to interpret the coefficient correctly, we have developed a statistical test.
After fitting both models to the light curves, we can see a strong relationship between the IAR and CIAR models when the coefficient ϕ^{R} of the CIAR model is positive. However, we have also found several cases of negative values of the ACF of light curves that the IAR model is not capable of identifying. In addition, we show examples where the inability to estimate negative values of ϕ from the IAR model prevents us from finding multiperiodic variable stars or correctly detecting whether the harmonic model is incorrectly specified.
It is expected that negative values of the ACF will be less frequent than positive values. This hypothesis is reflected in the astronomical datasets analyzed. However, there are several examples of negative values obtained from time series of light curves from the OGLE and HIPPARCOS surveys. This result shows how important it is to have a model that can identify the negative as well as the positive time dependencies in an irregular time series. Among the timeseries models that assume irregular sampling, only the CARFIMA process can fit a negative autocorrelated time series if this has an antipersistent behavior. The contribution of the model proposed here is that it can model series with negative exponential decay in the ACF. In other words, both processes can be negatively autocorrelated, but still have different correlation structures.
In addition, we have shown that the pvalues obtained from the test proposed in this work for the CIAR model are useful for characterizing the multiperiodic classes of RRLyraes (RRDs) and Cepheids (DMCEPs). This result indicates that the pvalue can be an important feature for a machinelearned classifier implemented on these classes. In a future study, a classifier for multiperiodic variable stars will be implemented using this feature.
The main aim of this work is to continue developing models for irregularly observed time series where time is considered discrete as opposed to continuous. This will allow us to extend the popular ARMA models for regular observations to the irregular case. We have shown that the CIAR model extends the IAR model by allowing negative as well as positive autocorrelations to be captured. We will continue working on expanding the scope of irregularly observed time series.
Acknowledgments
Support for this research was provided by grant IC120009, awarded to The Millennium Institute of Astrophysics, MAS, and from Fondecyt grant 1160861. F.E. acknowledges support from CONICYTPCHA (Doctorado Nacional 2014 21140566).
References
 Alder, B., & Wainwright, T. 1970, Phys. Rev. A, 1, 18 [NASA ADS] [CrossRef] [Google Scholar]
 Alperovich, Y., Alperovich, M., & Spiro, A. 2017, Tenth International Conference Management of LargeScale System Development (MLSD), 1 [Google Scholar]
 Ausloos, M., & Ivanova, K. 2001, Phys. Rev. E, 63, 047201 [NASA ADS] [CrossRef] [Google Scholar]
 Bondon, P., & Palma, W. 2007, J. Time Ser. Anal., 28, 261 [CrossRef] [Google Scholar]
 Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. 2015, Time Series Analysis: Forecasting and Control, 5th edn. (John Wiley& Sons, Inc.) [Google Scholar]
 Brockwell, P., & Davis, R. 2002, Introduction to Time Series and Forecasting (New York: SpringerVerlag) [CrossRef] [Google Scholar]
 Broersen, P. M. T. 2006, Automatic Autocorrelation and Spectral Analysis (Secaucus, NJ, USA: SpringerVerlag, New York, Inc.) [Google Scholar]
 Campbell, J. Y., Lo, A. W. C., & MacKinlay, A. C. 1997, The Econometrics of Financial Markets (Princeton University Press), 632 [Google Scholar]
 Carvalho, L. M. V. D., Tsonis, A. A., Jones, C., Rocha, H. R. D., & Polito, P. S. 2007, Nonlinear Processes Geophys., 14, 723 [NASA ADS] [CrossRef] [Google Scholar]
 Chan, K. S., & Tong, H. 1987, J. Time Ser. Anal., 8, 277 [CrossRef] [Google Scholar]
 Conrad, J. S., Hameed, A., & Niden, C. 1994, J. Finance, 49, 1305 [CrossRef] [Google Scholar]
 Debosscher, J., Sarro, L. M., Aerts, C., et al. 2007, A&A, 475, 1159 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Dubois, S. R., & Glanz, F. H. 1986, IEEE Trans. Pattern Anal. Mach. Intell., 8, 55 [CrossRef] [Google Scholar]
 Edelson, R. A., & Krolik, J. H. 1988, ApJ, 333, 646 [NASA ADS] [CrossRef] [Google Scholar]
 Edelson, R., Gelbord, J., Cackett, E., et al. 2017, ApJ, 840, 41 [NASA ADS] [CrossRef] [Google Scholar]
 Elorrieta, F., Eyheramendy, S., Jordán, A., et al. 2016, A&A, 595, A82 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
 Eyheramendy, S., Elorrieta, F., & Palma, W. 2018, MNRAS, 481, 4311 [NASA ADS] [CrossRef] [Google Scholar]
 Feigelson, E. D., Babu, G. J., & Caceres, G. A. 2018, Front. Phys., 6, 80 [CrossRef] [Google Scholar]
 ForemanMackey, D., Agol, E., Ambikasaran, S., & Angus, R. 2017, AJ, 154, 220 [NASA ADS] [CrossRef] [Google Scholar]
 Gao, J., Cao, Y., Tung, W. W., & Hu, J. 2007, Multiscale Analysis of Complex Time Series: Integration of Chaos and Random Fractal Theory, and Beyond (WileyInterscience) [Google Scholar]
 Kelly, B. C., Becker, A. C., Sobolewska, M., Siemiginowska, A., & Uttley, P. 2014, ApJ, 788, 33 [NASA ADS] [CrossRef] [Google Scholar]
 Lindgren, G., Rootzén, H., & Sandsten, M. 2013, Stationary Stochastic Processes for Scientists and Engineers (Chapman and Hall) [CrossRef] [Google Scholar]
 Lira, P., Arévalo, P., Uttley, P., McHardy, I. M. M., & Videla, L. 2015, MNRAS, 454, 368 [NASA ADS] [CrossRef] [Google Scholar]
 Martin, R. 1999, Sign. Proces., 77, 139 [CrossRef] [Google Scholar]
 Miller, K. 1974, in Complex Stochastic Processes: an Introduction to Theory and Application (AddisonWesley Publishing Company, Advanced Book Program), Adv. Book Program [Google Scholar]
 Perryman, M. A. C., Lindegren, L., Kovalevsky, J., et al. 1997, A&A, 323, L49 [NASA ADS] [Google Scholar]
 Picinbono, B., & Bondon, P. 1997, IEEE Trans. Sign. Proces., 45, 411 [NASA ADS] [CrossRef] [Google Scholar]
 Rehfeld, K., Marwan, N., Heitzig, J., & Kurths, J. 2011, Nonlinear Processes Geophys., 18, 389 [NASA ADS] [CrossRef] [Google Scholar]
 Richards, J. W., Starr, D. L., Butler, N. R., et al. 2011, ApJ, 733, 10 [NASA ADS] [CrossRef] [Google Scholar]
 Sekita, I., Kurita, T., & Otsu, N. 1991, Complex Autoregressive Model and its Properties (Electrotechnical Laboratory) [Google Scholar]
 Sewell, M. 2011, Characterization of Financial Time Series [Google Scholar]
 Tsai, H. 2009, Bernoulli, 15, 178 [CrossRef] [MathSciNet] [Google Scholar]
 Udalski, A., Soszynski, I., Szymanski, M., et al. 1999, Acta Astron., 49, 223 [Google Scholar]
 Uritskaya, O. Y., & Uritsky, V. M. 2015, Energy Econ., 49, 72 [CrossRef] [Google Scholar]
 Williams, S. R., Bryant, G., Snook, I. K., & van Megen, W. 2006, Phys. Rev. Lett., 96, 087801 [NASA ADS] [CrossRef] [Google Scholar]
 Zechmeister, M., & Kürster, M. 2009, A&A, 496, 577 [NASA ADS] [CrossRef] [EDP Sciences] [Google Scholar]
Appendix A: Proof of Lemma 1
Consider the CIAR process x_{tj} described by Eq. (5). We note that 𝔼(y_{tj}) = 0 and 𝔼(z_{tj}) = 0. Therefore, x_{tj} = y_{tj} + iz_{tj} is such that 𝔼(x_{tj}) = 0. According to the definition of the model, we have
Applying expectation 𝔼 and using the properties of the model (5)
from which we obtain
The autocovariance of the process is defined by , such that
It can be shown that by recursion the autocovariance function is given by
and therefore, the autocorrelation can be expressed as □.
Appendix B: Proof of Lemma 2
The CIAR model is defined as . Let us focus on the term (ϕ^{R} + iϕ^{I})^{tj − tj − 1},
Using the polar representation for complex numbers we obtain
Using the De Moivre formula, we have
Finally, the Complex IAR model can be represented by the expression
Appendix C: Proof of Lemma 3
At a fixed time t_{j}, the eigenvalues of the transition matrix of the CIAR process satisfy the following equation.
Since , then the process is stable if supα_{tj} < 1. Therefore, under this assumption the CIAR process has the unique stationary solution (Brockwell & Davis 2002) given by
Therefore, the general form can be written as,
Since and F_{tj − k} < 1 due to the stability of the process, then . Finally, if n → ∞ then the unique stationary solution is given by,
All Tables
Maximum likelihood estimation of complex ϕ computed by the CIAR model in the real part of the simulated CIAR data.
All Figures
Fig. 1. Boxplot of the root mean squared error computed for the fitted models on the 1000 sequences simulated of the real part of the CIAR process. In panel a, each CIAR process was generated using ϕ^{R} = 0.99. In panel b, each CIAR process was generated using ϕ^{R} = −0.99. The other parameters of the models are defined as ϕ^{I} = 0, c = 0 and length n = 300. The observational times are generated using a mixture of Exponential distribution with λ_{1} = 15 and λ_{2} = 2, ω_{1} = 0.15 and ω_{2} = 0.85. 

Open with DEXTER  
In the text 
Fig. 2. Estimated coefficients ϕ^{R} (yaxis) by the CIAR model in k = 200 harmonic processes generated using frequencies (xaxis) in the interval (0, π). The black line corresponds to the coefficients estimated by the CIAR model. The red line is the theoretical autocorrelation of the process y_{ti} 

Open with DEXTER  
In the text 
Fig. 3. Plot of ϕ^{R} vs. ϕ_{IAR}, the parameter coefficients estimated by the CIAR and IAR models, respectively, from light curves in OGLE and HIPPARCOS. 

Open with DEXTER  
In the text 
Fig. 4. Kernel density of the RMSE computed for the residuals of harmonic fit in the light curves when the CIAR coefficient is negative. The red density corresponds to the RMSE computed using the CIAR model, and the blue density corresponds to the RMSE computed using the IAR model. 

Open with DEXTER  
In the text 
Fig. 5. Panel a: light curve of a RRc star observed by the HIPPARCOS survey. The continuous blue line is the harmonic best fit. Panel b: natural logarithm of the absolute value of the estimated parameter by the CIAR model on the residuals of the harmonic model fitted with different frequencies; xaxis shows percentage variation from the correct frequency, yaxis shows the natural logarithm of . Panel c: natural logarithm of the estimated parameter by the IAR model on the residuals of the harmonic model fitted with different frequencies; xaxis shows the percentual variations from the correct frequency, yaxis shows the natural logarithm of . 

Open with DEXTER  
In the text 
Fig. 6. Panel a: boxplot of the pvalue estimated from the CIAR model in the RRLyraes variable stars separated by subclass. Panel b: boxplot of the pvalue estimated from the CIAR model in the Cepheids variable stars separated by subclass. 

Open with DEXTER  
In the text 
Fig. 7. Panel a: normalized Kband MCG63015 light curve. The red dots are the forecasted values. Panel b: zoom of the last 10% of the MCG63015 light curve. The red dots are the forecasted values using the CIAR model and the gray bars are the confidence intervals at the 90% level. 

Open with DEXTER  
In the text 
Fig. 8. Panel a: CIAR process with positive autocorrelation generated using parameters ϕ^{R} = 0.99, ϕ^{I} = 0, c = 0 and length n = 300. Panel b: CIAR process with negative autocorrelation generated using parameters ϕ^{R} = −0.99, ϕ^{I} = 0, c = 0 and length n = 300. 

Open with DEXTER  
In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.