Modeling financial interval time series

In financial economics, a large number of models are developed based on the daily closing price. When using only the daily closing price to model the time series, we may discard valuable intra-daily information, such as maximum and minimum prices. In this study, we propose an interval time series model, including the daily maximum, minimum, and closing prices, and then apply the proposed model to forecast the entire interval. The likelihood function and the corresponding maximum likelihood estimates (MLEs) are obtained by stochastic differential equation and the Girsanov theorem. To capture the heteroscedasticity of volatility, we consider a stochastic volatility model. The efficiency of the proposed estimators is illustrated by a simulation study. Finally, based on real data for S&P 500 index, the proposed method outperforms several alternatives in terms of the accurate forecast.


Introduction
There are a large number of models to develop in order to analyze financial data. Conventionally, most of well-proposed models are constructed by daily closing price. By doing so, some important valuable intra-daily information may be discarded such as maximum and minimum prices. According to the recent literature, we can treat the maximum and minimum prices as an interval valued observations. Symbolic data methodologies are applied to deal with this approach. For instance, Billard and Diday [1,2] propose the evaluation of mean, variance, and covariance along with regression analysis based on interval valued observations. By integrating the time dependency factor, their method evolves into the analysis of interval time series. The recent research pays more attention to model and forecast the interval time series process. In this study, we propose an interval time series model, and apply the proposed model to forecast the consecutive interval.
A naïve method to approach the interval time series is considering the maximum and minimum processes as a vector. This leads to the vector autoregressive (VAR) model. However, uncontrollable noise terms can bring about larger predicted lower value than the upper value. To deal with this problem, one can change the interval time series process to a bivariate time series model based on the center and the radius. For example, Neto and Carvalho [3] fit the autoregressive models to the center and radius processes, separately. It is possible to ignore the correlation between the center and radius. Arroyo et al. [4] consider their VAR model based on the first order difference center process and the radius process. Similarly, Rodrigues PLOS

Main results
Referring to Andersen et al. [13] and Aït-Sahalia et al. [14], the intra-daily log price, a.k.a. the high frequency data, on the i-th day follows the stochastic differential equation where W t is a standard Brownian motion. In this study, we assume that all high frequency data are latent, except for the opening, maximum, minimum, and closing prices. Denote X i = (O i , U i , L i , C i ) as the observed random vector on the i-th day where O i , U i , L i , and C i are the log opening, maximum, minimum, and closing price, respectively. The log maximum and minimum values can be given by U i = max i−1<t<i Y t and L i = min i−1<t<i Y t . Applying the Girsanov theorem to Y t and the connection between the maximum and the closing price expressed by Theorem 3.7.3 of Shreve [15], we have the following result. Theorem 1 Suppose that the log price Y t satisfies the stochastic differential Eq (1), and let Analogously, we have the following probability density function of the minimum and the closing log prices. Similarly, we can obtain the joint distribution of the terminal and the minimum values.
Theorem 2 Suppose that the log price Y t satisfies (1), and let Then the joint density of (L, C) conditional on O = o is In addition, according to Choi and Roh [16], denoting W u t ¼ sup 0�s�t W s and W l with a � 0 and b � 0. Applying the Girsanov theorem, we obtain the joint density of the maximum, minimum, and closing log prices in the following theorem. Theorem 3 Assume that the log price Y t satisfies (1), and the conditions of Theorem 1 and Theorem 2 hold. Then the joint density of (U, L, C) conditional on O = o is f U;L;CjO ðu; '; cjoÞ ¼ where ℓ � c, o � u and According to the results from Theorem 1 and Theorem 2, we can obtain the maximum likelihood estimators (MLEs) for the drift term μ and for volatility σ 2 as follows. Proposition 1 Suppose that the conditions of Theorem 1 and Theorem 2.
n be the observed data on the i-th day for the realization Y. The likelihood function of (μ, σ 2 ) based on Theorem 1 is given by Then the MLEs of μ and σ 2 arê Similarly, using Theorem 2, the MLEs of μ and σ 2 arê Owing tom u ¼m l , for simplicity, we use the notationm for bothm u andm l .

Remark 1 According to Theorem 3, the likelihood function can be written as
Lðm; s 2 jũ;l;õ;cÞ ¼ whereũ ¼ ðu 1 ; � � � ; u n Þ,l ¼ ðl 1 ; � � � ; l n Þ,õ ¼ ðo 1 ; � � � ; o n Þ, andc ¼ ðc 1 ; � � � ; c n Þ: The MLEs of μ and σ 2 denoted as ðm all ;ŝ 2 all Þ can be obtained numerically. We then calculate the one-step predictions for the log maximum and minimum prices. 1 À 2F Àm s wherem andŝ 2 are the MLEs based on X 1 , . . ., X t . Note that from Proposition 1 and Remark 1, we have the candidates for the MLE for μ (written asm andm all ), and the MLE for σ 2 (written asŝ 2 l ,ŝ 2 u , andŝ 2 all ); see also the further discussion in Section: Simulations. Note that the quantity O t+1 can be set to C t or it can be known. This means that we can make any decision after O t+1 is revealed.
In real-life applications, it is reasonable to assume that the mean return of each day equals zero. Then, we can obtain a simplified form for the one-step prediction.
Corollary 1 Let the assumptions of Proposition 1 hold, and further assume that μ = 0. Then we have ffi ffi ffi ffi ffi ffi ffi 2ŝ 2 p r :

Stochastic volatility model
A stochastic volatility model is constructed that the logarithmic price follows a stochastic diffusion equation and the volatility satisfies another diffusion processes. See, for instance, Hull and White [17], Stein and Stein [18], and Heston [19]. Define the stochastic volatility model as following: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where (B t , W t ) t>0 is a two-dimensional standard Brownian motion, and O is the initial log price and z is a random variable from the stationary distribution of s 2 t and independent of (B t , W t ). Referred to Bibby et al. [20], we assume the drift function b(�) to satisfy the mean reverting function, that is, bðs 2 t Þ ¼ rðy À s 2 t Þ. Then, the non-negative diffusion function v(�) is uniquely specified by the invariant density of s 2 t . For example, if v(x) is proportion to a constant, x, or x 2 , the invariant density of s 2 t is respectively normal, gamma, or inverse gamma distributions. However, if the intra-daily volatility is a stochastic processes, the Girsanov theorem can not be applied straightforwardly. In this section, we consider that s 2 t is stochastic on the discrete time i = 1, 2, . . ., n, but has a stationary distribution during a fixed time interval To illustrate, we study a particular model. Referred to Bibby et al. [20], for i = 1, � � �, n, the volatility s 2 t satisfies the following diffusion processes whereZ i , i = 1, � � �, n, are the standard normal random variables. By Bibby et al. [20], the stationary distribution of s 2 i is inverse gamma distributed. Then, given the i-th day volatility s 2 i , the intra-daily log price Y t on i-th day satisfies the following stochastic volatility model, where Z j , j = 1, � � �, m, are independently and normally distributed with mean zero and standard deviation Δ = m −1 . Namely, we assume that there are m log prices per day. For simplicity, we further assume that Z andZ are mutually independent. Then, the joint density of (U, C, L) can be obtained by using Bayesian method as below The likelihood functions are derived in the following theorem. Theorem 4 Suppose that the log price Y t and the volatility s 2 t satisfy (5) and (4), respectively.
where ℓ � c, o � u and

Simulations
We construct the observations as follows. Set the i-th intra-daily log price to satisfy where W Δ is normally distributed with mean 0 and variance Δ, and the sampling frequency is Δ = 1/5000. The log opening, maximum, minimum, and closing prices are denoted by , and repeat the above procedure for i = 1, 2, . . ., n − 1. We consider three practically oriented experiments based on the real observable data. According to the empirical evidence, the higher annualized market volatility is around 0.24, in contrast, the lower one is around 0.04. We also consider one particular case of the moderate volatility with the annualized market volatility being 0.12, and two cases of more violent volatilities with the annualized market volatilities being 0.36 and 0.48. So the daily volatilities are given by 0.04/250, 0.12/250, 0.24/250, 0.36/250, and 0.48/250. In addition, for the setting of drift term, we study two cases for the coefficient of variation: σ/μ = 1 (unit dispersion) and σ/μ = 2 (over dispersion). We propose the MLEm for μ and ðŝ 2 u ;ŝ 2 ' Þ for σ 2 in Proposition 1. Theorem 3 provides the MLEm all andŝ 2 all for μ and σ 2 , respectively. For comparison, we consider the conventional MLE for σ 2 based on discrete time closing prices given by where R i = C i − C i−1 are the log returns of closing prices and � R ¼ ðn À 1Þ À 1 P n i¼2 R i . After 1000 replications, the relative error (RE, see for instance Helfrick and Cooper [21]) can be defined as where RMSE stands for the root mean square error between the estimators and the true values.
The values ofm,m all ,ŝ 2 u ,ŝ 2 ' ,ŝ 2 all , and s 2 are shown in Tables 1 and 2. We can see that the RE of m is slightly less than that ofm all . This is possibly caused by truncating the infinite series into a finite sum of ±20 terms. On the other hand, the REs ofŝ 2 u ;ŝ 2 ' , andŝ 2 all are much less than s 2 . In particular, the performance ofŝ 2 all is the best in terms of the smallest RE. Meanwhile, the relative efficiencies of s 2 compared toŝ 2 u andŝ 2 ' are written as The results indicate that the proposed estimators (ŝ 2 u ,ŝ 2 ' , andŝ 2 all ) display substantial improvements and is stable in different scenarios. We conclude that the easy to implement estimatorm has lower relative error thanm all . In addition, the whole observations- based estimatorŝ 2 all has better accuracy thanŝ 2 u ,ŝ 2 ' , and even the conventional s 2 , in terms of relative error.
We obtain the MLEsâ andb via Theorem 4 and the MLE forŝ v given byŝ v ¼b=ðâ À 1Þ. We then consider the conventional estimator s 2 ,ŝ 2 all discussed in Theorem 3 for the constant volatility case, and the volatility estimator proposed by Chou [10,11], denoted asŝ 2 C for comparison. Since the volatility estimator proposed by Chou [10,11] based on the ranges, upward ranges, and downward ranges are quite similar, we only discuss one particular case among them. For simplify, we intend to fit GARCH(1,1) forŝ 2 C and the results are shown in Table 3. In the case of the high volatility, the relative efficiencies of s 2 compared toŝ v ,ŝ 2 all , andŝ 2 C are 2.19, 1.60, and 1.81, respectively.
As we expected that more information (maximum/minimum prices) improve the accuracy of the estimation of the volatility. Besides, the estimatorsŝ v andŝ 2 C estimated under the stochastic volatility model perform better than the one based on the constant volatility model written asŝ 2 all estimated in the constant volatility case. Meanwhile,ŝ v has the lowest relative errors since it is obtained from the exact likelihood function instead of the quasi-likelihood function. For the moderate and low volatility cases, the estimatorsŝ v are still the best one with the lowest relative errors. Note that the estimatorŝ 2 C performs worse in the case of the

Real application
We present the one-step predictions of an interval valued time series for the S&P 500 index. According to Arroyo et al. [4], the daily high/low prices of the S&P 500 index are utilized to compare the prediction performances of various methods. We make an one-step prediction by applying the rolling window where the historical data of previous year is used to estimate the parameters.  In order to quantify the accuracy of the one-step forecast, we adopt the measure of the mean distance error (MDE) defined as ; where X t = [L t , U t ] is the true interval valued data andX t ¼ ½L t ;Û t � is the estimated one. Following Rodrigues and Salish (2011), descriptive statistics are also evaluated by 1. coverage rate: 3. normalized symmetric difference: where w(X) represents the length of an interval X. By using Proposition 2, we obtain the high and low prices by one-step forecasting. Then we can compare our results with those of Naïve method, EWMA, k-NN, VAR(3), VECM(3), (cf. Arroyo et al. [4]) and CR-SETAR (cf. Rodrigues and Salish [5]). Let [U 1 , L 1 ],. . .,[U n , L n ] be the observations and our goal is to forecast the interval on day n + 1, i.e., ½Û nþ1 ;L nþ1 �. The Naïve method predicts the intervals by using the previous one, that is, ½Û nþ1 ;L nþ1 � ¼ ½U n ; L n �. EWMA provides the predicted interval as follows lð1 À lÞ j ½U nÀ j ; L nÀ j �: We set λ = 0.04 as suggested by Arroyo et al. [4]. The k-NN method is to find the historical k sequences with d points which are closest ones to the current ones in terms of MDEs, and then is to evaluate the average of the consecutive intervals of these k closest sequences. Let denoted as k-NN(eq), is given by Further, the proportion weights k-NN method, denoted as k-NN(prop), is given by where w j ¼ c j =ð P k m¼1 c m Þ. ψ j is defined as the inverse of the MDE between ½U; L� ðdÞ n and ½U; L� ðdÞ t j plus a small constant, say 10 −8 . Referred to Arroyo et al. [4], d = 2 for three consideration periods and k = 23, 18, and 26 for the similar volatility, high volatility, and dissimilar volatility periods, respectively. According to the results in Arroyo et al. [4], the VAR(3) model based on the vector of differenced center and radius time series can be written as Using the historical observations to fit the VAR(3) model and obtain all of the parameter estimations, the predicted interval is Next, assuming that (U t , L t ) satisfies the VECM(3) model, this implies where ΔU t = U t − U t−1 and ΔL t = L t − L t−1 . Using the historical observations to fit the VECM (3) model and to obtain all of the parameter estimations, the predicted interval is Following Rodrigues and Salish [5], the two-regime CR-SETAR model based on center and radius time series is where I {} represents the indicator function. We choose p = 6, q = 8, and d = 1 same as the cases proposed by Rodrigues and Salish [5]. Using the historical observations to fit the CR-SETAR model and obtain all of the parameter estimations, the predicted interval is  the improvement for MDE is around 26%, 23%, and 38% on the similar volatility period, high volatility period, and dissimilar volatility period, respectively. Meanwhile, compared to the best one among other methods, they are 17%, 20%, and 35% on the similar volatility period, high volatility period, and dissimilar volatility period, respectively. In addition, the measurements R E and R N of our proposed prediction method are also the largest one. The above results show that the proposed model presents the more accurate interval financial time series in the real world.

Conclusion
We propose the joint densities of daily log opening, maximum and closing prices and daily log opening, minimum and closing prices based on stochastic differential equations. Simulation studies show that the proposed estimators have higher efficiency than the conventional one using RE. In the real data analysis for S&P 500 index, the one-step forecasts of proposed method outperforms than several alternatives in terms of MDE, R E , and R N . The proposed methodology has several interesting extensions. In this paper, we study the stochastic volatility model on discrete time where the stochastic volatility is driven by a stationary distribution during a fixed time interval. In the literature, it is nature to consider the intradaily volatility is governed by stochastic processes. However, owing to the stochasticity feature of the volatility, the Girsanov theorem can not be applied straightforwardly. Based on Akahori et al. [22], during the small time interval, the asymptotic results can be used to simplify the Girsanov theorem by using the Taylor expansion. Then the likelihood function can be derived and the corresponding maximum likelihood estimators can be obtained. We left this issue as our future project. Alternatively, from the investment strategy point of view, it is also interesting to study the high dimensional financial interval time series for multiple assets leading to the corresponding estimation problem for the proposed high dimensional model.

Proof of Theorem 1
Let Y t = log S t given by the dynamics Let M t = sup 0�s�t Y s . The joint cdf of Y t and M t is written as Using t = i − (i − 1) = 1, we obtain (2).

Proof of Theorem 2
Define the minimum to date for Brownian motion to be By reflection principle, we have g; w � l; l < 0: The rest part follows the same procedure as Theorem 1 to demonstrate this proof.

Proof of Theorem 3
Given For the term L 1 , we obtain the results by changing the variable. To tackle the term L 2 , we exchange the order of integration as follows. Similar procedure can be applied to L t (1) and we complete this proof.