Clustering time series based on dependence structure

The clustering of time series has attracted growing research interest in recent years. The most popular clustering methods assume that the time series are only linearly dependent but this assumption usually fails in practice. To overcome this limitation, in this paper, we study clustering methods applicable to time series with a general and dependent structure. We propose a copula-based distance to measure dissimilarity among time series and consider an estimator for it, where the strong consistency of the estimator is guaranteed. Once the pairwise distance matrix for time series has been obtained, we apply a hierarchical clustering algorithm to cluster the time series and ensure its consistency. Numerical studies, including a large number of simulations and analysis of practical data, show that our method performs well.


Introduction
Massive amounts of data relating to time series are frequently collected in fields ranging from science, engineering, and business to economics, healthcare, and government. It is often desirable to find groups of similar time series in a set or panel of such series using clustering techniques. In the case of panel time series, especially for short time series, it is beneficial for model estimation and forecasting performance to pool time series with similar data-generating mechanisms [1]. Consequently, as one of the most frequently used exploratory techniques in data mining, clustering is a crucial step in time series analysis.
A popular approach to clustering time series is model-based clustering. This approach assumes that each time series is generated by a specific underlying model or a mixture of underlying probability distributions. The time series are considered similar when the models characterizing them are similar. The most commonly considered models include the ARIMA process [2], the GARCH model [3], and the dynamic regression model [1]. Researchers in machine learning have also used Markov chains [4] and hidden Markov models [5].
Unlike model-based clustering methods, distance-based methods cluster time series in a simple and efficient way, where the choice of a proper distance or dissimilarity measure is a critical step. Once the dissimilarity measure is determined, an initial pairwise dissimilarity matrix can be obtained and many conventional clustering algorithms can then be used to form PLOS  groups of objects. Different distances are pursued according to the aim of time series clustering. The selected distance should be able to capture particular discrepancies between series that are relevant to the final objective of the clustering. The R package TSclust [6] provides a brief overview of well-established peer-reviewed time series dissimilarity measures, including measures based on raw data, extracted features, underlying parametric models, levels of complexity, and forecast behaviors. An interesting overview of time series clustering methods and their applications can be found in [7]. A central issue in the analysis of time series data is to consider the structure of the temporal dependence. It is often helpful for model estimation to cluster time series into several groups according to their underlying dependency structures. In most research on the issue, it is assumed that the temporal dependences of time series are only linear. The ARIMA model is the most commonly used linear model (see, for example, [2,[8][9][10][11][12]). However, the assumption of linearity often fails to hold in practice. When time series are nonlinearly dependent, linear methods suffer from a severe model mismatch problem. Scant attention has been paid in the literature thus far to the clustering of nonlinear time series. Nonparametric model-free methods are usually employed to deal with nonlinear problems. Dissimilarity in nonparametric distance-based clustering methods is measured by comparing serial features extracted from the original series that aim to represent the dynamic structure of each series, such as autocorrelation [13,14], partial autocorrelation [15], cross-correlation [16], and spectral features [17]. Even though nonparametric methods do not make any model assumptions, most aforementioned dissimilarities are quantities related to Pearson's correlation, which can only measure linear dependence. It is thus natural that these features are inadequate at recognizing more general, temporal, and nonlinear dependence structures. They are not expected to perform well at clustering more general time series, which was also shown in our simulation experiments in this study.
We ignore model-based clustering methods and focus on the distance-based nonlinear time series clustering approach due to its popularity and simplicity. In view of the aforementioned considerations, a distance measure of global dependence is required. Specifically, we need to construct a distance measure that can capture linear and nonlinear dependencies without requiring the specification of any kind of model. However, to the best of our knowledge, no prevalent measure of association or dependence can satisfy this requirement. Numerous diagnostics or tests can be used to only examine departures from independence, and include mutual information, entropy measure, and the Hellinger distance. They cannot distinguish between dependence structures or gauge dissimilarity between them.
In this paper, we propose a distance measure based on a copula function to measure the dissimilarity among the general serial dependence of two time series. The advantages of this distance measure can be summarized as follows: First, it overcomes the limitations of prevalent time series clustering methods designed for linear processes. Our simulations show that the proposed measure performs well, particularly in terms of classifying nonlinear models. As the proposed distance measure is designed in terms of a discrepancy in the serial structure of global dependence, which includes linear structures, it can also be used for linear processes. Second, it is nonparametric. To obtain the distance measure, we rely on an empirical estimator of the copula function. The superiority of the proposed estimation approach resides precisely in its ability to account for the divergence of global dependencies with no need to specify an exact model. Third, this is a rank-invariant approach, as copulas possess an invariance property with respect to a monotonically increasing transformation of the variables [18]. Fourth, we can theoretically guarantee the consistency of the distance estimator. Fifth, this distance measure takes a close form that can be efficiently computed.
The remainder of the paper is organized as follows: In Section 2, we described the proposed clustering method with relevant statistical properties. Simulations and analysis of data from a practical scenario are provided in Sections 3 and 4, respectively. A short conclusion is provided in Section 5. The proofs of all theorems are in the Appendix.

Methodology
Notations and copula-based distance where T i is its length and n is the number of time series. We assume that these time series are strictly stationary drawn from J 0 clusters, and the time series in each cluster share a common dynamic pattern. The purpose is to identify these J 0 clusters. We propose a method based on the copula function to represent the dynamic pattern of the time series. Specifically, for a fixed positive integer h, (X ij , X i(j+h) ) and (X i 0 j , X i 0 (j+h) ) have the same copula function if time series X i and X i 0 belong to a common cluster.
For random variables X and Y with continuous marginal cumulative distribution functions F X (x) and F Y (y), respectively, we denote the joint cumulative distribution functions by F(x, y). Sklar's theorem [19] claims that a unique copula function C(u, v) exists connecting F(x, y) to F X (x) and F Y (y) via F(x, y) = C(F X (x), F Y (y)), which is equivalent to Cðu; vÞ ¼ FðF À 1 X ðuÞ; F À 1 Y ðvÞÞ. This means that the copula function C(u, v) can capture the structure of dependence between random variables X and Y. We use the copula to capture the dynamic pattern of the time series.
For each 1 � i � n, we denote by C i,h (u, v) the copula function of (X ij , X i(j+h) ). For arbitrary i 6 ¼ i 0 , we define the following copula-based Cramér-von Mises distance to measure the dissimilarity between time series X i and X i 0 : The copula-based distance D h (i, i 0 ) satisfies the following three classical properties: ) and (X i 0 j , X i 0 (j+h) ) share a common copula function.

(Triangle inequality property)
Nonnegativity and symmetry are apparent in the foregoing, and the triangle inequality property can be obtained from the Cauchy-Schwarz inequality. Traditionally, the dependence structure of time series is often captured using an autocovariance-based linear correlation, which fails in nonlinear dependency structures. Nonlinear dependence may be caused by various nonlinear structures. The copula function does not make any assumptions about the model, and is a flexible method that can capture them. Moreover, copulabased distance is not affected by strict monotonic transformations (i.e., logarithmic transform and exponential transform) due to the property of mapping invariance of the copula function [18].
If time series X i (1 � i � n) are dependent in the order of one, we let h = 1 and directly use D 1 (i, i 0 ) as the dissimilarity measure of X i and X i 0 . If the times series are dependent in a higher order, the dissimilarity measure of X i , X i 0 can be defined as the following weighted version: where K is the highest order of dependence of time series X i and X i 0 considered in the dissimilarity measure definition, and ω h is the weight of D h (i, i 0 ). We can allow ω h to decrease as h becomes larger.
The selection of K depends on the unknown underlying model. We may entertain several possible values of K and use model selection criteria such as AIC and BIC to select the optimal value of K. On this point, [20] claimed that the aim is not the goodness-of-fit to the underlying models but clustering them properly. We thus do not make any significant effort on this issue. In practice, the value of K should be chosen with specific knowledge of the application. It is often the case that the strongest serial dependency occurs in small lags, and a larger value of K means greater computational time and redundant information. Moreover, we can always obtain reasonably satisfactory results using small values of K in our simulations and applications. A large number of studies focusing on application have shown that it suffices for a large number of time series to use a lag of one to obtain goodness of fit. Therefore, in practice K = 1 is highly recommended.

Estimation of copula-based distance
In practice, the copula function is usually unknown, and thus the copula-based distance (1) cannot be used directly. We propose a nonparametric estimation of the copula function that can be plugged into (1) to obtain the estimated distance.
Specifically, we denote the empirical distribution functions of (X it , t = 1, � � �, Then, the nonparametric estimator for the copula function of (X it , X i(t+h) ) (i.e., IðU it � uÞIðV it � vÞ: vÞ and obtain the corresponding copula-based distance estimator: b D h ði; i 0 Þ ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Z Z If we further assume that the time series are α mixing processes, b C i;h ðu; vÞ; b D h ði; i 0 Þ are strong consistency estimators for C i,h (u, v) and D h (i, i 0 ), respectively. We summarize the results as the following theorem: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi The proof of Proposition 1 is provided in Appendix B.

Clustering algorithms based on copula distance
Once the pairwise distance between the time series has been obtained, we can apply any clustering method that uses a distance matrix as input, such as partition around medoids (PAM) [21], spectral clustering [22], and hierarchical clustering [23]. In PAM methods, we need to find a centroid time series in each cluster to represent the group. PAM methods require that the user specify the number of clusters. Through a hierarchical clustering algorithm, we can obtain a tree dendrogram built starting from the leaves and combining clusters to the trunk. The hierarchical clustering algorithm applied in this paper can be summarized as follows:

Clustering Algorithm
Step 1. For i, i 0 = 1, � � �, n and i 6 ¼ i 0 , compute the estimator b Dði; i 0 Þ for the copula-based distance between time series pair X i and X i 0 . Then, we can obtain the distance matrix. Treat each time series as its own cluster.
Step 2. For J = n, n − 1, � � �, 2: a. Compare all pairwise dissimilarity measures among the J clusters and identify the pair of clusters most similar. Merge them into as one.
b. Compute the pairwise dissimilarities of the new J − 1 clusters.
It is worth mentioning that dissimilarities should be defined among clusters based on distances between time series prior to applying the clustering algorithm. Single linkage, complete linkage, average linkage, centroid linkage, and Ward's linkage are the most common methods to extend dissimilarities among observations to dissimilarities among clusters [24].
In practice, The computations of clustering can be performed in a more efficient way. [25] introduced a recurrence formula which can be used to compute the updated inter-cluster distance values efficiently. If a clustering procedure does not satisfy such a recurrence relation, the initial data should be retained throughout the entire process when updating cluster distances. All these aforementioned linkage methods fit into the Lance-Williams formalism, and can therefore easily be implemented with user-defined time series distance. In this paper, the efficient method of implementing clustering is to store the matrix of copula distances and update inter-cluster distance using Lance-Williams recursive formula.
Specifically, the traditional Ward's linkage method minimizes the increase in total withincluster sum of squared error. This increase is proportional to the squared Euclidean distance between cluster centers. Here, we extends the classical Ward's linkage method that relies on Euclidean distance to copula distance, which still shares the same parameters in the update formula with original Ward's linkage method [26]. With the Lance-Williams formula, we do not need to assign the cluster center to compute the new inter-cluster distance.
Recall that we assume these time series are drawn from J 0 clusters. We denote these J 0 clusters by fM 1 ; � � � ; M J 0 g. Let C j0,h (u, v) be the common copula shared by the time series in the j-th cluster. We further define which is the copula-based distance between the j-th and the j 0 -th clusters. Let � = min j6 ¼j 0 D 0,h (j, j 0 ), which is the minimal distance between the clusters. Then, for the foregoing clustering algorithm, the following theorem holds: We call the results of Theorem 2 the consistency of clustering. The proof of Theorem 2 is provided in Appendix C. If J 0 is known, we can directly cluster the time series into J 0 groups. However, the number of clusters J 0 is usually unknown, and we should detect the optimal number of clusters in practice. A useful approach to determine this optimal number is the silhouette method [27]. For each time series X i with i = 1, 2, . . ., n, its silhouette width s(i) is defined as sðiÞ ¼ bðiÞ À aðiÞ maxðaðiÞ; bðiÞÞ ; where a(i) is the average copula distance between series X i and all other time series of the cluster to which X i belongs, and b(i) is the average copula distance between series X i and time series in neighboring cluster, i.e., the nearest cluster to which it does not belong. Let the set of time series be partitioned into J clusters. The corresponding average silhouette width is defined as The average silhouette of the clusters is calculated according to the number of clusters. A high average silhouette width indicates good clustering. Thus, the optimal number of clusters J is that which maximizes the average silhouette over a range of possible values for J. We choose the number of clusters as J � , which yields the maximum value of Sil(J).

Simulations
We used three examples to assess the performance of nonlinear time series clustering based on our proposed distance measure. For the sake of comparison, in each simulation scenario, we performed clustering using some representative dissimilarity measures proposed in the literature, including model-based measures d PIC of [8] and d M of [28], which are based on ARIMA models. We also made comparisons with model-free dissimilarity measures. In the temporal domain, distances d ACF and d PACF are defined as Euclidean distances between the estimated ACF and PACF using a number of significant lags. In the frequency domain, the dissimilarity measures were designed to assess the discrepancy between corresponding spectral densities. [15] proposed distance measures based on the periodogram, Euclidean distances between periodograms (d P ), log-periodograms (d LP ), normalized periodograms (d NP ), and log-normalized periodograms(d LNP ). The other nonparametric dissimilarity measures in the frequency domain, d W(DLS) , d W(LK) , d GLK , and d ISD proposed in [17], were also considered. They are different versions of nonparametric spectral dissimilarity measures. The differences among them were in terms of estimation methods of spectral density and discrepancy functions. For more details concerning these methods, the interested reader can refer to [6].
In the simulation experiments, the ground truth was known in advance. We assessed the clustering methods by using the cluster similarity index proposed in [29], which is defined as where G ¼ fG 1 ; G 2 ; � � � ; G J 0 g are the set of J 0 true clusters, A ¼ fA 1 ; A 2 ; � � � ; A J 0 g is the solution to the clusters by the clustering method evaluated, and where |�| denotes the cardinality of an arbitrary set. The similarity index has values ranging from zero to one, with one corresponding to the case when G and A are identical.

Example 1: Nonlinear time series clustering
In this example, we considered the following four models: 1. Threshold autoregressive (TAR) model X t ¼ 0:5X tÀ 1 IðX tÀ 1 � 0Þ À 2X tÀ 1 IðX tÀ 1 > 0Þ þ ε t ; 2. Exponential autoregressive (EXPAR) model 3. Linear moving average (MA) model 4. Nonlinear moving average (NLMA) model where the error process ε t independently followed N(0, 1). These models were used in [30] for linearity tests and [17] to study time series clustering. Except for model (3), the others were conditional mean nonlinear models. The stationarity of these models can be guaranteed. We here only take the TAR model as an instance to illustrate its stationarity. For a TAR model where {ε t } are independent and identically distributed with zero mean and a finite variance.
Then the necessary and sufficient condition for the strictly stationarity to above TAR model when γ = δ = 0 is α < 1, β < 1 and αβ < 1 [31][32][33][34]. In model (1), we set γ = δ = r = 0, and α = 0.5, β = −2, hence the necessary and sufficient condition holds, and furhter the stationarity of the TAR model (1) is guaranteed. We generated four time series from each model, and thus the sample size was 16. We set the lengths of all series as a common parameter T, and we considered two values of T (i.e. T = 100, 200). For sake of simplicity, in all of the experiments we used uniform weight for copula distance, that is w h = 1 for 1 � h � K. The experiment was repeated 100 times using all the aforementioned distance measures. The clustering similarity indices were calculated and summarized by the boxplot in Fig 1. The distance based on the copula function always yielded the best performance. For each of the distances, a larger series size seems to benefit more. When the series length T = 200, the similarity indices of the copula distance were almost equal to one, which meant that the copula distance could cluster the series into the true group from which they were generated. The first eight distances (d ACF * d LNP ) needed the assumption of linearity of the time series. Therefore, their results were significantly worse than those of the other nonparametric distances (d DLS * d ISD ) in this example. Fig 2 shows the multidimensional scaling (MDS) plot [35] used to visualize observations in two dimensions (T = 200), where the dissimilarity among the time series was based on our proposed copula distance. With the dissimilarity measures obtained from the data, the MDS plot sought a lower-dimensional representation of the data that preserved pairwise distances as closely as possible. Fig 2 shows a clear separation of the four clusters and the capability of copula distance to discriminate among them. Furthermore, it is evident that the series from the MA models and NLMA models are closer to one another because both models expressed time series as a function of white noise.
To gain further insights into copula distance, based on Example 1, we designed two more challenging clustering tasks. The first involved increasing heterogeneity within each cluster and the second explored the performance of copula distance when the strength of nonlinear dependence was changing.  (2) and (4), instead of using a fixed constant in each model, we used varying coefficients generated randomly from some given uniform distributions. Specifically, the models considered were:

Smooth transition autoregressive (STAR) model
where F(X t−1 ) = (1 + exp(−X t−1 )) −1 is the smooth transition function. 2. Exponential autoregressive (EXPAR) model where the smooth transition function is specified as 4. Nonlinear moving average (NLMA) model The results are shown in Fig 3. As the heterogeneity in each group increased, almost all of the distance measured led to different levels of performance degradation except copula distance. The copula distance still generated the best performance among all measures, and its performance has hardly any degradation.

Example 3: Nonlinear time series clustering by adjusting nonlinear strength
In this example, we studied the sensitivity of copula distance to the nonlinear strength of the time series with the length T = 200. We wanted to determine the clustering performance of copula distance when the nonlinear strength of the time series varied. We controlled the strength of nonlinearity by adjusting the coefficients of the models considered in Example 1 as follows: 1. Threshold autoregressive (TAR) model 2. Exponential autoregressive (EXPAR) model 3. Linear moving average (MA) model 4. Nonlinear moving average (NLMA) model We can see that model (3) remains identical to that in Example 1 and, if (b 1 , b 2 , b 4 ) was equal to (2, 10, 0.8), the remaining three models were identical to those in Example 1. On the contrary, if we set (b 1 , b 2 , b 4 ) equal to (−0.5, 0, 0), the models degenerated to the following linear models: We can see that the strength of nonlinear dependency decreased when (b 1 , b 2 , b 4 ) changed from (2, 10, 0.8) to (−0.5, 0, 0). Therefore, if we assigned (b 1 , b 2 , b 4 ) = (2.5α − 0.5, 10α, 0.8α), 0 � α � 1, the size of α represented the strength of nonlinear dependence. The larger the value of α, the stronger the nonlinear dependence. When α = 1, this strength was the highest, which is the situation in Example 1. On the contrary, when α = 0 the strength of nonlinear dependence was the weakest, which was linear. We provided six uniformly spaced values in [0, 1] to α (i.e., α = 0, 0.2, 0.4, 0.6, 0.8, 1), and the results of clustering are shown in Fig 4 below. Clustering time series based on dependence structure From Fig 4, we see that when the time series were simulated from linear models, i.e., α = 0, the distance measures based on the assumption of model linearity yielded the best performance, such as d PACF , d PIC , and d M . The performance of copula distance was not inferior to that of the other methods. As nonlinear strength increased (i.e., α increased), the advantage of nonparametric distance measures become ever more apparent while the clustering performance of methods based on linearity degenerated. In most cases, copula distance yielded far better results than the competition.

Real data analysis
We further illustrate the use of copula distance for time series clustering with two practical examples.

Case 1: Annual real GDP data analysis
We considered data concerning the gross domestic product (GDP) obtained from https:// www.conference-board.org/retrievefile.cfm?filename=Output-Labor-and-Labor-Productivity-1950-20111.xls&type=subsite. It contained the annual real GDP of the 23 most developed countries in the world from 1950 to 2011: Austria, Belgium, Denmark, Finland, France, Germany, Greece, Iceland, Ireland, Italy, Luxembourg, the Netherlands, Norway, Portugal, Spain, Sweden, Switzerland, United Kingdom, Canada, the United States, Australia, New Zealand, and Japan. We considered data normalized by the EKS method [36]. We used annual GDP growth rate log(GDP t ) − log(GDP t−1 ) in the clustering procedures rather than annual GDP. These series were clustered via Ward's linkage method based on copula distance. When K = 2, the dendrogram is shown in Fig 5. The result was relatively insensitive to the maximum lag K used, with similar clustering results obtained when K ranged from two to nine.
In practice, we do not know how many distinct populations generate n time series. In general, as in any cluster analysis, the optimal number of clusters can be chosen according to some objective criterion, such as the average silhouette criterion. The average silhouette of the data is a useful criterion for assessing the natural number of clusters, which can be determined by maximizing the coefficient. It is a measure of how tightly grouped all data in the cluster are. The average silhouette coefficients were examined for different numbers of clusters, and two clusters appeared to yield a compact solution (Fig 6). Fig 7 shows the grouping of the two clusters. In this figure, the two colors represent two groups. To display the clustering effect of copula distance, the area occupied by Europe is magnified many times. It is interesting to note that the countries were grouped primarily by geographical location. The group in blue contains five northern European countries and nearly all developed non-European countries except Japan. The northern European countries have a

Case 2: Population growth data analysis
This application considers the annual population series of 20 US states, and is aimed at identifying similarities among the population growth trend. This data is obtained from the U.S Census Bureau, Population Distribution Division (https://www.census.gov/programs-surveys/ popest/data/data-sets.2007.html). It is a collection of time series of the population estimates from 1991 to 2010 in 20 states of the US, and has been used in [10]. In [10], two different groups of time series in the dataset were identified. Group 1 consisting of CA, CO, FL, GA, MD, NC, SC, TN, TX, VA, and WA had an exponentially growth trend, while Group 2 consisting of IL, MA, MI, NJ, NY, OK, PA, ND, and SD had a stable trend. In this case, we assume the above finding is the ground truth. In the following analysis, we use the growth series by taking log difference of the original time series. We cluster the data with hierarchical clustering algorithm using Ward's linkage method. The dendrogram clustered by copula distance with Ward's linkage method are shown in Fig 8. As we know, in this application the optimal number of clusters is 2. We can get the exact same optimal cluster number by maximizing the average silhouette width (see Fig 9). The copula distance method has the similarity index 0.8. If we section the dendrogram at the highest level we can obtain two groups, and all of the states are correctly classified except three states, CA, FL and MD.

Conclusion
The ability to successfully cluster sets of time series is a popular area of research in many fields. In this paper, we proposed a clustering method that is not limited to linear processes. Because of the diversity of the structures of dependence of time series, we avoided model-based clustering and used the distance-based method instead. We proposed a distance measure based on the copula function to measure the dissimilarity between the general serial dependence of time series. This distance can be calculated easily by the empirical estimator of the copula function. We theoretically guaranteed the consistency of the distance estimator as well. This method takes advantage of the ability of the copula function to measure the global dependence of the time series and fills gaps in research on time series clustering. Three simulation examples and the examination of a practical scenario illustrated the usefulness of our method. However, our proposed method has its limitation. Beyond the linear time series, there are infinitely many nonlinear forms to be explored. In this paper we only explore nonlinear relationship between lagged variables, which is only one general kind of nonlinear time series.

Appendix A: Proof of Theorem 1
For every 1 � i � n, denote by F i (x) the marginal distribution function of X it . Then we will prove the theorem by the following four steps. Step 1. For arbitrary u, v 2 [0, 1], definẽ IðF i ðX it Þ � u; F i ðX iðtþhÞ Þ � vÞ: In this step, we will prove that as T i ! 1,C i;h ðu; vÞ ! a:s: C i;h ðu; vÞ.
is a strictly stationary α mixing processe, one can see that Y i = {Y it , 1 � t � T i } is also a strictly stationary α mixing processe. Then by Proposition 2.8 of [37], one can see that as T i ! 1, Clustering time series based on dependence structure Step 2. In this step, we will prove that as T i ! 1, sup À 1<x<1 j b F i ðxÞ À F i ðxÞj À ! a:s: 0: Because X i = (X it , 1 � t � T i ) is a strictly stationary α mixing processe, X i is ergodic. Then by the theorem of [38], the Glivenko-Cantelli theorem holds for time series X i , hence we have that as T i ! 1, sup À 1<x<1 j b F i ðxÞ À F i ðxÞj À ! a:s: 0: Step 3. In this step, we will prove that b C i;h ðu; vÞ ! a:s: C i;h ðu; vÞ. Define On the event A i , for an arbitrary �, there exists a T such that j b F i ðxÞ À F i ðxÞj < � for 8x 2 (−1, 1). Hence we have that I F i ðX it Þ � T i þ1 Step 4. In this step, we will prove that b D h ði; i 0 Þ ! a:s: D h ði; i 0 Þ as T i , T i 0 ! 1. By the results of Step This completes the whole proof of Theorem 1.