Fig 1.
The comparison between Euclidean distance and DTW distance.
(A) and (B) show the alignment rule of Euclidean. Ignoring the time shifts, the distance between the unimodal curve (load curve 1) and the unimodal curve (load curve 2), dEU(curve1, curve2) = 4.2742, is smaller than the distance between the unimodal curves (load curve 2 and load curve 3) dEU(curve2, curve3) = 4.8454. (C) and (D) on the contrary, show the alignment rules of DTW. Considering the time shifts, the distance between the unimodal curve (load curve 1) and the unimodal curve (curve 2), dDTW(curve1, curve2) = 0.0299, is larger than the distance between the unimodal curves (load curve 2 and load curve 3) dDTW(curve2, curve3) = 0.0099. Notes: in order to show the alignment clearly, we move down the lower curve for 2 units.
Fig 2.
Presentation of active set and its 1st-neighbor.
The distribution of distances between 1st-neighbor and all data items is presented; the histogram shows the frequency of distances; the corresponding blue line is smoothed frequency, acting as a approximated curve of probability density function (PDF).
Fig 3.
(1) generater adjacent matrix; (2) judgement on 1st-neighbor and; (3) the integral loop layer.
Fig 4.
Heterogenous clusters with noise.
(A) shows the clustering result with a scatter picture, in which the members belonging to the same clusters are shown in the same color and noise are labeled as tiny clusters; (B) shows the adjacent matrix after clustering, the blocks on the diagonal are separated clusters.
Fig 5.
(A) shows the long period missing values in load curve; (B) shows the heat map of dynamic warping and the new align path.
Fig 6.
The applications of DTW on load curves with missing values.
(A) The red cycles highlight the missing values in raw data. (B) The missing values are repaired by dsDTW.
Fig 7.
The proportions of clusters and noises under the different values of r.
When r is set to be 2, most of the load curves are excluded from the six major clusters, that is, normal load curves are wrongly recognized as noise. When r is set to be 6, cluster 2 are composed of two types of different load curves, that is, the time tolerance is too high to divide clusters properly. r = 4 makes a trade-off between the mentioned situations.
Fig 8.
Clustering results of different clustering methods.
Notes: The proposed method gets 6 major clusters and many tiny clusters. The tiny clusters are composed of a few members. In applications, these tiny clusters are recognized as noise. So we only present the 6 major clusters for conciseness.