A density-based matrix transformation clustering method for electrical load

doi:10.1371/journal.pone.0272767

Fig 1.

The comparison between Euclidean distance and DTW distance.

(A) and (B) show the alignment rule of Euclidean. Ignoring the time shifts, the distance between the unimodal curve (load curve 1) and the unimodal curve (load curve 2), d^EU(curve1, curve2) = 4.2742, is smaller than the distance between the unimodal curves (load curve 2 and load curve 3) d^EU(curve2, curve3) = 4.8454. (C) and (D) on the contrary, show the alignment rules of DTW. Considering the time shifts, the distance between the unimodal curve (load curve 1) and the unimodal curve (curve 2), d^DTW(curve1, curve2) = 0.0299, is larger than the distance between the unimodal curves (load curve 2 and load curve 3) d^DTW(curve2, curve3) = 0.0099. Notes: in order to show the alignment clearly, we move down the lower curve for 2 units.

More »

Expand

Fig 2.

Presentation of active set and its 1st-neighbor.

The distribution of distances between 1st-neighbor and all data items is presented; the histogram shows the frequency of distances; the corresponding blue line is smoothed frequency, acting as a approximated curve of probability density function (PDF).

More »

Expand

Fig 3.

Work flow with 3 modules.

(1) generater adjacent matrix; (2) judgement on 1st-neighbor and; (3) the integral loop layer.

More »

Expand

Fig 4.

Heterogenous clusters with noise.

(A) shows the clustering result with a scatter picture, in which the members belonging to the same clusters are shown in the same color and noise are labeled as tiny clusters; (B) shows the adjacent matrix after clustering, the blocks on the diagonal are separated clusters.

More »

Expand

Fig 5.

Long period missing values.

(A) shows the long period missing values in load curve; (B) shows the heat map of dynamic warping and the new align path.

More »

Expand

Fig 6.

The applications of DTW on load curves with missing values.

(A) The red cycles highlight the missing values in raw data. (B) The missing values are repaired by dsDTW.

More »

Expand

Fig 7.

The proportions of clusters and noises under the different values of r.

When r is set to be 2, most of the load curves are excluded from the six major clusters, that is, normal load curves are wrongly recognized as noise. When r is set to be 6, cluster 2 are composed of two types of different load curves, that is, the time tolerance is too high to divide clusters properly. r = 4 makes a trade-off between the mentioned situations.

More »

Expand

Fig 8.

Clustering results of different clustering methods.

Notes: The proposed method gets 6 major clusters and many tiny clusters. The tiny clusters are composed of a few members. In applications, these tiny clusters are recognized as noise. So we only present the 6 major clusters for conciseness.

More »

Expand