Fig 1.
Cluster center selection process at time t.
At time t, the current cluster center is chosen based on the previous γ cluster centers
, 1 ≤ q ≤ γ. The drift time window γ determines for how many previous time steps centers must be within maximum drift λ of one another. The objective in Expression 2 is evaluated for all potential cluster centers; the center that minimizes the objective is chosen.
Fig 2.
Visualization of a Clique-cross-Clique over three timesteps.
The graph consists of three “ground truth” clusters with five members each.
Fig 3.
Visualization of a Theseus Clique, with n = 5 nodes in each ground-truth cluster.
The clusters are disconnected every n time steps, and the connectivity pattern repeats every n2 time steps.
Fig 4.
Visualization of a Random Clique-cross-Clique containing three clusters with five nodes each, intra-cluster connection probability p = .30, and inter-cluster connection probability p′ = .20.
Table 1.
Average AMI scores (higher is better) of 100 independent runs of various community detection methods over a range of synthetic datasets.
STGkM is our method, CC uses dynamic connected components, k-medoids compresses a dynamic graph into a single static one and uses k-medoids, and DCDID is a heuristic method [4]. The best performance is bolded.
Fig 5.
Cluster assignment histories of STGkM, CC, and DCDID run on a Clique-cross-Clique.
CC and DCDID identify only the single, connected component at every timestep, whereas STGkM finds three stable clusters.
Fig 6.
Cluster assignment histories of STGkM, CC, and DCDID run on a Strong Random Clique-cross-Clique.
Due to the strong connectivity, CC can only ever find a single connected component. DCDID sometimes finds multiple clusters, but they are inconsistent. STGkM finds three relatively stable, evolving communities.
Fig 7.
Cluster assignment histories of STGkM, CC, and DCDID run on a Mixed Random Clique-cross-Clique.
Though all three methods have three unique groups of cluster histories, only STGkM’s persist throughout time and give insight into cluster evolution.
Fig 8.
Cluster assignment histories of STGkM, CC, and DCDID run on a Weak Random Clique-cross-Clique.
Due to the disconnected nature of the network, CC and DCDID find upwards of fifty unique clusters throughout the duration of the simulation. Contrastingly, STGkM monitors the evolution of three cohesive communities.
Fig 9.
Cluster assignment histories of STGkM, CC, and DCDID run on a Theseus Clique.
CC and DCDID identify the two connected components at every time step, while STGkM attempts to maintain some level of stability in cluster identity over time.
Fig 10.
Cluster assignment histories of STGkM, CC, and DCDID run on Three Clusters.
Only STGkM is able to track how three clusters evolve. CC and DCDID predominantly find only a single cluster.
Fig 11.
Sensitivity of AMI score for STGkM run on Standard and Random Clique-cross-Cliques with varying spatial complexities, time complexities, and levels of connectivity.
Though STGkM may take much longer to converge on small communities when inter- and intra- connectivity are both very high or very low, there is a clear upward trend over time. STGkM will eventually find the ground truth communities.
Fig 12.
Average silhouette score versus number of clusters on a synthetic dataset.
Note: a higher score is better.
Fig 13.
Average silhouette score versus number of clusters on the roll call dataset.
Note: a higher score is better.
Fig 14.
Matrices showing the similarity scores of short-term cluster assignment histories between every pair of vertices in the roll call dataset using STGkM with γ = 5, λ = 1, and k = 2 on the left and k = 3 on the right.
Rows and columns are ordered based on long-term cluster membership.
Fig 15.
Average silhouette score versus number of clusters on the Semantic Scholar dataset.
Note: a higher score is better.
Fig 16.
Matrix storing the similarity scores of short-term cluster assignment histories between every pair of nodes in the Semantic Scholar dataset using STGkM with γ = 1, λ = 1, and k = 18.
Rows and columns are ordered based on long-term cluster membership.
Fig 17.
Four most common journals for a selection of clusters from the Semantic Scholar dataset using STGkM with γ = 1, λ = 1, and k = 18.
The most common journals in each cluster tend to cover similar topics.
Fig 18.
Average silhouette score verses number of clusters on the Reddit dataset for subreddits with positive sentiment on the left and negative sentiment on the right.
Note: a higher score is better.
Fig 19.
Matrices showing the similarity scores of short-term cluster assignment histories between every pair of vertices in the Reddit dataset using STGkM with λ = 1, γ = 1, and k = 19 on positive sentiment data on the left and λ = 1, γ = 1, and k = 2 on negative sentiment data on the right.
Rows and columns are ordered based on long-term cluster membership.
Fig 20.
Top row: Four most common subreddits for a selection of clusters from the positive sentiment Reddit data using STGkM with γ = 1, λ = 1, and k = 19. Bottom row: Ten most common subreddits for the two dynamic clusters from the negative sentiment Reddit data using STGkM with γ = 1, λ = 1, and k = 2.