Table 1.
Table of symbols.
Fig 1.
An example showing the input and output for each step of UniCon.
Fig 2.
UniStar-naïve causes a data explosion problem.
The more edges, the darker the background color in red. The blue lines show how edge (17, 16) propagates; the edge is copied every round. The dashed-line is the partition.
Fig 3.
An example of UniStar when the number of partitions ρ = 2.
Red and blue areas represent 0 and 1, respectively. UniStar divides G into overlapping subgraphs G0 and G1, computes from G0 by connecting each node u ∈ V0 to m(Λ(u, G0)) = 1 or m([Λ(u, G0)]ξ(u)) = 2, and computes
in the same way. Then, G′ is the union of
and
.
Fig 4.
Intact edges in round r are marked in blue.
The output of UniStar in round r becomes the input in round r + 1. Intact edge (u, v) in round r such that ξ(u)≠ξ(v) is excluded in a subgraph of round r + 1. The excluded intact edges are marked with red dashed lines.
Fig 5.
An example of UniStar-opt when the threshold τ = 4.
Intact edges are represented as blue lines. The edges filtered by cases 1, 2, and 3 are marked with orange, green, and purple dashed lines, respectively. As the number of remaining edges after round 3 is less than τ = 4, the output edges of Rem on the graph induced by the remaining edges and the edges filtered by cases 1 and 2 become the input of the finishing step.
Table 2.
The summary of datasets.
Fig 6.
The running time of UniCon-opt on various τ.
Fig 7.
The running time of UniCon-opt and UniCon-base on various partition number ρ.
With large data like GSH, CW, and HL, when ρ is greater than each optimal value, the running time of UniCon increases marginally as ρ increases.
Fig 8.
The input and intermediate data sizes of UniStar and UniStar-naïve each round.
UniStar reduces both sizes by up to 4× and 8×, respectively, compared to UniStar-naïve.
Fig 9.
The running time of UniStar and UniStar-naïve, and its’ cumulative sums, respectively.
UniStar is faster than UniStar-naïve in all rounds and requires fewer rounds.
Fig 10.
The number of input edges (denoted by lines) for UniCon-opt (τ = 0) and UniCon-base each round.
Filtering dispensable edges (denoted by bars), UniCon-opt shrinks the input size by 80.4% on average every round on CW.
Fig 11.
The running time of UniStar-opt and UniStar each round.
By the edge filtering, the running time of UniStar-opt drops quickly.
Fig 12.
The numbers of data stored in HybridMap, a hash table, and an array each round of UniCon-opt.
The Y-axis is in a log scale. HybridMap significantly reduces the data stored in memory compared to an array, letting UniCon-opt succeed in processing large graphs: GSH, CW, and HL. HybridMap and a hash table have no significant difference in terms of the peak number of stored data.
Fig 13.
The running time of UniStar-opt with HybridMap, a hash table, and an array, respectively, each round.
UniCon-opt with HybridMap outperforms Unicon-opt with hash tables when the graph is large enough (FS, SD, GSH, CW, and HL) thanks to fast random accesses of HybridMap. UniCon-opt with arrays fails on large graphs (GSH, CW, and HL) because of an out-of-memory error.
Fig 14.
(left) UniCon handles up to 4096× larger graphs than competitors. (right) UniCon-opt shows the best performance regardless of the number of machines.
Fig 15.
The relative running time, compared to UniCon-opt, of competitors on real-world graphs.
o.o.m.: out-of-memory error. UniCon-opt with optimal τ outperforms all competitors on all graphs except for LJ. Rem, ConnectIt, and PowerGraph are faster than UniCon-opt on LJ but fail on all other graphs because of out-of-memory errors.
Fig 16.
The numbers of rounds required by all algorithms except Rem on real-world graphs.
UniCon-opt requires up to 11 fewer rounds than competitors.