UniCon: A unified star-operation to efficiently find connected components on a cluster of commodity hardware

doi:10.1371/journal.pone.0277527

Table 1.

Table of symbols.

More »

Expand

Fig 1.

An example showing the input and output for each step of UniCon.

More »

Expand

Fig 2.

UniStar-naïve causes a data explosion problem.

The more edges, the darker the background color in red. The blue lines show how edge (17, 16) propagates; the edge is copied every round. The dashed-line is the partition.

More »

Expand

Fig 3.

An example of UniStar when the number of partitions ρ = 2.

Red and blue areas represent 0 and 1, respectively. UniStar divides G into overlapping subgraphs G₀ and G₁, computes from G₀ by connecting each node u ∈ V₀ to m(Λ(u, G₀)) = 1 or m([Λ(u, G₀)]_ξ(u)) = 2, and computes in the same way. Then, G′ is the union of and .

More »

Expand

Fig 4.

Intact edges in round r are marked in blue.

The output of UniStar in round r becomes the input in round r + 1. Intact edge (u, v) in round r such that ξ(u)≠ξ(v) is excluded in a subgraph of round r + 1. The excluded intact edges are marked with red dashed lines.

More »

Expand

Fig 5.

An example of UniStar-opt when the threshold τ = 4.

Intact edges are represented as blue lines. The edges filtered by cases 1, 2, and 3 are marked with orange, green, and purple dashed lines, respectively. As the number of remaining edges after round 3 is less than τ = 4, the output edges of Rem on the graph induced by the remaining edges and the edges filtered by cases 1 and 2 become the input of the finishing step.

More »

Expand

Table 2.

The summary of datasets.

More »

Expand

Fig 6.

The running time of UniCon-opt on various τ.

More »

Expand

Fig 7.

The running time of UniCon-opt and UniCon-base on various partition number ρ.

With large data like GSH, CW, and HL, when ρ is greater than each optimal value, the running time of UniCon increases marginally as ρ increases.

More »

Expand

Fig 8.

The input and intermediate data sizes of UniStar and UniStar-naïve each round.

UniStar reduces both sizes by up to 4× and 8×, respectively, compared to UniStar-naïve.

More »

Expand

Fig 9.

The running time of UniStar and UniStar-naïve, and its’ cumulative sums, respectively.

UniStar is faster than UniStar-naïve in all rounds and requires fewer rounds.

More »

Expand

Fig 10.

The number of input edges (denoted by lines) for UniCon-opt (τ = 0) and UniCon-base each round.

Filtering dispensable edges (denoted by bars), UniCon-opt shrinks the input size by 80.4% on average every round on CW.

More »

Expand

Fig 11.

The running time of UniStar-opt and UniStar each round.

By the edge filtering, the running time of UniStar-opt drops quickly.

More »

Expand

Fig 12.

The numbers of data stored in HybridMap, a hash table, and an array each round of UniCon-opt.

The Y-axis is in a log scale. HybridMap significantly reduces the data stored in memory compared to an array, letting UniCon-opt succeed in processing large graphs: GSH, CW, and HL. HybridMap and a hash table have no significant difference in terms of the peak number of stored data.

More »

Expand

Fig 13.

The running time of UniStar-opt with HybridMap, a hash table, and an array, respectively, each round.

UniCon-opt with HybridMap outperforms Unicon-opt with hash tables when the graph is large enough (FS, SD, GSH, CW, and HL) thanks to fast random accesses of HybridMap. UniCon-opt with arrays fails on large graphs (GSH, CW, and HL) because of an out-of-memory error.

More »

Expand

Fig 14.

Data and machine scalability.

(left) UniCon handles up to 4096× larger graphs than competitors. (right) UniCon-opt shows the best performance regardless of the number of machines.

More »

Expand

Fig 15.

The relative running time, compared to UniCon-opt, of competitors on real-world graphs.

o.o.m.: out-of-memory error. UniCon-opt with optimal τ outperforms all competitors on all graphs except for LJ. Rem, ConnectIt, and PowerGraph are faster than UniCon-opt on LJ but fail on all other graphs because of out-of-memory errors.

More »

Expand

Fig 16.

The numbers of rounds required by all algorithms except Rem on real-world graphs.

UniCon-opt requires up to 11 fewer rounds than competitors.

More »

Expand