Stochastic block models: A comparison of variants and inference methods

doi:10.1371/journal.pone.0215296

Fig 1.

General structures and their representation as a standard SBM.

The standard SBM is represented as a block matrix with the probabilities visualized in a grey-scale. a) assortative structure b) disassortative structure c) core-periphery d) hierarchy.

More »

Expand

Fig 2.

Visualization of a complex network with three partitions, the result of standard SBM and degree-corrected SBM as well as the metadata.

The networks are drawn by ordering the nodes by degree with the highest at the top and splitting the nodes according to the known metadata into two sets. The color of each node represents the membership to one of the two groups given by the partition of the respective column. Below the network graphics, the degree distribution is colored according to its partitions. Each node represents a bar with the height of its degree and the color of its group.

More »

Expand

Fig 3.

The resulting SBM of a graph can be represented as a multi-graph, which contains a node for each block and the edges between the blocks like in the underlying graph.

Based on this representation of a SBM the process of inference can be iterated by applying the method again to this representation.

More »

Expand

Table 1.

SBM variants, authors and assigned abbreviations.

More »

Expand

Fig 4.

Results of Girvan-Newman test applied with a known number of groups.

Each marker represents the average of 10 network instances. For each network each of inference algorithm (see Fig 5) were executed 10 times from random partitions with 4 blocks for each of the shown SBM variants. The AMI of the partition with the best objective value of all inference algorithms and all executions is taken into account. Since the results of the DCP algorithm are the same as the SKN for the case without model selection, the diagram only includes the results of SKN.

More »

Expand

Fig 5.

Overview of the performance of the applied inference algorithms.

The results shown are based on the same 10 executions for each of the 10 networks for each SBM variant of Fig 4, which are the basis for Fig 4. The value shown are the mean of all executions, all networks with the same k_out and all SBM variants (of Fig 4). Fig 6 includes the results for selected inference algorithms for individual SBM variants.

More »

Expand

Fig 6.

Results from starting the inference algorithms from the true partition 10 times for 10 network instances for each k_out = 0, 0.5, …, 16.

The resulting difference to the original value (designated with a blue dot ●), the average AMI of all runs of all network with the same parameter (represented with green triangles upside down ▼) and for comparison the average in the same manor reached from a random starting partition with the known number of groups (designated with red triangles ▲) are shown. The AMI values (▼,▲) of each row are visualized according to the right axis of each row. The difference of the objective functions we calculate with P_observed − P_true and is measured according to the left axis in each row. If the regarded model describes a likelihood, positive values represent more likely partitions then the planted partition. Each small diagram contains the values of the model of its row and the inference algorithm of its column. To reduce the total number of diagrams shown only the results of the Metropolis-Hastings algorithm with 50 000 steps is shown. Because PAH is not designed to start from a given partition, it is not included, too.

More »

Expand

Fig 7.

The recorded number of calculated deltas of the studied inference algorithm during all performed runs for Fig 4.

The green triangle marks the mean and the orange line represents the median of the values.

More »

Expand

Fig 8.

Results of Girvan-Newman test with model selection.

Like before, we used 10 network instances for each k_out. For each network and each SBM variant we executed the MHA 50k 10 times for each K = 1, …, 10. Then we considered one execution for each network for all number of group K ∈ {1, …, 10} as one unit and executed the model selection based on these results. Therefore, a data point is the average of 100 AMI values resulting from the 10 network instances and the 10 selected partitions. For the classic models only the best model selection according to Table 2 is shown.

More »

Expand

Table 2.

Normalized AUMIC of the different SBM variants of the GN test for 0 ≤ k_out ≤ 8 based on 10 executions of the MHA 50 000 from 10 random partitions for each group size K = 1, …, 10 and each k_out = 0, 0.5, …, 8.

More »

Expand

Fig 9.

Ratio of average selected number of blocks to true number of blocks K of the classic SBM variants with the presented model selections.

As in Fig 10, the values shown are based on the same setting as Fig 8.

More »

Expand

Fig 10.

Ratio of average selected number of blocks to true number of blocks K of those SBM variants which include model selection step.

The values shown are based on the same setting as Fig 8.

More »

Expand

Fig 11.

Results of LFR benchmark applied with the known number of groups.

Each marker represents the average of 10 network instances. For each network each of inference algorithm (see Fig 12) were executed 10 times from random partition with the same number of blocks as the planted partition for each of the shown SBM variants. The AMI of the partition with the best objective value of all inference algorithms and all executions is used as result for the combination of network and model. Since the results of the DCP algorithm are the same as the SKN for the case without model selection, the diagram only includes the results of SKN.

More »

Expand

Fig 12.

Overview of the performance of the applied inference algorithms.

The results shown are based on the same executions as Fig 11, i.e. 10 executions for each of the 10 network instances for each SBM variant. The value shown are the mean of all executions with the same μ_t and all SBM variants (of Fig 11). Fig 14 includes the results of selected inference algorithms for individual SBM variants.

More »

Expand

Fig 13.

The recorded number of calculated deltas of the studied inference algorithm during all performed runs for Fig 11.

More »

Expand

Fig 14.

Executing the inference algorithms from the true partition 10 times for 10 network instances for each μ_t = 0, 0.1, …, 0.6 of the LFR benchmark.

The Figure is structured in the same way like Fig 6. The resulting difference to the original value (designated with a blue dot ●), the average AMI of all runs of all network with the same parameter (represented with green triangles upside down ▼) and for comparison the average of the results from a random starting partition with the known number of groups (designated with red triangles ▲) are shown. The AMI values of each row are visualized according to the right axis of each row. The difference between the objective function values we calculate with P_observed − P_true and is measured according to the left axis in each row.

More »

Expand

Fig 15.

Ratio of average selected number of blocks to average true number of blocks K of the classical SBMs.

As in Fig 16, the values shown are based on setting of Fig 17.

More »

Expand

Fig 16.

Ratio of average selected number of blocks to average true number of blocks K of those SBM variants including model selection.

The values shown are based on the same setting as Fig 17.

More »

Expand

Fig 17.

Results of LFR benchmark using model selection.

Like before, we used 10 network instances for each μ_t. For each network and each SBM variant we executed the MHA 250k 10 times for each μ_t = 0, 0.1, …, 0.6. Then we considered one execution for each network for all number of groups K ∈ {1, …, 30} as one unit and executed the model selection based on these results. Therefore, a data point is the average of 100 AMI values resulting from 10 networks and 10 selected partitions. For the classic models only the best model selection according to Table 3 is shown.

More »

Expand

Table 3.

Normalized AUMIC of the different SBM variants of the LFR benchmark for 0 ≤ μ_t ≤ 0.5 based on 10 executions of the MHA 250 000 from 10 random partitions for each number of groups K = 1, …, 30 and each μ_t = 0, 0.1, …, 0.5.

More »

Expand

Fig 18.

Results of inference of real networks.

Each model beside the hierarchical ones (HSPC, HSDCPU, HDCPUH) was executed 10 times with MHA with 250 000 steps from the partition given by the metadata and for each group size between 1 and 15 (respectively 25 for the football network) from 10 random starting points. For these models the figure includes results of the start from the metadata and for partitions with the same number of groups like the metadata. The classical SBM also contain the results for the three presented model selection AIC, BIC and MDL. Whereas, the models, which include the model selection step, show instead the results of their model selection. The hierarchical SBM variants only show the results of 10 executions with algorithm for the hierarchical models. The bars represent the average of the 10 independent runs (for the one with model selection one execution per group size) and the solid line in the middle of each bar represents the AMI of the partition with the best objective value. Bars not visible relate to results near to zero and the dashed gray line with its white space was added to separate the models.

More »

Expand