Figure 1.
Examples for link similarity calculation.
(A) A simple example for the link similarity calculation. (B) First example to show the limitation of the original link similarity calculation. (C) Second example to show the limitation of the original link similarity calculation. (D) Third example to show the limitation of the original link similarity calculation.
Figure 2.
A simple network for ELC and LC calculation.
(A) A simple network example mentioned in Ahn’s paper (2010). (B) The transform matrix and (C) The dendrogram obtained by ELC on (A)’s example networks. (D) The transform matrix and (E) the dendrogram obtained by LC on (A)’s example networks.
Figure 3.
Karate network (34 nodes/2 classes).
The transform matrix (A) and the dendrogram (B) obtained by ELC, the transform matrix (C) and the dendrogram (D) obtained by LC. (E–G) Communities and corresponding values of Extended Quality of modularity (EQ), Partition Density (PD), In-Group-Proportion (IGP), Communities Number (CN), Cover Rate (CR) and number of Uncovered Nodes (UN) obtained by ELC, LC and CPM. *the red and bold data marked with an asterisk (*) is the best value of each evaluation on the dataset for the three methods.
Figure 4.
Dolphin network (62 nodes/2 classes).
The transform matrix (A) and the dendrogram (B) obtained by ELC, the transform matrix (C) and the dendrogram (D) obtained by LC. (E-G) Communities and corresponding values of Extended Quality of modularity (EQ), Partition Density (PD), In-Group-Proportion (IGP), Communities Number (CN), Cover Rate (CR) and number of Uncovered Nodes (UN) obtained by ELC, LC and CPM. *the red and bold data marked with an asterisk (*) is the best value of each evaluation on the dataset for the three methods.
Figure 5.
US politics network (105 nodes/3 classes).
The transform matrix (A) and the dendrogram (B) obtained by ELC, the transform matrix (C) and the dendrogram (D) obtained by LC. (E–G) Communities and corresponding values of Extended Quality of modularity (EQ), Partition Density (PD), In-Group-Proportion (IGP), Communities Number (CN), Cover Rate (CR) and number of Uncovered Nodes (UN) obtained by ELC, LC and CPM. *the red and bold data marked with an asterisk (*) is the best value of each evaluation on the dataset for the three methods.
Figure 6.
Football network (115 nodes/12 classes).
The transform matrix (A) and the dendrogram (B) obtained by ELC, the transform matrix (C) and the dendrogram (D) obtained by LC. (E–G) Communities and corresponding values of Extended Quality of modularity (EQ), Partition Density (PD), In-Group-Proportion (IGP), Communities Number (CN), Cover Rate (CR) and number of Uncovered Nodes (UN) obtained by ELC, LC and CPM. *the red and bold data marked with an asterisk (*) is the best value of each evaluation on the dataset for the three methods.
Figure 7.
Y2H network (1647 nodes/3 sources).
The transform matrix (A) and the dendrogram (B) obtained by ELC, the transform matrix (C) and the dendrogram (D) obtained by LC. (E–G) Communities and corresponding values of Extended Quality of modularity (EQ), Partition Density (PD), In-Group-Proportion (IGP), Communities Number (CN), Cover Rate (CR) and number of Uncovered Nodes (UN) obtained by ELC, LC and CPM. *the red and bold data marked with an asterisk (*) is the best value of each evaluation on the dataset for the three methods.
Figure 8.
Y2H network for GO enrichment analysis.
(A) Y2H network’s community numbers and GO enrichment values obtained by ELC, LC and CPM. Axis x is log10 community numbers and axis y is –log10 p-values of all modules GO enrichment for biological process, molecular functions and cellular component. The average communities size found by ELC are much higher than LC and CPM by GO categories at smaller p-value level, especially when p-values are lower than E-8. (B) Y2H network’s statistics on nodes number of communities by ELC, LC and CPM.
Figure 9.
A selected artificial network set with different node average degrees and pinside values.
Table 1.
Proteins number (PN) in the top 10 communities of three methods sorted by GO enrichment values ranked p-values of all modules for biological process, molecular functions and cellular component.
Table 2.
ELC performance on different artificial datasets conditions.
Table 3.
LC performance on different artificial datasets conditions.
Table 4.
CPM performance on different artificial datasets conditions.
Table 5.
Comparison with three methods on five real-world networks by four different evaluations.
Table 6.
Comparison with three methods on five real-world networks by cover rate and uncovered nodes.