Pattern detection in bipartite networks: A review of terminology, applications, and methods

doi:10.1371/journal.pcsy.0000010

Fig 1.

Bipartite networks are ubiquitous in the real world.

Examples of systems that can be represented by bipartite networks (left) and their corresponding binary matrix representation (right). Black/gray cells in each matrix indicate presence of links (i.e., 1s) between the items in rows and the items in columns, while white cells indicate the absence of links (i.e., 0s). The networks link, respectively: (a) buyers to purchases; (b) ruminants to associated microbiota; (c) plants to pollinators; (d) authors to articles; (e) listeners to songs; (f) visitors to exhibitions; (g) genes to samples; (h) species occurrences to localities; amd (i) countries to exported commodities.

More »

Expand

Table 1.

Examples of bipartite networks and their applications in the real world.

More »

Expand

Fig 2.

Bipartite networks and their projections.

A hypothetical bipartite network connecting different ruminants to associated microorganisms (a), and its two one-mode projections. One projection (b) connects all the microorganisms that are found together in at least one host. The other projection (c) connects all the ruminants sharing at least one microorganism, generating a fully connected network in this example.

More »

Expand

Fig 3.

Pattern detection in bipartite networks.

Schematic example of how matrix randomization is used to detect nonrandom patterns in bipartite networks/rectangular matrices. Black/gray cells in each matrix indicate presence of links (i.e., 1s) between the items in rows and the items in columns, while white cells indicate the absence of links (i.e., 0s). First, the structural measure of interest (in this case, a nestedness metric, NODF [71]) is computed on the target matrix (a). Then, a large set (ideally some hundreds or thousands) of randomized versions of the starting matrix are generated, and the target metric is computed for each of them (b). The possible rules to be applied in the generation of the random matrices, the reasoning behind, and the implications of choosing a particular set of rules over another, as well as the practical implementation of randomization procedures will be described in detail in the sections. The target metric computed on the original matrix will be then compared with the distribution of the metric values computed on the random matrices. Such comparison would permit to obtain an estimated p-value computed as the frequency of random matrices for which the target metric is equal or higher than that of the original matrix. In some fields, and particularly in ecology [29], it is also common practice to compute a standardized effect size (Z) as (μ –x) / σ, where μ and σ are the average and standard deviation of the target metric across the randomized matrices, and x is the value of the metric in the original matrix. It should be noted that the use of Z values is based on the underlying assumption that the distribution of the target metric values in the set of randomized matrices follows a normal distribution, which might not be always the case.

More »

Expand

Fig 4.

A classification scheme of bipartite randomization algorithms, based on whether the matrix row and columns sums are preserved exactly (Fixed, F), preserved on average (Proportional, P), or unconstrained (Equiprobable, E).

Black cells in each matrix indicate presence of links (i.e., 1s) between the items in rows and the items in columns, while white cells indicate the absence of links (i.e., 0s).

More »

Expand

Fig 5.

Effect of different randomization constraints on pattern detection.

Example of how applying different constraints to matrix randomization can lead to contrasting resulst in pattern detection. In this example, we apply the same pattern detection workflow as described in Fig 3. Black/gray cells in each matrix indicate presence of links (i.e., 1s) between the items in rows and the items in columns, while white cells indicate the absence of links (i.e., 0s). First, the structural measure of interest (in this case, a nestedness metric, NODF [71]) is computed on the target matrix (a). Then, two sets of 1,000 randomized versions of the starting matrix are generated using, alternatively, an algorithm that generates random matrices with the same exact row and column totals of the starting matrix (FF), and an algorithm that generates random matrices having the same size, shape, and fraction of occupied cells of the starting matrix, but with varying (equiprobable) row and column totals (EE). The target metric is computed for each random matrix in the two sets (b, d). Then, the starting NODF value is compared against the two distribution of “null” values in the two sets of randomized matrices. In this example, the starting NODF does not depart significantly from the null expectation from the set of matrices generated with the FF algorithm (Z = 1; p = 0.186). Conversely, the pattern is identified as particularly strong when compared with the metrics measured in the random matrices generated with the EE algorithm (Z = 6; p = 0).

More »

Expand