Backbone: An R package for extracting the backbone of bipartite projections

Bipartite projections are used in a wide range of network contexts including politics (bill co-sponsorship), genetics (gene co-expression), economics (executive board co-membership), and innovation (patent co-authorship). However, because bipartite projections are always weighted graphs, which are inherently challenging to analyze and visualize, it is often useful to examine the ‘backbone,’ an unweighted subgraph containing only the most significant edges. In this paper, we introduce the R package backbone for extracting the backbone of weighted bipartite projections, and use bill sponsorship data from the 114th session of the United States Senate to demonstrate its functionality.


Unfunded studies
Enter: The author(s) received no specific funding for this work. The authors have declared that no competing interests exist. NO     Networks are useful for studying many different phenomena in the natural and social 2 worlds, but network data can be difficult to collect directly. As a result, it is common 3 for research to measure an unobserved unipartite network of interest using a bipartite 4 projection in which the edges capture whether (or the extent to which) two vertices 5 co-participate in a relevant event. For example, friendship networks are measured using 6 event co-attendance [1], political networks are measured using bill co-sponsorship [2], 7 executive networks are measured using board co-membership [3], scholarly collaboration 8 networks are measured using paper co-authorship [4], knowledge networks are measured 9 using paper co-citation [5], and genetic networks are measured using gene 10 co-expression [6]. 11 However, there are several challenges to studying bipartite projections. First, the 12 projection function "transforms the problem of analysing a bipartite structure into the 13 problem of analysing a weighted one, which is not easier" [7, p. 34-35]. Second, when 14 transforming a bipartite graph into a unipartite graph via projection, information about 15 the 'events' responsible for edges between vertices is lost [7]. Third, bipartite 16 projections introduce topological artifacts such as inflated clustering, such that the 17 projection of "even a random [bipartite] network -one that has no particular structure 18 built into it at all -will be highly clustered" [8, p. 128]. Finally, the scale of each edge 19 weight in a bipartite projection is driven by characteristics of both sets of vertices in the 20 original bipartite graph. For example, observing two people attending many of the same 21 events (i.e. a large edge weight) may still be uninteresting if they each attended many 22 large events, but observing two people attend just a few of the same events (i.e. a small 23 edge weight) may be interesting if they each attended only a few small events [9].
For these reasons, it is often useful to extract and study the backbone of a bipartite 25 projection. The backbone is an unweighted (i.e. an edge is either present or absent) or 26 signed (i.e. an edge can be positive, negative, or absent) graph that preserves only the 27 most "significant" edges from the weighted bipartite projection. Multiple methods of 28 backbone extraction exist for bipartite projections [9], but until now there has not been 29 a single software package that implements the most common methods. In this paper, we 30 introduce and demonstrate the backbone R package, which implements four backbone 31 extraction methods -a universal threshold, a hypergeometric model, a stochastic degree 32 sequence model, and a fixed degree sequence model -in a common framework that  the edge e = ij. We make no distinction between a graph and its adjacency matrix. 50 We call a graph bipartite if the set of vertices can be partitioned into two sets U and 51 W such that each edge of the graph is of the form ij where i ∈ U and j ∈ W . The sets 52 U and W are called independent, meaning there are no edges present within the sets.

53
These graphs can be represented by a biadjacency matrix, B. This matrix uses rows to 54 represent vertices in the set U and columns to represent vertices in the set W . We set 55 B ij = 1 if there is an edge between vertex i of U and vertex j of W , and set B ij = 0 56 otherwise. As with graphs, we conflate a bipartite graph with its biadjacency matrix.

57
The row sum of the ith row of the biadjacency matrix, denoted R i , is equal to the 58 degree of the ith vertex in the set U . Similarly, the column sum of the j column of B, 59 denoted C j , is the degree of the jth vertex in W . 60 We can use bipartite graphs to study social networks where the vertices of U are agents (e.g., people), the vertices of W are artifacts (e.g., events), and edges represent the affiliation of a person with an artifact. These bipartite networks are often also called affiliation networks or two-mode networks. To transform this data into a weighted graph, we project the bipartite adjacency matrix B by multiplying it by its transpose B . This produces a weighted graph G = BB called the bipartite projection, where G ij is equal to the number of artifacts of W with which both i and j are affiliated when i = j. The value G ii is equal to the total number of artifacts with which i is affiliated. Let The value of each off-diagonal entry G ij is bounded by July 29, 2020 2/17 Bipartite projections are of interest in social network analysis because they allow us 61 to construct a network from artifact affiliations, which are often easier to obtain than 62 taking a survey of the network members. If the number of members of the network is 63 large, getting complete and reliable information regarding relationships between 64 members can involve long and repetitive survey techniques which can lead to survey 65 fatigue and costly field work. Additionally, in some settings, individuals may be 66 reluctant to provide information about their relationships. In these cases, it is often 67 beneficial and easier to collect bipartite network information, such as event attendance, 68 then project to infer a weighted network between the social figures.

69
Because bipartite projections are weighted, and because what counts as a 'large' or 70 'small' weight can differ for each edge, it can be useful to reduce this information by 71 focusing on an unweighted subgraph that contains only the most important edges. We 72 call this subgraph a backbone of G, which we denote as G .

73
Backbone extraction methods

74
The simplest approach to extracting a backbone of bipartite projections applies a single, 75 universal threshold T to all edges such that: For example, a threshold value of T = 0 indicates that as long as a pair of vertices are 77 affiliated with at least one of the same artifacts (i.e. they have a nonzero edge weight), 78 an edge between them will be present in the backbone. We call this type of backbone 79 binary. It is also possible to extract a signed backbone by selecting distinct upper T + 80 and lower T − thresholds, However, backbones extracted using universal thresholds are very dense and clustered, 82 and therefore tend to be uninformative [7,9].

83
An alternative approach to backbone extraction, and the approach that backbone is 84 designed to facilitate, identifies important edges to be retained in the backbone using a 85 statistical test that compares an edge's observed weight to the distribution of its weights 86 under a null model. Given a null model that can generate a distribution of an edge's 87 weights G * ij , it is possible to compute the probability that an edge's observed weight is 88 in the upper (P (G * ij ≥ G ij )) or lower (P (G * ij ≤ G ij )) tail of this distribution, and 89 therefore not likely to be the result of such a null model. Using these probabilities and a 90 significance level α for the test, a signed backbone can be extracted such that: In many cases, a simpler binary backbone may be more useful, which can be achieved 92 by discarding the negative edges such that: July 29, 2020 3/17 However, whether a binary or signed backbone is extracted, a two-tailed significance 94 test is used because, for any given edge, the observed value in the projection could be in 95 either tail of the null distribution.

96
The challenge to using the statistical approach to backbone extraction lies in 97 defining a suitable null model and computing the required probabilities (i.e. edge-wise 98 p-values). A family of nine null models can be defined by the constraints they place on 99 the row and column sums in a random bipartite graph B * : The row sums in B * , and 100 separately the column sums in B * , can be unconstrained, constrained to match those in 101 B on average, or constrained to match those in B exactly [12,13]. Let R be a set of The hypergeometric model constrains the row sums in B * as equal to their values in B 113 (i.e. fixed), but places no constraints on the column sums [14,15]. The distribution of The hypergeometric distribution is a discrete probability distribution that models 117 the probability of having k "successes" in a random sample of size n (drawn without 118 replacement) from a population of size N , where K of the objects are considered 119 successes. The probability mass function is given by where X is the random variable of the distribution.

121
In the case of a bipartite projection, let B * be a bipartite graph with independent 122 sets U and W . Let U = {u 1 , u 2 , . . . , u m }. We denote the neighborhood of a vertex i by 123 N (i), meaning the set of vertices to which i is adjacent. Then entry G * ij = (B * B * ) ij is 124 the number of vertices in N (u i ) which are also in N (v j ). So the corresponding 125 distribution is hypergeometric with population N = |W |, sample size n = |N (u i )|, The stochastic degree sequence model constrains B * so that its expected row sums and 134 expected column sums equal those observed in B [9]. That is, for any given B * , each row 135 and column sum may be higher or lower than its observed degree in B, but on average 136 they are equal. When B * is generated by filling the B * ij with the outcomes of 137 independent Bernoulli trials, the distribution of (B * B * ) ij for all bipartite graphs 138 July 29, 2020 4/17 B * ∈ B(R) is given by a Poisson binomial distribution [16] 1 . A Bernoulli trial is a 139 random variable with exactly two outcomes, often referred to as "success" and "failure", 140 or "1" and "0." The Poisson binomial distribution is the distribution of a sum of 141 independent Bernoulli trials. We let p k be the probability of success of the kth trial and 142 call the p k the paramenters of the distribution. When the probability of success on each 143 Bernoulli trial is equal, this reduces to the ordinary binomial distribution.

144
The Poisson binomial distribution models the probability of getting k successes in n 145 trials, where each trial has a specific probability for success [16,17]. To apply the 146 Poisson binomial distribution to compute P (G * ij ≥ G ij ) we must first determine the 147 independent probabilities P (B * ij = 1). There are several ways to determine P (B * ij = 1), 148 including simple equations such as ( B is the sum of each entry 149 of the biadjacency matrix of B [13, 16] 2 , and predicted probabilities from binary 150 outcome models [9]. These older methods are available in the backbone package, but 151 here we introduce and describe the polytope method [18].

152
The polytope method revolves around creating a convex set. A subset C of R n is 153 convex if for any a, b ∈ C, the line segment joining a and b is also in C. The convex hull 154 of a set points A ⊆ R n , conv(A), is the smallest convex set containing A. When A is a 155 finite set, the convex hull of A is called the polytope generated by A.  From this set of matrices, we find the matrix M max that maximizes the entropy function

164
Once we have the matrix of probabilities P ij = P (B * ij = 1) we note that, since Furthermore, since the probabilities are independent, It follows that G * ij is Poisson binomial with parameters p 1 = P i1 P j1 , . . . , p n = P in P jn .

165
Computing the exact cumulative distribution function for a Poisson binomial 166 distribution is difficult. However, the Poisson binomial distribution is well-approximated 167 by a refined normal distribution [17], which we use in the stochastic degree sequence The fixed degree sequence model constrains both the row and column sums in B * to be 171 equal to their values in B (i.e. both are fixed) [19]. The probability distribution that 172 1 Although [16] suggested using the Poisson binomial distribution in this case, their equation (6) for computing the probability that two given primary nodes are connected to a given secondary node can sometimes yield numbers great than 1. We provide a correct proof here.
2 Although this approach is simple, fast, and recommended by multiple authors, it can yield values greater than 1. In practice, these values are often truncated to 1 for use as probabilities. 3. Repeat steps 1 and 2 N times to sample the space of possible G * ij .

178
The matrix multiplication required in step 2 is computationally expensive but 179 straightforward [20]. However, the random sampling of a B * from B(R) in step 1 is 180 more challenging. Several methods have been suggested, including Markov Swap 181 methods [21] and Sequential Importance Sampling [22], but in the fixed degree sequence 182 model we use the curveball algorithm [23], which is one of the fastest algorithms that 183 has been proven to randomly sample [24, see also [25]].  The differences in the number of bills co-sponsored prompts an important underlying 235 question: how many bills do two senators have to co-sponsor before we would be 236 justified in concluding they are political collaborators? Similarly, how few bills do they 237 have to co-sponsor before we would be justified in concluding they are political We plot our upper threshold of 0 using the igraph package [29]. Note that the function has five arguments: matrix, signed, alpha, class, narrative, and fwer.

294
The matrix argument takes a backbone object generated by hyperg(), sdsm(), or 295 fdsm() and returns a backbone graph of class = class using a two-tailed significance 296 test with significance value α = alpha. If the signed parameter is set to TRUE then a 297 signed backbone is returned, if it is set to FALSE then a positive backbone is returned. If 298 the narrative parameter is set to TRUE then suggested narrative text for a manuscript, 299 including possible citations, is displayed. correction is applied when fwer = 'holm' [31].

308
To apply the hypergeometric distribution to a bipartite graph, one uses the hyperg() 309 function. The hyperg() function returns a backbone class object that contains: a 310 positive matrix with (i, j) entry equal to the probability that G * ij is equal to or above 311 the corresponding entry in G, and a negative matrix with (i, j) entry equal to the 312 probability that G * ij is equal to or below the corresponding entry in G, and a summary 313 data frame with includes the name of the model, number of rows and columns in the We can now examine how this method has changed the appearance of our network, 318 focusing only on the positive edges of the signed backbone in fig. 3. We can see that the 319 hypergeometric function has reduced the density of our network and that we begin to 320 see some of the two party structure that is inherent in the United States Senate.

321
Specifically, for our example, the hypergeometric function will fix the number of bills 322 that each senator sponsors, while allowing each bill to be sponsored by a varying 323 number of senators. The function will compute the probability of each senator 324 sponsoring at least (or at most) the observed number of bills when the bills which they 325 sponsor were chosen randomly.

326
The stochastic degree sequence model: sdsm( ) 327 The sdsm() function returns a backbone object containing the same objects as Occasionally backbone will report that the polytope results are not optimal. This suboptimality is related to computational precision, however because the polytope results are used to compute an approximate Poisson binomial distribution with a large number of parameters, such suboptimal precision likely has little practical impact.
July 29, 2020 11/17 We are able to see more of the partisan structure that is suggested to be present in 356 the US Senate in fig. 4, and this visualization provides more information than the 357 extremely dense graph found using a universal threshold.

358
The fixed degree sequence model: fdsm( ) 359 The fixed degree sequence model first constructs a random bipartite graph B * that 360 preserves both degree sequences using the curveball algorithm [12]. This bipartite graph 361 B * is then projected to obtain a random weighted bipartite projection G * = B * B * .

362
These two steps are repeated a number of times to sample the space of possible G * ij . At 363 each iteration, we compare G ij to the value of G * ij and keep a record of how often it was 364 above, below, or equal to the generated value. The fdsm() function returns a backbone 365 object containing a matrix object positive of the proportion of times G * ij is equal to or 366 above the corresponding entry in G, and a matrix object negative containing the 367 proportion of times G * ij is equal to or below the corresponding entry in G, and a 368 summary data frame as in hyperg() and sdsm().

369
The function can also save each value of G * ij for a given i, j. This is useful for  Using the fixed degree sequence model on the senate data set will allow us to 375 compare our observed values to a distribution where each senator sponsors the exact 376 same number of bills and each bill is sponsored by the exact same number of people. 377 We can find the backbone using the fixed degree sequence model as follows:

378
July 29, 2020 12/17 Using the fixed degree sequence model allows us to see more of the partisan 392 structure we assume to be present in the United States Senate in fig. 6.

394
We have presented four methods -universal threshold, HM, SDSM, and FDSM -for 395 identifying significant links in a weighted bipartite projection, and thus for extracting its 396 binary or signed backbone. We have also introduced and demonstrated the R backbone 397 package, which implements these methods in a common framework facilitating their use. 398 Together, the methods and package offer ways for researchers to reduce the structural 399 artifacts (e.g., excessive density and clustering) and complexity of bipartite projections, 400 making them easier to analyze and visualize. This is likely to be most useful in cases 401 where the researcher is interested in an unobserved unipartite network that is not 402 simply a bipartite projection of something (e.g., a social network), but that can be 403 inferred (even with some error) from observed bipartite data (e.g., co-behaviors).

404
There are two important cases where backbone is not appropriate and should not be 405 used. First, if the research question can be answered by directly analyzing the original 406 bipartite graph, or the central message can be communicated by visualizing the original 407 bipartite graph, then it is not necessary and indeed may be harmful to examine a 408 bipartite projection or its backbone. In these cases, researchers should instead use tools 409 designed for the analysis of bipartite networks, including for example bipartite 410 extensions of ERGM [32] and SIENA [33]. Second, it is possible to obtain a weighted 411 (via projection) or unweighted (via backbone) square symmetric matrix from any 412 bipartite data, however this does not imply that the matrix can be meaningfully psychological networks in which clinical symptoms (the 'agents') are linked because they 422 co-occur in patients (the 'artifacts') [34] -the rationale is far from obvious [35].