Using Random Walks to Generate Associations between Objects

Measuring similarities between objects based on their attributes has been an important problem in many disciplines. Object-attribute associations can be depicted as links on a bipartite graph. A similarity measure can be thought as a unipartite projection of this bipartite graph. The most widely used bipartite projection techniques make assumptions that are not often fulfilled in real life systems, or have the focus on the bipartite connections more than on the unipartite connections. Here, we define a new similarity measure that utilizes a practical procedure to extract unipartite graphs without making a priori assumptions about underlying distributions. Our similarity measure captures the relatedness between two objects via the likelihood of a random walker passing through these nodes sequentially on the bipartite graph. An important aspect of the method is that it is robust to heterogeneous bipartite structures and it controls for the transitivity similarity, avoiding the creation of unrealistic homogeneous degree distributions in the resulting unipartite graphs. We test this method using real world examples and compare the obtained results with alternative similarity measures, by validating the actual and orthogonal relations between the entities.


Introduction
An object can be described by its attributes. Given a set of objects, it is often desirable to quantify the similarity between any two objects based on the attributes that they possess. A similarity measure is then used to predict the events in which these two objects behave similarly. For instance, one can ask whether two senators would vote concordantly given the similarity between their voting records. Or one can quantify the likelihood of a person switching occupations based on the task similarities between the occupations.
Here, we think of the object attribute associations as a bipartite graph of two types of nodes (i.e., objects and attributes), where a link is present (often with a weight) between an object and the attribute if the object possesses that attribute. Then, the objectobject similarities can be modeled as a unipartite graph. Most of the recent interest in large-scale social, biological, and communication networks has been devoted to unipartite graphs [1,2]. As a result, unipartite graphs are well understood in literature [3]. An impressive number of tools helps us extracting knowledge from such structures.
The methodology presented in this paper can be thought as a unipartite projection of an underlying bipartite graph. Many complex systems have an underlying bipartite representation: a scientist can be connected to papers that she wrote [7], an actor can be connected to a movie that he/she acted in [8], a country may be connected to the products it exports [9]; flavors can be connected to the food that they are tested in [10], human diseases can be connected to the genes that cause them [11] and many others (e.g., [4][5][6]). To exploit the richness of the methods developed for analysis of unipartite graphs in recent years, and, therefore, to gain an improved vantage point over the influence or interdependence of entities in bipartite structures, a unipartite projection becomes useful. For instance, projections of the bipartite graphs that we mentioned above has resulted in the scientific collaboration network [7], co-actorship network [8], the product space [9]; flavor network [10] or human disease network (diseasome) [11], respectively.
In network analysis, techniques aimed at uncovering the actual similarity value between entities in complex networks are popular. Some examples are the network back-boning technique to evaluate the significance of a link in weighted graphs [12]; or the graph deconvolution method to evaluate the direct connections between not directly connected entities [13]. Here, we are focusing on projecting bipartite graphs into unipartite entities. In the following, we refer to the graph projection as the construction of a unipartite graph map exploiting connections in a bipartite graph and allowing us to evaluate the similarity between the objects, for instance predicting which occupations are similar by looking at their common tasks.
Projection techniques make use of distance measures and/or counting common elements [7,14,15]. The projection criteria are very important as they affect the usefulness of the graph itself. Suppose we want to create a network map of nodes of class I from Figure 1. Each node of class J can be considered as a vectorial element. Then, the link strength in the bipartite graph would reflect the load of the class I node in that dimension. Using this vectorial representation, then a classical spatial distance can be calculated, such as Euclidean or Cosine distance measures, which have been extensively used in adaptive filtering [16]. We can also represent a I node as a set that contains all the J type nodes which it connects to. Then a set difference measure like the Jaccard measure can be calculated between all possible I node pairings. Issues arise with these approaches. Simpler distance measures like the Jaccard measure cannot cope with what is known as the saturation effect: an additional shared J node between two I nodes that share only one J node should count more than between two nodes who already share 100 J nodes [17]. Moreover, in a scale free bipartite graph, the degree of each node decays as k {a in both sets of nodes I and J. Therefore, few hubs from set I connect with many nodes in set J, which on average have a low degree, as a consequence of the scaling of the power-law. If we project the structure to connect nodes from set J, the low degree nodes will connect one with the other, as their few connections cannot outweigh their common I hub connection. Moreover, the similarity is transitive: i 1 *i 2 and i 2 *i 3 implies i 1 *i 3 . As a result, unipartite projections using these measures end up having a normal degree distribution, which is different than most real-world scenarios [18].
Some of these drawbacks do not affect other techniques. In [19] and subsequent papers by the same group [20,21], authors propose to overcome the saturation problem by using a resource allocation process. In practice, each node in I is considered to be a bearer of a certain quantity of resources, which it scatters equally to all its J neighbors. Then, each node in J will disseminate equally all the resources it gathered back to I nodes. Using this process, we can quantify the information originated form an I node and ended up in another I node. This amount is the degree of similarity of that node to the other I nodes. This approach, however, belittles crucial structural properties of the graph structure as a whole. The position of a node in the graph and the structural importance of the connections between I and J nodes influence their significance when projecting the graph, beyond what the simple degree can capture. In fact, the focus in [19] is to use bipartite graph projection as a tool for personal recommendation. In other words, the aim is not to predict an I{I edge, but an I{J edge. In this case, the structural properties of the graph as a whole are not important, as the focus is in the direct neighborhood of the node. For example, in a customer-product bipartite graph where customers connect to the products they buy, the method presented in [19] aims to understand which products a customer will likely buy next, given what other customers similar to her purchased.
We have a different aim, namely to predict I{I edges, that is equivalent of building the unipartite graph. In such scenario, we cannot just rely on the immediate neighbors but take into account the overall structure of the complex network. For this reason, we propose an approach that is alternative to [19], with a complementary application. In this method, we let a random walker explore the bipartite structure. In doing so, we can overcome most of the problems of traditional similarity measures. Two nodes from set I are similar if they frequently appear as successive visiting sites of the random walker. Since hubs in J are connected to many nodes in I, their contribution to each node pair in I is low, as the probability of consistently choosing the same endpoints can be considered insignificant. In this way, the random walker gives us information taking into account the overall structural properties of the graph. Random walkers have been extensively used in literature with this precise aim. For example, they are at the basis of the modular organization detection of many community discovery algorithms [22,23]. Other applications include centrality measures, used to rank nodes according to their structural importance [24].
The numerical simulations indicate that this approach is able to predict I{I edges with higher confidence, when the unipartite graph maps extracted from the bipartite graphs are tested against the real world knowledge about the I{I connections. This happens in four different realms, including occupation flows, industry employee flows, political activities in the US Congress and a citation graph between international aid agencies. We also tested our method in ranking I{J edges and it functions equally well as the other methods.

Methods
The proposed approach consists of projecting the bipartite graph into a unipartite graph by creating a weighted edge between two nodes in the unipartite graph from the information present in the bipartite graph. The weight is directly proportional to how often one would observe a random walker on the bipartite graph visiting the two nodes consequently. Formally, let us assume that in the bipartite graph there are two types nodes indexed with i and j, respectively. Assume that there are n (m) i (j) type nodes that form the set I~fi 1 ,:::,i n g (J~fj 1 ,:::, j m g). An edge in the bipartite graph is defined as a link between an i type and j type node. We can define the n|m adjacency matrix, B, whose (i, j) th entry represents the strength of the links between node i and node j. In the binary case, B i, j will be 1 if there is an edge between i and j, and 0 otherwise. In general, a bipartite graph can be represented by the triplet fI,J,Bg. The unipartite projection of this bipartite graph onto I domain requires defining an n|n edge matrix, U, from the bipartite edge matrix B.
Here, we propose to build the U matrix as the number of times a random walker (RW) passes through a pair of I type nodes on the bipartite graph, separated by a single J type node. Suppose the RW is on the node i. Then, the RW would end up to any j type node with probability: Once on a j type node, the probability that the RW goes to node the i' is: Therefore, the probability of moving between nodes i and i' will be the sum of all paths i?j?i' that pass through Vj[J: We can rewrite the transition probabilities in terms of a Markov transition matrix T, such that T i,i'~P i?i' ð Þ. The frequency of observing the path i?i' also depends on how often the RW visits node i in general. Suppose thatP P n denotes the probability vector whose ith element is the probability of the RW being on the node i in the nth step of the random walk. We initialize the process with P P 0~1 DID1 1 where1 1 denotes a row vector of ones. Therefore: Since T is a right-stochastic matrix (i.e., its elements are nonnegative and sum of its rows is always 1), the stationary distribution,p p will satisfy:p Here, we will assume that the transition matrix is irreducible (i.e., every node can communicate with each other in finite step) and aperiodic (i.e., there is nox x and integer mw1 such that x xT m~x x butx x=Tx x). If any of these properties are violated, then we will not be able to ensure a unique stationary distribution. In our case, we would ensure that the bipartite graph is connected, which would satisfy the irreducibility property. Moreover, we only work with bipartite graphs with non-directed edges which justifies the aperiodicity property. With these properties at hand, the Perron-Frobenius theorem guarantees the existence of unique stationary distribution, which is the left eigenvector of T matrix with eigenvalue 1.
Given that we calculated the stationary distributionp p, then as the RW moves infinitely many times, the random-walk similarity between nodes i and i' is: We would like to remark that Zhou et al. [19] defines a similar metric based on their ProbS methodology but they they do not include p i in their definition. The p i element is the one that contributes information about the overall graph structure. It allows the similarity to consider not only the immediate neighbors of a node, but also its position in the graph, enabling T to avoid the saturation and transitivity issues described in the Introduction.

Other Projection Techniques
In this section we provide our implementation of the methods we compared our technique with. In each technique, the entities of the bipartite graph N are considered as binary vectors. Suppose that we have a bipartite graph with two classes of entities A~fa 1 ,a 2 ,a 3 g and B~fb 1 ,b 2 ,b 3 ,b 4 ,b 5 g. Suppose that a 1 is connected to b 1 , b 4 and b 5 . Then a 1~f 1,0,0,1,1g. In the following discussion, we adopt the convention of always projecting onto nodes of class A.
ProbS. This is the bipartite projecting technique presented in [19]. The assumption at the basis of this measure is the same we implemented, namely that the relatedness of a 1 and a 2 depend on the resource flow from a 1 and a 2 to the B nodes and back. Instead of implementing this idea with random walks, Zhou et al. decided to pass the entire resource equally to all B nodes and back.
So, in the first step, all the resource flows from A to B as: where k(a i ) is the degree of a i and N is the DAD|DBD adjacency matrix representing the bipartite graph, containing 1 if a i is connected to b l , 0 otherwise. In the next step, all the resource flows back to A, and the final resource located on a i reads: This can be rewritten as: where: which sums the contribution from all two-step paths between a i and a j , and it is ultimately the similarity between the two nodes. Using a standard example that will be adopt from now on, we assume that a 1~f b 1 ,b 4 ,b 5 g and a 2~f b 1 ,b 3 g, and all B nodes do not have any other connection with any other A node. Then, s(a 1 ,a 2 )~1=4.
HeatS. Heats method, introduced by Zhou et al. in [20], is related to the ProbS method but instead of normalizing by column, it is normalized by the row. Mathematically, The difference between HeatS (Eq. 8) and ProbS similarity measures (Eq. 7) is the first fraction. For the example introduced above, HeatS similarity would be 1/6, lower than ProbS similarity of 1/4.
Hybrid. The Hybrid methodology, introduced in [21], hybridized ProbS and HeatS, by taking a geometric average of the first normalizing parameters. The similarity in this measure is defined as: Assuming l~1=2, the similarity will be 1=2 ffiffi ffi 6 p , a value between ProbS and HeatS similarities.
Jaccard. In this bipartite projecting technique, each class A node is seen as a set of elements. So, if a 1~f 1,0,0,1,1g, then we consider it equivalent to a 1~f b 1 ,b 4 ,b 5 g. Then, the similarity between two nodes a 1 and a 2 is equivalent to the Jaccard similarity: s(a 1 ,a 2 )~D a 1 \a 2 D Da 1 |a 2 D : For instance, if a 2~f b 1 ,b 3 ,b 5 g, then s(a 1 ,a 2 )~2=4.
Cosine. This bipartite projecting technique is based on the Cosine similarity. The Cosine distance between two vectors of same length is defined as: We recall that a 1 and a 2 are both binary vectors. For each b[B where either (or both) nodes are not attached, the overall contribution to the sum is 0. Only when both a 1 (b) and a 2 (b) are equal to 1 there is a contribution of 1 to the sum. Again considering our standard example a 1~f 1,0,0,1,1g and a 2~f 1,0,1,0,1g, we obtain s(a 1 ,a 2 )~2=3.
Euclidean. The Euclidean projecting technique takes advantage of the concept of Euclidean distance. The a 1 and a 2 vectors are seen as points in a DBD-dimensional space. We then calculate the Euclidean distance between points a 1 and a 2 as follows: The Euclidean similarity is inversely proportional to the Euclidean distance, thus s(a 1 ,a 2 )~1=d(a 1 ,a 2 ). Euclidean similar-ity gives more weight not only to the co-presence of 1 s in a 1 and a 2 , but also in co-presence of 0 s.
Keeping fixed a 1~f 1,0,0,1,1g and a 2~f 1,0,1,0,1g, we obtain s(a 1 ,a 2 )~1= ffiffi ffi 2 p . Pearson. This is the bipartite projecting technique based on the well-known Pearson correlation coefficient. We calculate the correlation coefficient of a 1 and a 2 vectors as follows: s(a 1 ,a 2 )~c ov(a 1 ,a 2 ) s a 1 s a 2 , where cov(a 1 ,a 2 ) is the covariance of a 1 and a 2 , and s a 1 and s a 2 are the variance of the a 1 and a 2 vectors, respectively. Just like in the Euclidean case, also the Pearson similarity gives some weight to the co-absence of a connection, not only a co-presence. We calculate the Pearson similarity for our standard example in which a 1~f 1,0,0,1,1g and a 2~f 1,0,1,0,1g, obtaining as result s(a 1 ,a 2 )~:05=(:3|:3).

Results
We create two different sets for our experiments. In the first one, we compare the performances of all the methods using the experiments described in [19], on the very same dataset extracted from MovieLens and processed as described there. We then refer to [19] for the details of this experiment. Aim of the first experiment is to test the efficiency of different methods in ranking I{J edges. In a second set of experiments, we study the projection of four real-world bipartite graphs. In this case, we also have unipartite graphs with observed relations between I entities. Aim of this experiment is to show that the I{I edges, as ranked by the proposed methodology, are closer to the observed relations than any other methodology.
In both experiments, we compare the obtained results with the seven alternative projecting techniques presented in the previous section. Four of them are based on distance measures: Jaccard, Cosine, Euclidean and Pearson. The other three alternatives are ProBs [19], HeatS [25] and Hybrid [20]. We refer to the proposed method as Bipartite Projection via Random-walk, or ''BPR''.

I-J Edges
In this numerical simulation, we have user-to-movie connections if a user (I) liked the movie (J) and the aim is to suggest other movies to the user (a I{J edge). To test the efficiency of the methods, a random subset of connections are removed from the original bipartite graph. Then we calculate movie to movie similarity in the remaining graph using the measures presented above. Finally, for each user i and movie j we average (for Cosine, Euclidean, Jaccard and Pearson) or sum (for ProbS, HeatS, Hybrid and BPR) the movie similarities to j of all movies which are liked by i. At the end of the procedure, for each user i we have a list of J nodes, sorted by the computed value. We calculate the quality of this suggestion list in two ways. First, given a user i and a movie j that was removed from the graph, r i,j is equal to the rank of j in i's suggested list over the length of the list. Second, we shorten the suggested list to different lengths (including 10, 20 and 50 elements) and we record the share of the randomly removed movies that are included in the list -we refer to this measure as Hit Rate (HR-X , where X is the length of the recommendation list). Hybrid method also includes a parameter of choice (l in Equation  9). We selected l~0:2, which maximized the predictive power.
The results of this numerical simulation are reported in Figures 2 and 3, and Table 1. In Figure 2 we report the cumulative value of r as the recommendation lists grows. A lower value here indicates a better prediction method. In Figure 2, ProbS, Hybrid, BPR and HeatS appear to easily outperform all other methods, with Hybrid performing the best. BPR performed better than HeatS but was slightly worse than ProbS. In the first column of Table 1 we report the overall average value of r. The ranking of the methodologies remains the same.
In Figure 3 we report the hit rate at different lengths of the recommendation list. Again, the result is confirmed: Hybrid, ProbS and BPR outperform in the task with respective order. The hit rates at different list length are reported in the HR-X columns of Table 1. All these results confirm that BPR works in I{J but there are more efficient methodologies, namely Hybrid and ProbS in this task.

I-I Edges
For the task of predicting I{I edges we consider four different bipartite graphs: Occupations connected to the tasks they fulfil, from the O-Net database [26,27] (referred here as ''O-Net'').

(ii)
Industries connected to the fields of educations of the people they employ, from the IPUMS dataset [28] (referred here as ''IPUMS'').
International aid organizations connected to the countries and the development issues they talk about in their websites [29] (referred here as ''Aid''). (iv) Congressmen from the 111th US Congress, connected to the topics they wrote a bill on (referred here as ''Congress'').
For additional information about how we built the bipartite and the unipartite graphs used, see Material S1.
For each of this bipartite graph we have a corresponding unipartite graph that we use to evaluate the goodness of the projection. The test graphs are: For O-Net dataset, the occupation-occupation job flows.
For IPUMS dataset, the job flows across industries.
(iii) For Aid dataset, the mentions of other aid organizations in an organization's website. (iv) For Congress dataset, the co-sponsorship of bills.
The procedure is the same presented in the previous section: for each pair of I nodes we calculate the similarity using one of the proposed measures. For each node i, we obtain a ordered list of similarities. We use this list to predict actual I{I edges, observed in the corresponding unipartite graphs. In Figure 4 and Table 2 we report the performance in the prediction task for all methods and for all graphs. Figure 4 presents the receiver operating characteristic (ROC) curves of the various methods. We can see that BPR comes as winner or a close second in most cases. Table 2 reports the area under the ROC curve, that summarizes the overall quality of the predictions shown Figure 4. Table 2 confirms that BPR is the best predictor of the I{I edges, based on the observed I{J edges in the test graphs, with the exception of the O-Net graph. However, in that case, BPR is beaten by Pearson, which scores poorly in the other scenarios. The second best predictor is different for each graph, while BPR's performance across all graphs is constantly on top. Since we are dealing with the weighted graphs, we need a threshold, d, to determine when an observed weight is significant and when it is not. d influences prediction scores, but not the performance ranking of the methods (see Material S1 and Figure S1).
Prediction quality is not the only quality criterion to evaluate the unipartite projections. We also want the unipartite graph map to have topological properties comparable to the real-world complex graphs in the literature. One of such properties is the small-world property [30]: the distribution of shortest paths are normally distributed around a mean much lower than the random Erdös-Renyi graphs, usually * log n where n is the number of nodes in the graph [3]. Figure 5 shows the distribution of the shortest path lengths in different bipartite graph projections. Each graph map has been generated by extracting the maximum spanning tree from all the I{I edge similarities returned by each method, and then adding edges until the average degree reaches 3. We can see that BPR is the only method which constantly generates unipartite graphs with the expected distribution of shortest path lengths. With the exception of the Euclidean method in the O-Net dataset,  the other methods have usually either higher averages or distributions more skewed on higher values, or both.
Another property of real-world graphs is a broad degree distribution. Real-world graphs are characterized by few hubs with high degree and many nodes with degree equal to one [18]. However, transitive similarity measures may be prone to boost transitivity beyond what is reasonable, creating large cliques and inflating the degree of most nodes. Therefore, for a similarity measure, higher skewed distribution is a desirable property because it is a signal of the absence of large cliques, that lowers the practicality of the network map. We depict the cumulative degree distributions of the graph projections in Figure 6. We can see that BPR has very broad degree distributions, clearly the broadest in the Aid graph and the broadest in Congress and O-Net after the Euclidean graph. However, we saw that for practical purposes the only contestant was ProbS ( Figure 4 and Table 2). Here, ProbS is affected exactly by the problems of very homogeneous degree distribution: in all graphs the nodes with Table 1. Average predicted position (,r.) and hit rate for recommendation lists of length 10 (HR-10), 20 (HR-20) and 50 (HR-50).  degree lower than 3 are very few (always less than 10%), while the most connected nodes have half or a third the degree they have in BPR.

Discussion
The proposed bipartite projection technique gives differential weights to elements by their commonality. The methodology generates an edge in the graph map whenever the random walker frequently visit the two nodes in the same path, traversing their common elements, which ensures that hubs do not artificially drive up the similarity measure. As a result, the random walk similarity allows the creation of significant and meaningful graph maps, who are structurally very similar to the corresponding real-world graphs. Consequently, the resulting graph projections carry some fundamental properties that are observed in many other naturally occurring graphs. The AUC is the integral of the area below the ROC curve, as shown in Figure 4. If we obtain an AUC equal to.5, then the prediction is said to have a performance equivalent to a random predictor. doi:10.1371/journal.pone.0104813.t002 As a criticism, one could say that it only works in the case of bipartite graphs that exhibits non-overlapping scale free degree distributions, where there are hubs in one or all classes of nodes. In any case, any projecting technique has limitations, and the choice between one algorithm over another has to be made considering the objective of the exercise. We do not exclude the existence of a scenario in which our methodology will not yield significant results. Yet, it has been proved that broad (scale free or exponential) degree distributions are ubiquitous in real world graphs: from social graphs to scientific co-authorship, from the physical Internet infrastructure to the virtual hyperlinks in the World Wide Web, from financial graphs to protein interactions. Therefore, we conclude that our methodology may be applied in this wide range of scenarios. Material S1 (PDF) Figure 6. Degree distributions. The cumulative degree distributions of the unipartite graphs generated with each technique, for all datasets: O-Net (top left), IPUMS (top right), Aid (bottom left) and Congress (bottom right). We calculate the probability (y-axis) for a node to have a degree equal to or higher than a given degree (x-axis). doi:10.1371/journal.pone.0104813.g006