Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Potential Theory for Directed Networks

  • Qian-Ming Zhang,

    Affiliation Web Sciences Center, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, People’s Republic of China

  • Linyuan Lü ,

    linyuan.lue@unifr.ch

    Affiliations Institute of Information Economy, Alibaba Business College, Hangzhou Normal University, Hangzhou, People’s Republic of China, Department of Physics, University of Fribourg, Chemin du Musée 3, Fribourg, Switzerland

  • Wen-Qiang Wang,

    Affiliation Web Sciences Center, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, People’s Republic of China

  • Yu-Xiao,

    Affiliation Web Sciences Center, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, People’s Republic of China

  • Tao Zhou

    Affiliation Web Sciences Center, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, People’s Republic of China

Potential Theory for Directed Networks

  • Qian-Ming Zhang, 
  • Linyuan Lü, 
  • Wen-Qiang Wang, 
  • Yu-Xiao, 
  • Tao Zhou
PLOS
x

Correction

2 Aug 2013: Zhang QM, Lü L, Wang WQ, Zhu YX, Zhou T (2013) Correction: Potential Theory for Directed Networks. PLOS ONE 8(8): 10.1371/annotation/6dff4052-f7c3-4b0a-88da-85cdd5d3addd. https://doi.org/10.1371/annotation/6dff4052-f7c3-4b0a-88da-85cdd5d3addd View correction

Abstract

Uncovering factors underlying the network formation is a long-standing challenge for data mining and network analysis. In particular, the microscopic organizing principles of directed networks are less understood than those of undirected networks. This article proposes a hypothesis named potential theory, which assumes that every directed link corresponds to a decrease of a unit potential and subgraphs with definable potential values for all nodes are preferred. Combining the potential theory with the clustering and homophily mechanisms, it is deduced that the Bi-fan structure consisting of 4 nodes and 4 directed links is the most favored local structure in directed networks. Our hypothesis receives strongly positive supports from extensive experiments on 15 directed networks drawn from disparate fields, as indicated by the most accurate and robust performance of Bi-fan predictor within the link prediction framework. In summary, our main contribution is twofold: (i) We propose a new mechanism for the local organization of directed networks; (ii) We design the corresponding link prediction algorithm, which can not only testify our hypothesis, but also find out direct applications in missing link prediction and friendship recommendation.

Introduction

Many social, biological and technological systems can be well described by networks, where nodes represent individuals and links denote the relations or interactions between nodes. The study of structure and functions of networks has therefore become a common focus of many branches of science [1]. A big challenge attracting increasing attention in the recent decade is to uncover the mechanisms underlying the formation of networks [2]. Macroscopic mechanisms include the rich-get-richer [3], the good-get-richer [4], the stability constrains [5], and so on, while microscopic mechanisms include homophily [6], clustering [7], balance theory [8], and so on. Mechanisms can also play a part in regulating the mesoscopic structure, like the formation and transformation of groups and communities [9][11]. Real networks usually result from a hybrid of several mechanisms, for example, new nodes may form links according to the rich-get-richer mechanism, and simultaneously, new links among old nodes could be a consequence of the mechanism of clustering [12].

The so called clustering mechanism declares that two nodes have a high probability of making a link between them if they share some common neighbors [13]. This mechanism is indirectly supported by increasing evidences of high clustering coefficients (the clustering coefficient of a node is defined as the density of links among its neighbors, and the clustering coefficient of the network is the average of all nodes’ clustering coefficients [14]) of disparate networks [7]. Through investigation on a social network consisting of 43,553 university members, Kossinets and Watts [15] found direct evidence that two students sharing more common acquaintances are more likely to become acquaintance with each other. The clustering mechanism also works for directed networks, for example, in Twitter, more than 90% of new links are added between nodes sharing at least one common neighbor [16]. In addition, evolving network models driven by common neighbors could reproduce some significant features of both directed and undirected networks [17], [18].

Homophily mechanism states the observed tendency of people to communicate with others of similar profiles or experiences [6]. Experiments on social networks strongly support this mechanism. Positive evidences come from various examples, such as an acquaintance network of university members [15], a large-scale instant-messaging network containing individuals [19], friendship networks of a set of American high schools [20], a social network of a cohort of college students in Facebook [21], and so on. A variety of characteristics, such as race, tastes for music and movies, grade, age, location, language and sharing experience, are significant to the link formation. Homophily mechanism also plays a role in other kinds of networks, for example, in directed document networks, links (e.g., hyperlinks between web pages and citations between articles) tend to connect similar documents in content [22]. In some literature, the clustering mechanism is considered as a special case of homophily mechanism, where two nodes having some common neighbors are recognized as being in similar network surroundings. In this article, we prefer to distinguish these two mechanisms. Recent experiments on directed social networks show that the clustering mechanism may be even stronger than the homophily mechanism [23].

Reciprocity mechanism is the tendency of nodes to response to incoming links by creating links to the source [24]. It is a specific mechanism for some directed networks, but not applicable everywhere. For example, the reciprocity mechanism plays a significant role in the growth of social networks of Facebook-like community [25] and Flickr [26], but it has much less impacts on Slashdot [27] and it does not work at all on food webs [28].

This article focuses on directed networks. Examples of directed networks are numerous: the world wide web is made up of directed hyperlinks, the food webs consist of directed links from predators to preys, and in the microblogging social networks, fans form links pointing to their opinion leaders. High reciprocity is a specific property for some directed networks, in addition, the formation of directed links also obey the aforementioned mechanisms, for example, users in Twitter are likely to form links to neighbors of their neighbors and to friends of their friends in near ages, which are in accordance with the clustering and homophily mechanisms [16]. Besides a few representative works on local organizations (e.g., loops, small-order subgraphs, etc.) of directed networks [29][33], link formation of directed networks receives less attention and has not been well understood compared with undirected networks. Here we propose a hypothesis of link formation for general directed networks, named potential theory. Combining the potential theory with the clustering and homophily mechanisms, we could deduce a certain preferred subgraph. We apply the link prediction approach [34] to verify our deduction. That is, we hide a fraction of links and predict them by assuming that a link generating more preferred subgraphs is of a higher probability to exist (see details in Methods and Materials). Experiments on disparate directed networks ranging from large-scale social networks containing millions of individuals to small-scale food webs consisting of a hundred of species show that the prediction according to the preferred subgraph is more accurate and robust than prediction according to other comparable subgraphs. Besides the insights of the underlying mechanism for directed network formation, our work could find applications in friendship recommendation for social networks and missing link prediction for biological networks.

Results

Potential Theory

A graph is called potential-definable if each node can be assigned a potential such that for every pair of nodes and , if there is a link from to , then ‘s potential is a unit higher than . Clearly, a link is potential-definable yet a graph containing reciprocal links is not potential-definable. Figure 1 illustrates some example graphs with orders from 2 to 4, where graphs (a) and (c) are not potential-definable and graphs (b) and (d) are potential-definable. Notice that, the condition “potential-definable” is only meaningful for a very small graph since a graph consisting of many nodes is very probably not potential-definable. Although potential-definable networks are always acyclic, the directed acyclic networks [35] are usually not potential definable. For example, the feed forward loops are directed acyclic networks but not potential-definable.

thumbnail
Figure 1. Illustration of four example graphs.

Graphs (b) and (d) are potential-definable, and the numbers labeled beside nodes are example potentials. Graphs (a) and (c) are not potential-definable, and if we set the top nodes’ potential to be 1, some nodes’ potentials cannot be determined according to the constrain that a directed link is always associated with a decrease of a unit potential.

https://doi.org/10.1371/journal.pone.0055437.g001

The potential theory claims that a link that can generate more potential-definable subgraphs is more significant and thus of a higher probability to appear. Our definition of subgraph is more general than the traditional one. Given a directed graph with and the sets of nodes and directed links. A graph is called a deduced subgraph of if and contains all the links in that connect two nodes in . Our definition only requires and , that is, is not necessary to include all links connecting nodes in . As shown in figure 2, (b), (c) and (d) are subgraphs of (a) according to our definition, but only (b) is a deduced subgraph of (a).

thumbnail
Figure 2. Considering subgraphs of (a) that contains nodes {1,2}.

If we only consider the deduced subgraph, (b) is the unique one, while in our method, graphs (b), (c) and (d) are all subgraphs under consideration. Notice that, the empty graph containing nodes 1 and 2 and no link is also a subgraph of (a) according to our definition.

https://doi.org/10.1371/journal.pone.0055437.g002

Since any graph containing reciprocal links is not potential-definable, here we do not take into account the reciprocity mechanism. The clustering mechanism prefers short loops (not necessary to be directed loops) and it only works for local surrounding, and thus we only consider loop-embedded subgraphs with orders 3 and 4. Two nodes connected by reciprocal links are not treated as loops. To avoid the repeated count, we only consider the minimal loop-embedded subgraphs that do not contain loop-embedded subgraphs themselves.

Figure 3 illustrates all the six different minimal loop-embedded subgraphs of orders 3 and 4. These subgraphs are named after Ref. [29] but our motivation is different from motif analysis and we adopt a different definition of subgraph (In Ref. [29] they only consider deduced subgraph). Among these six subgraphs, only Bi-fan and Bi-parallel are potential-definable. Since generally we could not obtain the explicit attributes of nodes, the homophily mechanism here only refers to the homogeneity in topology related to the potential levels. In a potential-definable subgraph, two nodes with the same potential cannot directly connect to each other and thus the homophily mechanism only works when we consider each subgraph as a whole. Specifically, a subgraph is more homogeneous if the nodes therein are of fewer potential levels. For Bi-fan the links are equivalent to each other and nodes are of two different potentials, while in Bi-parallel, links are different (two are from high-potential nodes to moderate-potential nodes, and the other two are from moderate-potential nodes to low-potential nodes) and nodes are of three different potentials. According to the assigned potentials, we could say the Bi-fan structure is more homogeneous (of fewer potential levels) than the Bi-parallel structure, then the homophily mechanism prefers the former one.

thumbnail
Figure 3. All the six minimal loop-embedded subgraphs of orders 3 and 4.

They are named after Ref. [29], where 3-FFL and 4-FFL stand for three-order and four-order feed forward loops, and 3-Loop and 4-Loop mean three-order and four-order feedback loops, respectively.

https://doi.org/10.1371/journal.pone.0055437.g003

In a word, taking into account the potential theory, together with the clustering and homophily mechanisms, it is thought that the Bi-fan subgraph is the most preferred one and a link that can generate more Bi-fan subgraphs should be of higher probability to exist. This hypothesis receives strongly positive supports as indicated by the most accurate and robust performance of Bi-fan predictor within the link prediction framework. Figure 4 illustrates the selecting procedure for the final winner Bi-fan, as well as the respective contributions of the three mechanisms.

thumbnail
Figure 4. Illustration of the reason why Bi-fan is selected to be the final winner according to the homophily mechanism, clustering mechanism and potential theory.

https://doi.org/10.1371/journal.pone.0055437.g004

Experimental Results

Corresponding to these six subgraphs we get 12 individual predictors by removing one link from every subgraph (S1–S12, see figure 5). To evaluate the accuracy of a predictor, a network is divided into two parts – training set and testing set. Denote one pair of disconnected nodes in the network as a nonexistent link, then all links can be classified into three categories: observed links are the ones in the training set, missing links are the ones in the testing set, and nonexisting links are the remain links. All the missing links and nonexisting links constitute the set of non-observed links. A good predictor will assign higher scores to missing links than nonexistent ones. We adopt the Area under the Receiver operating characteristic Curve (AUC) to evaluate the prediction accuracy: a higher AUC value corresponds to a better predictor. Please see details about the link prediction algorithm and the evaluation metric for algorithmic performance in Methods and Materials.

thumbnail
Figure 5. Illustration of the twelve predictors corresponding to the subgraphs shown in figure 3.

The red dashed arrows represent the links removed from the original subgraphs. The relations are as follows: {, , } 3-FFL, {} 3-Loop, {} Bi-fan, {, } Bi-parallel, {} 4-Loop, {, , , } 4-FFL.

https://doi.org/10.1371/journal.pone.0055437.g005

Table 1 shows the prediction accuracy, measured by AUC values, of all the 12 individual predictors. In 14 out of 15 real networks, except Youtube, the predictor performs best. The advantage of the predictor to others is usually remarkable, while for Youtube, the performance of is very close to the optimal one, . The last row of Table 1 shows the average AUC values, which again emphasizes the great advantage of . Roughly speaking, the very simple rule – a link generating more Bi-fan subgraphs has higher probability to exist – is nearly 90% right.

Table 2 shows the comparison of the prediction accuracy of some hybrid predictors. We explain again that the predictor means that the score of a non-observed link is defined as the number of created , and resulting from the addition of this link. In fact, the six predictors in Table 1 correspond to the six minimal loop-embedded subgraphs in figure 3. Therefore, Table 1 directly gives the comparison of the six candidate subgraphs. Again, Bi-fan wins.

Looking at the results presented in Table 1 and Table 2, another significant advantage of the Bi-fan structure is the high robustness, that is to say, even when the predictor is not the best in some cases, its performance is very close to the optimal one. In contrast, for any other predictor, no matter what predictor–an individual predictor or a hybrid one, it is very sensitive to the network structure, and will occasionally give very bad predictions.

Discussion

This article studied the underlying mechanism of the link formation for directed networks. We presented a hypothesis named potential theory, which claims that a link that can generate more potential-definable subgraphs is of a higher probability to appear. This mechanism cannot be solely used to infer network structure for there are too many potential-definable subgraphs (e.g., directed paths of any lengths are potential definable). Therefore, we also take into account two well-known local mechanisms: clustering and homophily. By combining the three mechanisms, it is inferred that Bi-fan is the most preferred subgraph in directed networks. Via comparison of the link prediction accuracies of 12 individual predictors as well as six minimal loop-embedded subgraphs, Bi-fan performs best: not only for its higher AUC value than others, but also for its robustness, namely for disparate testing networks, its performance is either the best or very close to the best. Notice that though the experimental results provided supportive evidences, they can only be considered as a necessary condition, but not a sufficient condition or a solid proof for the potential theory.

The local driven mechanisms underlying directed network formation are less understood compared with those for undirected networks. This kind of study is thus of theoretical significance, and our work provided insights into the microscopic architecture of directed networks. Although the potential theory is more complicated than the clustering and homophily mechanisms as well as the balance theory, its meaning is easy to be captured, that is, the potential-definable property implies a local hierarchy and the potential value of a node indicates its level in the hierarchical structure. For example, the directed loops are not hierarchy-embedded and the directed path is strictly hierarchically organized; the former is not potential-definable and the later is potential-definable. The hierarchical organization is a well-known macroscopic feature for many undirected [36], [37] and directed [38], [39] networks, and our work indicates that for directed networks, nodes tend to be locally self-organized in a hierarchical manner. We guess this kind of microscopic hierarchical organization will contribute to the macroscopic hierarchical structure. In the near future, we will study more data sets in a more detailed way to check whether the potential theory and our hypothesis about hierarchical organization are valid or not and to see the applicable range (to which networks it works and to what extent it can explain the network formation) of the potential theory.

Lastly, we would like to say again that the link prediction problem is very fundamental to both information filtering and network analysis [34], [40], and it could find out countless applications. In this work, we applied the link prediction approach to evaluate driven mechanisms of network formation, at the same time, our method can be directly applied to predicting missing links and recommending friendships for large-scale directed networks, since the accuracy of our method is much higher than the common-neighbor-based methods as indicated by the performance of predictors , , and .

Materials and Methods

Link Prediction Algorithm

Given a directed network , the fundamental task of a link prediction algorithm is to give a rank of all non-observed links in the set , where is the universal set containing all possible directed links. If one wants to find out missing links or recommend friendships, one can go for the links with the highest ranks. The mainstream method is to assign each non-observed link a score, and the one with higher score ranks ahead.

We design the predictors corresponding to the six minimal loop-embedded subgraphs shown in figure 3. By removing one link from every subgraph, we get twelve predictors as shown in figure 5. If we adopt the predictor , it means the score of a non-observed link is defined as the number of the th subgraphs created by the addition of this link. Notice that, a link may generate ten 3-FFLs, but their roles can be different. For example, these ten 3-FFLs may include two , three and five . So if we adopt the predictor , the score of this link is three. Therefore, if we would like to see the contribution of a link to the created 3-FFLs, we can adopt the predictor , which means that the score of a non-observed link is defined as the total number of created , and by this link, equivalent to the number of created 3-FFLs. Figure 6 illustrates a simple example about how we calculate the scores.

thumbnail
Figure 6. Illustration of the scores of links according to our method.

The red dashed arrows are probe links. If we adopt the predictor , the scores for and are ( and ) and , respectively. More examples are as follows: ; ; ; .

https://doi.org/10.1371/journal.pone.0055437.g006

Given a predictor we can rank all the non-observed links according to their scores. To evaluate the algorithmic performance, we randomly divide the observed links into two parts: the training set is treated as known information while the testing set (probe set) is used for testing and no information therein is allowed to be used for prediction. Clearly, and . In our experiments, the training set always contains 90% of links, and the remaining 10% of links constitute the testing set.

Evaluation Metric

We use a standard metric, area under the receiver operating characteristic (ROC) curve [41], to test the accuracy of link prediction algorithms. It is usually abbreviated as AUC (Area Under Curve) value. This metric can be interpreted as the probability that a randomly chosen missing link (a link in ) is given a higher score than a randomly chosen nonexistent link (a link in ). In the implementation, among times of independent comparisons, if there are times the missing link having higher score and times the missing link and nonexistent link having the same score, we define the AUC value as [34]:

If all the scores are generated from an independent and identical distribution, the AUC value should be about 0.5. Therefore, the degree to which the AUC value exceeds 0.5 indicates how much better the algorithm performs than pure chance.

Data Description

Our experiments include 15 real directed networks drawn from disparate fields. Details are as follows and the basic structural features are presented in Table 3. If a network is unconnected, we only consider its largest weakly connected component.

thumbnail
Table 3. The basic structural features of the studied 15 real networks.

https://doi.org/10.1371/journal.pone.0055437.t003

Biological networks.

Three of them are food webs, representing the predator-pray relations, and another one is a neural network of C.elegans.

  • FW1 [42] – A food web consists of 69 species living in Everglades Graminoids during wet season.
  • FW2 [43] – A food web consists of 97 species living in Mangrove Estuary during wet season.
  • FW3 [44] – A food web consists of 128 species living in Florida Bay during dry season.
  • C.elegans [45] – A neural network of the nematode worm C.elegans, in which an edge joins two neurons if they are connected by either a synapse or a gap junction.

Information networks.

We consider networks of documents where a directed link from to means the document cites the document , and a network of weblogs where a directed link stands for a hyperlink.

  • Small & Griffith and Descendants (SmaGri) [46] – Citations to Small & Griffith and Descendants.
  • Kohonen [46] – Articles with topic “self-organizing maps” or references to “Kohonen T”.
  • Scientometrics (SciMet) [46] – Articles from or citing Scientometrics.
  • Political Blogs (PB) [47] – A directed network of hyperlinks between weblogs on US political blogs.

Social networks.

All the following networks describe relationships between people.

  • Delicious [48] – Delicious.com, previously known as del.icio.us, allows individuals to tag the bookmarks and follow other users. The studied who-follow-whom network was collected at May 2008.
  • Youtube [49] – YouTube offers the greatest platform where users can share videos with others. Active users who regularly upload videos maintain a channel pages. Other users can follow those users thus forming a social network. This data was collected at January 2007.
  • FriendFeed [50] – FriendFeed is an aggregator that consolidates the updates from the social media and social networking websites, social bookmarking websites, blogs and micro-blogging updates, etc. Members can manage their social networking contents with one Friend-Feed account and follow others’ updates. This data set captures the who-follow-whom relationships.
  • Epinions [51] – Epinions.com is a who-trust-whom online social network of a general consumer review site. Members of this site can decide whether to “trust” each other.
  • Slashdot [52] – Slashdot.org is a technology-related news website known for its specific user community. This site allows individuals to tag each other as friends or foes.
  • Wikivote [53], [54] – Wikipedia is a free encyclopedia written collaboratively by volunteers around the world. Active users can be nominated to be administrator. A public voting begins after some users are nominated. Other users can express their positive, negative or neural idea towards all the candidates. The most voted candidate will be promoted to admin status. This process implies a social network in which users are nodes and the action of voting from someone to another demonstrates a directed link. This data is from English Wikipedia on 2794 elections.
  • Twitter [55] – Twitter is an online social networking service where users can post texts within 140 characters. It also allow users to “follow” other users whereby a user can see updates from the users he follows on his twitter page. In this network, a link from user A to user B means that user A is following user B. The data used here is a sample from the whole dataset in [55].

Acknowledgments

We acknowledge An Zeng, Changsong Zhou and Xiao-Ke Xu for helpful discussions and irradiative ideas.

Author Contributions

Conceived and designed the experiments: QMZ LL TZ. Performed the experiments: QMZ WQW YXZ. Analyzed the data: QMZ LL WQW. Contributed reagents/materials/analysis tools: QMZ LL TZ. Wrote the paper: QMZ LL TZ.

References

  1. 1. Newman MEJ (2010) Networks: An Introduction. Oxford University Press, New York.
  2. 2. Barabási AL (2009) Scale-free networks: A decade and beyond. Science 325: 412–413.
  3. 3. Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286: 509–512.
  4. 4. Zhou T, Medo M, Cimini G, Zhang ZK, Zhang YC (2011) Emergence of scale-free leadership structure in social recommender systems. PLoS ONE 6: e20648.
  5. 5. Perotti JI, Billoni OV, Tamarit FA, Chialvo DR, Cannas SA (2009) Emergent self-organized complex network topology out of stability constraints. Phys Rev Lett 103: 108701.
  6. 6. McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: Homophily in social networks. Annual Review of Sociology 27: 415–444.
  7. 7. Szabó G, Alava M, Kertész J (2004) Clustering in complex networks. Lecture Notes in Physics 650: 139–162.
  8. 8. Marvel SA, Strogatz SH, Kleinberg JM (2009) Energy landscape of social balance. Phys Rev Lett 103: 198701.
  9. 9. Backstrom L, Huttenlocher DP, Kleinberg JM, Lan X (2006) Group formation in large social networks: membership, growth, and evolution. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY, USA: ACM, KDD ‘06, 44–54.
  10. 10. Palla G, Barabási AL, Vicsek T (2007) Quantifying social group evolution. Nature 446: 664–667.
  11. 11. Kumpula JM, Onnela JP, Saramäki J, Kaski K, Kertész J (2007) Emergence of communities in weighted networks. Phys Rev Lett 99: 228701.
  12. 12. Holme P, Kim BJ (2002) Growing scale-free networks with tunable clustering. Phys Rev E 65: 026107.
  13. 13. Newman MEJ (2001) Clustering and preferential attachment in growing networks. Phys Rev E 64: 025102.
  14. 14. Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393: 440–442.
  15. 15. Kossinets G, Watts DJ (2006) Empirical analysis of an evolving social network. Science 311: 88–90.
  16. 16. Yin D, Hong L, Xiong X, Davison BD (2011) Link formation analysis in microblog. In: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. New York, NY, USA: ACM, SIGIR ‘11, 1235–1236.
  17. 17. Leskovec J, Backstrom L, Kumar R, Tomkins A (2008) Microscopic evolution of social networks. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY, USA: ACM, KDD ‘08, 462–470.
  18. 18. Cui AX, Fu Y, Shang MS, Chen DB, Zhou T (2011) Emergence of local structures in complex networks: Common neighborhood drives the network evolution. Acta Phys Sin 60: 038901.
  19. 19. Leskovec J, Horvitz E (2008) Planetary-scale views on a large instant-messaging network. In: Proceedings of the 17th international conference on World Wide Web. New York, NY, USA: ACM, WWW ‘08, 915–924.
  20. 20. Currarini S, Jackson MO, Pin P (2010) Identifying the roles of race-based choice and chance in high school friendship network formation. Proc Natl Acad Sci USA 107: 4857–4861.
  21. 21. Lewis K, Gonzalez M, Kaufman J (2012) Social selection and peer influence in an online social network. Proc Natl Acad Sci USA 109: 68–72.
  22. 22. Cheng XQ, Ren FX, Zhou S, Hu MB (2008) Triangular clustering in document networks. New J Phys 11: 033019.
  23. 23. Brzoowski MJ, Romero DM (2011) Who should I follow? Recommending people in directed social networks. In: Proceedings of the 5th International Conference on Weblogs and Social Media. The AAAI Press, 458–461.
  24. 24. Garlaschelli D, Loffredo MI (2004) Patterns of link reciprocity in directed network. Phys Rev Lett 93: 268701.
  25. 25. Opsahl T, Hogan B (2010) Modeling the evolution of continuously-observed networks: Communication in a Facebook-like community. ArXiv:1010.2141.
  26. 26. Mislove A, Koppula HS, Gummadi KP, Druschel P, Bhattacharjee B (2008) Growth of the flickr social network. In: Proceedings of the first workshop on Online social networks. New York, NY, USA: ACM, WOSN ‘08, 25–30.
  27. 27. Gómez V, Kaltenbrunner A, López V (2008) Statistical analysis of the social network and discussion threads in slashdot. In: Proceedings of the 17th international conference on World Wide Web. New York, NY, USA: ACM, WWW ‘08, 645–654.
  28. 28. Pimm SL (2002) Food Webs. The University of Chicago Press, Chicago.
  29. 29. Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, et al. (2002) Network motifs: simple building blocks of complex networks. Science 298: 824–827.
  30. 30. Itzkovitz S, Milo R, Kashtan N, Ziv G, Alon U (2003) Subgraphs in random networks. Phys Rev E 68: 026127.
  31. 31. Milo R, Itzkovitz S, Kashtan N, Levitt R, Shen-Orr S, et al. (2004) Superfamilies of evolved and designed networks. Science 303: 1538–1542.
  32. 32. Palla G, Farkas IJ, Pollner P, Derényi I, Vicsek T (2007) Directed network modules. New J Phys 9: 186.
  33. 33. Bianconi G, Gulbahce N, Motter AE (2008) Local strcuture of directed networks. Phys Rev Lett 100: 118701.
  34. 34. Lü L, Zhou T (2011) Link prediction in complex networks: A survey. Physica A 390: 1150–1170.
  35. 35. Karrer B, Newman MEJ (2009) Random acyclic networks. Phys Rev Lett 102: 128701.
  36. 36. Clasuet A, Moore C, Newman MEJ (2008) Hierarchical structure and the prediction of missing links in networks. Nature 453: 98–101.
  37. 37. Lancichinetti A, Fortunato S, Kertész J (2009) Detecting the overlapping and hierarchical community structure in complex networks. New J Phys 11: 033015.
  38. 38. Yu H, Gerstein M (2006) Genomic analysis of the hierarchical structure of regulatory networks. Proc Natl Aacd Sci USA 103: 14724–14731.
  39. 39. Mones E, Vicsek L, Vicsek T (2012) Hierarchy measure for complex networks. PLoS ONE 7: e33799.
  40. 40. Lü L, Medo M, Yeung CH, Zhang YC, Zhang ZK, et al. (2012) Recommender systems. Physics Reports 519: 1–49.
  41. 41. Hanely JA, McNeil BJ (1982) The meaning and user of the area under a reciever operating characteristic (ROC) curve. Radiology 143: 29–36.
  42. 42. Ulanowicz RE, Heymans JJ, Egnotovich MS (2000) Network analysis of trophic dynamics in South Florida Ecosystems, FY 99: The Graminoid Ecosystem. Technical report, Technical Report TS-191-99, Maryland System Center for Environmental Science, Chesapeake Biological Laboratory, Maryland, USA.
  43. 43. Baird D, Luczkovich J, Christian RR (1998) Assessment of spatial and temporal variability in ecosystem attributes of the St Marks National Wildlife Refuge, Apalachee Bay, Florida. Estuarine, Coastal and Shelf Science 47: 329–349.
  44. 44. Ulanowicz RE, Bondavalli C, Egnotovich MS (1998) Network analysis of trophic dynamics in South Florida Ecosystem, FY 97: The Florida Bay Ecosystem. Technical report, Annual Report to the United States Geological Service Biological Resources Division, University of Miami Coral Gables, [UMCES] CBL 98–123, Maryland System Center for Environmental Science, Chesapeake Biological Laboratory, Maryland, USA.
  45. 45. White JG, Southgate E, Thomson JN, Brenner S (1986) The structure of the nervous system of the nematode C.elegans. Philosophical transactions Royal Society London 314: 1–340.
  46. 46. Batagelj V, Mrvar A (2006). Pajek datasets website. Available: http://vlado.fmf.uni-lj.si/pub/networks/data/. Accessed 2013 Jan 14.
  47. 47. Adamic LA, Glance N (2005) The political blogosphere and the 2004 US election: divided they blog. In: Proceedings of the 3rd international workshop on Link discovery. New York, NY, USA: ACM, LinkKDD ‘05, 36–43.
  48. 48. Lü L, Zhang YC, Yeung CH, Zhou T (2011) Leaders in social networks, the delicious case. PLoS ONE 6: e21202.
  49. 49. Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM conference on Internet measurement. New York, NY, USA: ACM, IMC ‘07, 29–42.
  50. 50. Celli F, Di Lascio FML, Magnani M, Pacelli B, Rossi L (2010) Social network data and practices: the case of FriendFeed. In: Proceedings of the Third international conference on Social Computing, Behavioral Modeling, and Prediction. Berlin, Heidelberg: Springer-Verlag, SBP’10, 346–353.
  51. 51. Richardson M, Agrawal R, Domingos P (2003) Trust management for the semantic web. In: Proceedings of the 2nd International Semantic Web Conference. 351–368.
  52. 52. Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2009) Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics 6: 29–123.
  53. 53. Leskovec J, Huttenlocher D, Kleinberg J (2010) Predicting positive and negative links in online social networks. In: Proceedings of the 19th international conference on World wide web. New York, NY, USA: ACM, WWW ‘10, 641–650.
  54. 54. Leskovec J, Huttenlocher D, Kleinberg J (2010) Signed networks in social media. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. New York, NY, USA: ACM, CHI ‘10, 1361–1370.
  55. 55. Zafarani R, Liu H (2009). Social computing data repository at ASU website. Available: http://socialcomputing.asu.edu. Accessed 2013 Jan 14.
  56. 56. Palmer CR, Gibbons PB, Faloutsos C (2002) Anf: A fast and scalable tool for data mining in massive graphs. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY, USA: ACM, KDD ‘02, 81–90.
  57. 57. Fagiolo G (2007) Clustering in complex directed networks. Phys Rev E 76: 026107.