Competition between Homophily and Information Entropy Maximization in Social Networks

In social networks, it is conventionally thought that two individuals with more overlapped friends tend to establish a new friendship, which could be stated as homophily breeding new connections. While the recent hypothesis of maximum information entropy is presented as the possible origin of effective navigation in small-world networks. We find there exists a competition between information entropy maximization and homophily in local structure through both theoretical and experimental analysis. This competition suggests that a newly built relationship between two individuals with more common friends would lead to less information entropy gain for them. We demonstrate that in the evolution of the social network, both of the two assumptions coexist. The rule of maximum information entropy produces weak ties in the network, while the law of homophily makes the network highly clustered locally and the individuals would obtain strong and trust ties. A toy model is also presented to demonstrate the competition and evaluate the roles of different rules in the evolution of real networks. Our findings could shed light on the social network modeling from a new perspective.


I. INTRODUCTION
The last decade has witnessed tremendous research interests in complex networks [4,22,36], including the evolution of social networks [13,20,23,24,30]. It has been found that in many social networks from different circumstances, the probability of having a friend at a distance r is p(r) ∝ r −1 , which is stated as the spacial scaling law [7]. Recent work [11] presents a possible origin that explains the emergence of this scaling law with the hypothesis of maximum information entropy with energy constrains. The authors assume that human social behavior is based on gathering maximum information through various activities and making friends is one of them. However, it is also found conventionally that homophily leads to connections in social networks [5,10,13,15,18,20,27,28,32]. Homophily is the principle that a contact between similar individuals occurs at a higher rate than among dissimilar ones [20]. For instance, in social networks, two individuals with more common friends are easier to get connected, where the number of overlapped friends could represent the strength of homophily. Both of the above rules might drive the growth of the network in local structure simultaneously, however, to our best knowledge, little has been done to unveil the relationship between them. In this paper, we try to fill this gap from the perspective of network evolution in local structure. A social network can be modeled as a simple undirected graph G(V, E), where V is the set of individuals (nodes) and E is the set of friendships (ties) among them. As shown in FIG. 1a, node 1 may obtain information from nodes 2, 3, 4 and their friends 5, 7. Therefore, as defined in [11], the information sequence for node 1 is {2, 3, 4, 5, 7} and the frequency of each node appears in the sequence is q 2 = q 3 = q 4 = q 5 = q 7 = 1/5 for nodes 2, 3, 4, 5 and 7 respectively, while q 6 = 0 for node 6. Then the information entropy for node 1 can be obtained as

II. THEORETICAL ANALYSIS
Next, we assume the social network evolves to the one as shown in FIG. 1b under the rule of homophily. For example, node 1 and node 5 may establish a new friendship because they share the common friend node 2. Therefore, the updated information sequence for node 1 is {2, 3, 4, 5, 5, 7, 2} currently. Then the new frequency of each node appears in the sequence is q 2 ′ = q 5 ′ = 2/7, q 3 ′ = q 4 ′ = q 7 ′ = 1/7, and q 6 ′ = 0. We recompute the information entropy of node 1 as depicted above and obtain It can be easily observed that ∆ǫ(1) = ǫ ′ (1) − ǫ(1) < 0 after node 1 built a new tie with node 5, which means in the evolution dominated by homophily, the information entropy for node 1 decreases. It is an intuitive observation that the rule of homophily is incompatible with the law of maximum information entropy, and a general explanation is introduced as follows. Note that here we mainly discuss the network evolution in local structure, in which ties are newly built only with nodes two hops away. Because of this, with the aim of simplification, conditions of limited energy and nodes' distances are not considered in the following analytical framework. Besides, the magnificent development of the online social network has facilitated our daily social activity greatly [1,21], so here the cost of establishing a new tie is assumed to be a constant and it is independent to the distance in social networks.
We define n(i) as the set of individual i's initial friends and k i is i's degree, i.e., the number of its friends. Then the set of overlapped friends between i and j is c(i, j) = n(i) ∩ n(j) and c ij = |c(i, j)| is the number of their common friends. We define U = ∪ q∈n(i) n(q) ∪ n(i). We also define Ψ = {j} ∪ c(i, j), where j is a random individual appearing in i's information sequence s(i) and j / ∈ n(i). Based on the definition of information entropy in [11], we can obtain the information entropy for node i is where n q is the count that q appears in s(i) and s i is the length of s(i). Since we mainly investigate the evolution in local structure, here only friends of i and friends of its friends are considered during the computation of the entropy. Then we assume that a new friendship is established between i and j and the current entropy for i is where which is the length of the updated information sequence, where k j is the initial degree of j. Therefore, the change of entropy for i caused by the new tie with j, i.e., ∆ǫ(i) j = ǫ ′ (i) j − ǫ(i) j could be rewritten as Assume and Then for Equation (3) we have (for details, see Appendix), Suppose that k j is fixed, it can be easily obtained that as c ij grows, ∆ǫ(i) j decreases. Given the network is undirected, so this conclusion is also proper for j. Then we can conclude that if we build a new tie between i and j, the information entropy gain ∆ǫ(i, j) = ∆ǫ(i) j +∆ǫ(j) i produced by this new friendship for the two nodes decreases as c ij increases. It tells us that for the nodes with more common friends, establishing a new tie between them produces less information entropy gain for them. Be brief, there is a competition between homophily and information entropy in breeding a new connection. Note that ∆ǫ(i, j) declining with c ij might be very slow, because generally s ′ i is much greater than c ij . In fact, the information entropy for i represents the diversity of its information sources.
If we create ties between i and other nodes who have overlapped friends with it, these nodes will appear more frequently in its information sequence and even become the dominating sources of the information. Then the diversity of the information source is weaken and the gain of the information entropy decays accordingly.

III. EMPIRICAL ANALYSIS
In order to validate the above analysis, we employ several data sets, including both synthetic and real-world networks, for further empirical study. The synthetic data sets are generated by BA [2], Small World [31] and CNNR [32] models. BA is a classic model to generate scale-free networks with the mechanism of preferential attachment. We denote the data set it generates as BA (N, m), where N is the size of the network and m is the number of initial ties that would be connected when a new node is added. Small World model is a random model with probability p to rewire and produce long range ties, it can be denoted as SW (N, K, p), where 2K is the averaged degree. CNNR model is modified from CNN [28] for generating social networks, especially online social networks. We denote it as CNNR(N, u, r), where u(1 − r) is the probability to covert the potential edges into real ties. The averaged degree of the network it generates is approximately 2/(1 − u). The real-world data sets come from different fields. For example, CA-HepPh is a collaboration network from the e-print arXiv [? ] and covers scientific collaborations between authors of papers submitted to High Energy Physics [17]. NewOrleans is the Facebook network in New Orleans [29]. Email-Enron is an email communication network that covers all the email communication within a data set of around half million emails [16]. The basic properties of theses data sets we utilize in following experiments are listed in Tab. I.
As discussed before, establishing a new friendship may affect the entropy of the both ends.
In the above networks, we characterize the relation between c ij and ∆ǫ(i, j) in the following steps: For each tie between i and j, we first obtain ǫ ′ (i) j + ǫ ′ (j) i in the origin network; Secondly, we delete this tie and get ǫ(i) j + ǫ(j) i ; Thirdly, the tie is restored. For different ∆ǫ(i, j) for the same c ij , we get the maximum, mean and minimum values, respectively.
The change of entropy for other nodes in the network is not considered here for the reason that we assume the establishment of a tie between i and j is a personal activity with local information solely. As shown in FIG. 2, in all networks, ∆ǫ(i, j) decreases as c ij grows, which is consistent with our above analysis, especially for the small world network in FIG. 2b. At the start stage, the diverge between the maximum and mean of ∆ǫ(i, j) is large, then it decays quickly as c ij increases. It is also observed that for the nodes with tremendous common friends, building a new friendship between them may even lead to entropy loss.
To sum up, the empirical results testify our statement further that increment of homophily would reduce the information entropy gain, which indicates a competition between the two  established to gain more entropy for both ends. Therefore, we could distinguish the tie that makes the entropy of its ends gain as the positive tie, while the one that leads to entropy loss as the negative tie. Then we define the positiveness of the social network as the fraction of positive ties, which is denoted as τ . Larger τ means more ties in the network are established to increase their ends' entropy gain. As shown in Tab. II, we list τ of the real-world network, where c is the clustering of the network. It is interesting that for the network with higher c, its τ is lower generally. We also investigate this finding on the network with various clusterings generated by BA and Small World models. For the BA model, we employ the method of tuning clustering while keeping its degree distribution stable [14,19].
For this reason, with respect to the rule of homophily, a new tie added preferentially between nodes with overlapped friends would also lead to new triangles constructed in local structure.
That is to say, the clustering of the network, i.e., c, would be increased when its evolution is driven by the homophily. Because of this, homophily dominated evolution leads to the decrement of τ . However, with respect to the information entropy maximization, the new tie is established to increase the diversity of the information source and gain more entropy, which would improve τ by importing more positive ties.
The strength of a social tie can be defined as the number of overlapped friends between   its ends. For example, the strength of a tie between i and j could be defined as [3,12,35], where lower w ij stands for a weak tie. It is obvious that if i and j share a lot of common friends, the strength of the tie between them is strong.
Conventionally, it is thought that the weak tie is helpful in getting the new information [9], while the strong tie means the relationship is trustful [21]. Therefore, based on the above discussion, it seems that the evolution supervised by homophily could lead to generations of strong ties in the network, because it renders the network highly clustered. In order to validate this, we observe the cumulative distribution function(CDF) of w ij for each tie in the network. As shown in FIG. 4, as c of the network decreases, the CDF curve moves to the left, which indicates the increment of the fraction of weak ties [33]. It validates our conjecture that in both synthetic and real-world data sets, highly clustered networks caused by homophily contain more strong ties, while the ones with lower clusterings contain more weak ties, which are produced by the law of maximum information entropy.

V. CONCLUSION AND FUTURE WORK
In summary, both theoretical analysis and experimental results show that the rule of homophily is competing with the law of information entropy maximization in social networks.
Moreover, the rule of homophily driven evolution makes the network highly clustered and increases the certainty of the information source for a node. Contrarily, the rule of maximum entropy leads to the diversity of information sources. Based on the definition of weak ties, we can conclude that the rule of maximum information entropy leads to the generation of weak ties in the network, while the homophily produces strong ties between nodes with overlapped friends. Corresponding to the fact that both the weak and strong ties coexist in the network, we conjecture that both of the evolving rules might coexist in growth of the social networks. Therefore, in the view of maximum information entropy, the social network is not efficient, however, it owns many strong ties which may deliver trust information. Our findings could provide insights for modeling social network evolution as a competition of different rules.
Given the tremendous development of the online social network, the cost of social activity in the epoch of the Internet continues to decrease [1,21]. Because of this, we neglect the cost of establishing ties of different strengths for simplifying the analytical framework in this paper. While in the real world, the social activity is constrained by the personal cognition limit and social cost [26] and the Dunbar's number [6] still exists in the online social network [1, 8,34]. Hence in the future work, we would take the cost of establish different ties into consideration and build an evolution model of social networks based on the competition of strong and weak ties.