The weight of links in a network is often related to the similarity of the nodes. Here, we introduce a simple tunable measure for analysing the similarity of nodes across different link weights. In particular, we use the measure to analyze homophily in a group of 659 freshman students at a large university. Our analysis is based on data obtained using smartphones equipped with custom data collection software, complemented by questionnaire-based data. The network of social contacts is represented as a weighted multilayer network constructed from different channels of telecommunication as well as data on face-to-face contacts. We find that even strongly connected individuals are not more similar with respect to basic personality traits than randomly chosen pairs of individuals. In contrast, several socio-demographics variables have a significant degree of similarity. We further observe that similarity might be present in one layer of the multilayer network and simultaneously be absent in the other layers. For a variable such as gender, our measure reveals a transition from similarity between nodes connected with links of relatively low weight to dis-similarity for the nodes connected by the strongest links. We finally analyze the overlap between layers in the network for different levels of acquaintanceships.
Citation: Mollgaard A, Zettler I, Dammeyer J, Jensen MH, Lehmann S, Mathiesen J (2016) Measure of Node Similarity in Multilayer Networks. PLoS ONE 11(6): e0157436. https://doi.org/10.1371/journal.pone.0157436
Editor: Alain Barrat, Centre de Physique Théorique, FRANCE
Received: March 2, 2016; Accepted: May 31, 2016; Published: June 14, 2016
Copyright: © 2016 Mollgaard et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: All authors received funding through the University of Copenhagen UCPH 2016 Excellence Programme for Interdisciplinary Research. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Are two connected individuals more similar than a pair of strangers? Over the last decades, advances in data collection methods have provided new opportunities for research on human behavior  including the topic of homophily, i.e., whether a pair of connected individuals tends to be more similar than pairs of randomly selected individuals. For instance, it is now possible to observe social interaction across multiple channels, e.g., by combining data describing face-to-face contacts, with data from online social organizations or smartphone data [2–5]. Multiple networks formed from the simultaneous interaction in different channels are often called multiplex or multilayer networks . Homophily has been observed with regard to many different variables. Examples span across socio-demographic variables (e.g., age, gender, ethnicity), variables describing behavioral patterns (e.g., drinking behavior, smoking behavior, physical activity), variables representing attitudes, beliefs, or opinions (e.g., about politics and sport), and personality traits such as extraversion [7–12]. It is an open question though, if homophily is becoming more pronounced between stronger connected individuals. Here, we introduce an extended similarity measure with a tunable parameter, which allows us to check for homophily across links with a broad spectrum of weights. Based on the measure, we find a moderate degree of homophily with respect to behavioral patterns but no significant homophily with regard to the basic personality traits conscientiousness, agreeableness, and neuroticism.
Most commonly, homophily is investigated via likeability ratings about strangers, via a comparison of personality reports from a dyad, triplet etc. of acquaintances, or via network analyses. Recent studies based on personality reports by well-acquainted persons did find overlap between acquaintances concerning the levels of some of the basic personality traits [13–15]. Network studies focusing on observable variables such as gender or cigarette use have suggested that similarity in this regard is important for friendly acquaintanceship . Overall, research so far suggests similarity between pairs of friends or acquaintances, but the detailed conclusions concerning homophily tend to differ depending on the methodology. In addition, the similarity of nodes, as we shall see below, is strongly related to the strength of the link connecting them.
For an accurate understanding of homophily, a long-term and detailed monitoring of social networks is needed for several reasons. In order to reveal a complete picture of homophily, it is essential to gain insights into the similarity at all levels, e.g., from best friends, acquaintances, to people in the network one hardly likes or spends time with. These distinctions are possible in weighted network analyses.
Here, we investigate the similarity of connected individuals in a multilayer social network, with connections based on phone calls, text messages, and physical proximity (Bluetooth). We estimate the similarity between connected persons within a specified network with regard to socio-demographic variables (sex, age, body mass index), behavioral patterns (physical activity, alcohol drinking, and smoking behavior), attitudes concerning politics and religion, and, ultimately, basic personality traits in terms of the Big Five, i.e., conscientiousness (e.g., being organized, precise, thorough), agreeableness (e.g., being kind, sympathetic, warm), neuroticism (e.g., being anxious, moody, touchy), openness to experience (e.g., being creative, philosophical, unconventional), and extraversion (e.g., being active, sociable, talkative). We focus on the Big Five as personality traits since they reflect an ‘integrative descriptive taxonomy for personality research’ .
This work rests on a unique dataset. We have mapped out the social network between 659 freshman students starting in the year 2013 at the Technical University of Denmark and running over 24 months . Using state-of-the-art smartphones equipped with custom data collection software, we have collected the communication patterns within this densely connected population across a number of channels . Specifically, we measure telecommunication networks (phone calls, text messages), online social networks (Facebook connections and interactions), and networks based on physical proximity. The physical proximity networks are measured via the Bluetooth signal strength, and can be used as a proxy for face-to-face meetings . As a complement to the network data, we also collect information on geo-spatial mobility using GPS, as well as a number of more technical probes.
In addition to the automated data collection, we have also acquired extensive questionnaire-based data on participants’ personality and behavior, comprising the following questionnaires: Big Five Inventory , Rosenberg Self Esteem Scale , Narcissistic Admiration and Rivalry Questionnaire , Satisfaction With Life Scale , Rotter’s Locus of Control Scale , UCLA Loneliness scale , Self-efficacy , Perceived Stress Scale , Major Depression Inventory , The Copenhagen Social Relation Questionnaire , and Positive and Negative Affect Schedule , as well as several general health-, attitudes- and behavior-related questions.
Here, we consider three different types of social interaction networks based on calls, text messages, and physical proximity, respectively. We introduce a tunable link weight based on the strength of the interactions. To explain our definition of a link weight, let us start by considering the call network. The weight of a directed link from person i to person j is given by where nij represents the total number of accepted calls from person i to person j. Links therefore take a value in the interval, wij ∈ [0, 1], and the sum of weights of outgoing links from any person equals unity. The power α is used to test if homophily is more pronounced between individuals who interact more frequently than for individuals who do not interact that often. The case α = 0 corresponds to a network where all links have equal weight. For intermediate α values, we predominantly test for similarity on the strongest links and, ultimately, for large values, e.g., for α ≈ 2, we only consider the strongest out-going link for each individual. The network of text-messages (SMS network) is constructed in the similar fashion, but with nij, representing the number of text messages sent from person i to person j. From the data on physical proximity, we can determine the time a pair of individuals has spent together. We say that a person i has spent an amount of time Δt together with person j if two consecutive Bluetooth scans are separated by a time Δt and, in addition, both scans estimate person j to be within approximately three meters distance. The link weight between i and j is where Tij is the total time that j has been within the three meter limit of i. In general, the proximity data contains information about a large number of more or less random encounters during lectures and classes. In order to prevent that these encounters dominate our data, we make use of proximity data sampled only in the weekends or from 6pm to 12am during the weekdays. We place no such restrictions on the call and SMS data. Finally, we construct a symmetric weight from the two directed weights by taking the average weight of the two directed links. From these three types of interaction, we construct the corresponding networks, see Fig 1. Here the size and color of the nodes are determined by the sum of link weights connecting to the node, while the width of a link is given by the square root of the link weight. The visual representation reveals that the networks tend to be dominated by a relatively small set of links with strong weights.
The size of a node is determined by the sum of the weights of in-going links. The width and the darkness of a link is given by the square root of the link weight. For visual clarity, we show the nodes in the same positions in each panel and we do not show the weakest links (wij < 0.01).
In order to analyze homophily, we construct vectors (xi, xj, wij) for each link in the network where xi represents a variable (e.g., of a personality trait) associated with person i. The degree of homophily is estimated by a generalization of the intraclass correlation coefficient (ICC). The ICC quantifies the similarity of the variables xi and xj for the connected persons i and j in the network. Similarly to the Pearson correlation coefficient, the ICC is a measure of the tendency for xi and xj to assume similar values relative to their average value. Normally, the ICC is computed under the assumption that persons are either connected or not. Here we modify the ICC by including the weight of interactions wij between persons. The weighted ICC, here denoted by r, is then computed for a network, (xi, xj, wij), from the expressions, The auxiliary variable s measures the variance within the sample, including both variables xi and xj, and the variable t is a measure of the co-variance of xi and xj. Please note how the contribution to the variance for each link is weighted by wij. In general, the weighted correlation coefficient provides a basic measure of the importance of homophily in social interactions. In Fig 2, we show the ICC where all weights are proportional to the activity on the link, i.e., α = 1. The error bars are estimated using bootstrapping, where we for each value of α and for each network layer (Call, SMS, and BlueTooth), generate 10,000 reference networks by randomly reshuffling the links. We then measure the correlation coefficient in these reference network. The fraction of networks with an ICC larger than that of the true network provides us with a measure of the p-value.
The bars show the intraclass correlation coefficient for the different variables and for the networks formed from call activity, SMS activity and proximity data. The lines have a range of one standard deviation. We find no similarity with regard to the personality traits conscientiousness, agreeableness and neuroticism but we find a weak similarity with regard to extraversion and dissimilarity with respect to openness. In general a stronger similarity is found for socio-demographic, behavior- and attitudes-related variables.
We observe in Fig 2 that there is no pronounced homophily for the personality traits conscientiousness, agreeableness and neuroticism, even when we consider only the strongest links. In Fig 3, we test the importance of link strength by varying the parameter α, i.e., we test for homophily by considering all social interactions equally important (α = 0) or by weighting frequent interactions higher (α > 0). We see that for the Big Five personality traits, only extraversion have ICCs which are significantly different (p < 0.05) from zero in all layers. List of p-values for the computed ICCs are listed in S3 Text and a description of how the p-values are computed can be found in Materials and Methods. In the sms layer, the ICC for extraversion ranges from values around zero when all links have equal weights to values around 0.2 for α = 2.
The computed similarity is shown as function of the parameter α, which tunes the weight of the individual links in the networks. The case α = 0 corresponds to the case of links having equal weight, whereas increasing values of α enhances the contribution from the stronger links in the calculation, e.g., for α = 2 the strongest links of each individual dominates. The envelope is the estimate of the standard deviation (see text). In S3 Text, a table is included of p-values for the individual traits and for different α-values. In general, the p-values are large (>0.05) for the traits Conscientiousness, Agreeableness, Neuroticism and Openness. The estimates of Extraversion attains lower p-values, e.g. for the BlueTooth, p ≈ 0.01, and for SMS the p-value varies in the interval 0.02 − 0.08 for different α-values.
For both the extraversion and openness traits, the proximity and call layers result in ICCs that are lower than the ICC of the text message layer. The ICCs for agreeableness, conscientiousness and neuroticism are for almost all values of α not significantly different from zero and are bounded above by approximately 0.12.
Homophily is pronounced in the phone call network for the variables capturing smoking and drinking behaviors. Here the ICCs are significantly different (p < 0.05) from zero and achieve values larger than 0.3 in the call layer and values up to 0.2 in the sms layer. This is in contrast to the other variables in our study, where homophily is most pronounced in the sms layer. The variables representing attitudes concerning politics and religion show a weak or no correlation. Less surprisingly, we see an over-representation of social interaction between individuals of the same sex for calls and text messages when α = 0; the ICCs attain values around 0.2. Moreover, we observe that for increasing values of α, the stronger links in the text message network more frequently connect individuals of different sex, i.e. we see a transition from a positive ICC to a negative ICC as alpha is increased. Interestingly, albeit the correlation is slightly smaller, the proximity data at the same time shows that individuals with frequent face-to-face encounters tend to be of same sex.
Multilayer networks – The overlap between the three layers in the multilayer network can be estimated from the pairwise Pearson correlation coefficients rp, kℓ of the link weights in two layers Lk and Lℓ. (1) We find that for α = 1 the correlation coefficient is 0.75 between the call and SMS layers, 0.53 between the call and proximity layers, and 0.47 between the SMS and proximity layers. A similar approach has previously been suggested in Ref.  where, instead of the link weights, the degree of the nodes in the individual layers was considered. Using the link weights, we can now by tuning the parameter α test the overlap between the layeres for different levels of acquaintanceships. In Fig 4, we show the pairwise correlation between the three layers for different values of α. As expected there is a significant overlap between the layers, but they certainly also differ enough to be treated as more than a fluctuation of a single network. Interestingly, the overlap changes with the factor α, which opens a fundamental question in the analysis of multiplex networks. Which weights would be the right to use? The unweighted case α = 0 certainly leads to a correlation different than those of larger α values. In fact, strong links might not necessarily be present or strong in all layers, e.g. two persons that frequently communicate might prefer phone calls rather than text messages. At the intermediate range, interaction could be more equally distributed across the channels or layers. In other words, the degree of multiplexity in our network is tunable and depends on the perspective, whether strong or weak links should be favored. This observed sensitivity in overlap, could have implications for community detection algorithms on multiplex networks [31, 32] or for the structural reducibility of overlapping layers .
Based on the correlation between the link-weights, we can check the overlap between the individual layers of call, SMS, and proximity for varying values of α. We see that the multilayer features of the network changes with the link-weight exponent α and the overlap between the layers is maximal for intermediate values of α.
We further note that the proximity (Bluetooth) layer is more densely connected than the other layers, in particular because the participants in the study meet at more informal gatherings at the university campus or have encounters which could either be spontaneous or of less personal character such as study groups. This could be one reason for the weaker similarity seen for most of the variables in the proximity data in Figs 2 and 3.
Here, we have performed an extensive mapping of similarity in a large social network based on detailed records of social interactions over a time span of nearly two years. From the frequency of interactions, all links in the network are assigned a weight, which we have been able to tune in order to look for homophily across varying levels of acquaintanceships. We show that tuning the weights can reveal new features of the node similarity. For the variables describing alcohol use, cigarette use, and extraversion, we see that individuals are more similar when they interact strongly. In contrast if the weights are disregarded, we see little or no similarity. Interestingly, the similarity of individuals is not monotonically increasing with the frequency of interaction for all variables, e.g., the intraclass correlation coefficient with regard to gender transitions from postive to negative values. The analysis of our data does not provide any evidence that the basic personality traits agreableness, conscientiousness, neuroticism and to some degree openness are an important factor in the formation of social networks. In fact, we find a small or non-existing correlation between these personality traits and social interaction, even when we only consider individuals that interact very frequently. Finally, the measure, we have introduced, shows that the degree of muliplexity in our network is tunable as we vary the balance between weak and strong links.
Materials and Methods
In constructing the multilayer network, we include links from participants that meet minimum requirements with respect to the total time window in which they are active and their level of activity. In particular, we require that the data recording period is longer than 3 months and associated with at least 170 calls, 950 text messages and 200 hours of Bluetooth interaction. These numbers correspond to the typical social activity of a person during a 3 months period, which, we believe, is a reasonable time scale for the resolution of social behavior. These requirements reduce the dataset to 659 participants and is introduced to avoid the addition of noisy links in the network. The average user in the study has been active for 530 days, has been part of 952 phone calls, and has exchanged 5313 text messages. The average number of hours that a user has been in the proximity of others is 1073. The proximity network is based on asynchronous Bluetooth scans by each smartphone every 5 minutes, which are collected into 5 minute time-bins and symmetrized. Many of the recorded interactions are with people outside the study and can therefore not be included in the analysis of homophily. In the call and SMS data, the total weight of a single individual therefore depends on the fraction of calls or text messages that are directed to other participants in the study.
The significance of our estimated ICCs have been computed in the following way. For each value of alpha and each layer in the network, we generate 10,000 reference layers (i.e. networks) by shuffling the links within a layer. We then measure the ICC in these reference layers. The fraction of network layers with an intraclass correlation coefficient larger than that of the original network layer provides us with a estimated of the p-value. A table of all computed p-values have been included in S3 Text.
This study was reviewed and approved by the appropriate Danish authority, the Danish Data Protection Agency (Reference number: 2012-41-0664). The Data Protection Agency guarantees that the project abides by Danish law and also considers potential ethical implications. All subjects in the study provided written informed consent.
S1 Text. Description of data.
Short description of the format of the data.
S2 Text. Data file.
Data used in the analysis in the manuscript.
Conceived and designed the experiments: AM IZ JD MHJ SL JM. Performed the experiments: IZ JD SL. Analyzed the data: AM. Wrote the paper: AM IZ JD SL JM.
- 1. Borgatti SP, Mehra A, Brass DJ, Labianca G. Network analysis in the social sciences. science. 2009;323(5916):892–895. pmid:19213908
- 2. Eagle N, Pentland A. Reality mining: sensing complex social systems. Personal and ubiquitous computing. 2006;10(4):255–268.
- 3. Cattuto C, Van den Broeck W, Barrat A, Colizza V, Pinton J, Vespignani A. Dynamics of Person-to-Person Interactions from Distributed RFID Sensor Networks. PLOS ONE. 2010 07;5(7):e11596. pmid:20657651
- 4. Aharony N, Pan W, Ip C, Khayal I, Pentland A. Social fMRI: Investigating and shaping social mechanisms in the real world. Pervasive and Mobile Computing. 2011;7(6):643–659.
- 5. Stopczynski A, Sekara V, Sapiezynski P, Cuttone A, Madsen MM, Larsen JE, et al. Measuring Large-Scale Social Networks with High Resolution. PLoS ONE. 2014 04;9(4):e95978. pmid:24770359
- 6. Kivelä M, Arenas A, Barthelemy M, Gleeson JP, Moreno Y, Porter MA. Multilayer networks. Journal of Complex Networks. 2014;2(3):203–271.
- 7. Crandall CS, Schiffhauer KL, Harvey R. Friendship pair similarity as a measure of group value. Group Dynamics. 1997;1(2):133–143.
- 8. Currarini S, Jackson MO, Pin P. Identifying the roles of race-based choice and chance in high school friendship network formation. Proceedings of the National Academy of Sciences. 2010;107(11):4857–4861.
- 9. Kandel DB. Similarity in real-life adolescent friendship pairs. Journal of Personality and Social Psychology. 1978;36(3):306–312.
- 10. Rushton JP, Bons TA. Mate Choice and Friendship in Twins: Evidence for Genetic Similarity. Psychological Science. 2005;16(7):555–559. pmid:16008789
- 11. Kurtz JE, Sherker JL. Relationship quality, trait similarity, and self-other agreement on personality ratings in college roommates. Journal of personality. 2003;71(1):21–48. pmid:12597236
- 12. Stehlé J, Charbonnier F, Picard T, Cattuto C, Barrat A. Gender homophily from spatial behavior in a primary school: a sociometric study. Social Networks. 2013;35(4):604–613.
- 13. Cohen TR, Panter AT, Turan N, Morse L, Kim Y. Agreement and similarity in self-other perceptions of moral character. Journal of Research in Personality. 2013;47(6):816–830.
- 14. Lee K, Ashton MC, Pozzebon JA, Visser BA, Bourdage JS, Ogunfowora B. Similarity and Assumed Similarity in Personality Reports of Well-Acquainted Persons. Journal of Personality and Social Psychology. 2009;96(2):460–472. pmid:19159143
- 15. Paunonen SV, Hong RY. The many faces of assumed similarity in perceptions of personality. Journal of Research in Personality. 2013;47(6):800–815.
- 16. Delay D, Laursen B, Kiuru N, Salmela-Aro K, Nurmi JE. Selecting and retaining friends on the basis of cigarette smoking similarity. Journal of Research on Adolescence. 2013;23(3):464–473.
- 17. John OP, Naumann LP, Soto CJ. Paradigm shift to the integrative big five trait taxonomy. Handbook of personality: Theory and research. 2008;3:114–158.
- 18. Mollgaard A, Mathiesen J. The Dynamics of Initiative in Communication Networks. PloS one. 2016;11(4): e0154442. pmid:27124493
- 19. Sekara V, Lehmann S. The strength of friendship ties in proximity sensor data. PLOS One. 2014;9(7): e100915. pmid:24999984
- 20. Rosenberg M. Society and the adolescent self-image (rev). Wesleyan University Press; 1989.
- 21. Back MD, Küfner AC, Dufner M, Gerlach TM, Rauthmann JF, Denissen JJ. Narcissistic admiration and rivalry: Disentangling the bright and dark sides of narcissism. Journal of Personality and Social Psychology. 2013;105(6):1013. pmid:24128186
- 22. Diener E, Emmons RA, Larsen RJ, Griffin S. The satisfaction with life scale. Journal of personality assessment. 1985;49(1):71–75. pmid:16367493
- 23. Rotter JB. Generalized expectancies for internal versus external control of reinforcement. Psychological monographs: General and applied. 1966;80(1):1.
- 24. Russell DW. UCLA Loneliness Scale (Version 3): Reliability, validity, and factor structure. Journal of personality assessment. 1996;66(1):20–40. pmid:8576833
- 25. Sherer M, Maddux JE, Mercandante B, Prentice-Dunn S, Jacobs B, Rogers RW. The self-efficacy scale: Construction and validation. Psychological reports. 1982;51(2):663–671.
- 26. Cohen S, Kamarck T, Mermelstein R. A global measure of perceived stress. Journal of health and social behavior. 1983;p. 385–396. pmid:6668417
- 27. Bech P, Rasmussen NA, Olsen LR, Noerholm V, Abildgaard W. The sensitivity and specificity of the Major Depression Inventory, using the Present State Examination as the index of diagnostic validity. Journal of affective disorders. 2001;66(2):159–164. pmid:11578668
- 28. Lund R, Nielsen LS, Henriksen PW, Schmidt L, Avlund K, Christensen U. Content Validity and Reliability of the Copenhagen Social Relations Questionnaire. Journal of aging and health. 2014;26(1):128–150. pmid:24584264
- 29. Watson D, Clark LA, Tellegen A. Development and validation of brief measures of positive and negative affect: the PANAS scales. Journal of personality and social psychology. 1988;54(6):1063. pmid:3397865
- 30. Nicosia V, Latora V. Measuring and modeling correlations in multiplex networks. Physical Review E. 2015;92(3):032805.
- 31. Mucha PJ, Richardson T, Macon K, Porter MA, Onnela JP. Community Structure in Time-Dependent, Multiscale, and Multiplex Networks. Science. 2010;328(5980):876–878. Available from: http://science.sciencemag.org/content/328/5980/876. pmid:20466926
- 32. De Domenico M, Lancichinetti A, Arenas A, Rosvall M. Identifying Modular Flows on Multilayer Networks Reveals Highly Overlapping Organization in Interconnected Systems. Phys Rev X. 2015 Mar;5:011027. Available from: http://link.aps.org/doi/10.1103/PhysRevX.5.011027.
- 33. De Domenico M, Nicosia V, Arenas A, Latora V. Structural reducibility of multilayer networks. Nature communications. 2015;6.