Understanding the process of social network evolution: Online-offline integrated analysis of social tie formation

It is important to consider the interweaving nature of online and offline social networks when we examine social network evolution. However, it is difficult to find any research that examines the process of social tie formation from an integrated perspective. In our study, we quantitatively measure offline interactions and examine the corresponding evolution of online social network in order to understand the significance of interrelationship between online and offline social factors in generating social ties. We analyze the radio signal strength indicator sensor data from a series of social events to understand offline interactions among the participants and measure the structural attributes of their existing online Facebook social networks. By monitoring the changes in their online social networks before and after offline interactions in a series of social events, we verify that the ability to develop an offline interaction into an online friendship is tied to the number of social connections that participants previously had, while the presence of shared mutual friends between a pair of participants disrupts potential new connections within the pre-designed offline social events. Thus, while our integrative approach enables us to confirm the theory of preferential attachment in the process of network formation, the common neighbor theory is not supported. Our dual-dimensional network analysis allows us to observe the actual process of social network evolution rather than to make predictions based on the assumption of self-organizing networks.


Introduction
In an age of perpetual digital connectedness, our ways of bonding and maintaining relationships heavily depend on online social networking. More than one billion users around the world are actively networking through Facebook, one of the most prominent social networking services, to promote their online social existence and connections [1]. Social networking services (SNS) provide powerful tools for generating social capital, as they allow users to develop new connections and expand their personal networks [2]. Here, possessing a durable social network can provide network participants with otherwise unattainable resources, such as access to information, financial gains and psychological well-being [3,4,5]. As social a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Thus, to observe the actual process of building new links, we study not only factors from existing online network structures but also offline interactions that promote social linkages.
In this study, we analyze the offline interaction data from a series of 7 monthly wine parties (offline social events) which measured the interaction between the participants using radio signal sensors. We also collected information on online network factors before and after the events to reveal the process of social network evolution. By observing the link formation process through both aspects of our social networks, we verify the effects of prominent prediction models: the preferential attachment model explaining new link formations via the degree of participants and the common neighbor effect that explains link formations via mutual friendships.

Theoretical background
Dual-dimensional online and offline social network model In our research, we examine a multilayered online-offline network [14,15] to confirm the effect of online predictors with offline social interaction (Fig 1). As the embedded resources in a network can be accessed for social benefits [16], we examine how the social structural resources from existing online networks affect interaction behaviors at offline events and the final resulting online social network. These embedded resources within a network structure are measured through degree centrality [17], which captures the effects of preferential attachment, and mutual friendships between a pair of social participants, which capture the effects of common neighbors. The number of online connections among individuals, which measures their degree centralities and presence of mutual friends on Facebook between the pair at a social event, acting as a bridge between the two individuals, may affect the connection between the pair [8].

Preferential attachment model
Previous studies have suggested that the more connected a network node is or the greater degree centrality a node has, the more likely it is to receive new connections [18]. This theory, i.e., the preferential attachment model, is one of the most prominent models that incorporate the growth of a network and preferential or biased network connections based on a node's attributes. If a social network participant were to connect with another participant randomly, the probability of connecting with a participant would be proportional to his/her degree [19]. The preferential attachment explains a positive feedback loop where initial differences in degree centrality are automatically reinforced, eventually magnifying the differences. Online network link formation can be predicted using prior online factors such as degree and the presence of mutual friends. These factors also affect offline social interaction at a social event, which in turn will directly affect the online link formation. https://doi.org/10.1371/journal.pone.0177729.g001 Although there have been ample discussions of preferential attachment at the level of online social networks [18,20,21], the underlying mechanism explaining how this model develops at the offline level has not been well-explored. It is expected that the offline interaction between participants will significantly affect the formation of a new online social tie between them, and that it is less likely for online social ties to form without this offline interaction. In other words, we expect that the evolution of online social networks significantly depends on the offline interaction that is not reflected in the online network layer. Many studies indicate that online networking sites are used to maintain and reinforce offline relationships, as opposed to establishing new relationships online [5,22]. There is a tendency for individuals to group when interpersonal relationship are positive, which indicates that frequent positive interactions lead to the formation of a network [23,24]. Furthermore, Sykes' research suggests that previous acquaintances have a strong effect on the frequency of interaction, demonstrating that prior social connections can influence new connections [25]. Offline interaction may be a significant underlying mechanism of preferential attachment, which we empirically explore through randomized controlled experiments of offline social events.

Effect of common neighbors
In addition to preferential attachment, people tend to make social connections by introducing pairs of their friends to one another, thus completing triangles in a network and increasing the clustering coefficient [26]. Newman constructed a growth model of a social network in which one preferentially makes links between pairs of individuals who have one or more common neighbor, or mutual friends. The network shows clear clusters of nodes that share many connections internally, and fewer shared outside the group. The common neighbor analysis reflects a mechanism for local community formation in social networks in which network participants introducing pairs of their friends to one another induce cliquishness and social groupings.
This common neighbor theory also significantly explains the evolution of social networks but explores the issue from a one-dimensional network perspective [21]. By incorporating offline social interaction, we can understand the process of how common neighbor effects drive the formation of social ties between once-unknown participants.
We believe that the presence of mutual friends is crucial in terms of introducing and bringing together participants who are not yet connected. However, in a social setting where all participants have chance to meet and interact with one another regardless of having common acquaintances, the effect of common neighbors can provide contradictory results. On the one hand, the presence of mutual friends may positively affect the interaction between participants not yet connected, as mutual friends can mediate and induce interaction between them. On the other hand, if the role of mutual friends is solely providing a chance for unlinked strangers to meet each other, then the presence of mutual friends may have no significant effect in an offline social setting where all participants have a chance to meet one another on their own. Moreover, the presence of mutual friends may provide distraction for the pair of interacting participants [27,28]. Mutual friends may indirectly hinder the interaction between participants who are not yet linked by competing for interaction time within the limited duration of our social events.
Introducing intermediary offline social interactions into the network evolution will illuminate the underlying mechanisms for these online social predictors. By taking offline social behavior into account, we can verify and explain the process of these online predictors affecting online social tie formation in a multilayered on-offline social network. We will conduct a regression analysis with new tie formation between a pair of interacting social participants as the dependent variable to understand how online predictors and offline interaction affect network evolution.

Data structure
Our research is primarily based on the 7 monthly social networking events hosted by AtDusk, a social event organizing company, from November 2011 to May 2012 on the first Fridays of the months. AtDusk promoted the social events through Facebook advertisements in both Korean and English. Each event lasted approximately 3 hours and had approximately 30 participants in attendance, including many local business owners, professionals, and undergraduate and graduate students in Daejeon, Korea ( Table 1). The series of social events consisted of participants each carrying a radio signal strength indicator device (RSSI), compatible with IEEE 802.15.4 Protocol (Zigbee), which detects the strength of radio signals from other participants' devices to measure the proximity and duration of interactions (Fig 2). The RSSI devices were used to detect social proximity because they can cover almost 180 degrees of forward-facing direction to detect all interacting social participants, including those talking side by side. Moreover, radio signal strength decreases when blocked by a human body, which we used to our advantage, to detect only forward-facing interactions when sensors were worn as necklaces. In this way, the radio signal strength indication can show stronger signal transmission between conversation members than with non-members at the same distance.
Four transmission receiving stations were installed in each corner of the event location to control the interaction proximity. The measured values were partitioned into various conversational groups using the K-means clustering method. The k centroids were selected among the nodes of individual attendees, and because the number of groups (k) had to be predetermined, k was updated proportionally to the total number of attendees at the event.
We were able to detect which two or more nodes were within the same cluster, signifying the social interaction among individuals within the same groups. When compared to the ground truth results from the manual annotation of two months of event video recordings with no blind angles, our data showed a random index value of 0.872 [29], which indicates that the sensor-based method demonstrates high accuracy in measuring offline interaction for our research purposes.
We have not only acquired the participants' offline social interaction measurements from the 7 social networking events, as mentioned above, but also followed the changes in their online networks using a crawler scripted in Sikuli, a Python-based compiler that uses images to recognize graphical user interface, to crawl publicly available personal and friendship information on Facebook [30]. Although Facebook provides publicly available demographic data of all users through their Facebook API, they do not provide information on the date that the friendship was made. Our Sikuli script generates URLs of Facebook friendship pages between all possible pairs of participants who have attended the social events using their Facebook ID. The crawler goes through all possible friendship pages to detect pairs' friendships and dates of friendships by recognizing and recording the names and dates of friendships when available. The names of the participants, Facebook IDs and the generated URLs are automatically deleted after the data collection, and undisclosed to the researchers to protect participants' privacy. With the collected data, we modeled existing online networks prior to each event and online networks after each event. Within the bounds of the Facebook privacy settings of the participants, we were able to collect online samples from 244 unique participants with 3703 possible interaction pair occurrences (unbalanced panel data) throughout 7 social events.

Privacy considerations
All of the social event participants were informed in advance by AtDusk, the social event organizing company, through promotional materials, written informational document and oral explanation in Korean or English that interaction sensing technology would be deployed at the event and their publicly available Facebook data would be gathered, to provide event participants their personalized interaction statistics via AtDusk Partylytics Facebook application [31]. The participants were able to enter the social events after they have given the verbal consent and registered into Partylytics, using their Facebook ID and password. In June 2016, this interaction data was disclosed to us, researchers for researching purposes with the consent from Atdusk and the event participants at the selected social events. Thus, the given data was purely an observational data, collected on the real social event settings organized by the company, AtDusk. Therefore, there were no instructions for the participants to perform specific tasks regarding the experiment, and the company has only monitored real interaction behavior in real life situations. Thus, the social event participants were not experiment participants, as they came to the events to enjoy themselves [29]. Their relevant demographic information, such as gender, spoken language, and friendship ties only between the social event participants, was collected through AtDusk and publicly available data on Facebook, which does not violate participants' privacy. The names, Facebook IDs of the participants, and the generated URLs, although publicly available on Facebook, were automatically deleted through our scripted crawler after recognizing their social ties on Facebook. All of the data used in this research complies with the terms of services of Facebook and AtDusk. We did not have any access to the participants' private information which may allow us to verify individuals who have participated in the series of social events.

Variables
Dependent variables. Our dependent variable is the newly made social linkage, Made Friends, of the 3703 pairs. This binary variable is measured via Facebook friendship data. If a friendship is created within one month after the date of an event between a pair who participated, the value is recorded as 1; otherwise, it is 0. There were 240 occurrences of Made Friends between 3703 possible pairs over 7 social events (Table 1).
Independent variables. Our independent variables are the online Degree of a pair of event participants in the online social network prior to the social event, the presence of online Mutual Friends shared by a pair, and the offline Interaction between a pair of nodes in an offline social event. These variables best represent major factors of online and offline social networks. The Degree variable between a pair is calculated by measuring the combined degree centralities of 2 nodes of the pair while subtracting the number of mutual friends shared by the pair which is counted twice, and excluding the connection between the pair if it is present. The Degree of a pair measures all participants connected to the pair through the online social network prior to the social event. Among our studied 3703 pairs over 7 social events, there are 12.59 participants on average linked to a given pair at a social event.
The presence of online mutual friends at an attended social event, Mutual Friends at the social event, is a binary variable. When at least one mutual Facebook friend is present at an event between a pair of participants, this value is 1; otherwise, it is 0. Because of an excess of 0 mutual friends shared by observed pairs, normalization of the distribution was not possible, and thus, the value was converted to a binary variable. There were 1502 cases of the presence of at least one mutual friend between 3703 pairs.
The Interaction variable between pairs was calculated utilizing data from RSSI sensors as mentioned above. This variable measures the total duration of interaction between a pair of participants. The interaction duration was measured every 10 seconds to prevent 'pass-bys' without actual interaction affecting our data. We also used log scale to prevent skewed distribution, and the Interaction variable was 90.78 (approximately 908 seconds) on average and in log scale, 1.751 for 3703 possible pairs. Control variables. Since the Made Friends variable cannot be 1 when there is an already existing friendship between a pair, the Already Friends binary variable was used to control the regression outcome. The Already Friends variable between a pair of participants is 1 when a Facebook friendship has been made prior to the pair's event attendance; otherwise, it is 0. A total of 805 cases of friendship existing prior to social events were observed in 3703 pairs.
As there were 7 consecutive but separate social events with some overlapping attendees and some new participants, the Party ID dummy variable was used to normalize our panel data. A total of 561, 630, 351, 666, 595, 435 and 465 cases of pair occurrences were observed. Demographic dummy variables Gender and Language were used to control the regression analyses. Gender was controlled as the interaction and Facebook friendship status may be affected by gender due to potential social status and romantic relationships. Language was controlled as the series of events were conducted in Daejeon, Korea targeting Korean and non-Korean international participants, and there may be possible effects of language barrier hindering participants from communicating with one another. Gender was categorized in 3 groups: femalefemale pair (FF), male-female pair (MF) and male-male pair (MM). A total of 413, 1662 and 1628 cases, respectively, were observed. Language used between a pair was categorized in 6 groups: Korean-Korean pair (KK), Korean-English pair (KE), Korean-bilingual pair (KI), bilingual-bilingual pair (II), bilingual-English pair (IE) and English-English pair (EE). There were 648, 1409, 400, 56, 450 and 740 cases, respectively (Table 1).

Analysis models
To test our hypotheses, we used panel logit regression model to understand the effects of predictors from prior online network structure and offline interactions on the formation of social ties in the resulting online network.
Our main regression analysis utilizes panel logit regression with Made Friends as the dependent variable.
L ij,t is the Made Friends binary variable between a pair i and j at social event t. B 0 is the intercept estimate of the model. B 1 is the coefficient of the Degree variable, which measures the degree centrality of a pair i and j (k ij,t-1 ), measured prior to the social event t. y ij,t is the variable for offline Interaction, and B 2 is its coefficient. B 3 is the coefficient of the binary Mutual Friend variable, x ij,t . B 4 is the coefficient of all control variables, Already Friends, Gender, Language and Party ID (z ij,t ). E ij,t is the error term of the regression. The analysis results will explain the influence of online predictors and offline interaction on new social tie formation from the pair perspective.
Since there may be causal effects of prior online network structure on offline interaction, we also perform panel linear regression with offline Interaction as the dependent variable to further verify the influence of online factors on offline behaviors.
y ij,t is the Interaction between a pair i and j at social event t, as mentioned above. a 0 is the intercept estimate of the model. a 1 is the coefficient of Degree centrality of a pair i and j (k ij,t-1 ). a 2 is the coefficient of Mutual Friends for the social event variable, x ij,t . a 3 is the coefficient of the control variables, Already Friends, Gender, Language and Party ID (q ij,t ). v ij,t is the error term for this regression. Based on the influence of online factors on the offline interaction, we can further investigate the reasons for our results from the primary regression.

Correlation analysis
According to the correlation coefficients, there is no significant correlation problem. There are signs of slight collinearity between online Degree, Mutual Friends at the social event, and Already Friends variables (Table 2). This occurs, because the pure chance of having mutual friends and a pair already being friends at a given social event increase as the number of connections to Facebook nodes is more readily available, which accounts for the Degree variable. Model 7 has the highest log-likelihood, indicating the best fit of the 7 models. This result verifies that the online degree centralities of the prior network can be used to predict social network formation while disproving the readily accepted notion of common neighbors within a network structure having a positive influence on new social link formation. According to the results based on our control variables, the Gender variable heavily affects new link formation, where a female-female pair is less likely to become friends than malefemale and male-male pairs. There was no serious effect based on Language, as the results only showed a higher likelihood for Korean-bilingual pairs making friends.

Panel linear regression analysis (DV: Offline interaction)
To understand the process of a prior online network influencing offline behavior, which in turn affects the final network formation, a panel linear regression with the dependent variable as the log scale Interaction was performed ( Table 4). As our data was gathered through 7 different social events at different time periods, we used Party ID dummy variables to produce a time-fixed model. There were 3 regression models: Model 1 and 2, which incorporate one of our two independent variables, online Degree of a pair and Mutual Friends at the social event, and Model 3, which incorporates both independent variables. Model 1 tests for the direct relationship between the online Degree of a pair and their offline Interaction. There was a slight negative and significant relationship between the amount of Interaction and the Degree of the observed pair at -0.0123 (p<0.01), indicating that the number of connections that a pair has is negatively associated with the amount of interaction between the pair. Model 2 tests for the relationship between the presence of Mutual Friends at the social event and offline Interaction, which showed no significant relationship between the presence of mutual friends and interaction. Model 3 analyzes both online Degree and the presence of Mutual Friends in relation to offline Interaction between the pair of participants. The Degree variable retains its negative significance at -0.0130 with p<0.01, while Mutual Friends at the social event is nonsignificant, as in Model 1. Factors from the prior social network structure affecting offline behavior were selective. The adjusted R 2 value of the Model 3 is the highest, which indicates that Model 3 is the best-fitting model. One of the studied variables, Already Friend, shows that interaction occurs more between pairs who are already friends on Facebook, with positive significance at 0.359 (p<0.01). Participants who are already friends tend to interact more with each other with those who are not.
The control variables, Gender and Language, showed significant effects throughout the analysis. The Gender dummy variable showed that female-female pairs had much less interaction, while male-female and male-male pairs had greater interaction. The Language variable showed cultural differences between international and Korean participants, as English-English speaking participant (EE) pairs showed the most interaction, while Korean-Korean pairs (KK) showed the least, possibly because of cultural reasons. The results for fluent bilingual participants (ethnically Korean) show that they interacted with English-speaking participants as did those in English-English pairs, showing relatively high interaction, but interacted less with Korean and other bilingual participants (KI and II). In general, the Language variable did not reveal less interaction due to language barriers, as we predicted; rather, it showed social behavior differences between Korean and non-Korean cultures.

Discussion
The main goal of this study was to articulate the effects of online predictors with offline interaction for social tie formation. According to the outcomes of our study, social predictors from existing online networks can be used to predict network evolution through the conventional preferential attachment model, which relies on the degree centralities of network participants [19]. This finding indicates that the degree centrality of a pair contains social resources from the existing network that drive the network formation and are not accounted for by offline interactions at social events.
However, the common neighbor effect, which was believed to promote social connections by Newman [8], actually showed opposite tendencies in the formation of new social connections (Table 3). Here, the presence of mutual friends may be a factor that brings two participants together, but under the circumstances of offline social events, this effect is nullified because it is the social event that brings participants together, not mutual friends. The common neighbor effect may only be visible in natural friendship over a longitudinal observation period in which mutual friends introduce friends to each other rather than through predesigned social events that eliminate the necessity of mutual friends to bring two unconnected people together.
Furthermore, the presence of mutual friends has negative effects on making new social ties. This result may be due to the unique triangular network structure formed by two interacting participants who are not yet connected, joined by their shared mutual friend. The Already Friends variable in the regression with offline Interaction as the dependent variable (Table 4) promotes interaction between friends. If a mutual friend is present between a pair, then each node of the pair interacts more with the common mutual friend than with one another. As participants are already friends with the common mutual friend and tend to interact more with the common mutual friends than with each other, this leads to relatively less interaction between the observed pair given the limited duration of the social events. The presence of mutual friends may be a distraction to the pair when interacting with each other.
As our study verifies the predicative power of conventional network prediction models when offline interactions are quantified, we provide balanced grounds for evaluating the qualities of online and offline networks by observing both throughout the process of social network formation. The addition of real human behaviors to the network prediction model leads to the dual online and offline social network model that conceptualizes the formation of social ties via existing online networks and offline social events. Therefore, our newly established dual-dimensional model can be applied to components of intra/inter-organizational social networks as well as non-social networks with online and offline layers.
Moreover, our research framework is unique in that we designed social experiments in the form of offline networking events to observe network evolution firsthand. Utilizing our interaction measurement method, we have quantified the actual offline interaction between individuals. While conventional models predict growth of a network according to characteristics of the network itself, our framework allows two-dimensional analysis from both online and offline perspectives, as aforementioned. We expect future social network studies to utilize our distinct framework to understand the causal effect of catalytic factors in network evolutions.
Our study establishes the initial step towards the systematic representation of online and offline social factors on an integrated basis. Today, many social networking platforms are seeking ways to facilitate offline interaction among users. Many platforms already utilize geolocation and demographic data from users' mobile devices to assist in offline meetups with nearby friends and with people with similar interests in real time. These online platform services can design new services to integrate online and offline domains ubiquitously based on our understanding of the social network formation process.
Although we quantified social network formation from existing online networks within the bounds of all event participants to understand offline interactions, our data was limited to event participants only, due to Facebook privacy issues. To obtain the most accurate data from existing social networks, it would be best to observe not only the network of our event participants but also the entire networks to which they belong. Future research could design closed on-offline network settings to monitor the formation of networks from the beginning and control unseen effects of outside networks. As explained above, our research participants were not in a closed network. They were free to interact at their will outside of social event settings. Our research accounted only for interactions measured through the social events and could not control for possible extra interactions outside of the measured boundaries. We expect further studies to combine network analysis techniques with our multi-dimensional model to comprehend a wider range of social contexts in diverse dimensions. These research results will augment the human ability to manage on-offline social networks, enhancing social and business strategies and eventually sociological and political policies.