Suicide Ideation of Individuals in Online Social Networks

Suicide explains the largest number of death tolls among Japanese adolescents in their twenties and thirties. Suicide is also a major cause of death for adolescents in many other countries. Although social isolation has been implicated to influence the tendency to suicidal behavior, the impact of social isolation on suicide in the context of explicit social networks of individuals is scarcely explored. To address this question, we examined a large data set obtained from a social networking service dominant in Japan. The social network is composed of a set of friendship ties between pairs of users created by mutual endorsement. We carried out the logistic regression to identify users’ characteristics, both related and unrelated to social networks, which contribute to suicide ideation. We defined suicide ideation of a user as the membership to at least one active user-defined community related to suicide. We found that the number of communities to which a user belongs to, the intransitivity (i.e., paucity of triangles including the user), and the fraction of suicidal neighbors in the social network, contributed the most to suicide ideation in this order. Other characteristics including the age and gender contributed little to suicide ideation. We also found qualitatively the same results for depressive symptoms.


Introduction
Suicide is a major cause of death in many countries. Japan possesses the highest suicide rate among the OECD countries in 2009 [1]. In fact, suicide explains the largest number of death cases for Japanese adolescents in their twenties and thirties [1]. Suicide is also a major cause of death for youths in other countries including the United States [2].
Since the seminal sociological study by Durkheim in the late nineteenth century [3], suicides have been studied for both sociology interests and public health reasons. In particular, Durkheim and later scholars pointed out that social isolation, also referred to as the lack of social integration, is a significant contributor to suicidal behavior [3][4][5][6]. Roles of social isolation in inducing other physical and mental illnesses have also been examined [7]. Conceptual models that inherit Durkheim's idea also claim that social networks affect general health conditions including tendency to suicide [8][9][10][11].
Social network analysis provides a pragmatic method to quantify social isolation [12,13]. In their seminal work, Bearman and Moody explicitly studied the relationship between suicidal behavior and egocentric social networks for American adolescents using data obtained from a national survey (National Longitudinal Study of Adolescent Health) [14]. They showed that, among many independent variables including those unrelated to social networks, a small number of friends and a small fraction of triangles to which an individual belongs significantly contribute to suicide ideation and attempts. A small number of friends is an intuitive indicator of social isolation. Another study derived from self reports from Chinese adolescents also supports this idea in a quantitative manner [15]. The paucity of triangles, or intransitivity [12], also characterizes social isolation [14]. Individuals without triangles are considered to lack membership to social groups even if they have many friends [16]; social groups are often approximated by overlapping triangles [17,18].
Nevertheless, the structure of the Bearman-Moody study [14] implies that our understanding of relationships between social networks and suicide is still limited. First, in the survey, a respondent was allowed to list best five friends of each gender. However, many respondents would generally have more friends. The imposed upper limit may distort network-related personal quantities such as the number of friends and triangles. Second, their study was confined inside each school in the sense that only in-school names are matched. If a respondent X named two outschool friends that were actually friends of each other, the triangle composed of these three individuals was dismissed from the analysis. Therefore, the accuracy of the triangle counts in their study may be limited such that the relationship between intransitivity and suicidal behavior remains elusive.
In the present study, we examine the relationship between social networks and suicide ideation using a data set obtained from a dominant social networking service (SNS) in Japan, named mixi. Our approach addresses limitations in the previous study [14]. First, an entire social network of users is available, where a link between two users represents explicit bidirectional friendship endorsed by both users. Some users have quite a large number of friends, as in general social networks [13]. Second, for the same reason, we can accurately calculate the number of triangles for each user. An additional feature of the present data set is that the sample is relatively diverse because anybody can register for free.
In contrast, the respondents were 7 to 12 graders in schools in the Bearman-Moody study.
A function of mixi relevant to this study is user-defined communities. A community is a group of users that get together under a common interest, such as hobby, affiliation, or creed. A user-defined community of mixi is often composed of users that have not known each other beforehand. Although some SNSs have userdefined communities, and their dynamics were studied [19], major SNSs including Facebook do not own this type of user-defined communities. We define suicide ideation by the membership of a user to at least one community related to suicide. Then, we statistically compare users with and without suicide ideation in terms of users' properties including those related to egocentric networks.

Multivariate Logistic Regression
We defined the group of users with suicide ideation and the control group of users, as described in Methods. Table 1 indicates that the difference in the mean of each independent variable (see Methods for the definition of the independent variables) between the suicide and control groups is significant (pv0:001, Student's ttest). We also verified that the distributions of each independent variable are also significantly different between the two groups (pv0:0001, Kolmogorov-Smirnov test).
The results obtained from the multivariate logistic regression are summarized in Table 2. The VIF values (see Methods) are much less than 5 for all the independent variables. The three types of correlation coefficients between pairs of the independent variables are also sufficiently small (Table 3). On these bases, we justify the application of the multivariate logistic regression to our data.
The odds ratio (OR) values shown in Table 2 suggest the following. A one-year older user is 1.00463 times more likely to belong to the suicide group than the control group on average. Likewise, being female, membership to one community, having one friend, an increase in C i by 0.01, an increase in the fraction of friends in the suicide group (i.e., homophily variable) by 0.01, and one day of the registration period make a user 0.821, 1.00733, 0.99790, 0:0093 0:01~0 :95, 2:22|10 12 À Á 0:01~1 :33, and 0.999383 times more likely to belong to the suicide group, respectively. For all the independent variables, the 95% confidence intervals of the ORs do not contain unity, and the p-values are small. Therefore, all the independent variables significantly contribute to the regression. In addition, because the AUC (see Methods) is large (i.e. 0.873), the estimated multivariate logistic model captures much of the variation in the user's behavior, i.e., whether to belong to the suicide group or not.

Univariate Logistic Regression
All the independent variables significantly contribute to the multivariate regression probably because of the large sample size of our data set. Therefore, we carried out the univariate logistic regression between the dependent variable (i.e., membership to the suicide versus control group) and each independent variable to better clarify the contribution of each independent variable.
The results obtained from the univariate logistic regression are shown in Table 4. Although the p-value for each independent variable is small, the AUC value considerably varies between These independent variables also yield large AUC values under the univariate regression. The community number makes by far the largest contribution among the seven independent variables. The AUC value obtained from the univariate regression (0.867) is close to that obtained by the multivariate regression (0.873). Table 3. Correlation coefficients between pairs of independent variables for the suicide, depression, and control groups.  The independent variable with the second largest explanatory power is the local clustering coefficient (AUC~0.690). The results are consistent with the previous ones [14]. We stress that we reach this conclusion using a data set whose full social network is available.
The homophily variable makes the third largest contribution (AUC~0.643). Although we refer to this independent variable as homophily (see Methods), the effect of this variable is in fact interpreted as either homophily or contagion [20,21]. Nevertheless, the result is consistent with previous claims that suicide is contagious (for recent accounts, see [6,[22][23][24][25][26]; but see [27] for a critical review) and that other related states such as depressive symptoms are contagious [28,29] (but see [30,31]).
The effect of the age, gender, and degree (i.e., number of friends), on suicide ideation is small, yielding small AUC values, close to the minimum value 0:5 (Table 4). In addition, the ORs for these variables are inconsistent between the multivariate and univariate regressions. For example, a female user is more likely to belong to the suicide group according to the univariate regression and vice versa according to the multivariate regression. Therefore, we conclude that these three independent variables do not explain suicide ideation.
The registration period also yields a small AUC value (i.e., 0.545). Therefore, suicide ideation depends on the community number, local clustering coefficient, and homophily variable not because they commonly depend on the registration period.

Depressive Symptoms
Our data set allows us to investigate correlates between users' other characteristics and the independent variables if the characteristics have corresponding used-defined communities in the SNS. We repeated the same series of analysis for depressive symptoms, which are suggested to be implicated in suicidal behavior [5,22,32]. A user is defined to own depressive symptoms when the user belongs to at least one of the seven depressionrelated communities (Methods).
The statistics of the independent variables for the depression group are compared with those for the control group in Figures 1,  2, 3, and Table 5. Each independent variable in the depression and control groups is significantly different in terms of the mean (pv0:0001, Student's t-test; see Table 5) and distribution (pv0:0001, Kolmogorov-Smirnov test).
We applied the multivariate and univariate logistic regressions to identify independent variables that contribute to depressive symptoms (i.e., membership to the depression group). The control group is the same as that used for the analysis of suicide ideation. The results are shown in Tables 6 and 7. The VIF values shown in Table 6 and the correlation coefficient values shown in Table 3 qualify the use of the multiple logistic regression. The results are qualitatively the same as those for the suicide case.

Discussion
We investigated relationships between suicide ideation and personal characteristics including social network variables using the data obtained from a major SNS in Japan. We found that an increase in the community number (i.e., the number of userdefined communities to which a user belongs), decrease in the local clustering coefficient (i.e., local density of triangles, or transitivity), and increase in the homophily variable (i.e., fraction of neighboring users with suicide ideation) contribute to suicide ideation by the largest amounts in this order. In addition, the results are qualitatively the same when we replaced suicide ideation by depressive symptoms. Remarkably, the most significant three variables represent online social behavior of users rather than demographic properties such as the age and gender.
Our result that the age and gender little influence suicide ideation is inconsistent with previous findings [6]. The weak age effect in our result may be because the majority of registered users is young; the mean age of the users in the control group is 27.7 years old (Table 1). Nevertheless, we stress that suicide is a problem particularly among young generations to which a majority of the users belong.
We concluded that the node degree little explains suicide ideation. In contrast, previous studies showed that suicidal behavior is less observed for individuals with more friends [14,15]. It has also been a long-standing claim that social isolation elicits suicidal behavior [3][4][5][6]. As compared to typical users, some users may spend a lot of time online to gain many ties with other users and belong to many communities on the SNS. Such a user may be active exclusively online and feel lonely, for example, to be prone to suicide ideation. Although this is a mere conjecture, such a mechanism would also explain the strong contribution of the community number to suicide ideation revealed in our analysis. In contrast, many people nowadays, especially the young, regularly devote much time to online activities including SNSs [33]. Therefore, the data obtained from SNSs may capture a significant part of users' real lives.
Because mixi enjoys a large number of users and implements the user-defined community as a main function, its user-defined Figure 1. Distribution of the community number (i.e., number of communities to which a user belongs) for the suicide, depression, and control groups. We set the bin width for generating the histogram to 50. The abrupt increase in the distribution at 1000 communities for the suicide and depression groups is owing to the restriction that a user can belong to at most 1000 communities. doi:10.1371/journal.pone.0062262.g001 communities cover virtually all major topics. Therefore, applying the present methods to other psychiatric illness and symptoms, such as schizophrenia, bipolar disorder, and alcohol abuse, as well as positive symptoms may be profitable.
Our studies are limited in some aspects. First, we identified suicide ideation with the membership to a relevant community, but not with suicide attempts or committed suicides. Second, membershipship to a relevant community may not even imply suicide ideation. Users may enter the suicide group because they have encountered suicide among their friends or family. Third, our data are a specific sample of individuals from a general population. This criticism applies to any work that relies on SNS data. However, it is particularly pertinent when one focuses on individuals' chracteristics (e.g., personality and attitudes) rather than collective phenomena online (e.g., contagion on SNSs). Although it is beyond the scope of the current study, quantifying the extent to which our sample accurately represents general populations remains a future challenge.

Data
Mixi is a major SNS in Japan. It started to operate on March 2004 and enjoys more than 2:7|10 7 registered users as of March 2012. Similar to other known SNSs, users of mixi can participate in various activities such as making friendship with other users, writing microblogs, sending instant messages to others, uploading photos, and playing online games. Registration is free. See [34] for a previous study of the mixi social network.
In mixi, there were more than 4:5|10 6 user-defined communities on various topics as of April 2012. Users can join a userdefined community if the owner personally permits or the owner allows anybody to join it.
We identified suicide ideation with the membership of a user to at least one suicidal community. To define suicidal community, which is sufficiently active, we first selected communities satisfying the following five criteria: (1) The name included the word  ''suicide'' (''jisatsu'' in Japanese), (2) there were at least 1000 members on November 2, 2011, (3) there were at least 100 comments posted on October, 2011, which were directed to other comments or topics, (4) there were at least three independent topics on which comments were made on October, 2011, and (5) the condition for admission was made open to public. Seven communities met these criteria. Then, we excluded one community whose name indicated that it concentrated on methodologies of committing suicide and two communities whose names indicated that they encouraged members to live with hopes (one contained the word ''want to live'', and the other contained the word ''have a fun'' in their names; translations by the authors). As a result, four communities were qualified as suicidal communities. The user statistics of these communities are shown in Table 8. A user that belongs to at least one suicidal community is defined to possess suicide ideation. To exclude inactive users, we restricted ourselves to the set of active users. The active user was defined as users that existed as of January 23, 2012 and logged on to mixi in more than 20 days per month on average from August through December 2011. A similar definition was used in a previous study of the Facebook social network [35]. We also discarded users with zero or one friend on mixi because the triangle count described below was undefined for such users. Despite this exclusion, the remaining data allowed us to examine the effect of social isolation in terms of the degree, i.e., number of neighbors, because the degree was widely distributed between 2 and 1000. There were 9990 active users with suicide ideation (suicide group).
We statistically compared the users in the suicide group with users without suicide ideation. Because the number of users was huge, we randomly selected 228949 active users that possessed at least two friends and belonged to neither of the seven candidates of the suicidal community defined above nor the ten candidates of the depressionrelated community defined below. We call this set of users the control group.
The employees of mixi deleted private information irrelevant to the present study and encrypted the relevant private information before we analyzed the data. In addition, we conducted all the analysis in the central office of mixi located in Tokyo using a computer that was not connected to Internet.

Statistical Models
The dependent variable that represents the level of suicide ideation is binary, i.e., whether a user belongs to a suicidal community or not. Therefore, we used univariate and multivariate logistic regressions. To check the multicollinearity between independent variables to justify the use of the multivariate logistic regression, we carried out two subsidiary analysis. First, we measured the variance inflation factor (VIF) for each independent variable (see [36,37] and references therein). The VIF is the reciprocal of the fraction of the variance of the independent variable that is not explained by linear combinations of the other independent variables. It is recommended that the VIF value for each independent variable is smaller than 10 (preferably smaller than 5) for the multivariate logistic regression to be valid. Second, we measured the Pearson, Spearman, and Kendall correlation coefficients between the independent variables.
To quantify the explanatory power of the logistic model, we measured the area under the receiver operating characteristic curve (AUC) for each fit (e.g., [37]). The receiver operating characteristic curve is the trajectory of the false positive (i.e., fraction of users in the control group that are mistakenly classified into the suicide group on the basis of the linear combination of the independent variables) and the true positive (i.e., fraction of users in the suicide group correctly classified into the suicide group), when the threshold for classification is varied. The AUC value falls between 0.5 and 1. A large AUC value indicates that the logistic regression fits well to the data in the sense that users are accurately classified into suicide and control groups.

Independent Variables
We considered seven independent variables. Their univariate statistics for the suicide and control groups are shown in Table 1.
Demographics. Demographic independent variables include age and gender. Our analysis does not include ethnic components because most users are Japanese-speaking Japanese; mixi provides services in Japanese. Other demographic, socioeconomic, and personal characteristic variables such as residence area, occupation, company/school, and hobby, were not used because they were unreliable. In fact, many users leave them blank or do not fill them consistently, probably because they do not want to disclose them.
Community number. The number of user-defined communities that a user belongs to was adopted as an independent variable. We refer to this quantity as community number. The community number obeys a long tailed distribution for both  suicide and control groups (Figure 1). The mean is quite different between the two groups (Table 1). Degree. When a user sends a request to another user and the recipient accepts the request, the pair of users form an undirected social tie, called Friends. A web of Friends defines a social network of mixi. We adopted degree as the most basic network-related independent variable. The degree is the number of neighbors (i.e., Friends), and denoted by k i for user i. The system of mixi allows a user to own at most degree 1000. As is consistent with the previous analysis of a much smaller data set of mixi [34], the degree distributions for both groups are long tailed (Figure 2). A small degree is an indicator of social isolation.
Local clustering coefficient. We quantified transitivity, or the density of triangles around a user, by the local clustering coefficient, denoted by C i for user i. A directed-link version of the same quantity was used in the Bearman-Moody study. For user i having degree k i , there can be maximum k i (k i {1)=2 triangles that include user i. We defined C i as the actual number of triangles that included i divided by k i (k i {1)=2. Examples are shown in Figure 4. By definition, 0ƒC i ƒ1. We discarded the users with k i ƒ1 because C i was defined only for users with k i §2. C i quantifies the extent to which neighbors of user i are adjacent to each other [13,38]. If C i is large, the user is probably embedded in close-knit social groups [12,13,38]. A small C i value is an indicator of social isolation. As in many networks [13], C i decreases with k i in both suicide and control groups ( Figure 3). The results are consistent with those in the previous study in which the average C i obtained without categorizing users is roughly proportional to k {0:6 i [34]. Therefore, we carefully distinguished the influence of k i and C i on suicide ideation by combining univariate and multivariate regressions.
Homophily. Suicide may be a contagious phenomenon (e.g., [6,[22][23][24][25][26]). If so, a user is inclined to suicide ideation when a neighbor in the social network is. Therefore, we adopted the fraction of neighbors with suicide ideation as an independent variable. It should be noted that, even if a user with suicide ideation has relatively many friends with suicide ideation, it does not necessarily imply that suicide is contagious. Homophily may be a cause of such assortativity. In this study, we did not attempt to distinguish the effect of imitation and homophily. The differentiation would require analysis of temporal data [20,21]. Nevertheless, for a notational reason, we refer to the fraction of neighbors as the homophily variable.
Registration period. A user that registered to mixi long time ago may be more active and own more resources in mixi than new users. Such an experienced user may tend to simultaneously have, for example, a large community number, large degree, and  perhaps high activities in various communities including suicidal ones. To control for this factor, we measured the registration period defined as the number of days between the registration date and January 23, 2012.

Analysis of Depressive Symptoms
To define depression-related community, we identified the communities satisfying the five criteria as in the case of suicidal community, but with the term suicide in the community name replaced by depression (''utsu'' in Japanese). There were ten such communities. We excluded three of them because their names include positive words (let's overcome, resume one's place in society, cure; translations by the authors). We defined the remaining seven communities, summarized in Table 9, to represent depressive symptoms of users. The depression group is the set of active users that belongs to at least one depression-related community listed in Table 9. The depression group contains 24410 users.

Ethics Statement
Mixi approved the provision of the data.