Social Network Analysis Predicts Health Behaviours and Self-Reported Health in African Villages

The provision of healthcare in rural African communities is a highly complex and largely unsolved problem. Two main difficulties are the identification of individuals that are most likely affected by disease and the prediction of responses to health interventions. Social networks have been shown to capture health outcomes in a variety of contexts. Yet, it is an open question as to what extent social network analysis can identify and distinguish among households that are most likely to report poor health and those most likely to respond to positive behavioural influences. We use data from seven highly remote, post-conflict villages in Liberia and compare two prominent network measures: in-degree and betweenness. We define in-degree as the frequency in which members from one household are named by another household as a friends. Betweenness is defined as the proportion of shortest friendship paths between any two households in a network that traverses a particular household. We find that in-degree explains the number of ill family members, whereas betweenness explains engagement in preventative health. In-degree and betweenness independently explained self-reported health and behaviour, respectively. Further, we find that betweenness predicts susceptibility to, instead of influence over, good health behaviours. The results suggest that targeting households based on network measures rather than health status may be effective for promoting the uptake of health interventions in rural poor villages.


Introduction
Infectious diseases remain a leading cause of morbidity and mortality in developing countries. Over one billion people, mostly in rural areas, are currently afflicted with one or more communicable diseases. [1][2][3][4][5] Many of these maladies are preventable with access to safe water, sanitation, and healthcare. [6] However, identification of people that are ill is a strenuous task in rural poor areas where access to formal medical care is scarce and infectious diseases are chronic. [7] Furthermore, behavioural modification to improve preventative health, such as the persuasion of households to use protected instead of open water sources and to use pit latrines instead of engaging in open defecation, proves challenging in practice. [8] Monitoring such behaviours is difficult, in particular for open defecation, as many of these behaviours are conducted in private. Therefore, indirect indicators, such as social popularity and influence, may offer a useful alternative to identify who is ill and whom to target for behavioural health interventions [9].
Social networks have been widely studied for understanding peer effects and the spread of behaviours in a variety of contexts. [10][11][12][13][14][15] The study of complex networks aims to provide insight into the connectivity of physical, natural, or social systems. [16] Two commonly examined network properties are degree and betweenness. The degree of an individual, or node, is the number of incoming and outgoing connections. Of primary focus for social networks is in-degree, the number of incoming connections. [17] In-degree conveys the popularity of individuals, by counting the number of people that have named that person as a friend. Betweenness is the proportion of shortest paths (here, a set of lines connecting two households) in a network that traverses the node of interest. [17] In contrast to in-degree, betweenness is a global measure of the network. High betweenness commonly is viewed as an indicator of information spreading in social networks. [18] However, it is an open question as to what extent betweenness and in-degree are important factors in determining self-reported physical health and actual health behaviours. These centrality indicators are particularly important in distinguishing how social placement affects health, as households that traverse the most paths connecting other households (high betweenness) need not have many friends (high in-degree) [19].
Existing studies [20][21][22][23] on social networks and health identified the dependencies between dyads of people in single noninteracting networks and showed that individuals were connected with other individuals of similar physical health. For example, Christakis et al. [22] and Liljeros et al. [23] found that high degree nodes were more likely to be sick or infected than low degree nodes. We expand upon this literature and study a set of remote villages in rural Liberia. Liberia is recovering from a civil war, which resulted in high levels of mortality and morbidity and broke down the delivery of basic health services. While important steps have been taken to rebuild the country, Liberia still ranks amongst the lowest in the world with respect to health indicators. Underfive mortality is 110 out of every 1000, the childhood malaria rate is 32%, reduced growth in children (stunting) is 42%, and access to formal health services is below 40% in remote areas. [24] We demonstrate the ability of social network indicators to predict health outcomes and to explain the susceptibility of a household to partake in health interventions in one of the most impoverished and under-researched developing country contexts. Especially, we find that in-degree predicts household physical health status and betweenness identifies individuals susceptible to positive behavioural influence and likely to respond to health interventions.

Ethical approval
We obtained verbal informed consent from the respondents. Research assistants went through an explanation of the research and asked respondents if they understood and agreed to go ahead. Respondents were informed that they were not obliged to answer questions if they did not want to and were free to stop the interview at all times. In rural Liberia very few people can read or write. The research assistant recorded the answer of the respondent and any remarks made on the survey form. We obtained Institutional Review Board (IRB) approval for this consent procedure and a Gola Forest study that encompasses the trans-boundary area of Sierra Leone and Liberia under the IRB of the University of Chicago, number H10076.

Data sample and network construction
During February -March 2012, we surveyed friendship networks in seven highly remote villages in post-conflict Liberia ( Figure 1). One week in advance of survey activities, runners were sent to each village with a letter of invitation for the village chief. The runner reviewed the letter in detail with the chief, explained the proposed date of arrival, number of enumerators to be hosted, and a summary of proposed activities. Permission was requested from the chief for activities to be undertaken as planned. If permission was denied, the runner sought to determine whether an alternative date was available. If the chief refused the research team to visit at all, the runner informed the project leader immediately and a new village was selected from within the same geographical quadrant; however, refusal to conduct research in the village never occurred. Scheduled activities only were undertaken with permission from the chief. After securing approval to carry out the program, a project leader walked around the village and created a numbered household list, which contained names of all household heads.
The full household list was used in a public lottery to randomly select households. Paper slips with household numbers were placed in an opaque bag. During a public meeting of all households in the village, a child was invited to draw household numbers from the bag one-by-one. Participation rates where high; yet, if a selected household refused to participate, another number was drawn until a willing household agreed to participate. The procedure continued until at least 15 households were selected (30 households in one village, 26016). For villages smaller than 15 households, all households were included. In all study villages, the final sample ranged from 9-30 households.
The study catchment consisted of small villages surrounding the Gola Rainforest, which is located in the post-conflict region bordering Sierra Leone and Liberia (Figures S1-S2 in File S1). A summary of the demographic characteristics and health variables of the study area is provided in the Tables S1-S2 in File S1. Friendship ties were elicited by asking household heads to ''Please give the full names of eight close friends that do not live in your household and that you would feel comfortable to either turn to for advice, ask for an interest-free loan, or ask for help with harvest without paying (only feeding). Please indicate if the person named lives in the village.'' Directed edges between pairs of households were generated if any member of the receiving household was named. When a household head named more than one friend from a particular household, these multiple edges were treated as one edge. Friends named within the same household as the respondent (self-loops) were ignored in the analysis.
To allow for reciprocation of a connection, only the households interviewed were used to construct the village networks. Furthermore, only households that either named a connection within our sample, or were named by a household within our sample were included in the network. Network summary statistics are presented in Table S3 in File S1. The nodes (N = 83) represent households and the edges (N = 124) represent friendship connections between them ( Figure 1).
Interviewed households included in the derived networks were compared to interviewed households that were not used to construct the networks and were found to be balanced across a wide range of socio-economic variables except for the number of years the household has been in the village, agricultural occupation, social status, and belonging to the village 26007 (Tables S4-S5 in File S1). We account for this variation in sample selection by including these differences in extended models presented below to show that our findings uphold. To access the study data and code, please refer to the following supplementary information: Study Data S1, and Regressions S1.

In-degree and betweenness variables
The directed network parameters of betweenness and in-degree were calculated and visualized using the Network Analyzer Plugin from Cytoscape version 2.8.3. We analyze subnetworks where the namers and named fall within the same sample. Both in-degree and betweenness depend on the number of nodes included in the derived network. The subnetworks result in a small number of nodes with many connections (high in-degree) and a small number of nodes with high connectivity (high-betweenness). We examined these properties, which are found in complete named networks [16], to identify trends across seven villages. Further, there is not a clearly defined one-unit increase of betweenness that is comparable to a one-unit increase of in-degree, which is the addition of one friend. Accordingly, we focused on the significance and direction of the coefficients of betweenness and in-degree to explain the associations found in this paper. We find that in-degree and betweenness are correlated, but are not perfectly collinear (Table  S6 in File S1), which further supports that each centrality indicator independently explains different health outcomes.

Statistical Analysis
Statistical analyses are conducted using Stata version 12.1. The regressions presented in Tables 1-2 and Figure 2, use general linear models with iterated least squares maximization and villagelevel fixed effects. Below, we discuss the relative importance of indegree and betweenness for various health outcomes in a comparative analysis of regression models containing both variables.

In-degree as a self-reported health indicator
In-degree positively predicts the number of people sick in a household (Table 1 Panel 1A and Figure 2A). Households with high in-degree (many close friends) report a higher number of total people sick in their home during the previous month than households with few or no incoming friendship links. The difference in the logs of the expected number of sick people changes by 0.308 (p,0.05) with each additional incoming friendship connection (Table 1 Panel 1A). In terms of the incidence rate ratio, with each additional connection (in-degree), the number of people sick increases by 1.360 (p,0.05, Table 1 Panel 1B). This finding is expected as in-degree is an indicator of popularity and the more popular an individual is the more likely they are to be ill. [22] By contrast, we do not observe a significant relation with betweenness and the total people sick in a household (coefficient (coeff.) 20.581, p.0.05, Table 1

Betweenness as a health behaviour measure
Households with high betweenness avoid poor health behaviours and actively spend income on medical care. Poor health behaviours were measured by asking household heads if they conduct one of the following six behaviours at least once per week: drinking alcohol, smoking, burning cooking fuel indoors, entering freshwater bodies, open defecation, and sleeping outside. Betweenness results in a decrease (coeff. 25.107, p,0.01; coeff. is per unit betweenness-See Methods) of the log of expected count of poor preventative health behaviours that a household head engages in (Table 1 Panel 2 and Figure 2B). By contrast, indegree is only weakly and insignificantly related to poor preventative health (coeff. 0.034, p.0.05). As the household head is the decision-maker of the family, their behaviours set an example for other household members. In rural Liberia, behavioural influence [25] is important for disease control as only 15.85% (13/82) of the study households had access to private latrines.
Households of high betweenness are not only cautious with respect to preventative health behaviours, but also actively seek formal medical care (Table 1 Panel 3 and Figure 2C). Medical care expenditure is defined as the total amount spent on drugs and hospital stays over the year preceding the study. We examine medical care separately from poor health behaviours, as expenditure on formal care is a distinct type of behaviour when compared to preventative health (Tables S7-S8 in File S1). These two behavioural indicators are neither collinear (VIF 1.00 for poor health behaviour and VIF 1.23 for medical care) nor correlated (Spearman coeff. 0.044, p.0.05). Betweenness is associated with a health expenditure increase (coeff. 23,085 Liberian Dollars (LD) or 296.34 USD per unit betweenness, p,0.05) in the yearly purchases of drugs and hospital payments. Again, in-degree is weakly and insignificantly related to not only the behavioural variable of poor preventative health, but also medical care expenditures (coeff. 399 LD, p.0.05).
Further, in Table 2, we control for the total number of people sick in a household. An additional person ill in the household increases yearly medical expenditures (coeff. 1,192 LD, p,0.01). Including the total people ill in a household does not affect our main result. Betweenness remains significant and positive (coeff. 22,890 LD, p,0.01). As expenditures on medical care have zero or positive values, we additionally construct Tobit models to corroborate the robustness of the betweenness and medical care relationship (Tables S9-S10 in File S1).

Betweenness as an indicator for susceptibility to good health behaviours
In our study, betweenness explains susceptibility to positive health behaviours rather than influence, as found in other social network contexts. [19] In our survey, we ask, ''Is your decision to seek treatment or to engage in preventative health care behaviours affected by what other villagers are doing?'' As can be seen in Panel 4 of Table 1 and Figure 2D, betweenness increases the probability of susceptibility to the health behaviours of other people in the village (coeff. 7.117, p,0.05) whereas in-degree does not (coeff. Importantly, receptiveness to the actions of other villages is an indicator of engaging in good health behaviours. Susceptibility is positively correlated to good health behaviours with a Spearman coefficient of 0.283 (p,0.01, N = 83) and negatively, but not significantly, related to poor preventative health, with a weak Spearman coefficient of 20.018 (p.0.05, N = 83). Good health behaviour is defined as a count variable of the number of good behaviours that a household head engages in on a weekly basis including: hand-washing, use of public latrine, use of private latrine, boiling water, sleeping under a bed net, and taking packaged medicine.
Extended models of network centrality and health with socio-economic covariates Thus far, we presented models of in-degree and betweenness for self-reported physical health and actual health behaviours using only a minimal set of regressors. Understanding that a large number of variables for a small number of observations limit the degrees of freedom and model leverage, we also construct extended models (Table 3). The additional covariates are not perfectly collinear (Table S11 in File S1). In addition to the villagelevel fixed effects, the extended models incorporate agricultural occupation, years of residence in a village, and social status.  Table S3 in File S1 for network construction statistics. doi:10.1371/journal.pone.0103500.g001 Further, we include a wealth indicator of home quality to control for the impact of household wealth on the health of household members.
The additional covariates are defined as follows. Agricultural occupation is a binary variable indicating whether the main occupation of the household head was in agriculture. Social status is a binary variable and is positive if at least one household member has at least one important leadership position in the community including: elder council member, youth leader, women's leader, societal head, village chief, tribal authority, or a mining chairman. Years in village is a count variable of the total years since the household has settled in the current village. The home quality score is a count indicator with range 0 to 18. Households were asked to indicate all materials used for any part of the home, such as, ''Does your home have a tarpaulin roof?'' Binary indicators were recorded and converted to scores. The floor, walls, and roof of the home are rated from 1-3 and added together to form the home quality score. The materials are provided in order of low to high quality. The floor materials are earth, wood, and concrete. The wall materials are mud and sticks, zinc, and cement. The roof materials are straw/thatch, tarpaulin, and zinc. For descriptive statistics of the covariates, see Tables S3-S4 in File S1.
With the additional covariates, we show that the network parameters of in-degree and betweenness still significantly explain self-reported health and health behaviour. In Panel 1A of Table 3, the log of the expected number of people ill changes by 0.297 (p, 0.05) with each connection gained by a household when named by another household as a close friend. In other words, the incidence rate or number of new people ill in a household increases by 1.346 (p,0.05) for each new household connection (Table 3 Panel 1B). We do not observe a significant relation with betweenness and total people sick in the household (coeff. 0.840, p.0.05). In Panel 2 (Table 3), a one-unit increase in betweenness results in a 4.897 (p,0.05) decrease in the log of the number of poor preventative health behaviours. By contrast, in-degree is only weakly and insignificantly related to poor preventative health (coeff. 0.032, p. 0.05). Additionally, belonging to village 13247 increases the log of the expected number of poor preventative health behaviours by 0.702 (p,0.05). Furthermore, in Panel 3 (Table 3), a one-unit increase in the betweenness of a household is associated with an expenditure increase of 23,317.54 LD (p,0.01) for medical care. Also, having a household member with high social status increases medical expenditures by 2,975.19 LD (p,0.01). Lastly, in Panel 4 (Table 3), a one-unit increase in betweenness increases the probability that an individual is influenced by the health actions of other villagers by a factor of 6.813 (p,0.05). In-degree is insignificantly (coeff. 20.062, p.0.05) related to health behaviour of other individuals. Individuals belonging to village 13111 (coeff. 21.539, p,0.05) and village 13245 (coeff. 21.621, p,0.05) are less likely to be influenced by other village members.

Discussion
Social networks exhibit similar structures across developed and developing countries. [26] In this study, we ask whether social network analysis can inform healthcare provision in the developing world, and more specifically whether network centrality indicators can be used to identify self-reported health, the presence of good or bad health behaviours, and the response to positive social influence. Using data from highly remote and isolated villages in Liberia, we show that health status and health behaviour of an individual can be explained by the in-degree and betweenness of households in village friendship networks. We find that in-degree is a significant predictor of self-reported physical health, complementing past studies that show a positive correlation between indegree and infection rates in a developed country context. [22] We also find that the heads of households with high betweenness avoid poor health behaviours and are more susceptible to social influence for good preventative health. In-degree explains exposure to environmental factors whereas betweenness explains the susceptibility to influence. In rural areas of the developing world, access to healthcare is scarce, therefore, it is important to be able to identify those who need medical help the most, to encourage good health behaviours, and to utilize positive social influence in order to reduce morbidity in these areas. Our findings show that social network analysis can offer an alternative set of observable indicators and provide a better understanding as to why these challenges persist. In the context of our study villages in Liberia, social networks reveal that the sickest households are neither likely to engage in preventative health nor likely to be susceptible to positive health influences from the community.

Data Availability Statement
All authors provide full access to the data used to perform the analysis in ''Social network analysis predicts health behaviors and self-reported health in African villages.'' Due to the sensitive nature of the data, i.e. the involvement of human participants, identification numbers in place of household names are provided to comply with ethical requirements. Further, as social status and household variables are included and can be backtracked to participants; village names were replaced with identifiers. The data are available within the Supporting Information files. The first sheet of the file contains all the regression data. The second sheet of the excel file contains the household linkages in long form. All predictor variable names are the same to those used in the main paper; these definitions can be found in the manuscript text. The dependent variable names are shortened versions of the descriptions used in the main-text. Accordingly, explanations of dependent variables also are provided in the script used for regressions. The annotated code used to execute the regressions in the statistical analysis software, Stata, is provided as a plain text file.

Supporting Information
File S1 Appendix containing two figures and eleven tables. Figure S1, Map of study villages. Figure S2, Zoomed in map of study villages. Table S1, Summary statistics of study population for count and continuous variables. Table S2, Summary statistics of study population for binary variables. Table  S3, Network summary statistics. Table S4, Two-tailed, two-sample t-tests for selection biases of households included in networks. Table S5, Chi-squared tests for sample biases of households included in networks. Table S6, Collinearity diagnostics of indegree and betweenness. Table S7, Collinearity diagnostics of dependent health variables. Table S8, Spearman correlation of dependent health variables. Table S9, Medical care and network centrality. Table S10, Medical care and network centrality with self-reported health covariate. Table S11, Collinearity tests for covariates in extended models. (DOCX) Regressions S1 Stata regression code.

(TXT)
Study Data S1 Excel file of data used in this paper. (XLS)