Social Network Analysis and Mining to Monitor and Identify Problems with Large-Scale Information and Communication Technology Interventions

The published literature reveals several arguments concerning the strategic importance of information and communication technology (ICT) interventions for developing countries where the digital divide is a challenge. Large-scale ICT interventions can be an option for countries whose regions, both urban and rural, present a high number of digitally excluded people. Our goal was to monitor and identify problems in interventions aimed at certification for a large number of participants in different geographical regions. Our case study is the training at the Telecentros.BR, a program created in Brazil to install telecenters and certify individuals to use ICT resources. We propose an approach that applies social network analysis and mining techniques to data collected from Telecentros.BR dataset and from the socioeconomics and telecommunications infrastructure indicators of the participants’ municipalities. We found that (i) the analysis of interactions in different time periods reflects the objectives of each phase of training, highlighting the increased density in the phase in which participants develop and disseminate their projects; (ii) analysis according to the roles of participants (i.e., tutors or community members) reveals that the interactions were influenced by the center (or region) to which the participant belongs (that is, a community contained mainly members of the same region and always with the presence of tutors, contradicting expectations of the training project, which aimed for intense collaboration of the participants, regardless of the geographic region); (iii) the social network of participants influences the success of the training: that is, given evidence that the degree of the community member is in the highest range, the probability of this individual concluding the training is 0.689; (iv) the North region presented the lowest probability of participant certification, whereas the Northeast, which served municipalities with similar characteristics, presented high probability of certification, associated with the highest degree in social networking platform.


Introduction
Information and communication technologies (ICTs) influence how individuals, companies, and society perform their functions. There are several arguments in favor of the use of ICTs, including its strategic importance for increasing the competitiveness of business organizations [1,2]; e-commerce, e-business, and new business models [3,4]; improvement of social services and governance [5,6]; changes in public policies [7]; modernization of public management [8,9]; improvement in health care [10,11] and education systems [12]; and expansion of democratic participation [13,14].
Developing countries have a growing interest in implementing ICT intervention in urban and rural areas, either to increase the development of the country or to decrease internal inequalities. ICT intervention helps to reduce the digital divide, which is a challenge to be overcome both domestically (in a given country or region) and internationally (to address gaps between regions, countries, or continents) [15].
In spite of many reports on ICT interventions [12,[16][17][18] in several segments of society (e.g., among elders, students, teachers, young entrepreneurs, civil servants, and residents of rural and isolated areas), without proper investigation of the impact of these initiatives, it is difficult to determine whether the intervention has been successful. Thus, monitoring and evaluating ICT interventions, and even measuring the digital divide, are fundamental for helping managers, researchers, and professionals to make decisions for overcoming the challenges often posed by endemic problems in a region or country. There are many indications that, in developing countries, these endemic problems hinder both the completion of technological innovations and the realization of expected benefits [1,19].
In the literature, we can find successful investigations that evaluate the impact of ICT interventions at the micro-level-that is, they were performed with references to the beneficiaries of the actions in their local context [1,12,16,18,[20][21][22]. These investigations [12,16,20,22] were all carried out by means of questionnaires and/or interviews and were applied to a small number of respondents, thus favoring a detailed view of the impact of ICT interventions within a particular social context. Such an approach, which is costly in terms of physical and human resources, and which would require a long time to complete, is inappropriate for interventions in an extensive geographical territory. However, large-scale assessments are often standardized and do not consider the different conditions of access or the socioeconomic and cultural levels of the participants. In short, a gap exists in the strategies, models, or frameworks necessary for evaluating the impact and monitoring of large-scale ICT interventions. Our research aims to fill the gap in the monitoring of large-scale training programs for digital inclusion.
The objective of the present study is to monitor and identify problems in a large-scale training project throughout its execution. An approach based on Social Network Analysis (SNA) is applied to the data from the training at Telecentros.BR [23], which certifies individuals in using ICT resources. Our analysis relates the influence of the social network of participants to the success of the training, considering local indicators of the telecommunications infrastructure and the socio-economic conditions.

Setting and Methods
The Brazilian government funded the Telecentros.BR Program [23] at the national level as part of a public policy for digital inclusion in Brazil in different regions of the country (North, Northeast, South, Southeast, and Central-West). This program is focused on installing telecenters and training individuals to disseminate and use ICT in these areas. Community members who can assume the role of digital inclusion agents use the telecenters, assisting others in learning to use ICT resources as a means to improve the social conditions of their communities. For  this, the Telecentros.BR provides training that uses a learning platform along with online social networks. Tutors train community members to act as digital inclusion agents in a model originally designed to qualify approximately 16,000 individuals. This program is ideal for our case study because of its large scale in relation to the geographic scope and number of participants involved, with each region having a training facility, known as a "center," which is responsible for training in its region.
The Telecentros.BR training is structured in phases, with the goals for each phase as follows: in phase 1, to become familiar with the learning platform; in phase 2, to understand and discuss relevant issues (e.g. basic computing, digital inclusion in communities, communication and sharing in networks, digital culture, e-waste, open and free software); and in phase 3, to formulate, carry out, and achieve visibility for projects involving the communities around the telecenters.
To monitor this intervention, the following stakeholders were identified: community members, who are the beneficiaries of the intervention; and, tutors, the agents who diffuse the use of ICT resources among community members (e.g., use of e-gov services, digital content, online social networks). ICT interventions occur in locales where community members live (e.g. neighborhood, city, state, country).

Data sources
We used the Telecentros.BR dataset to collect information on the participants' training. This dataset was obtained directly from the implementing organization as an anonymized set of records over an eleven-month period between 2011 and 2012, and full permission was granted by the Federal University of Pará (Universidade Federal do Pará) for its use in our analysis. To ensure privacy, participants' individual data were anonymized by the implementing organization. In the anonymized dataset, a hashed ID identifies each participant. The dataset contains information with respect to the following: (i) exchange of instant messages in the learning platform; (ii) attributes of participants; (iii) participant evaluations; (iv) telecenter where the community member acts.
The dataset contains information about the exchange of instant messages in the learning platform but does not contain the content of the text messages. Thus, each registry represents an online interaction in the following format: <id_sender, id_receiver>. Looking at both the id_sender and id_receiver allows us to study the relationships in the social networks constructed by placing edges between nodes that represent participants whenever two participants exchange messages. The social network has 4,382 nodes and 104,831 edges.
The Telecentros.BR dataset also contains certain attributes of the participants as described in Table 1.
Additionally, we used anonymized information from the evaluation system of the Telecentros.BR program. This system contains descriptions from the tutor with respect the performance of a community member at different phases of training.
Because of the lack of data sources for indicators by district, we considered the municipalities as the "locale." Thus, for each municipality in the Telecentros.BR dataset, we collected education and income indicators from the Atlas of Human Development in Brazil (Atlas do Desenvolvimento Humano no Brasil; http://www.atlasbrasil.org.br). These indicators are components of the Municipal Human Development Index (MHDI) and are classified by the Institute for Applied Economic Research (Instituto de Pesquisa Econômica Aplicada) [24]. In addition, we obtained information about low-cost Internet access with a speed of at least 1 megabit per second in Brazilian municipalities from the Brazilian Communication Ministry (Ministério das Comunicações do Brasil; http://www.mc.gov.br/DSCOM/view/Principal.php). The information on households that have computers with Internet access in Brazilian municipalities was collected from the Demographic Census of the Brazilian Institute of Geography and Statistics (Instituto Brasileiro de Geografia e Estatística-IBGE; http://loja.ibge.gov.br/ populacao/universo.html). Table 2 presents the categories and ranges of values for these indicators.
We also used the georeferenced cartographic database of the Brazilian municipalities that is freely available online in SHP format (shapefile) in the Brazilian Institute of Geography and Statistics (http://downloads.ibge.gov.br/downloads_geociencias.htm.).

Designing the analysis
Our approach applies the techniques of social network analysis to monitoring and analyzing the outcomes of ICT intervention. The links between individuals are the foundation of our approach because these individuals are in the initial stage of technological appropriation. Thus, we applied our approach to the records of online interaction between participants of the Telecentros.BR training (i) to the time facet, to obtain indicators of the evolution of participation during the training in different moments; (ii) to the space and role facets, to analyze the interactions according to geographic region and the roles of the participants; and (iii) to the individual facet, to favor the identification of actors engaged in training or the identification of variables associated with low participation. Fig 1 shows an overview of our SNA-based approach.
At initialization, the data were prepared and integrated, including the social network of the participants. At this point, we attempted to identify those actors whose number of relationships was not sufficient for a meaningful analysis. In fact, a significant number of community members did not complete the first phase of the training, which was to become familiar with the learning platform. For this, we used the degree centrality metric (or simply degree) as a measure network activity for a node. Degree is defined as the number of links incident upon a nodethat is, the number of ties that a node has [25]. We proposed setting a minimum degree by which the network under analysis could be reduced so that the actors discarded in this step might be directed to the microanalysis in the step "select actors for context analysis." For the time facet, we defined the moments of observation (times) according to the three different phases of the training. Thus, one subnetwork was extracted for each time period, and the analysis was performed on the evolution of the structural measurements of degree centrality [26] and density, expressed as a proportion of the maximum possible number of lines [27][28]. The purpose was to verify whether the time training lends to reducing or to intensifying the interactions between participants.
Because the role of the participant in training can influence the social network, we used the role attribute to define subgroups for identifying and measuring the interactions between tutors

Categories of attributes Description
Participant's function Represented by the attribute role, which identifies the participant's role in the learning platform and can assume one of the following values: tutor, community member, or not identified.

Regional training center
Represented by the attribute center; it identifies the center (and region) where the participant performs the training. It can assume one of these values: North, South, Central-West, Northeast, Ceará, Southeast, and São Paulo. Training managers belong to the Coordination center-not represent any specific region of the country.

Locale
Represented by the attribute municipality, which identifies the municipality where the participant lives.

Certification
Represented by the attribute certified. It identifies whether the participant has successfully concluded the training and can assume the values "yes" or "no." and community members. This classification or clustering of the vertices in the network is done so that each vertex is assigned to exactly one cluster [28], and subnetworks for each time period can be extracted from them. This approach allows extraction of structural measurements only within a particular role for comparison with measurements extracted from another role. In addition to the density and degree centrality, we used the closeness centrality [28], which focuses on the closeness of the participant to other participants. The closeness centrality of a vertex is based on the total distance between one vertex and all other vertices, whereby larger distances yield lower closeness centrality scores [28]. The closer a vertex is to all other vertices, the more easily information may reach it and the higher its centrality. In addition to structural measurements, we used the Watts-Strogatz (WS) clustering coefficient [29] to quantify the connectivity of actors, considering the tutor and community member subgroups. Represented by the attribute national_broadband_plan with respect to satisfying the goal of the government for popular broadband, providing Internet access for a low cost and speed of at least 1 megabits per second.
Yes "Yes", if the municipality is attended No "No", if the municipality is not attended.

Households with a computer with Internet access
Represented by the attribute households_internet, this value is the percentage of households that have a computer with Internet access in the municipality. This value is determined by X / Y * 100, where X is the number of households in the municipality that answered "Yes" to the 2010 census question on "owning a personal computer and Internet access," and Y is the total number of households in the municipality. Subgroups based on the attribute center were also defined for supporting visual representations of the networks of tutors and community members. Visualization of the networks according to centers contributes to identifying actors with more or fewer links to other regions.
One of the goals of the Telecentros.BR training is that as community members appropriate ICT resources, they will be able to build social networks by sharing day-to-day problems and solutions in the telecenters. In this sense, analyses of the centers are used to understand whether interactions between community members are constructed independent of geographical regions. For this purpose, we used a community detection algorithm based on a modularity quality function known as the Louvain method [30][31], which is a heuristic algorithm based on modularity optimization [32].
Community detection is well studied in the literature; many different community detection algorithms have been presented in social network analysis literature, and a good survey of these algorithms can be found in Fortunato [33]. In our approach, we compared the automatically detected communities with the clusters pre-assigned by center. We used two types of statistical indices to analyze the association between clusters found by the Louvain method and the clusters pre-assigned by center: Cramer's V and Rajski's information index [28]. We also used visual representations and dynamic measurements to quantify the connectivity of actors in clusters. Reduction methods were used to understand the network structure and to identify the roles of actors within the communities found.
In view of the context analysis, ranges of values for the extracted measurements were defined, and actors were classified according to these ranges. Community members and centers are placed within the top range of degree centrality as well as being assigned to the simplest measure of prestige, which is the degree input, also known as a measure of popularity [34]that is, a count of the number of ties directed to the node. This strategy can be used to support the selection of new tutors; however, as the degree may be influenced by regional characteristics or the center's performance in using the social networking platform, we performed the same classification process for the actors within each center.
For the set of actors with a degree equal to zero and for actors who stopped interactions during the course, we tested some qualitative hypotheses, such as the presence of inappropriate telecommunications infrastructure and sustainability in telecenters, lack of personal interest, and weak training from the center in using the learning platform. Given that we had no access to the participants, the anonymized information from the Telecentros.BR evaluation system served to support our tests of some of these hypotheses.
The training was conducted in different municipalities with certain indicators-for example, socio-economic and infrastructural-that could influence variables in participation (degree) and success in training (certified). Thus, to study the association between these variables, we advanced from the SNA-based approach to applying the Bayesian networks technique, also referred to as causal networks or graphic models of probabilistic dependence. Bayesian networks are models that encode probabilistic relationships among variables that represent a certain domain. These models include both a qualitative and a quantitative structure. The qualitative structure represents dependencies between nodes (variables), while the quantitative structure represents the conditional probabilities of these nodes. The idea is evaluate the nodes in probabilistic terms [35,36] and to provide a compact and easy-to-use representation of the probabilistic information from the data. We used the K2 heuristic search algorithm [37] to find the most probable Bayesian network structure within the search space. This network structure is an effective way to communicate dependencies among the domain variables.
Based on the attributes of municipalities where the interventions occur (mhdi_income, mhdi_education, national_broadband_plan, households_internet), degree centrality (degree), region (center), and certification of community members (certified), we created a dataset as an input file for the Bayesian analysis. Once the network was established, the posterior distribution of the parameters was estimated by statistical inference.
To support the discussion, we used Qgis software version 2.6.0-Brighton (http://qgis.org) to show the geographic distribution for the percentage of households that have computers with Internet access in the Brazilian municipalities.

Initialization
The integration of data sources revealed that the social network contained 4,382 participants for the duration of the training program (11 months). In this sequence, to make the analysis more relevant, we considered only the nodes having a minimum degree centrality equal to one (degree (node min = 1)-pendants were deleted. Thus, the analysis of the reduced network began with 2,303 nodes.

Time facet
We determined three different time periods: months 1-2 (#1), months 3-6 (#2), and months 7-11 (#3) after the beginning of the training. These time periods are determined by training phases, such that #1 corresponds to phase 1, #2 corresponds to phase 2 and #3 corresponds to phase 3. For these moments, we computed the density and degree centrality indices (Table 3).
In structural terms, the network in #2 is 98% denser than the network in #1, and the network in #3 is 153% denser than the network at the beginning of the training (#1). The increases in density and centrality can be explained by the promotion of physical meetings that stimulated spontaneous interactions through instant messages. This was a desirable development in terms of training progress, given the structure and objectives of each phase of the course.

Space and role facets
Subgroups were defined to analyze the interactions according to the roles played by the participants (i.e., tutor or community member) or their center, which also indicated the geographical regions of the country. For role subgroups, we generated structural and dynamic indices for the three different moments ( Table 4).
The creation of clusters for individuals according to their role allowed us to analyze the interactions between peers, excluding interactions in the hierarchical relationship between tutors and community members. During the training process, community members developed projects involving communities around the telecenters. Interactions among participants were encouraged to promote development, with the aim being to map existing problems and to articulate solutions that would enhance the use of resources and increase community participation.
In structural terms, in the three analyzed time periods, the network of community members proved to be less dense than the networks of both the tutors and all the participants. The high density of the network of tutors is likely because they met physically more often than did community members, who were involved in the online training most of the time. Nevertheless, the network of community members increased more in #3 in relation to #1 when compared to the network of tutors during the same time period. This result was expected because phase 3 was intended to encourage interactions in order to give visibility to the project developed by the participants.
The closeness centrality (average), which focuses on the closeness of a participant in relation to other participants, is lower in the network of community members than in the network of tutors; this result was expected because many of the tutors were acquainted with each other, unlike the community members. However, when we considered the evolution of this index between #1 and #3, we verified that it increased in the network of community members while decreasing in the network of tutors. This was a desirable result because the training was intended to bring the community members closer each other to help solve the day-to-day problems of telecenters.
Among the dynamic measures, the WS clustering coefficient for the network of tutors is higher than that for community members. These highest values in the network of tutors reflect a high degree of transitivity that can be explained by the existing relationships among tutors of same center. The groupings of tutors and community members in the same region are visualized in Figs 2 and 3, respectively. The individuals are grouped by centers (clusters), which are identified by different colors.
Analyses employing the center are used to understand whether the interactions between community members are constructed independently of geographical regions. For this, we used a community detection algorithm based on a modularity quality function known as the Louvain method.
From the visualization, we used an automatic community detection method to identify whether the interactions between participants were constructed independently of geographical regions. For the network with all participants during the 11 months of training, we applied the Louvain method with a resolution parameter equal to 1, finding 30 clusters (ID-1 to ID-30) with modularity equal to 0.8094. Using the Cramer's V statistical index to analyze the association between the output of the Louvain method (30 clusters) and the clusters pre-assigned by center (9 clusters), we obtained the value 0.7375, which indicates strong association. However, because the cross-tabulation contains many cells that are (nearly) empty, this index is not very reliable. Thus, we used the Rajski's information index to measure the degree of association between the two classifications. The strongest correlation (0.7413) of the Rajski's index indicated that the classification of the cluster by center could be predicted by classifying with the Louvain method. For example: in cluster ID-1, found by the Louvain method, 96.8% of the nodes are from the North and in cluster ID-3, 72% are from the Northeast. Table 5 presents the percentage of nodes from the clusters pre-assigned by center (9 clusters) that coincides with the output of the Louvain method (30 clusters).
The largest cluster (ID-3), with 414 nodes, presented the lowest WS clustering coefficient (0.2335). Among the 14 clusters with a WS clustering coefficient above the average (0.6178), we found 12 clusters with a predominance of members from the Northeast, one with a predominance of members from Ceará, and one with a predominance of members from São Paulo.
We next used some reduction methods to understand the network structure and to identify the roles of actors within the clusters found by the Louvain method. The subnetwork extraction for each cluster offers a local view that includes the roles and centers of actors. Using this local view, we shrank the community members into one new vertex favoring the analysis of the relation between tutors and community members. We also observe that tutors are present in the networks extracted from cluster ID-1 to ID-28.

Individual facet
Considering the beneficiaries of the action (selected role = "community members") with the highest level of participation, we focused on actors with the highest value of degree centrality and degree prestige. Table 6 presents the ranges of degree centrality and degree prestige, respectively. These ranges were defined according to the frequency of states of the values of these indices. The states of the variables represent the possible values that a variable can assume.
Thus, to analyze the participants with the highest degree centrality, we considered only the top range (25%) with 524 actors. The same process was applied to degree prestige-that is, considering the 526 actors in the top range (25%). When both lists were combined and the repeated entries removed, the final list included 572 actors. This experiment was very useful for identifying actors who would be able to act as tutors in new training courses. It also served as a strategy to clarify the importance of interactions in large-scale digital inclusion training programs to parties who invest in ICT interventions based on telecenters.
Among these 572 participants, we found 416 community members from the Northeast but a low number of members from the other regions. This result may be influenced by regional characteristics or by different training strategies in the social networking platform. Thus, we considered the ranges of values for each center. This strategy promotes analysis by training center, aiming to minimize the influence of inequality between regions, because it delimits the scope of the geographic area in question. Indeed, for the North we found other ranges of values (Table 7) and 71 actors in the top range of degree centrality or degree prestige. The strategy of using clusters to identify actors in the regions was shown to be relevant because it allows us to reduce the scope of the analysis. We think that this strategy may lend itself to even greater refinement of the analysis if we add clusters by municipalities or districts.
As described in the initialization section, the number of actors without any interaction was significant. Additionally, we identified 152 nodes that did not create new interactions from time #1 to time #2 and 256 nodes from time #2 to time #3. For these cases, we offer some hypothetical reasons for lack of interaction: inappropriate telecommunications infrastructure in telecenter; problems of sustainability, such as telecenters that cannot maintain their infrastructure (energy, water, and maintenance); weak training provided by the center in the learning platform; low sociocultural identification with the program; lack of interest in the community or from the managers, resulting in partial or total closure of the telecenter; lack of personal interest; difficulty in delivery and replacement of equipment, especially in isolated communities in rural areas or banks of rivers, such as Amazonian peoples. To test some of these hypotheses, we resorted to the analyses of participant performance as conducted by tutors. We found reports about community members who, although registered in the learning platform, did not initiate training because the telecenter did not receive the equipment or because, though it received equipment, the latter was not operational. Such was the case for the telecenter in the town of Pacajá (population 41,000 and 600 km away from Belém, the capital of the State of Pará, North of Brazil). It took months to deliver the machines because of weather conditions, problems with transport, and theft of machines by pirates (i.e., Amazon River thieves). In fact, owing to these challenges, the original goal of installing telecenters was not achieved in several municipalities, as revealed in a report elaborated by individuals of the civil society: from a total of 8,083 expected telecenters, only 1,193 came into operation, and 2,800 received the equipment without its being installed [38]. Table 5. Cross-tabulation (%) between the clusters (Louvain method and pre-assigned by center).
Cluster ID (pre-assigned by center) * Cluster ID (Louvain method)  1  2  3  4  5  6  7  8  We also found 206 reports about low participation of community members because of unstable connections or low Internet speed in 96 telecenters in the North; 40, Northeast; 34, Ceará; 13, Central-West; 10, São Paulo; 8, South; and 5, Southeast. As one example, the telecenter in the indigenous Zoró, which is located in the countryside of the state of Rondônia in  northwestern Brazil, does not have Internet connections. Therefore, to access the online learning platform, the community member went to a neighboring town to use an Internet cafe.
The tutors' analyses of participants' performance also showed a significant number of reports (149) that revealed a lack of interest on the part of the participants, with 72 in the North; because we had no access to the participants, we could not investigate the reasons for this. However, as the training was performed in different municipalities, the variables of infrastructure and socioeconomic characteristics may have been influential, which is treated in the analysis of scenarios presented in next section.

Analysis of scenarios
To set up the learning of the Bayesian network structure, we use the dataset composed of the variables mhdi_education, mhdi_income, center, households_internet, degree, national_broad-band_plan, and certified. The associations allowed us to measure, in probabilistic values, the influence of the degree of the participant on finishing the training: given evidence that the degree of a community member is in the highest range (over 0.0166), the probability that this individual will finish the training is 0.689 (Fig 6). However, this probability falls to 0.166 when there is evidence that the degree is in the lower range (degree < 0.0015).
When we consider the socio-economic (mhdi_education and mhdi_income) and infrastructural (households_internet and national_broadband_plan) indicators of municipalities, we observe that in the North and Northeast regions, the most frequent scenario is for municipalities not to have the National Broadband Plan, these regions having the lowest percentage of households with a computer that has Internet access, as well as the lowest MHDI income and MHDI education. For example, given the evidence that the region is the Northeast, the probability that one municipality in this region will have a high percentage of households that have a computer with Internet access is 0.099 (households_Internet > 31.57%); the probability that national_broadband_plan = no is 0.871; the probability that mhdi_education = low or mhdi_ education = very low is 0.805; and the probability that mhdi_income 0.699 (low, very low, medium) is 0.872. However, when the evidence indicates that the municipality belongs to the North region, the probability of the highest percentage of households' having a computer with Internet access is 0.0; the probability that national_broadband_plan = no increases to 0.917; the probability that mhdi_education = low or mhdi_education = very low is 0.574; and the probability that mhdi_income 0.699 (low, very low, medium) is 0.715. In the highest range of degree, an individual belonging to the Northeast (0.739) has a greater probability of belonging than if the participant is in the North region (0.067). In addition, the Northeast has a higher probability that certified = yes (0.419) than does the North (0.273). We think this result is evidence that in the Northeast center, the social networking platform is most widely used, contributing to the probability that a greater number of individuals will be certified in training.
When the evidence is that a community member belongs to the Southeast region, the probability of completion of the training is 0.609, higher than in all other regions. However, this probability falls to 0.441 when there is evidence that the degree is in the lower range (degree < 0.0015). For municipalities in the Southeast, the probabilities are as follows: 0.615 for the highest percentage of households with computer with Internet access; 0.626 for national_ broadband_plan = yes; 0.732 if the MHDI income is high or very high (mhdi_income 0.700); and 0.590 if the MHDI education is high.
The North and Northeast regions are those having the most municipalities without access to the National Broadband Plan and where the percentage of households that have a computer with Internet access is lower than 7% (Fig 7). However, according to our study, the Northeast is also the center with the highest probability that a community member has a higher degree.

Conclusions
In this paper, we applied an SNA-based approach to monitor and identify problems in ICT intervention. The approach we presented, based on social network analysis and mining, is applicable to training that will target and certify a large number of participants in different geographical regions. Our SNA-based approach proved to be complementary to approaches that consider only individual access to technology, as the two approaches together take into account both the perspective of the beneficiaries and that of the experts and funders of the program.
Our study reveals that a significant number of participants had low or no interaction; likewise, the tutors reported weak infrastructure in terms of implementing training for 206 of the community members. For these community members, we observed that their municipalities are small populations, often located in the north, northeast, and center-west of the country. Although these cases represent a very small part of the target audience of the intervention, they reveal scenarios wherein online training is still unthinkable until infrastructure challenges can be overcome. Our results show that increasing the success of ICT intervention in these locales depends fundamentally on reducing the inequalities in the country. These results corroborate the arguments from the literature [1,39]. Despite the high concentration of digitally excluded individuals in the North regions of the country, we observed that this region presented the lowest probability of certified participants in the Telecentros.BR training. By way of contrast, the Northeast, which served municipalities with similar characteristics, presented a high probability of certification, associated with the highest degree in the social networking platform.
For developing countries such as Brazil, where almost half of the population resides in urban and rural areas, and has never accessed the Internet before, there remains interest in large-scale ICT projects to combat the digital divide, but even these initiatives are criticized. Moreover, with the expanding use of social networks, SNA-based approaches to monitoring ICT interventions seem to be promising. However, there are many challenges in adopting an approach for large-scale intervention because many aspects must be considered. It is not always possible, for example, to perform analyses for a better understanding of social phenomena in their context because of lack of data or human resources. More studies are needed for better comprehension of the influence of these interventions in the involved communities. This comprehension can assist in the decision to maintain investments in localities where the impact of innovation and its consequent social transformation in the communities is significant.
Finally, our study has demonstrated that large-scale ICT interventions may provide promising scenarios for the study of failure in developing countries [1,19,40].