A complex network approach to political analysis: application to the Brazilian Chamber of Deputies

In this paper, we introduce a network-based methodology to study how clusters represented by political entities evolve over time. We constructed networks of voting data from the Brazilian Chamber of Deputies, where deputies are nodes and edges are represented by voting similarity among deputies. The Brazilian Chamber of deputies is characterized by a multi-party political system. Thus, we would expect a broad spectrum of ideas to be represented. Our results, however, revealed that plurality of ideas is not present at all: the effective number of communities representing ideas based on agreement/disagreement in propositions is about 3 over the entire studied time span. The obtained results also revealed different patterns of coalitions between distinct parties. Finally, we also found signs of early party isolation before presidential impeachment proceedings effectively started. We believe that the proposed framework could be used to complement the study of political dynamics and even applied in similar social networks where individuals are organized in a complex manner.


I. INTRODUCTION
In recent years, the availability of information in the form of open datasets together with the capabilities to store and process data have been promoting the rise of many new studies in a variety of disciplines, including Biology [1], Social Sciences [2], Linguistics [3][4][5][6] and Physics [7,8]. Many of these systems can be regarded as being too complex for traditional methodologies in which each of its components is isolated and studied individually. This includes the political dynamics of a country, which may depends on many aspects such as economic factors, culture, mass media, social media, etc. In a democratic system, these aspects are reflected (or should be) on the decisions made by people's representatives and on how they are organized (such as partisanship or alliances between parties). In this context, the relationships between politicians could be drawn indirectly from how they vote in propositions [9][10][11], i.e., their agreement or disagreement among a set of voted proposals.
Politicians with similar voting patterns can be understood as having similar views and interests, thus can be connected in a political network. Being a complex system, it makes sense to explore such a system by using methods borrowed from network science [12]. The Brazilian political environment becomes an interesting system to be studied under this approach since it is a multi-party system, thus, supposedly presenting a diverse spectrum of political ideas. Also, it is a system that has undergone many changes over the past few years. Moreover, data for decisions and political organization are openly available.
In this paper, we propose a framework based on complex networks to study the evolution of the Brazilian political system in terms of how politicians vote in proposals along time.
For that, we collected and processed voting data from 1991 to 2018, spanning 6 terms in the lower chamber and build vote correlation networks. Because the obtained networks were weighted and dense, we applied a filtering method to preserve the community structure.
Using concepts borrowed from network science, we defined relevant political concepts, including coalition, fragmentation and isolation. We also devised a measure to quantify the effective number of groups, regardless of the number of political parties.
Several interesting results were obtained in the Brazilian political scenario. We found in most of years a low correspondence between communities and topological groups, meaning that the majority of deputies tend to share votes with different parties. Most importantly, despite the large number of political parties in the Brazilian political scenario (29 parties in 2019), effectively only roughly 3 groups of deputies were identified. The quantification and visualization of shared coalitions and isolation were consistent with the political scenario unfolded along the considered period. We also found different behavior of coalitions: while some parties tend to be aligned to the president party, other parties are always in distinct network communities. Finally, we identified an isolation pattern that could be an early indication of the start of an impeachment process. While we focused on the analysis of the Brazilian case, we advocate that the methods are robust enough to be applied in other democratic systems and even in other social networks.
The paper is structured as follows. In Section II, we review some recent works using a graph representation to study political systems. In Section III A, we detail the creation of networks from voting data. The procedure employed to prune the obtained weighted, dense networks is presented in Section III B. Section III C presents a discussion on how to effectively measure the quality and quantity of clusters identified in networks. In Section III D, we introduce some measurements to quantify the concepts of political coalition, isolation and fragmentation. Finally, results and future works are discussed in Sections V and VI, respectively.

II. RELATED WORKS
Some studies used the concept of complex networks to model some political concepts, including the concept of political coalition and cohesion [10,11,13,14]. Both concepts are quantified via networks extracted from parliamentary votes.
In [10], network edges are established according to the similarity of votes between parliamentarians. Votes were considered to be chosen among three options. Deputies can vote for and against a specific proposition. In addition, they may abstain from voting. All data were extracted from the Italian parliament in 2013. A topological analysis to quantify the concepts of cohesion, polarization and government influence was created in terms of network parameters. The cohesion of political parties was defined using the following two density indexes: where n c is the number of node in cluster (community) C, and w ij is the weight of the edge linking nodes i and j. A given community C is considered cohesive if the internal density ρ int (C) is significantly higher than the average internal density of the whole network ( ρ int ).
In a similar fashion, C may also be considered cohesive whenever ρ ext (C) > ρ ext .
In [10], network communities and relevant parliamentarians are identified by measuring the effect of removing and inserting nodes in distinct network communities. More specifically, the authors measured the variation in the modularity dQ [15] when parliamentarians with distinct political positions are placed in their respective opposite political communities.
The degree of polarization was then defined as proportional to the absolute variation |dQ| resulting from this procedure.
The cohesion defined in equations 1 and 2 revealed that some parties are well defined, while some parliamentarians display a voting pattern that is most similar to the patterns observed in other parties. Upon analyzing the modularity of the networks, the authors advocate that it may not be the best alternative to characterize the obtained clustered networks. Their analysis also showed that the emergence of three main clusters, where two groups represent parliamentarians voting according to the government preferences, while the remaining cluster was found to represent opposition parties. Finally, a temporal analysis revealed patterns of voting behavior along time. Such a temporal analysis was useful to identify e.g. parties do not display a consistent voting behavior.
In [13], a dataset of propositions analyzed between 2011 and 2016 was used to analyze votes in the Brazilian political scenario. Nodes were defined as deputies, which are linked by the number of identical votes in the considered period. Different variations in the network construction were considered because abstentions may lead to different interpretations on whether two deputies agree when at least one of them abstain from voting. This study also defined political concepts such as polarization in social networks using correlation clustering [16] and its symmetric relaxed version to quantify polarization.
The analysis carried out in [13] revealed that several parties do not comply with their respective original ideas as expressed during the campaign period. In addition, this study showed that, in recent years, the group of deputies allied to the Brazilian government diminished. Even after presidential reelection, allied deputies turned out to display a voting pattern in disagreement with government propositions. Finally, this study reports that the distance between centre-and right-wing deputies decreased in recent years.
The research conducted in [14] studied the network community organization of parliamentarians, focusing on their temporal evolution. The data were obtained from both Brazilian Chamber of Deputies and US Parliament from 2003 to 2017. Temporal networks were created without overlapping. In the network representation, parliamentarians are nodes and edge weights are established according to the vote similarity. In the Brazilian case, considered two additional possibilities for each vote: (i) abstention; and (ii) obstruction. In the latter, it may also be considered as a vote against the proposition under discussion. The authors proposed a measurement that depends on the community structure of the network.
The Partisan Discipline, computed for a given parliamentarian m, was defined as where N is the total number of propositions analyzed and I(m, p m , i) = 1 iff m and the respective group (i.e. network community or party) to which he/she belongs voted equally in the i-th proposition. In addition, this study showed that deputies oftentimes do not stay in the same dense, well-defined community for long periods.
Some studies used political networks to identify the ideology of political parties. While many studies focused on voting data, the study reported in [17] focused on party-switching affiliations from one election to another to generate networks. In such a network, nodes are parties and two parties are linked if there was a migration of deputies between these parties.
The community structure of the obtained networks revealed a tendency of deputies to switch between parties with similar ideology. A model was also created to quantify the ideology of political parties on the left-right scale. The methodology was found to be robust as it does not depend on the level of representation of parties, since only affiliation data is required to create political networks.
The European Parliament was studied using networks in the work conducted in [11].
A dataset with roll-call voting data from the European Parliament was used to create a network where parliamentarians are nodes and edges are weights according to the similarity of voting patterns. Twitter [18] was also used to create a network where parliamentarians are connected if there is a "retweet" relationship between them. The main results revealed the emergence of political coalitions, characterized by cooperation of opposite political wing groups. In addition, they found a tendency of cooperation among small and large political groups. Finally, this study also reports that political alignments in Twitter are consistent with voting patterns.
Unlike the above mentioned studies, we propose metrics based on shortest path lengths which can be understood as the connection strength between parliament members. We also use temporal series to show our results, enabling thus the visualization of cluster dynamics.
Differently from other works, we compare the traditional structure of political parties with the actual structure observed from voting behavior. Our focus is on the characterization of groups, which encompasses the definition of political fragmentation, isolation and diversity of parties.

III. METHODOLOGY
The proposed framework to analyze the relationship among deputies according to their voting patterns can be summarized in the following steps: 1. Network construction: a network is created from voting patterns in a given period.
Two deputies are linked if they vote in a similar fashion in several propositions.
2. Network backbone extraction: this step involves the removal of the weakest edges.
This is essential to create a network that can be treated by most of the traditional community detection methods. Mostly important, the removal of the weakest edges does not affect the community structure of the obtained networks.
3. Cluster detection and analysis: community detection methods are employed to identify groups of deputies displaying similar voting patterns. Thus, the pairwise level of agreement between two deputies is defined as: where v (i) · v (j) is the dot product between v (i) and v (j) . Note that, when deputies have the same opinion about the proposition being voted, the level of agreement is increased by 1. Conversely, when views are opposite, w ij is decreased by 1. Whenever at least one of the deputies absence from voting in a proposition, that proposition is not considered when computing w ij .
Because edges weights are given by w ij , all obtained networks are undirected and weighted. As we shall show, the networks may be constructed using a particular time interval, and also considering sliding (overlapping) windows of fixed length.

B. Network backbone extraction
Even considering only agreement between deputies as connections, the proposed approach can still lead to densely connected networks. While such networks can still be regarded as a weighted, it is usually desirable to remove noise from data, such as weak connections that may correspond to spurious links. For that reason, we included a preprocessing step to our analysis pipeline in which we deal with the problem of keeping only important connections in a network. Such a task is known as edge pruning [19].
A simple way to accomplish edge pruning is keeping only edges with weights larger than a fixed threshold. Such a threshold can be determined, for instance, in terms of percentiles to reach a certain network density. The main problem with that approach is that it may introduce some bias toward highly connected nodes which can lead to substantial undesired effects on the network topology.
A more sophisticated approach to the edge pruning task is the backbone extraction method [20]. This technique is based on the idea that when determining the importance of an edge, we must take into account how its ending points are connected to the network.
More specifically, the authors proposed that the importance of a node can be determined in terms of a disparity filter, so that a value α ij for an edge can be drawn from a p-value determined in terms of null model. This null model considers the probability of a node having an edge with a certain weight based on its other connections. In practice, one can compute α ij for a certain edge e ij existing in the network as: where k i is the number of connections for node i and p ij is a probability defined as The resulting network is obtained by removing all edges with α ij higher than a certain threshold α. Because the networks under analysis are undirected, we consider, for each edge e ij , the lowest between α ij and α ji . Here, we opted no to use a fixed threshold for α. We instead used the smallest α such that each network have at least 80% of the nodes belonging to the giant component. This results in more consistent networks along time and allow us to analyze the data based on the major component. This is important in network analysis because some measurements, such as shortest path distances, are only measured in a connected component.
In order to illustrate the edge pruning process, we show in Figure 2 a visualization of a network before and after the pruning process is applied. Interestingly, there is a clear separation in communities of deputies with very distinct voting patterns. The emergence of several communities is not as clear when the filtering procedure is not applied.

C. Cluster detection and analysis
Once the network is obtained, the clustering of nodes is necessary to understand how deputies are clustered by taking into account voting patterns. The identification of communities is also important to analyze if the organization of deputies in parties is consistent with the natural organization emerging from their voting behavior.
Several community detection methods rely on the modularity to measure the quality of the obtained clusters in complex networks [22][23][24]. This measurement quantifies the number of links that occur inside a community that are above the same number expected in a null model. Mathematically, the network modularity Q is given by In this particular study, we used the Leiden method [25], which is an improved version of the Louvain algorithm [22]. While we report our results based on this Leiden method, a preliminary study revealed that the choice of community detection method has no influence on the results reported in this paper.
The Louvain algorithm is based on the iterative repetition of two steps. First, each single node is considered as a single community in the network. Then, for each node, the method evaluates the gain in modularity when it is removed from its current community and placed to a neighbor community. The change of community membership is effectively performed if such a change corresponds to the highest gain in modularity among all possible changes in the current iteration. If the highest gain is negative (i.e. the change of community membership leads to a decrease in modularity), then the node stays in its original community. This process is repeated until no gain in modularity is achieved. The efficiency of the method stems from the fact that the gain of modularity can be computed in a efficient way. The algorithm proposed in [25] guarantees that the obtained communities are connected.
One particular feature that we are interested in the study of political networks is the correspondence between political parties and network communities. Such an analysis can be performed by comparing the total number of clusters obtained from these two partitioning strategies. A more refined notion group diversity (i.e. total number of groups) in a partition -obtained via political parties or community detection -may be derived from the concept of true diversity [26]. This quantity measures the effective number of groups in a partition by considering the size of groups. Let S = {s 1 , s 2 , . . . s R } be the size of each group in a partition comprising R groups. The computation of the true diversity requires that the sizes in S are normalized. For this reason, they are normalized to yield the following normalized set of normalized sizes Π = {π 1 , π 2 , . . . π R }, where π i = s i / j s j . The true diversity D (q) of the distribution Π is defined as where q is the exponent given to the weighted generalized mean of π i 's. If q = 1, the true diversity can be computed as which corresponds to the exponential of the entropy of Π. The true diversity has been employed in other contexts to measure the effective number of species [27]. In network theory, the accessibility of nodes in complex networks is measured in terms of the true diversity and quantify the effective number of neighbors [28][29][30][31][32]. Because we are measuring the diversity of clusters size, hereafter we refer to the true diversity as the effective number of political parties and communities.

D. Network characterization
In this section, we introduce the measurements that are used to analyze network obtained from voting patterns. We are specially interested in the analysis of distances among parties along time. The first important measurement that can be extracted from political networks is the topological distance d(A, B) between two groups A and B, defined as the average distance among their elements: where l(a, b) is the shortest path length between a ∈ A and b ∈ B. When groups are political parties, the distance d (A, B) can be used to quantify the level of coalition between A and B. In this case, low values of distances correspond to strong coalitions.
Another important quantity is the average distance d G between all nodes in the network.
It is defined as: Note that the distance l can not be computed directly from the network obtained as described in Section III B, because edge weights represent a similarity relationship. In order to map the similarity w ij defined in equation 4 into a dissimilarity index, we used the following equation: A more detailed description on network transformations, such as the one in equation 12, can be found in [33].
The political isolation of group A, I(A), is defined as the distance between group A and all other groups in a given division of the network. It is defined as which represents the average distance (as defined in equation 10) between A and all other groups. Note that the distances are weighted by the size of each group X = A. According to equation 13, the isolation reflects how distant nodes belonging to the group of reference are from the other nodes in the network. Interestingly, there is an analogy between the isolation of groups and the ability (complexity) of unsupervised classification [34]. In other words, isolated groups are analogous to outlier clusters in unsupervised data analysis.
The concept of political fragmentation for group A, F (A), can be defined in terms of intra-group dispersion, i.e.
Therefore, a group is highly fragmented whenever the nodes in that group are distant from each other, according to the topology of the network. Note that the quantities defined in  The response to the above questions is important to understand the dynamics of deputies relationships, both at the microscopic and mesoscopic level. Such an better understanding may point out to better ways to improve the democratic system of a country. In the same line, the extraction of information from unstructured date might assist the population to take decisions on who they choose as representatives based on the voting history of deputies and parties alike.
In order to address the above research questions, we divide our analysis in two parts.
In Section V A, we analyze the global properties of the networks and compare the groups defined via topology and political parties. In Section V B, we analyze the network at the mesoscopic (party) level.

A. Global analysis
The global evolution of the obtained political networks is depicted in Figure 5  The divergence between topological clusters and political parties can be further analyzed via Normalized Mutual Information (NMI) [37], which measures the quality of the obtained partitions (topological communities) based on a set of nodes membership labels (political parties). The evolution of the NMI index is shown in Figure 8. Overall, the correspondence between political parties and topological partitions varies along time, but a major rupture seems to occurs just before the first PSDB term (1994) and just before impeachment (2016).
Notably, the highest discrepancy (lowest value of NMI) occurred in 1996, which is in accordance with the visualization provided in Figure 5. This result confirms that, in the Brazilian political scenario, in most years the natural organization of deputies in communities has a low correspondence with party organization. The discrepancy between political parties and topological clusters is also evident when one study the actual and effective quantity of groups. The effective number of parties and communities was computed using the true diversity of the distribution of clusters size, as described in Section III C. The results are depicted in Figure 9.

B. Mesoscopic analysis
The obtained networks can also shed light into the understanding of the political dynamic at the mesoscopic level. An interesting analysis concerns the quantification of the pairwise distance between political parties along time. This analysis is important, e.g. to quantify the level of coalision (or opposition) between parties and party isolation. Such concepts are oftentimes used in politics because abrupt or recurrent changes in these quantities may precede a relevant change in the political scenario.
In Figure 10, we show the temporal evolution of the distances between PSDB and other parties. The distances are computed using equation 10. In this figure, we also show presidential terms in different colors, including the transition post-impeachment led by MDB presidency (2016 − 2018, green background). Interestingly, along the years of PSDB presidency, PT was found to be the most distant party from PSDB. This is consistent with the traditional sentiment of polarization between PT and PSDB in these years [38] [39]. During PT presidency, we note that PT and PSDB were still distant, however, two additional parties (MDB and PP) were found to increase (and keep) a larger distance from PSDB. This is consistent with the real political scenario, since MDB and PT shared a coalition along the period characterized by PT presidency. These observations are consistent with the visualization provided in Figure 5.
Another interesting pattern that can be extracted from Figure 10  PSDB fragmentation corresponds to the blue curve. Note that the typical distance between PSDB deputies is compatible with the distance between PSDB and DEM virtually along all considered years and, markedly, before 2002. Such a proximity between these two parties is also evident from the visualization of the network structure. A strong concordance with MDB and PP is also apparent before 2002 and after 2016.
In Figure 11, we show the distances between MDB and other parties. In this case, the transition in government seems to play a prominent role in the behavior of this party. (see also Figure 5). The rapprochement between these parties took place only in 2016, in the post-impeachment transition. The MDB fragmentation along time is roughly constant, however, apart from 2012-2015, the fragmentation is always similar to the typical distance between MDB and the closest parties. Finally, it is interesting to note that MDB and PP display a similar evolution of distances, according to Figures 11 and 12, respectively.
The distance between PT and other political parties is depicted in Figure 13. It is clear from the data the PT and PSDB are continuously distant from each other since 1995. During Note that the distances behavior is similar with the ones found for MDB (see Figure 11).
PSDB presidency, the isolation of PT is clear: no other party is closer to PT than PT itself.
In the first years of PT presidency, PT became closer to other parties, including MDB and PP. Some years before a new transition in governments, however, PT became distant from all others, specially after 2015. This isolation scenario became consistent even after the presidential impeachment (2016).
In Figure 14, two presidencies is shown in gray background color. Apart from PT, all parties have similar values and patterns of isolation along time. While PT isolation was compatible with many the behavior displayed by many other parties, its isolation increased (relatively to other parties) months before impeachment. During the first months of the interim presidential term, PT isolation increased relatively to other parties. After 2017, PT turned out to be the most isolated political party. PT isolation is also clear in the visualization provided in Figure 5, specially in 2017. A sudden change in relative isolation months before impeachment could indicate the vulnerability of the party, and potentially a signal of a major change in the political scenario.

VI. CONCLUSION
In a democratic nation, the political organization can reflect its cultural context, economic and social interactions. The understanding of the role of deputies and parties alike is essential to improve any democratic system. Here we proposed a framework to assist the understanding of the dynamics of political parties. The relationship between deputies in the lower chamber was represented as a network, where two deputies are linked if they voted in a similar way in different propositions. Using concepts borrowed from network science, we defined politics concepts such as isolation, fragmentation and coalition, which were measured in terms of topological distances. Several interesting results were found when the proposed framework was applied to analyze the Brazilian Chamber of Deputies. During PSDB term, PT was found to be the most distant party from PSDB and vice-versa. The coalition shared by parties was easily identified from both distance metrics and the visualization of the obtained networks. The networks displayed a modular topology, however, we found a low degree of consistency when comparing topological groups and political parties in most years. Surprisingly, we found that even though the Brazilian political scenario is highly fragmented, only a few topological groups emerge. While in 2019 the lower chamber comprised 29 distinct parties, we found that effectively there are roughly only 3 clusters of deputies, according to the diversity measurement. Finally, a detailed analysis in the period between 2015 and 2019 revealed that PT isolation increased a few months before the presidential impeachment of Dilma Rousseff (PT). This could be an early sign before impeachment proceedings against former president Rousseff effectively started.
As future work, we intend to use additional information to analyze and possibly predict the results of votes based on the history of deputies, parties. This could be accomplished by using textual data obtained from the propositions and transcripts from deputies' discourses.
While the focus of this manuscript was in identifying and quantifying relevant political quantities via network science, we advocate that this framework could also be used in other scenarios. An analogous approach could be used to analyze other types of social networks with similar characteristics, i.e. entities with pre-defined labels that interact in a complex