Bayesian Inference on Proportional Elections

Gabriel Hideki Vatanabe Brunello; Eduardo Yoshio Nakano

doi:10.1371/journal.pone.0116924

Abstract

Polls for majoritarian voting systems usually show estimates of the percentage of votes for each candidate. However, proportional vote systems do not necessarily guarantee the candidate with the most percentage of votes will be elected. Thus, traditional methods used in majoritarian elections cannot be applied on proportional elections. In this context, the purpose of this paper was to perform a Bayesian inference on proportional elections considering the Brazilian system of seats distribution. More specifically, a methodology to answer the probability that a given party will have representation on the chamber of deputies was developed. Inferences were made on a Bayesian scenario using the Monte Carlo simulation technique, and the developed methodology was applied on data from the Brazilian elections for Members of the Legislative Assembly and Federal Chamber of Deputies in 2010. A performance rate was also presented to evaluate the efficiency of the methodology. Calculations and simulations were carried out using the free R statistical software.

Citation: Brunello GHV, Nakano EY (2015) Bayesian Inference on Proportional Elections. PLoS ONE 10(3): e0116924. https://doi.org/10.1371/journal.pone.0116924

Academic Editor: Zhong-Ke Gao, Tianjin University, CHINA

Received: August 4, 2014; Accepted: December 16, 2014; Published: March 18, 2015

Copyright: © 2015 Brunello, Nakano. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Data Availability: All relevant data are within the paper and its Supporting Information files.

Funding: The authors are grateful to the National Council for Scientific and Technological Development (CNPq) and to Decanato de Pesquisa e Pós-graduação (DPP) of the University of Brasilia (UNB) for the financial support. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

INTRODUCTION

In Brazil, elections for president, governors and mayors use the majority system, where the candidate with absolute majority of the votes is elected. On a proportional system however, the absolute majority of the votes do not guarantee the election of this candidate. The proportional scenario is the kind of election that deputies (federal, state and district) as well as members of the city council are elected from. A problem with proportional elections is the difficulty to evaluate the precise number of seats (vacancy) that each party won. Since there is no guarantee that the ratio between the number of votes and the number of seats is an integer, an approximation and redistribution system must take place. Brazil defines the electoral quotient as the number of valid votes divided by the number of seats. Each party has its votes divided by the electoral quotient to obtain the party quotient, and the integer part of this quotient corresponds to the number of seats reserved to the party. The remaining seats are then allocated using the D’Hondt method. These peculiarities of proportional elections make classic statistical inference not viable. However, the same inference can be easily carried out using Bayesian inference combined with Monte Carlo simulation methods. In this context, the purpose of this paper was to perform a Bayesian inference on proportional elections considering the Brazilian system of seats distribution. More specifically, a methodology to answer the probability that a given party will have representation (at least one seat) on the chamber of deputies was developed. Inferences were made on a Bayesian scenario using the Monte Carlo simulation technique and calculations and simulations were carried out using the R software. The developed methodology was applied on data from the Brazilian election for Members of the Legislative Assembly and Federal Chamber of Deputies in 2010.

METHODS

Brazilian Proportional Election System

The proportional election is an electoral system in which the proportion of taken seats of each party is determined by the proportion of obtained votes. It is utilized with the intention of ensuring the participation of different segments of society, because unlike the majority system, proportional elections do not necessarily guarantee the candidate with the most number of votes will be elected. In Brazil, elections for Federal Deputies, Members of the Legislative Assembly and Councilor’s use the proportional system.

The seat distribution is accomplished using the electoral quotient and the D’Hondt method for the distribution of the remaining seats [1,2]. The electoral quotient is the sum of all valid votes (nominal votes + party votes, which is equivalent to the total of votes minus the blank and null votes) divided by the number of available seats. Only parties (or coalitions) with a total of valid votes greater than the electoral quotient will participate on the D’Hondt method.

Initially, parties with a total of votes greater than the quotient will earn an amount of seats equal to the number of votes the party has divided by the quotient. In case of decimals, the value is rounded down. After the distribution, the remaining seats are distributed using the D’Hondt method, where the party with greatest number of adjusted votes (party’s votes divided by the number of earned seats plus 1) earns one more seat and has its total of votes readjusted. This procedure is used until there are no empty seats.

Seats division method

The algorithm used for the division of seats on the Brazilian proportional electoral system is presented below [1,2].

Step 0: Get the data of the parties’ names, number of votes for each party and the number of available seats;
Step 1: Sum the number of valid votes (total of votes discarding null and blank votes) and divide by the number of seats. This result is the electoral quotient;
Note: If no party receives more votes than the electoral quotient, the election is cancelled (no party earns any seats);
Step 2: Divide the number of each party votes by the electoral quotient and for each party, add a number of seats equal to the number gotten rounded down;
Step 3: If there are no remaining seats after the division by the quotient, the distribution is done and display the quantity of seats that each party (or coalition) earned;
Step 4: If there are remaining seats after Step 2, distribute them using D’Hondt method:
1. Step 4.1: To identify the party with the most adjusted votes, where
3. Note: In case of a draw between two or more parties on the number of adjusted votes, the one with the smallest number of earned seats gets the seat.
4. Step 4.2: Add a seat to the party with the greatest number of adjusted votes in Step 4.1;
5. Step 4.3: If the number of remaining seats is greater than 0, return to Step 4.1, else, the distribution is complete.

Bayesian Inference

Initially, a Bayesian analysis was done to the proportion of votes received by each party/coalition. This analysis was made through Dirichlet-Multinomial conjugation [3].

Dirichlet-Multinomial Conjugation. Let X₁,…,X_n be a random sample of size n, where X_j = (X_1j, …, X_kj), j = 1,…,n has a Multinomial distribution with parameters vector (θ₁, …,θ_k), 0≤θ₁≤1, and . Assume that the prior distribution of (θ₁, …,θ_k) is a Dirichlet with known hiper-parameters (a₁, …,a_k), a_i>0, ∀ i = 1,…,k. Thus, the posterior distribution of (θ₁, …,θ_k) given X_j = x_j, j = 1,…,n is a Dirichlet with parameters vector (a₁+y₁, …,a_k+y_k), where , i = 1,…,k.

Assume that the opinion of each elector is independent and, that in a specific moment, each one of them may: to opt for one of the k parties/coalitions; or to opt for a blank/null vote or even be indecisive. We will assume that indecisive voters are not informative, being excluded from the sample (notice that this procedure is different from assuming that they may opt for one of the k parties with same probability). Let Y_j be the number of voters favorable to the party j, j = 1,2,…,k, and Y_k+1 the number of voters that pretend to vote blank/null. Selected a sample, the likelihood function of the data is given by:

where is the number of voters in the sample; θ_j is the true proportion of voters favorable to the party j, j = 1,2,…,k and θ_k+1 is the true proportion of voters that pretend to vote blank/null. By the results of the Dirichlet-Multinomial conjugation, if a Dirichlet distribution with parameters vector (a₁,…,a_k,a_k+1) is adopted as prior distribution, the posterior distribution of (θ₁,…,θ_k,θ_k+1) given (Y₁,…,Y_k,Y_k+1) is a Dirichlet with parameters vector (a₁+y₁,…,a_k+y_k,a_k+1+y_k+1), i.e.,

1

where and is the gamma function.

In a Bayesian scenario, the number of seats that each party earns is a multidimensional random variable and all information about this random variable is contained in its posterior density, whose analytic expression is unknown. However, it is not necessary to know the analytical form of the density of the seats, because its posterior can be easily obtained through Monte Carlo simulations methods [4]. The procedure consists in producing, from the posterior distribution of the proportion of votes (1), a large number of artificial elections and, in each one of them, to perform the seats distribution method described in the preceding section. Therefore, the probability of a determined party earning c seats is the number of times this party won c number of seats divided by the total of realized simulations.

Performance Rate

To evaluate the efficiency of the methodology, a performance rate was developed. This rate ranges from 0 to 1, where 1 is a perfect score meaning that all the parties/coalitions got probability 1 on the number of seats they earned on the real election, and 0 is the opposite result, where the probability of each party earn the amount they earned on the real election is 0. The performance rate is calculated from the sum of the probability of each party earning the number of seats it obtained on the real election, divided by the number of parties/coalitions.

RESULTS

Election of MLA (Members of the Legislative Assembly)

The candidating parties to the election of the Members of the Legislative Assembly (MLA) in Federal District of Brazil in 2010 were: DEM (Democratas; Democrats), PCB (Partido Comunista Brasileiro; Brazilian Communist Party), PCO (Partido da Causa Operária; Workers Cause Party), PDT (Partido Democrático Trabalhista; Democratic Labor Party), PMDB (Partido do Movimento Democrático Brasileiro; Brazilian Democratic Movement Party), PP (Partido Progressista; Progressive Party), PMN (Partido da Mobilização Nacional; Party of National Mobilization), PPS (Partido Popular Socialista; Popular Socialist Party), PHS (Partido Humanista da Solidariedade; Humanist Party of Solidarity), PR (Partido da República; Party of the Republic), PRB (Partido Republicano Brasileiro; Brazilian Republican Party), PTB (Partido Trabalhista Brasileiro; Brazilian Labor Party), PSB (Partido Socialista Brasileiro; Brazilian Socialist Party), PC do B (Partido Comunista do Brasil; Communist Party of Brazil), PSC (Partido Social Cristão; Social Christian Party), PRTB (Partido Renovador Trabalhista Brasileiro; Brazilian Labor Renewal Party), PSDB (Partido da Social Democracia Brasileira; Brazilian Social Democracy Party), PSDC (Partido Social Democrata Cristão; Christian Social Democratic Party), PT do B (Partido Trabalhista do Brasil; Labor Party of Brazil), PSL (Partido Social Liberal; Liberal Social Party), PTN (Partido Trabalhista Nacional; National Labor Party), PSOL (Partido Socialismo e Liberdade; Socialism and Freedom Party), PSTU (Partido Socialista dos Trabalhadores Unificados; Unified Socialist Workers’ Party), PT (Partido dos Trabalhadores; Workers’ Party), PTC (Partido Trabalhista Cristão; Christian Labor Party), PRP (Partido Republicano Progressista; Progressive Republicam Party) and PV (Partido Verde; Green Party), totalizing 19 parties/coalitions presented in Table 1.

Download:

Table 1. Number of valid votes (nominal votes + party votes) and number of seats earned by the party/coalition in the election of the MLA in Federal District of Brazil, 2010.

https://doi.org/10.1371/journal.pone.0116924.t001

The 2010 election of MLA in Federal District of Brazil had 1,425,661 valid votes of 1,833,942 effective voters, and 24 empty seats were disputed between the parties/coalitions [5]. Inference were made using a sample of size n = 1000, randomly selected among effective voters.

To select the sample, it was considered the votes and parties shown on Table 1, including blank/null/missing and using R free software [6]. The sampling method was a simple random sampling with no replacement.

The probability of each party obtaining a quantity of seats was estimated adopting a non-informative prior Dirichlet(1,1,…,1) and 1,000,000 Monte Carlo simulations.

Table 2 presents the estimated probabilities (highlighting the real number of seats received by each party), the number of votes each party earned in the sample and the number of votes each party should earn in case of a perfect sample (a sample that describes the population perfectly).

Download:

Table 2. Estimated probabilities of obtaining seats for each party/coalition using a sample of 1,000 voters.

https://doi.org/10.1371/journal.pone.0116924.t002

A performance rate of 0.715 was obtained to the methodology from a sample of 1,000 voters, where 203 were blank/null votes (as if the sample had only 797 voters) and forecasting the right number of seats (the seat number with greatest probability is the same as the real result) for 15 of 19 parties, which is a good performance (Table 2). By the results, the methodology seems efficient since the major part of greatest probabilities for each party were on the same number of seats as in the real election. Results of some parties diverged from the real, due to the fact that the samples were randomly selected and may not be a good representation of the population. An example is the coalition PSDC/PT do B, which was overrated on the selected sample. Nevertheless, the wrong predictions diverged from the real results by only one seat. In the perfect sample, the performance rate was 0.741, forecasting the right number of seats for 17 of 19 parties.

Fig. 1 displays the performance rate of the methodology for different sizes of samples representing the population perfectly. Perfect samples, despite being unlikely on real situations, are the best way to evaluate the performance of the proposed methodology. Samples of size 0 to 2,000 were used, where on sample of size 0, it was assumed that the probability was uniformly distributed among the number of seats, resulting a performance rate of 0.042. As expected, the methodology becomes more efficient when the sample size increases.

Download:

Fig 1. Performance rate for different sizes of perfect samples.

https://doi.org/10.1371/journal.pone.0116924.g001

Using data from the MLA elections of 2010, simulations were made to each Brazilian state, verifying the performance of the methodology for other states and electoral situations. Table 3 and Fig. 2 present the obtained values for each simulation. A simple random sample and a proportionally perfect sample of size 1,000 were used for each state.

Download:

Fig 2. Performance rate for each state using samples of 1,000 voters.

Data from the MLA election, Brazil 2010.

https://doi.org/10.1371/journal.pone.0116924.g002

Download:

Table 3. Performance of the methodology for each state using samples of 1,000 voters.

https://doi.org/10.1371/journal.pone.0116924.t003

Performance rates of perfect samples were superior to 60% and were superior to 50% in most cases of normal samples. One problem of the performance rate utilized is the devaluation of the result when the probability is greatly distributed among the seats of the party, even when the greatest probability corresponds the real result, because the rate only shows the proportion of the total probability that match with the real result of the election. Column “Right Predictions” from Table 3 shows the proportion of seats where the party’s/coalition number of seats with the greatest probability was the same as the real election. It is possible to verify that even states with low performance rate present high right predictions scores. To perfect samples the proportion of right predictions shows the efficiency of the methodology, being in most cases superior to 90%.

An interesting observation is that the performance rate shows a negative association with the number of parties (Pearson correlation = −0.592; p = 0.001), the number of seats (Pearson correlation = −0.775; p<0.001) and the number of votes (Pearson correlation = −0.642; p<0.001). These results were obtained considering the perfect sample and suggest that scenarios with large number of parties, large number of seats and/or large number of votes, need a larger sample size to get the same performance. Furthermore, we observed no significant correlation (p>0.05) between the right prediction index and the number of parties, number of seats and number of votes.

Minas Gerais state (MG) results are interesting because it had a performance rate of 0.441 and 33% of right predictions for the normal sample and, performance rate of 0.616 and 100% of right predictions for the perfect sample. It happened due to a bad sample that influenced the results. Table 4 shows the results from the normal sample and the ones from the perfect sample.

Download:

Table 4. Samples obtained from election of the MLA in Minas Gerais State. 2010.

https://doi.org/10.1371/journal.pone.0116924.t004

In Mato Grosso do Sul state (MS) the performance rate of the normal sample was better than the perfect sample. It happened due to extra information the normal sample had because of a lower number of blank votes when compared with perfect sample (Table 5).

Download:

Table 5. Samples obtained from election of the MLA in Mato Grosso do Sul State. 2010.

https://doi.org/10.1371/journal.pone.0116924.t005

Different from what occurred to Minas Gerais state (MG), that also received less null votes on the normal sample, Mato Grosso do Sul state (MS) sample didn’t overestimate or underestimate any party/coalition, it divided the remaining votes proportionally.

Election of Federal Chamber of Deputies

Results of each state to the elections for the Federal Chamber of Deputies in Brazil are presented in Table 6.

Download:

Table 6. Performance of the methodology for each state using samples of 1,000 voters. Data from the election of Federal Chamber of Deputies, Brazil 2010.

https://doi.org/10.1371/journal.pone.0116924.t006

As expected, the performance rates and the proportion of right predictions to the election for the Federal Chamber of Deputies were better than the MLA elections. As previously mentioned, it happened because elections for the Federal Chamber of Deputies have fewer parties and seats than MLA elections. The election for the Federal Chamber of Deputies presented same situations as the MLA, like the performance rate of normal samples better than the perfect sample and the bad performance rate of normal samples due to bad samples. Fig. 3 compares the performance of normal samples with perfect samples by state.

Download:

Fig 3. Performance rate for each state using samples of 1,000 voters.

Data from the election of Federal Chamber of Deputies, Brazil 2010.

https://doi.org/10.1371/journal.pone.0116924.g003

Moreover, the performance rate shows a negative association with the number of seats (Pearson correlation = −0.618; p<0.001) and the number of votes (Pearson correlation = −0.547; p<0.003) in election for the Federal Chamber of Deputies. Differently from the MLA election, we observed no significant association between the number of parties and the performance rate (Pearson correlation = −0.301; p<0.127). Furthermore, we observed no significant correlation (p>0.05) between the right prediction index and the number of parties, number of seats and number of votes.

CONCLUSIONS

Polls for majoritarian voting system usually show estimates of the percentage of votes for each candidate. On proportional systems, estimates of the percentage of votes of each party/coalition do not allow to forecast the number of seats each party/coalition will receive. Thus, classical methods used in majoritarian elections cannot be applied on proportional elections. This paper presented a Bayesian inference on proportional elections considering the Brazilian system of seats distribution, answering the probability that a given party will have representation on the Chamber of Deputies. Results based on data from the Brazilian election for Members of the Legislative Assembly and Federal Chamber of Deputies in 2010 show that most part of the greatest probabilities of each party was concentrated on the number of seats that were equivalent to the real result. Deviations from the real result happened mostly due to the utilized sample, since it might not have been a good representation of the real population. This is spotted when compared to the perfect sample result that presented a good precision estimating the number of seats each party/coalition would receive, with more than 80% of right predictions in all results on both elections. In this context, the success of the inference depends on a sample that should be a good representation of the population.

The proposed methodology is conservative with the indecisive voters. By the partition property of Dirichlet distribution, the indecisive voters do not participate in the analysis. A sample of 1,000 voters of which 200 are indecisive is probabilistically equivalent to a sample of 800 voters with no indecisive voters. This is different than, for example, to distribute (uniformly or proportionally) the indecisive between the parties/coalitions.

The methodology proved to be consistent since it becomes more efficient when the sample size increases. However, states with lots of parties, voters or seats need larger sample size to get the same performance. A suggestion for a future work is a simulation study to define the ideal sample size to obtain, for example, a performance of 90% for all states.

This paper can encourage the use of a Bayesian methodology on proportional elections. To provide a simple, consistent and easily implementable methodology may shorten the distance between Bayesian inference and political researches.

Supporting Information

S1 Data. Data – Election 2010 – Brazil.

https://doi.org/10.1371/journal.pone.0116924.s001

(XLSX)

Acknowledgments

The authors are grateful to the National Council for Scientific and Technological Development (CNPq) and to Decanato de Pesquisa e Pós-graduação (DPP) of the University of Brasilia (UnB) for the financial support.

Author Contributions

Conceived and designed the experiments: GHVB EYN. Performed the experiments: GHVB. Analyzed the data: GHVB EYN. Wrote the paper: GHVB EYN.

References

1. Distribuição de vagas nas eleições proporcionais. Artigos 106 e 113 do Código Eleitoral. Available: http://www.justicaeleitoral.jus.br/arquivos/tre-pb-distribuicao-de-vagas-nas-eleicoes-proporcionais. Accessed 20 December 2014.
2. Nicolau JM (1993) Sistema Eleitoral e Reforma Política. Rio de Janeiro: Foglio. 120 p.
3. Gelman A, Carlin JB, Stern HS, Rubin DB (2004) Bayesian Data Analysis. 2 ed. New York: Chapman and Hall/CRC. 696 p.
4. Hammersley JM, Handscomb DC (1964) Monte Carlo methods. London: Methuen. 178 p.
5. Estatísticas do Tribunal Superior Eleitoral / Resultado da eleição. Available: http://www.tse.jus.br/hotSites/estatistica2010/Est_resultados/resultado_eleicao.html. Accessed 2014 Dec 20.
6. R Core Team (2014) R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. Available: http://www.R-project.org. Accessed 2014 Dec 20.

[ref1] 1. Distribuição de vagas nas eleições proporcionais. Artigos 106 e 113 do Código Eleitoral. Available: http://www.justicaeleitoral.jus.br/arquivos/tre-pb-distribuicao-de-vagas-nas-eleicoes-proporcionais. Accessed 20 December 2014.

[ref2] 2. Nicolau JM (1993) Sistema Eleitoral e Reforma Política. Rio de Janeiro: Foglio. 120 p.

[ref3] 3. Gelman A, Carlin JB, Stern HS, Rubin DB (2004) Bayesian Data Analysis. 2 ed. New York: Chapman and Hall/CRC. 696 p.

[ref4] 4. Hammersley JM, Handscomb DC (1964) Monte Carlo methods. London: Methuen. 178 p.

[ref5] 5. Estatísticas do Tribunal Superior Eleitoral / Resultado da eleição. Available: http://www.tse.jus.br/hotSites/estatistica2010/Est_resultados/resultado_eleicao.html. Accessed 2014 Dec 20.

[ref6] 6. R Core Team (2014) R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. Available: http://www.R-project.org. Accessed 2014 Dec 20.

Figures

Abstract

INTRODUCTION

METHODS

Brazilian Proportional Election System

Seats division method

Bayesian Inference

Performance Rate

RESULTS

Election of MLA (Members of the Legislative Assembly)

Election of Federal Chamber of Deputies

CONCLUSIONS

Supporting Information

S1 Data. Data – Election 2010 – Brazil.

Acknowledgments

Author Contributions

References