Bayesian Inference on Proportional Elections

Polls for majoritarian voting systems usually show estimates of the percentage of votes for each candidate. However, proportional vote systems do not necessarily guarantee the candidate with the most percentage of votes will be elected. Thus, traditional methods used in majoritarian elections cannot be applied on proportional elections. In this context, the purpose of this paper was to perform a Bayesian inference on proportional elections considering the Brazilian system of seats distribution. More specifically, a methodology to answer the probability that a given party will have representation on the chamber of deputies was developed. Inferences were made on a Bayesian scenario using the Monte Carlo simulation technique, and the developed methodology was applied on data from the Brazilian elections for Members of the Legislative Assembly and Federal Chamber of Deputies in 2010. A performance rate was also presented to evaluate the efficiency of the methodology. Calculations and simulations were carried out using the free R statistical software.


INTRODUCTION
In Brazil, elections for president, governors and mayors use the majority system, where the candidate with absolute majority of the votes is elected. On a proportional system however, the absolute majority of the votes do not guarantee the election of this candidate. The proportional scenario is the kind of election that deputies (federal, state and district) as well as members of the city council are elected from. A problem with proportional elections is the difficulty to evaluate the precise number of seats (vacancy) that each party won. Since there is no guarantee that the ratio between the number of votes and the number of seats is an integer, an approximation and redistribution system must take place. Brazil defines the electoral quotient as the number of valid votes divided by the number of seats. Each party has its votes divided by the electoral quotient to obtain the party quotient, and the integer part of this quotient corresponds to the number of seats reserved to the party. The remaining seats are then allocated using the D'Hondt method. These peculiarities of proportional elections make classic statistical inference not viable. However, the same inference can be easily carried out using Bayesian inference combined with Monte Carlo simulation methods. In this context, the purpose of this paper was to perform a Bayesian inference on proportional elections considering the Brazilian system of seats distribution. More specifically, a methodology to answer the probability that a given party will have representation (at least one seat) on the chamber of deputies was developed. Inferences were made on a Bayesian scenario using the Monte Carlo simulation technique and calculations and simulations were carried out using the R software. The developed methodology was applied on data from the Brazilian election for Members of the Legislative Assembly and Federal Chamber of Deputies in 2010.

Brazilian Proportional Election System
The proportional election is an electoral system in which the proportion of taken seats of each party is determined by the proportion of obtained votes. It is utilized with the intention of ensuring the participation of different segments of society, because unlike the majority system, proportional elections do not necessarily guarantee the candidate with the most number of votes will be elected. In Brazil, elections for Federal Deputies, Members of the Legislative Assembly and Councilor's use the proportional system.
The seat distribution is accomplished using the electoral quotient and the D'Hondt method for the distribution of the remaining seats [1,2]. The electoral quotient is the sum of all valid votes (nominal votes + party votes, which is equivalent to the total of votes minus the blank and null votes) divided by the number of available seats. Only parties (or coalitions) with a total of valid votes greater than the electoral quotient will participate on the D'Hondt method.
Initially, parties with a total of votes greater than the quotient will earn an amount of seats equal to the number of votes the party has divided by the quotient. In case of decimals, the value is rounded down. After the distribution, the remaining seats are distributed using the D'Hondt method, where the party with greatest number of adjusted votes (party's votes divided by the number of earned seats plus 1) earns one more seat and has its total of votes readjusted. This procedure is used until there are no empty seats.

Seats division method
The algorithm used for the division of seats on the Brazilian proportional electoral system is presented below [1,2].
Step 0: Get the data of the parties' names, number of votes for each party and the number of available seats; Step 1: Sum the number of valid votes (total of votes discarding null and blank votes) and divide by the number of seats. This result is the electoral quotient; Note: If no party receives more votes than the electoral quotient, the election is cancelled (no party earns any seats); Step 2: Divide the number of each party votes by the electoral quotient and for each party, add a number of seats equal to the number gotten rounded down; Step 3: If there are no remaining seats after the division by the quotient, the distribution is done and display the quantity of seats that each party (or coalition) earned; Step 4: If there are remaining seats after Step 2, distribute them using D'Hondt method: Step 4.1: To identify the party with the most adjusted votes, where Adjusted Votes ¼ party valid votes earned seatsþ1 Note: In case of a draw between two or more parties on the number of adjusted votes, the one with the smallest number of earned seats gets the seat.
Step 4.2: Add a seat to the party with the greatest number of adjusted votes in Step 4.1; Step 4.3: If the number of remaining seats is greater than 0, return to Step 4.1, else, the distribution is complete.

Bayesian Inference
Initially, a Bayesian analysis was done to the proportion of votes received by each party/coalition. This analysis was made through Dirichlet-Multinomial conjugation [3].
Dirichlet-Multinomial Conjugation. Let X 1 ,. . .,X n be a random sample of size n, where X j = (X 1j , . . ., X kj ), j = 1,. . .,n has a Multinomial distribution with parameters vector (θ 1 , . . .,θ k ), 0 θ 1 1, and X k i¼1 y i ¼ 1. Assume that the prior distribution of (θ 1 , . . .,θ k ) is a Dirichlet with known hiper-parameters (a 1 , . . .,a k ), a i >0, 8 i = 1,. . .,k. Thus, the posterior distribution of (θ 1 , . . .,θ k ) given X j = x j , j = 1,. . .,n is a Dirichlet with parameters vector (a 1 +y 1 , . . .,a k +y k ), where . .,k. Assume that the opinion of each elector is independent and, that in a specific moment, each one of them may: to opt for one of the k parties/coalitions; or to opt for a blank/null vote or even be indecisive. We will assume that indecisive voters are not informative, being excluded from the sample (notice that this procedure is different from assuming that they may opt for one of the k parties with same probability). Let Y j be the number of voters favorable to the party j, j = 1,2,. . .,k, and Y k+1 the number of voters that pretend to vote blank/null. Selected a sample, the likelihood function of the data is given by: LðY 1 ; :::; Y k ; Y kþ1 jy 1 ; :::; ; y k ; y kþ1 Þ / Y kþ1 j¼1 y y j j where n ¼ X kþ1 j¼1 y j is the number of voters in the sample; θ j is the true proportion of voters favorable to the party j, j = 1,2,. . .,k and θ k+1 is the true proportion of voters that pretend to vote blank/null. By the results of the Dirichlet-Multinomial conjugation, if a Dirichlet distribution with parameters vector (a 1 ,. . .,a k ,a k+1 ) is adopted as prior distribution, the posterior distribution of (θ 1 ,. . .,θ k ,θ k+1 ) given (Y 1 ,. . .,Y k ,Y k+1 ) is a Dirichlet with parameters vector (a 1 +y 1 ,. . ., a k +y k ,a k+1 +y k+1 ), i.e., pðy 1 ; :::; y k ; y kþ1 jY 1 ; ::: In a Bayesian scenario, the number of seats that each party earns is a multidimensional random variable and all information about this random variable is contained in its posterior density, whose analytic expression is unknown. However, it is not necessary to know the analytical form of the density of the seats, because its posterior can be easily obtained through Monte Carlo simulations methods [4]. The procedure consists in producing, from the posterior distribution of the proportion of votes (1), a large number of artificial elections and, in each one of them, to perform the seats distribution method described in the preceding section. Therefore, the probability of a determined party earning c seats is the number of times this party won c number of seats divided by the total of realized simulations.

Performance Rate
To evaluate the efficiency of the methodology, a performance rate was developed. This rate ranges from 0 to 1, where 1 is a perfect score meaning that all the parties/coalitions got probability 1 on the number of seats they earned on the real election, and 0 is the opposite result, where the probability of each party earn the amount they earned on the real election is 0. The performance rate is calculated from the sum of the probability of each party earning the number of seats it obtained on the real election, divided by the number of parties/coalitions.

Election of MLA (Members of the Legislative Assembly)
The  Table 1.
The 2010 election of MLA in Federal District of Brazil had 1,425,661 valid votes of 1,833,942 effective voters, and 24 empty seats were disputed between the parties/coalitions [5]. Inference were made using a sample of size n = 1000, randomly selected among effective voters.
To select the sample, it was considered the votes and parties shown on Table 1, including blank/null/missing and using R free software [6]. The sampling method was a simple random sampling with no replacement.
The probability of each party obtaining a quantity of seats was estimated adopting a non-informative prior Dirichlet(1,1,. . .,1) and 1,000,000 Monte Carlo simulations. Table 2 presents the estimated probabilities (highlighting the real number of seats received by each party), the number of votes each party earned in the sample and the number of votes each party should earn in case of a perfect sample (a sample that describes the population perfectly).
A performance rate of 0.715 was obtained to the methodology from a sample of 1,000 voters, where 203 were blank/null votes (as if the sample had only 797 voters) and forecasting the right number of seats (the seat number with greatest probability is the same as the real result) for 15 of 19 parties, which is a good performance ( Table 2). By the results, the methodology seems efficient since the major part of greatest probabilities for each party were on the same number of seats as in the real election. Results of some parties diverged from the real, due to  * The estimated probabilities of each party/coalition earning more than 7 seats are zero, that's why the probabilities were omitted from the the fact that the samples were randomly selected and may not be a good representation of the population. An example is the coalition PSDC/PT do B, which was overrated on the selected sample. Nevertheless, the wrong predictions diverged from the real results by only one seat. In the perfect sample, the performance rate was 0.741, forecasting the right number of seats for 17 of 19 parties. Fig. 1 displays the performance rate of the methodology for different sizes of samples representing the population perfectly. Perfect samples, despite being unlikely on real situations, are the best way to evaluate the performance of the proposed methodology. Samples of size 0 to 2,000 were used, where on sample of size 0, it was assumed that the probability was uniformly distributed among the number of seats, resulting a performance rate of 0.042. As expected, the methodology becomes more efficient when the sample size increases.
Using data from the MLA elections of 2010, simulations were made to each Brazilian state, verifying the performance of the methodology for other states and electoral situations. Table 3 and Fig. 2 present the obtained values for each simulation. A simple random sample and a proportionally perfect sample of size 1,000 were used for each state.
Performance rates of perfect samples were superior to 60% and were superior to 50% in most cases of normal samples. One problem of the performance rate utilized is the devaluation of the result when the probability is greatly distributed among the seats of the party, even when the greatest probability corresponds the real result, because the rate only shows the proportion of the total probability that match with the real result of the election. Column "Right Predictions" from Table 3 shows the proportion of seats where the party's/coalition number of seats with the greatest probability was the same as the real election. It is possible to verify that even states with low performance rate present high right predictions scores. To perfect samples the proportion of right predictions shows the efficiency of the methodology, being in most cases superior to 90%. An interesting observation is that the performance rate shows a negative association with the number of parties (Pearson correlation = −0.592; p = 0.001), the number of seats (Pearson correlation = −0.775; p<0.001) and the number of votes (Pearson correlation = −0.642; p<0.001). These results were obtained considering the perfect sample and suggest that scenarios with large number of parties, large number of seats and/or large number of votes, need a larger sample size to get the same performance. Furthermore, we observed no significant correlation (p>0.05) between the right prediction index and the number of parties, number of seats and number of votes.
Minas Gerais state (MG) results are interesting because it had a performance rate of 0.441 and 33% of right predictions for the normal sample and, performance rate of 0.616 and 100% of right predictions for the perfect sample. It happened due to a bad sample that influenced the results. Table 4 shows the results from the normal sample and the ones from the perfect sample. Data from the MLA election, Brazil 2010. * "Right prediction" means the proportion of parties/coalitions which the number of seats with the greatest probability is the same as the real result. doi:10.1371/journal.pone.0116924.t003 In Mato Grosso do Sul state (MS) the performance rate of the normal sample was better than the perfect sample. It happened due to extra information the normal sample had because of a lower number of blank votes when compared with perfect sample (Table 5).
Different from what occurred to Minas Gerais state (MG), that also received less null votes on the normal sample, Mato Grosso do Sul state (MS) sample didn't overestimate or underestimate any party/coalition, it divided the remaining votes proportionally.

Election of Federal Chamber of Deputies
Results of each state to the elections for the Federal Chamber of Deputies in Brazil are presented in Table 6.   As expected, the performance rates and the proportion of right predictions to the election for the Federal Chamber of Deputies were better than the MLA elections. As previously mentioned, it happened because elections for the Federal Chamber of Deputies have fewer parties and seats than MLA elections. The election for the Federal Chamber of Deputies presented same situations as the MLA, like the performance rate of normal samples better than the perfect sample and the bad performance rate of normal samples due to bad samples. Fig. 3 compares the performance of normal samples with perfect samples by state.
Moreover, the performance rate shows a negative association with the number of seats (Pearson correlation = −0.618; p<0.001) and the number of votes (Pearson correlation = −0.547; p<0.003) in election for the Federal Chamber of Deputies. Differently from the MLA election, we observed no significant association between the number of parties and the performance rate (Pearson correlation = −0.301; p<0.127). Furthermore, we observed no significant correlation (p>0.05) between the right prediction index and the number of parties, number of seats and number of votes.

CONCLUSIONS
Polls for majoritarian voting system usually show estimates of the percentage of votes for each candidate. On proportional systems, estimates of the percentage of votes of each party/coalition do not allow to forecast the number of seats each party/coalition will receive. Thus, classical methods used in majoritarian elections cannot be applied on proportional elections. This paper presented a Bayesian inference on proportional elections considering the Brazilian system of seats distribution, answering the probability that a given party will have representation on the Chamber of Deputies. Results based on data from the Brazilian election for Members of the Legislative Assembly and Federal Chamber of Deputies in 2010 show that most part of the greatest probabilities of each party was concentrated on the number of seats that were equivalent to the real result. Deviations from the real result happened mostly due to the utilized sample, since it might not have been a good representation of the real population. This is spotted when compared to the perfect sample result that presented a good precision estimating the number of seats each party/coalition would receive, with more than 80% of right predictions in all results on both elections. In this context, the success of the inference depends on a sample that should be a good representation of the population.
The proposed methodology is conservative with the indecisive voters. By the partition property of Dirichlet distribution, the indecisive voters do not participate in the analysis. A sample of 1,000 voters of which 200 are indecisive is probabilistically equivalent to a sample of 800 voters with no indecisive voters. This is different than, for example, to distribute (uniformly or proportionally) the indecisive between the parties/coalitions.
The methodology proved to be consistent since it becomes more efficient when the sample size increases. However, states with lots of parties, voters or seats need larger sample size to get the same performance. A suggestion for a future work is a simulation study to define the ideal sample size to obtain, for example, a performance of 90% for all states.
This paper can encourage the use of a Bayesian methodology on proportional elections. To provide a simple, consistent and easily implementable methodology may shorten the distance between Bayesian inference and political researches.