The price of a vote: Diseconomy in proportional elections

The increasing cost of electoral campaigns raises the need for effective campaign planning and a precise understanding of the return of such investment. Interestingly, despite the strong impact of elections on our daily lives, how this investment is translated into votes is still unknown. By performing data analysis and modeling, we show that top candidates spend more money per vote than the less successful and poorer candidates, a relation that discloses a diseconomy of scale. We demonstrate that such electoral diseconomy arises from the competition between candidates due to inefficient campaign expenditure. Our approach succeeds in two important tests. First, it reveals that the statistical pattern in the vote distribution of candidates can be explained in terms of the independently conceived, but similarly skewed distribution of money campaign. Second, using a heuristic argument, we are able to explain the observed turnout percentage for a given election of approximately 63% in average. This result is in good agreement with the average turnout rate obtained from real data. Due to its generality, we expect that our approach can be applied to a wide range of problems concerning the adoption process in marketing campaigns.


I. THE DATA A. Data Description
In the main text we investigate the effect of the investment of candidates on campaign thanks to the available data containing the total donation received by and the expenses of each candidate. We analyze Brazilian elections for two different kinds of legislators, more specifically, the federal and state deputies. Their function is to legislate in the unicameral system of each Brazilian state. The federal deputies are representatives in the chamber of deputies of the national Congress. They are also elected for a four year term by a proportional system. The number of elected federal deputies is proportional to the population of each one of the 26 states. The data is available at the website of the Brazilian Federal Electoral Court [S1]. By force of law, each candidate must provide a detailed description of his/her campaign expenditure with specific informations such as the value, date and type of expense. All this information can be accessed by the public, however in order to know the total cost of the campaign and the number of votes of each candidate, it is necessary to process the database computationally. In Tables I and II, we show a detailed description of the data for each state. State deputies are local representatives elected for a four year term by a proportional system.

B. Results for all States
Here we summarize the results of our model for the election in 2014 of state and federal deputies in each Brazilian state. Fig S1 shows the data obtained for state deputies election and Fig S2 shows data for the federal deputies election.

II. ANALYTICAL SOLUTION
A. Calculation of the expected turnout rate T Following from Eq (1) in the main text and summing over i, we can find a differential equation for the decided number of voters S, which reads where n is the total number of voters, and r(t) = i [m i (t) > 0] is the number of candidates who still have money at instant t, which depends solely on the distribution of money. After integrating Eq (S1), we find that This equation enables us to compute the expected turnout rate T of the election as a function of the average price of a vote ∆m, the total money M , and n. To compute T , it is necessary to take the limit t → ∞, first. At this limit, we are able to compute the value where S saturates. Then, we can define T as In order to compute the integral in Eq (S2) at this limit, we recall from the main text that dt = −dm/∆m. Then, the integral becomes where N c is the total number of candidates. After commutating the summation with the integral, and integrating the Iverson's bracket over m , we find that which leads to T = 1 − e −M/(n∆m) .
(S6) Fig S4A shows the turnout rate T as a function of ∆m computed from Eq (S6) for the model with competition, and for the model without competition (T linear ). The number of votes (or money) lost by competition can be evaluated by looking at the difference between T and T linear . We see that there is a maximum loss when ∆m = M/n. B. Calculation of the expected number of votes v By integrating Eq (1) from the main text and performing a change of variables, we find that v i can be written as a function of m i as Using Eq (S2), we can rewrite the above equation as To find an analytical expression for v, we first decompose the external integral as that compared with Eq (S8) can be rewritten as (S10) The result of this integral relies on the limits of the external integral. Using the definition of r(m) for the external interval m ∈ [m i−1 , m i ], we find that By solving the integrals, we finally find that the number of votes v i is given by As we can see from Eq (S12), the number of votes v i of a candidate i is not only a function of his budget m i , but also depends on the whole distribution P (m). In Fig S4B we show how v(m) changes with ∆m. As ∆m decreases, a large fraction of the voters become decided (i.e., T → 1), and v(m) displays a saturation for larges values of m resulting on the diseconomy of scale due to the competition between candidates.

III. STATISTICAL COMPARISON OF MODELS
In order to compare our model with the simple case without competition, we make use of the Akaike's Information Criterion (AIC) [S2]. The AIC is a model selection method that uses information theory to compare the relative estimation of the information lost by mathematical models used to generate data. Here, we used AIC to measure the relative quality of our model when compared with the linear non-competitive model. Suppose that we have a model with P parameters that fits a data set with N points. Then, the AIC is defined as where RSS is the residual sum of squares given by Here, x i is the i th value of the variable to be predicted and the X i is the predicted value of x i . We calculate the AIC for each model using Eq (S13). Then, by Akaike's criterion, the preferred model is the one with the minimum AIC value. Here, we label the model without competition as WOC and the more complex model, where there is competition, as WC. The difference in AIC is then defined as ∆AIC = AIC WC − AIC WOC . Once this difference is computed we calculate the probability that model WC minimize the information loss: Therefore, the probability that model WOC minimizes the information loss is P WOC = 1 − P WC . Here, we define the ratio between P WC and P WOC as the evidence ratio, which means how many times the model WC is more likely to minimize the information loss. We then performed this analysis for federal and state deputies for the 2014 elections in all 26 Brazilian states. The model WC and the model WOC are compared to the logarithm of the data (Tables III  and IV), and to the data without applying the logarithm (Tables V, and VI). The AIC shows that the model with competition best explains the data when compared to the linear model in all studied cases.

IV. SIMULATION ON A COMPLEX NETWORK
In order to solve analytically the model, we make use of a mean field approximation where the network is a fully connected graph. To see if our solution still holds for a more complex topology, we performed simulations using the Erdös-Rényi network model with three different values for the average degree: k = 2, 6 and 10. As we can see in Fig S3A and B, for federal and state deputies, respectively, we find a good agreement between the analytical solution (black line) and the real data (grey circles) for k = 6 and 10. Due to computational performance, we chose the state of Espírito Santo to perform the simulations. First, we made use of the candidates' budget for the 2014 election as an input for the distribution of money P (m). The network size is taken from the number of registered voters in Espírito Santo, N = 2653536, as presented in Table 1 and 2. Each candidate starts the simulation with only one node as a decided voter. This node is the initial seed for the candidate's marketing campaing. The overall underestimation of the number of votes for k = 2 can be understood by noting that an important fraction of the network is made of unconnected nodes, therefore, for the candidates with seeds in the largest cluster the network seems to be smaller.

V. FREQUENCY DISTRIBUTION OF VOTES
Here, we show the comparison between the empirical votes distribution for the states of Rio de Janeiro ( Fig S5A) and Minas Gerais (Fig S5B) with the one obtained by our model. Again, the model reproduces correctly the empirical distribution of votes among candidates, P (v).

VI. STUDY OF THE DISPERSION
Our model allow us to calculate the mean or expected value of the number of votes. However, to fully describe the election we have also to study the statistical dispersion, which is given by the conditional probability distribution p(v|m). We can use the concept of maximum entropy probability distribution (MaxEnt) from information theory to guess which is the p(v|m) that maximizes the Shannon's Entropy [S3]. Imposing only a constraint for the mean v , the maximum entropy continuous distribution is exponential, which has the property that the mean and standard deviation are the same. We see in Fig S6A that our data show a close linear relationship with approximately unit slope σ ≈ v , which strongly indicates that the Eq (S16) accounts for all the random variation on v(m) with the expected value calculated by our model. In the inset of Fig 4F from the main text, we show these two elements in a simulation for the election of state deputy for the state of São Paulo, the greatest electoral college in Brazil. Fig S6B shows that the addition of random dispersion to our model leads to a remarkable resemblance with real election data.
[S1] http://www.  TABLE III. Statistical comparison between the models. We use the Akaike's information criterion (AIC) to compare the two models: WOC (without competition) and WC (with competition). The AIC lets us determine which model is more likely to describe correctly the data and quantify by calculating the probabilities and an evidence radio. The probability column shows the likelihood of each model to be the most correctly. The evidence radio is the fraction of Probability WC by Probability WOC, which means how many times model WC is likely to be correct than model WOC. Here, the AIC was applied in the logarithm of the data. Statistical comparison between the models. We used the Akaike's information criterion (AIC) to compare the two models: WOC (without competition) and WC (with competition). The AIC lets us determine which model is more likely to describe correctly the data and quantify by calculating the probabilities and an evidence radio. The probability column shows the likelihood of each model to be the most correctly. The evidence radio is the fraction of Probability WC by Probability WOC, which means how many times model WC is likely to be correct than model WOC. Here, the AIC was applied in the logarithm of the data. TABLE V. Statistical comparison between the models. We used the Akaike's information criterion (AIC) to compare the two models: WOC (without competition) and WCB (with competition). The AIC lets us determine which model is more likely to describe correctly the data and quantify by calculating the probabilities and an evidence radio. The probability column shows the likelihood of each model to be the most correctly. The evidence radio is the fraction of Probability WC by Probability WOC, which means how many times model WC is likely to be correct than model WOC. Statistical comparison between the models. We used the Akaike's information criterion (AIC) to compare the two models: WC (without competition) and WOC (with competition). The AIC lets us determine which model is more likely to describe correctly the data and quantify by calculating the probabilities and an evidence radio. The probability column shows the likelihood of each model to be the most correctly. The evidence radio is the fraction of Probability WC by Probability WOC, which means how many times model WC is likely to be correct than model WOC.  Here, each gray circle represents the data for one candidate. We used three different values of average connectivity: k = 2 (black diamonds), k = 6 (blue squares) and k = 10 (red circles). Each symbol is the result of a logarithmic binning for the money (m) axis over the simulation. We see that as we increase the average network degree, the simulation presents better agreement with the analytical solution. However, the analytical solution seems to capture the overall behavior for all networks tested. The apparent disagreement for k = 2 is a consequence of a smaller effective size of the network, since an important fraction of nodes are not connected with the largest cluster.
FIG . S4. Dependence with ∆m. The solution of the mean field model enables us to calculate the turnout radio T as a function of the dimensionless n∆m/M parameter. In (A) we compare the turnout for the linear case where we excluded the competition between the candidates, T linear , with the case with competition, T . The competition creates an exponential saturation, which increases the waste of money when candidates seek new voters. By looking at the difference T linear − T , we can see that this inefficiency is maximum when n∆m/M = 1.0. In (B) we show that as we decrease ∆m the values of v(m) usually increases, as expected by the definition of ∆m. However, there is a point where a saturation appears as the total number of votes starts to get close to the size of the system, resulting on a diseconomy of scale due to the competition between candidates.   S6. Test of statistical dispersion. It is widely known that the exponential distribution have the property that its mean and standard deviation are equal. Therefore we use this property in order to test if the dispersion along the mean follows an exponential distribution, as predicted by the MaxEnt hypothesis. In (a) we see that for state deputies of the eight largest states in 2014 election the data is in close agreement with σ = v (dashed line). (b) of votes calculate by our model to generate a random election. Here we show for the state of São Paulo that when we add random noise to our model (squares), we obtain a cloud that closely resembles the actual data (circles).