The authors have declared that no competing interests exist.
Do conservative econometric models that comply with the Golden Rule of Forecasting provide more accurate forecasts?
To test the effects of forecast accuracy, we applied three evidencebased guidelines to 19 published regression models used for forecasting 154 elections in Australia, Canada, Italy, Japan, Netherlands, Portugal, Spain, Turkey, U.K., and the U.S. The guidelines direct forecasters using causal models to be conservative to account for uncertainty by (I) modifying effect estimates to reflect uncertainty either by damping coefficients towards no effect or equalizing coefficients, (II) combining forecasts from diverse models, and (III) incorporating more knowledge by including more variables with known important effects.
Modifying the econometric models to make them more conservative reduced forecast errors compared to forecasts from the original models: (I) Damping coefficients by 10% reduced error by 2% on average, although further damping generally harmed accuracy; modifying coefficients by equalizing coefficients consistently reduced errors with average error reductions between 2% and 8% depending on the level of equalizing. Averaging the original regression model forecast with an equalweights model forecast reduced error by 7%. (II) Combining forecasts from two Australian models and from eight U.S. models reduced error by 14% and 36%, respectively. (III) Using more knowledge by including all six unique variables from the Australian models and all 24 unique variables from the U.S. models in equalweight “knowledge models” reduced error by 10% and 43%, respectively.
This paper provides the first test of applying guidelines for conservative forecasting to established election forecasting models.
Election forecasters can substantially improve the accuracy of forecasts from econometric models by following simple guidelines for conservative forecasting. Decisionmakers can make better decisions when they are provided with models that are more realistic and forecasts that are more accurate.
The evidencebased forecasting principle known as the Golden Rule of Forecasting advises forecasters to adhere closely to cumulative prior knowledge about the situation. We test whether following this principle of conservatism can help to
This paper tests the effect of following conservative guidelines on the accuracy of forecasts from published models originally estimated using multiple regression analysis. In particular, we tested three of the guidelines on 19 regression models used to forecast vote shares in 154 elections in ten countries.
The development of causal models for forecasting voting in elections has become an important subdiscipline of political science. As of September 2018, about 2,000 results were identified by a Google Scholar search for the two terms “election forecasting” and “model.” Evidence on the models’ predictive validity should be of interest to researchers whose theories of voting behavior are represented by the models, and to decisionmakers whose plans vary depending on their expectations of who will win an election.
Causal theories to which the modelers ascribe identify influences on voting behavior; election forecasting models include variables that represent these influences. Most election forecasting models represent the theory of retrospective voting, which views an election as a referendum on the incumbent government’s performance, often based on the country’s economic performance. Thus, retrospective voting theory assumes that voters reward the incumbent party for good performance and punish it otherwise. Causal models typically represent this theory by using changes in one or more macroeconomic variables—such as GDP, unemployment, or prices—to measure performance. The models often include popularity pollbased variables as proxies for voters’ satisfaction with the government’s handling of both economic and noneconomic issues.
Many of the models include variables that represent aspects of the country’s electoral system affecting voting behavior or historical patterns of voting behavior. For example, the time the incumbent party has held power can be used to allow for the observation that, historically, leaders have often enjoyed a “honeymoon” period of popularity following their first election, with the effect fading through a leader’s tenure as the electorate’s desire for change increases.
In the U.S., political economy models have been established in presidential election seasons since the late 1970s [
The dominant method for estimating political economy models is multiple regression analysis. Multiple regression analysis estimates variable weights that provide the leastsquarederror fit to a given sample of data. The resulting variable weights are then applied to new values of the causal variables to make forecasts.
We used three criteria for including a model in our analysis. The model (1) was estimated with multiple linear ordinary least squares (OLS) regression analysis, (2) predicted national election results, and (3) was published in an academic journal. However, the forecasters of some models did not publish their data and did not respond to, or declined, our request for their data; these models were excluded from analysis.
Nineteen models from ten countries met our criteria. While those models are not exhaustive of the election forecasting literature, we believe that they do provide a representative sample of the models that have been developed for different countries.
Number of  

Country / Election / Model  Dependent variable  Elections  Variables 
Cameron & Crosby [ 
Incumbent vote  40  5 
Jackman [ 
Incumbent vote  22  3 
Bélanger & Godbout [ 
Incumbent vote  19  4 
Nadeau & Blais [ 
Liberal vote  13  4 
Bellucci [ 
Incumbent vote  9  3 
LewisBeck & Tien [ 
LDP (percent seats)  17  3 
Dassonneville, LewisBeck & Mongrain [ 
Incumbent vote  20  3 
Magalhães & AguiarConraria [ 
Incumbent vote  11  3 
Magalhães, AguiarConraria & LewisBeck [ 
Liberal vote  14  4 
Toros [ 
Incumbent vote change  11  3 
LewisBeck, Nadeau & Bélanger [ 
Incumbent vote  12  3 
Fair [ 
Incumbent vote  25  7 
Cuzán [ 
Incumbent vote  25  5 
Abramowitz [ 
Incumbent vote  17  3 
Campbell [ 
Incumbent vote  17  2 
LewisBeck & Tien [ 
Incumbent vote  16  4 
Holbrook [ 
Incumbent vote  16  3 
Erikson & Wlezien [ 
Incumbent vote  16  2 
Lockerbie [ 
Incumbent vote  15  2 
Given the attention that election forecasting attracts in the U.S., models for forecasting U.S. presidential elections form the largest group; a total of eight models. Australian and Canadian general elections have two models each, while there is only one model each for Italy, Japan, the Netherlands, Portugal, Spain, Turkey and the U.K.
In general, the models can be written as:
When estimating variable weights, multiple regression analysis cannot account for uncertainty arising from sources including biases in the data, use of proxy variables, omission of important variables, inclusion of irrelevant variables, lack of variation in variable values in the estimation sample, and error in predicting or controlling causal variables in the future. As a result, multiple regression models are insufficiently conservative for forecasting as they tend to
The Golden Rule of Forecasting provides four conservative guidelines for causal models [
Regression reduces the estimated effect of a variable in response to unexplained variation in the estimation data. It does not, however, compensate for all sources of uncertainty. Damping and equalizing causal variable coefficient estimates are conservative strategies that can be used to compensate for some of the residual uncertainty.
Damping refers to the general idea of reducing the size of an estimated effect toward having no effect. Damping has been used with extrapolation models by reducing the magnitude of an estimated trend resulting in reductions in forecast errors of about 12% [
Moreover, damping is a conditional guideline. It is not expected to work if the estimated coefficient is lower than what one would expect based on prior knowledge. If, on the other hand, the forecaster is uncertain over whether future causal variables values will be more extreme than those in the estimation data, the case for damping would seem stronger.
Unlike extrapolation, Armstrong, Green and Graefe were unable to find evidence on whether damping regression coefficients towards no effect improves the accuracy of
Damping coefficients is not a new idea. For example, an early study tested “ridge regression”—a sophisticated approach to damping—using simulated data. Ridge regression model forecasts were more accurate than OLS model forecasts, which in turn were more accurate than equalweights model forecasts [
A simple strategy for damping is to multiply the estimated weights with a factor
The factor
Equalizing is useful if there is uncertainty about the relative importance of the causal variables; the greater the uncertainty, the more one should adjust the coefficients towards equality. When relative effect sizes are highly uncertain, one should consider the most extreme case of equalizing and assign equalweights to all variables expressed as differences from their mean divided by their standard deviation (i.e., standardized).
To equalize, standardize the variables, estimate the model using multiple regression analysis, and adjust the estimated coefficients toward equality. The adjusted vote equation can be written as:
One review looked at comparative studies on equalweights published since the 1970s in a variety of areas, and concluded that equalweights models often provide
For election forecasting, one study found that equalweights versions of two published regression models provided outofsample election forecasts that were at least as accurate as those from the original regression models [
Hundreds of studies have shown that combining forecasts that incorporate diverse data and information is an effective method for using additional knowledge and to thereby improve forecast accuracy [
Reviews of studies on combining forecasts conclude that simple unweighted averages provide the most accurate forecasts, except in rare situations where strong evidence suggests that some models consistently provide more accurate forecasts than others [
Include all known important variables in a model. The guideline is difficult to implement with multiple regression analyses because the practical limit of the method is a handful of variables at best [
One way to avoid the practical limits that regression places on the number of variables in a model is to use
The major advantage of this approach is that variables are included on the basis of prior knowledge about their importance (i.e., substantive effect) and direction, and not on the basis of a given set of data alone. Consequently, one does not need to estimate a coefficient for each variable from the data and the number of variables that can be included in a model is unlimited.
Franklin suggested differential weighting of variables. Forecasters, however, often lack adequate prior knowledge about the relative importance of important variables. Given the evidence on the relative accuracy of equal and regression weights outlined above, equal variable weights are a reasonable starting point for causal models. As the number of variables in a model increases, the magnitudes of individual variable effects become less important for predictive validity, as an early paper showed mathematically [
Franklin’s approach was intended for rating alternatives, but when the dependent variable is a scalar and data are available, the scores for alternatives can be used as the independent variable in single regression analysis. One study tested that approach by assigning equalweights to all 27 (standardized) variables that were included in nine established models for forecasting U.S. presidential elections. The resulting model was used to generate
The present study uses a similar approach and sums the standardized values of all variables that are used in different models that predict the same target variable in order to calculate an index variable. The resulting vote equation is:
All data and calculations are available at the Harvard Dataserve:
For each of the 19 models, we standardized the original data and transformed variables to ensure that all predictor variables correlated positively with the dependent variable. Standardization of variable values was performed by calculating the differences from their mean and dividing by their standard deviation. Transformation for variables that are correlated negatively with the dependent variable was done by multiplying the variable values by 1.
We analyzed the accuracy of forecasts across all observations available for each model. All forecasts were outofsample using an N1 crossvalidation procedure, an approach that is also known as jackknifing. In other words, to forecast an election outcome we estimated models using the data on all other elections in the data set. This method allows for a powerful test of predictive validity because it maximizes both the size of the estimation sample and the number of outofsample forecasts.
All data and calculations are based on the models’ specifications published in the respective journal publications. Often, however, these versions were different from the original specifications that were used to predict a particular election. For example, Ray Fair changed his model equation in 1992, and kept it constant since [
In sum, N1 crossvalidation favors regression analysis in producing forecasts that use more information than one would have had available at the time of making the prediction. Hence, any accuracy gains from applying the conservative guidelines obtained in the present study should be regarded as a lower boundary.
We report the relative absolute error (RAE) of the forecasts that result from the application of each guideline [
Across all 19 models, only damping of 20% or less reduced errors for most models and on average, and the error reductions were small. For example, damping model coefficients by 10% reduced error for 14 of the 19 models (74%), with an average error reduction of 2% (= 1–0.98). Heavier damping than 20% harmed accuracy.
Level of equalizing / damping (%)  Damping  Equalizing  

Mean RAE  % Mean RAEs < 1  Mean RAE  % Mean RAEs < 1  
10  0.98  74  0.97  100 
20  0.99  63  0.96  95 
30  1.02  47  0.95  89 
40  1.07  37  0.94  89 
50  1.15  32  0.93  89 
60  1.23  32  0.92  89 
70  1.32  32  0.92  89 
80  1.41  26  0.93  89 
90  1.52  26  0.93  84 
100  1.62  26  0.94  79 
All levels of equalizing reduced forecast error on average. Error reductions ranged from 3% to 8%. Moreover, equalizing reduced the errors of forecasts from at least 15 of the 19 models for all levels of equalizing. The most extreme equalizing—in which all predictor variables are assigned equalweights in the models—provided forecasts with a mean RAE of 0.94. In other words, equalweights models reduced forecast error compared to forecasts from the original models by, on average, 6%.
Error reductions were maximized, more or less, with equalizing of 50% and, both mean RAEs and the percentage of models with RAEs of less than one improving little and then deteriorating with more equalizing. In sum, the results suggest that, by providing an efficient tradeoff between average error reduction (RAE) and the chance of error reduction (% Mean RAEs < 1), 50% equalizing is a sensible compromise. Moreover, this 50–50 rule is easy to understand and easy to apply: simply average the forecast from the original regression model and the forecast from an equalweights version of the model.
The benefits of combining forecasts can be tested for elections for which (a) more than one model is available and (b) the models predict the same dependent variable. This was the case for the eight models that forecast U.S. presidential elections and the two models that forecast Australian general elections. (Note that although two models were available for predicting Canadian federal elections, those models predict a different outcome—incumbent party vote for one, and Liberal party vote for the other—and thus their forecasts could not be combined.)
MAE 
RAE 




Cameron & Crosby [ 
2.68  0.84 
Jackman [ 
2.54  0.89 




Abramowitz [ 
1.76  0.84 
Campbell [ 
1.99  0.74 
Cuzán [ 
2.07  0.72 
Erikson & Wlezien [ 
2.54  0.58 
Fair [ 
2.49  0.60 
Holbrook [ 
2.55  0.58 
LewisBeck & Tien [ 
2.29  0.65 
Lockerbie [ 
2.73  0.54 

For Australian elections, model forecasts were combined across the 22 elections from 1951 to 2004 for which forecasts from both models were available. The MAE of the combined forecast was 2.26 percentage points, which was more accurate than the forecasts from both of the individual models. Compared to the average model forecast (with an error of 2.61 percentage points), combining reduced error by 14%.
For U.S. elections, model forecasts were combined across the 15 elections from 1956 to 2012 for which forecasts from all eight models were available. The MAE of the combined forecast was 1.48 percentage points and was thus smaller than the average errors of each of the eight individual models, which ranged from 1.76 to 2.73 percentage points. Compared to the error of the typical model, which was 2.30 percentage points, combining reduced error by 36% (
Compared to the error of forecasts from Abramowitz’s model, the RAE of the combined forecast was 0.84, which means that forecast combining reduced error by 16% compared to the single model that performed best in retrospect. Thus, even if one knew what would be the best model, it was better to use the combined forecast.
Similar to the tests of combining forecasts, the benefits from using more important variables in one model could be tested only for U.S. and Australian elections. While the conservative guideline is to include
MAE 
RAE 


Cameron & Crosby [ 
2.68  0.88 
Jackman [ 
2.54  0.92 


Abramowitz [ 
1.76  0.75 
Campbell [ 
1.99  0.66 
Cuzán [ 
2.07  0.64 
Erikson & Wlezien [ 
2.54  0.52 
Fair [ 
2.49  0.53 
Holbrook [ 
2.55  0.52 
LewisBeck & Tien [ 
2.29  0.58 
Lockerbie [ 
2.73  0.48 

In the U.S. case, the allvariables model included 24 variables. While the total number of variables used in the eight models is 28, four variables were excluded: The models of Fair [
In this paper, we applied three conservative forecasting guidelines to 19 published regression models for forecasting election results. The guidelines were: (I) modify effect estimates to reflect uncertainty, (II) combine forecasts from dissimilar models, and (III) include all variables that are important in the model.
For the first guideline, we tested two approaches to modifying effect estimates to make them more conservative: damping and equalizing. Small levels of damping yielded 2%
Armstrong, Green and Graefe suggested that the “optimal approach most likely lies in between… statistically optimal and equal, and so averaging the forecasts from an equalweights model and a regression model is a sensible strategy” [
Applying the second guideline—combining forecasts—to eight U.S. models, and to two Australian models, produced forecasts that were more accurate than those from the individual model that provided the most accurate forecasts in each case. Compared to the typical individual model forecast, error was reduced by 36% in the U.S. case and 14% in the Australian case. The results are thus consistent with the average of 22% error reduction for five comparative studies from different areas—including forecasts of economic variables—that examined combining across dissimilar causal models [
The third guideline recommends an alternative approach to incorporating more information into a forecast: to use all important variables in the one “knowledge model”. As with combining, knowledge models provided forecasts that were more accurate than even the best individual model. Compared to the typical forecast, a knowledge model that assigned equalweights to all unique variables from the original published models reduced forecast error by 10% in the case of the sixvariable Australian model and 43% in the case of the 24variable U.S. model. As expected, including more variables that have an important causal relationship with the variable being forecast impoved forecast accuracy.
Our tests found that the strongest implementation of the conservative guidelines, in the form of knowledge models, provided the greatest improvement in
Implementing the conservative guidelines offers more than simply improved
The gains from combining forecasts and from using more of the important variables were achieved for election forecasting models that, for the most part, used similar variables. We expect that further gains in accuracy and model realism could be achieved by incorporating variables that measure other important effects on voting, such as candidates’ prior experience [
Many forecasters are wary of incorporating a large number of variables into a model, regarding parsimony as an important quality of a forecasting model [
The strict assumptions of regression analysis are seldom met in practice. As a consequence, the question of which method should be used for developing a forecasting model cannot be settled by asserting the superior statistical properties of an optimal regression model. Damping—for which the results were mixed—aside, the error reductions of between 3% and 43% found in the study reported in this paper support the contention that for practical forecasting problems, models developed by following conservative forecasting guidelines are likely to provide forecasts that are more accurate than those from the original econometric models.
Forecasters who value forecast accuracy should endeavor to include all important variables in a model. The variables should be assumed to be equally important in the absence of prior experimental evidence.
The gains in accuracy reported in this paper were achieved for election forecasting, a problem that involves little uncertainty and only modest complexity. Larger gains in forecast accuracy might be possible when the Golden Rule of Forecasting guidelines are applied to complex problems that involve much uncertainty. Such problems include forecasting election outcomes in more volatile political jurisdictions, but also lessstructured problems, such as forecasting the onset of political conflicts, the costs and benefits of government policies, and the longterm economic growth of nations. Further empirical studies on the value of applying the Golden Rule of Forecasting to such problems would help to assess the conditions under which the guidelines improve accuracy.
(DOCX)
(DOCX)
We thank Paul Goodwin, Randy Jones, and Keith Ord for helpful reviews. Amy Dai, Hester Green, and Lynn Selhat edited the paper. We also received helpful suggestions when presenting an early version of the paper at the 2014 APSA Annual Meeting in Washington, DC.
In producing this paper, we endeavored to conform with the Criteria for Science Checklist at GuidelinesforScience.com. At least one of the authors read each of the papers we cited. We were able to contact the authors of 20 of the 24 papers that we cite to ask if we had correctly represented their work. We received replies from the authors of 13 of those papers, which led to changes to our descriptions in two instances. Each of the references in this paper is linked to a fulltext version, thus making it easy to confirm that the description of findings in our paper agrees with that provided in the original version.