Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Decomposition of the mean absolute error (MAE) into systematic and unsystematic components

  • Scott M. Robeson ,

    Roles Conceptualization, Formal analysis, Methodology, Project administration, Visualization, Writing – original draft, Writing – review & editing

    srobeson@indiana.edu

    Affiliation Department of Geography, Indiana University, Bloomington, Indiana, United States of America

  • Cort J. Willmott

    Roles Conceptualization, Methodology, Writing – review & editing

    Affiliation Department of Geography, University of Delaware, Newark, Delaware, United States of America

Abstract

When evaluating the performance of quantitative models, dimensioned errors often are characterized by sums-of-squares measures such as the mean squared error (MSE) or its square root, the root mean squared error (RMSE). In terms of quantifying average error, however, absolute-value-based measures such as the mean absolute error (MAE) are more interpretable than MSE or RMSE. Part of that historical preference for sums-of-squares measures is that they are mathematically amenable to decomposition and one can then form ratios, such as those based on separating MSE into its systematic and unsystematic components. Here, we develop and illustrate a decomposition of MAE into three useful submeasures: (1) bias error, (2) proportionality error, and (3) unsystematic error. This three-part decomposition of MAE is preferable to comparable decompositions of MSE because it provides more straightforward information on the nature of the model-error distribution. We illustrate the properties of our new three-part decomposition using a long-term reconstruction of streamflow for the Upper Colorado River.

Introduction

Across the sciences, model-estimation and -prediction errors are often summarized and analyzed using dimensioned [1] and dimensionless [2] measures. While dimensionless error measures have received considerable attention [35], dimensioned measures are better suited to summarizing the magnitude of model error in meaningful units. When given a set of model predictions (Pi, i = 1, 2, …, n), where each Pi corresponds to a reliable observation (Oi), the mean squared error (MSE) and the root mean squared error (RMSE): (1) (2) are routinely reported [6]. The mean absolute error (MAE): (3) is reported less often, even though it has a clearer interpretation than RMSE because MAE is the average error [7].

We and others have shown elsewhere that error statistics based on sums-of-squares have a number of issues that make them less interpretable than those based on absolute values [1, 4, 712]. This is especially the case when they are used as measures of average model error. An additional drawback to using MSE is that its squared dimensional units are difficult to interpret. As a result, MAE is the preferred measure of average model error. Even so, sums-of-squares measures continue to be assessed and reported, partially due to inertia, but also to their amenability to mathematical decomposition into additive variance-based measures. In the context of evaluating model error, this property was used by Willmott in 1981 [13] to decompose MSE into systematic (MSEs) and unsystematic (MSEu) components: (4) (5) such that (6)

For both (MSEs) and (MSEu), ordinary least-squares (OLS) regression of the model predictions on the observations typically is used to obtain (i.e., a linear fit of P on O).

As computed above, (MSEs) is typically interpreted as consistent over- and/or under-prediction of the observations by the model (i.e., the model has non-zero mean bias and/or the regression slope is not one). The unsystematic component provides an estimate of the model’s random error or scatter about the regression line. Forming the ratios of MSEs/MSE and MSEu/MSE gives estimates of the fraction of total error (as estimated by MSE) that is identified as systematic or unsystematic. This decomposition has served as a relatively insightful summary of model error (e.g., [14]) and has been used as a guide to model improvement because a model that has a large amount of systematic error usually can be respecified to reduce the consistent over- or under-prediction.

While decomposing MSE into its constituent components has been a useful approach, MSE is a flawed measure of average model error. Using MSE to identify systematic and unsystmatic components of error, therefore, can produce misleading summaries of the types of errors that various models contain. Even more importantly, models may be inappropriately adjusted to reduce systematic error that has been misidentified by the MSE-based approach, e.g., when the impacts of outliers are overemphasized. As a result, our goal here is to develop and present a more rational approach for error decomposition that uses MAE as the baseline for average model error.

Decomposition of MAE into three components

Although our goal is to partition MAE into components that represent systematic and unsystematic error patterns, we also want to move beyond the traditional two-part decomposition to further divide systematic errors into two separate components. One can be used to indicate the amount of bias in a model and the other to represent the extent to which the model predictions systematically under- or over-estimate observations falling below and above the observed mean (the regression slope is not one). The latter is referred to as proportionality error and is distinct from model bias represented in the under- or overestimation of the observed mean.

For each of these three types of error—bias, proportionality, and unsystematic—we develop a weighting function that can be used to partition MAE into its three components. We offer a diagram (Fig 1) using a small, synthetic dataset (Table 1) to illustrate the estimation of the three components.

thumbnail
Fig 1. Representations of model-prediction errors showing aspects of the decomposition into systematic and unsystematic components (using data from Table 1).

Predictions (in the upper left panel) can be decomposed in the traditional way using MSE, as shown in the upper-right panel where the lengths of the red and blue dotted vertical lines determine the partitioning of the errors. The bottom-left panel shows the predictions after bias is removed (i.e., ), while the bottom-right panel shows the magnitudes of the bias, proportionality, and unsystematic components of our three-part, weight-based decomposition of MAE.

https://doi.org/10.1371/journal.pone.0279774.g001

thumbnail
Table 1. Data and error components for the example in Fig 1.

For the components whose sum (Σ) is given in the far right column, dividing by 6 gives the mean value (i.e., , , MSE, MSEs, (MSEu), and MAE). For the other three rows that have absolute values and are used to form the b, pi, and ui weights, which then determine , and (see Eqs 1416), the sums are not relevant and, therefore, are not given.

https://doi.org/10.1371/journal.pone.0279774.t001

Bias error

Here, we define bias as the component of systematic error that is contained in the over- or under-prediction of the observed mean. This is often referred to as the mean bias error (MBE): (7)

In addition to indicating average over- or under-prediction, MBE can be used to develop a corresponding (to Pi) set of unbiased predicted values: (8)

The magnitude (absolute value) of MBE can additionally serve as the weight that determines the relative importance of bias to the overall MAE: (9)

The magnitude of bias for our example dataset (Table 1) can be seen in the bottom left panel in Fig 1, where the model predictions systematically underestimate the mean of the observations by 1.

Proportionality error

In addition to bias error (caused by an incorrect estimation of the observed mean), there is another systematic error related to consistent under- or over-prediction. This error is reflected in the slope of the regression estimate of on Oi (note that if the regressions are estimated using OLS, then this slope estimate is the same as that of P on O). If the slope of this relationship is anything other than unity, there is proportionality error in the model predictions. A slope of less than one indicates that the model systematically overestimates values below and underestimates those above. Conversely, a slope greater than one indicates that the model systematically underestimates values below and overestimates those above. To estimate proportionality error, we use the unbiased predicted regression values: (10)

Given that the OLS solution for is constrained to pass through , passes through and, therefore, is unbiased. Weights for the relative importance of proportionality error (for each Oi) are determined using the difference between the unbiased predictions and the observations (the red lines in Fig 1d): (11)

Unsystematic error

After accounting for bias and proportionality errors, the remaining error is related to scatter about . Analogous to the way that the individual components of MSEu are formed, weights for the relative importance of each prediction’s unsystematic error are determined using the difference between the unbiased predictions and the unbiased regression values: (12)

Once again, if OLS regression is used for , then the biased predictions and regression values produce the same weights: (13)

Three-component decomposition of MAE

The three weights for bias, proportionality, and unsystematic error developed above now can be used to scale the individual components of absolute error: (14) (15) (16)

A clear advantage of this weight-based decomposition of average error is that it uses MAE rather than MSE as the baseline. Another advantage is that predictions that have no error do not contribute to the components. This was not the case with the MSE-based decomposition, where predictions that have no error can substantially influence the values of MSEs and MSEu (e.g., the point that lies directly on the 1:1 line in the upper right panel of Fig 1 contributes substantially to both MSEs and MSEu despite being error-free).

It is possible for the denominator within these summations (i.e., b + si + ui) to be zero, but that can only occur when a model has no bias and the regression line passes through a predicted value that has no error (i.e., when b = 0 and ). If that rare model-prediction event occurs, those elements with b + pi + ui = 0 can simply be excluded from the summation.

Given the definitions in Eqs 1416, MAEb, MAEp, and MAEu sum to MAE: (17)

As with MSEs and MSEu, it is instructive to form ratios (i.e., MAEb/MAE, MAEp/MAE, and MAEu/MAE) to identify the proportion of total error contributed by each component. The constraints within the weighted decomposition of MAE diminish MAEb relative to the magnitude of MBE. MBE, therefore, remains a useful metric to be reported when analyzing model error. R and Matlab functions for these calculations are provided in the S1 File.

An example of model-estimation errors

To illustrate the properties of our newly derived measures of model error, we use a tree-ring based reconstruction model developed by Meko et al. [15]. This reconstruction provides over 1200 years of annually resolved flow predictions for the Upper Colorado River at Lee’s Ferry. The large majority of the annual flow in the Colorado River comes from upstream of Lee’s Ferry [16], so the reconstruction is an essential indicator of historical water availability for the Colorado River. The observed and reconstructed flow are water-year totals for a large river and, therefore, are reported in billions of cubic meters per year. Both the observed data and reconstructed values are based on estimates of “naturalized” streamflow, which corrects for the anthropogenic alterations of flow (i.e., reservoirs, irrigation, etc.). In a recent article [14], the model was bias-corrected so that its empirical probability distribution better matched that of the observations (e.g., compare Fig 2a and 2b). Here, we employ our new decomposition of model errors to compare the three sources of error in the original and bias-corrected reconstruction.

thumbnail
Fig 2. Model-estimation errors before and after bias correction.

Model-estimation errors for (a) the reconstruction of annual Upper Colorado River flow (in billions of cubic meters) from [15] and (b) the same reconstruction after applying the bias-correction procedure of [14].

https://doi.org/10.1371/journal.pone.0279774.g002

Prior to bias correction (Fig 2a), the Upper Colorado River reconstruction has low overall error, with a MAE of 2.12 billion m3 (i.e., when compared to the of 18.53 billion m3). The small value of MAEb (0.08 billion m3) also shows that the reconstruction model faithfully reproduces the observed mean. From the scatterplot and the substantial amount (34%) of error in MAEp, however, it is clear that high flow years are underestimated (and, to a lesser extent, low flows are overestimated). Even with these substantial proportionality errors, the majority (62%) of the mean absolute error is in MAEu), which is desirable (i.e., the majority of error is unsystematic). At the same time, the traditional decomposition into MSEs and MSEu masks the distinction between bias and proportionality error while also providing an underestimate of these combined systematic errors because it is inflating its representation of the unsystematic error (MSEu) by squaring the model-predicted deviations from the regression line. As a result, the MSE-based measures suggest that there is little room for improvement when there is.

Bias correction (Fig 2b) produces a reconstruction model that has similar MAE (2.05 billion m3) to the original model. But, bias correction has produced much lower error in the two systematic terms of MAE, reducing MAEb to 3% and MAEp to 19% of MAE. From the slope of the regression line, however, it is clear that there still is some proportionality error that the bias-correction procedure has not entirely removed. The MSE-based measures present a rosier picture of the reduction of systematic error, again due to the inflation of the unsystematic error produced by the squaring of the deviations around the regression line. Overall, the MAE-based approach shows that there is still room for additional improvement in the original reconstruction (Fig 2a) and in the bias correction procedure (Fig 2b) than is evident in the MSE-based measures. In particular, the additional systematic component introduced here, MAEp, suggests that high flows still need to be adjusted upward.

Conclusions

Traditional decomposition of sums-of-squared errors into systematic (MSEs) and unsystematic (MSEu) components has been a popular approach for characterizing the different components of model error. These sums-of-squares-based measures, however, have been shown to be imprecise and, at times, deceptive indicators of average error and its constituents. As a result, evaluations of model estimates and predictions should increasingly use absolute-value-based error measures such as MAE. To fill the need for a decomposition of MAE into its constituent components, we present new measures that are formed as weighted averages of the absolute error. As a result, MAE can now be decomposed into three components that represent bias (MAEb), proportionality (MAEp), and unsystematic (MSEu) errors. These measures provide a more intrepretable standard for evaluating model errors while also pointing to more specific types of error that may be reduced.

Acknowledgments

The authors appreciate the informative and constructive comments of the two reviewers.

References

  1. 1. Willmott CJ, Robeson SM, Matsuura K. Climate and other models may be more accurate than reported. EOS. 2017 98.
  2. 2. Willmott CJ, Robeson SM, Matsuura K., Ficklin DL. Assessment of three dimensionless measures of model performance. Environ Mod Softw. 2015 73: 167–174.
  3. 3. Nash JE, Sutcliffe JV. River flow forecasting through conceptual models part I—A discussion of principles. J Hydrol. 1970 10: 282–290.
  4. 4. Legates DR, McCabe GJ Jr. Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation. Wat Resour Res. 1999 35: 233–241.
  5. 5. Willmott CJ, Robeson SM, Matsuura K. A refined index of model performance. Intl J Climatol. 2012 32: 2088–2094.
  6. 6. Jackson EK, Roberts W, Nelsen B, Williams GP, Nelson EJ, Ames Dp. Introductory overview: Error metrics for hydrologic modelling–A review of common practices and an open source library to facilitate use and adoption. Environ Mod Softw. 2019 119: 32–48.
  7. 7. Willmott CJ, Matsuura K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim Res. 2005 30: 79–82.
  8. 8. Gao J. Bias-variance decomposition of absolute errors for diagnosing regression models of continuous data. Patterns. 2021 2: 100309. pmid:34430928
  9. 9. Mielke PW, Berry K. Permutation methods: a distance function approach, Springer; 2007.
  10. 10. Pontius RG Jr, Thontteh O, Chen H. Components of information for multiple resolution comparison between maps that share a real variable. Environ Ecol Stat. 2008 15: 111–142.
  11. 11. Willmott CJ, Matsuura K, Robeson SM. Ambiguities inherent in sums-of-squares-based error statistics. Atmos Environ. 2009 43: 749–752.
  12. 12. Pontius Jr RG. Metrics That Make a Difference: How to Analyze Change and Error. Advances in Geographic Information Science. Springer Nature; 2022. https://doi.org/10.1007/978-3-030-70765-1
  13. 13. Willmott CJ. On the validation of models Phys Geog. 1981 2(2):184–94.
  14. 14. Robeson SM, Maxwell JT, Ficklin DL. Bias correction of paleoclimatic reconstructions: A new look at 1,200+ years of Upper Colorado River flow. Geophys Res Lett. 2020 e2019GL086689.
  15. 15. Meko DM, Woodhouse CA, Baisan CA, Knight T, Lukas JJ, Hughes MK, Salzer MW. Medieval drought in the upper Colorado River Basin. Geophys Res Lett. 2007 L10705.
  16. 16. Christensen NS, Lettenmaier DP. A multimodel ensemble approach to assessment of climate change impacts on the hydrology and water resources of the Colorado River Basin, Hydrol Earth Sys Sci. 2007 11: 1417–1434.