Statistical Basis for Predicting Technological Progress

Forecasting technological progress is of great interest to engineers, policy makers, and private investors. Several models have been proposed for predicting technological improvement, but how well do these models perform? An early hypothesis made by Theodore Wright in 1936 is that cost decreases as a power law of cumulative production. An alternative hypothesis is Moore's law, which can be generalized to say that technologies improve exponentially with time. Other alternatives were proposed by Goddard, Sinclair et al., and Nordhaus. These hypotheses have not previously been rigorously tested. Using a new database on the cost and production of 62 different technologies, which is the most expansive of its kind, we test the ability of six different postulated laws to predict future costs. Our approach involves hindcasting and developing a statistical model to rank the performance of the postulated laws. Wright's law produces the best forecasts, but Moore's law is not far behind. We discover a previously unobserved regularity that production tends to increase exponentially. A combination of an exponential decrease in cost and an exponential increase in production would make Moore's law and Wright's law indistinguishable, as originally pointed out by Sahal. We show for the first time that these regularities are observed in data to such a degree that the performance of these two laws is nearly the same. Our results show that technological progress is forecastable, with the square root of the logarithmic error growing linearly with the forecasting horizon at a typical rate of 2.5% per year. These results have implications for theories of technological change, and assessments of candidate technologies and policies for climate change mitigation.


Introduction
Innovation is by definition new and unexpected, and might therefore seem inherently unpredictable. But if there is a degree of predictability in technological innovation, understanding it could have profound implications. Such knowledge could result in better theories of economic growth, and enable more effective strategies for engineering design, public policy design, and private investment. In the area of climate change mitigation, the estimated cost of achieving a given greenhouse gas concentration stabilization target is highly sensitive to assumptions about future technological progress [1].
There are many hypotheses about technological progress, but are they any good? Which, if any, hypothesis provides good forecasts? In this paper, we present the first statistically rigorous comparison of competing proposals.
When we think about progress in technologies, the first product that comes to mind for many is a computer, or more generally, an information technology. The following quote by Bill Gates captures a commonly held view: ''Exponential improvementthat is rare -we've all been spoiled and deeply confused by the IT model'' [2]. But as we demonstrate here, information technologies are not special in terms of the functional form that describes their improvement over time. Information technologies show rapid rates of improvement, but many technologies show exponential improvement. In fact, all the technologies we study here behave roughly similarly: Information technologies closely follow patterns of improvement originally postulated by Wright for airplanes [3][4][5][6][7][8], and technologies such as beer production or offshore gas pipelines follow Moore's law [9,10], but with a slower rate of improvement [8,[11][12][13][14][15].
It is not possible to quantify the performance of a technology with a single number [16]. A computer, for example, is characterized by speed, storage capacity, size and cost, as well as other intangible characteristics such as aesthetics. One automobile may be faster, while another is less expensive. For this study, we focus on one common measure of performance: the inflation-adjusted cost of one ''unit''. This metric is suitable in that it can be used to describe many different technologies. However, the nature of a unit may change over time. For example, a transistor in a modern integrated circuit today may have quite different performance characteristics than its discrete counterpart in the past. Furthermore, the degree to which cost is emphasized over other performance measures may change with time [17]. We nonetheless use the changes in the unit cost as our measure of progress, in order to compare competing models using a sizable dataset. The crudeness of this approach only increases the difficulty of forecasting and makes it particularly surprising that we nonetheless observe common trends.

Analysis
We test six different hypotheses that have appeared in the literature [3,9,[18][19][20], corresponding to the following six functional forms: Moore log y t~a tzbzn(t) Goddard log y t~a log q t zbzn(t) SKC log y t~a log q t zc log (x t {q t )zbzn(t) Nordhaus log y t~a tzc log x t zbzn(t): The dependent variable y t is the unit cost of the technology measured in inflation-adjusted dollars. The independent variables are the time t (measured in years), the annual production q t , and the cumulative production x t~P t i~1 q i . The noise term n(t), the constants a, b and c, and the predictor variables differ for each hypothesis.
Moore's law here refers to the generalized statement that the cost y of a given technology decreases exponentially with time: where mw0 and Bw0 are constants [9,12]. (We assume throughout that tw0, and we have renamed a~{m and b~log B in Eq. (1)). Moore's law postulates that technological progress is inexorable, i.e. it depends on time rather than controllable factors such as research and development.
Wright's law, in contrast, postulates that cost decreases at a rate that depends on cumulative production: where ww0 and Bw0 are constants, and we have renamed a~{w and b~log B in Eq. (1). Wright's law is often interpreted to imply ''learning by doing'' [5,21]. The basic idea is that cumulative production is a proxy for the level of effort invested, so that the more we make the more we learn, and knowledge accumulates without loss. Another hypothesis is due to Goddard [18], who argues that progress is driven purely by economies of scale, and postulates that: where sw0 and Bw0 are constants, and we have renamed a~{s and b~log B in Eq. (1). We also consider the three multi-variable hypotheses in Eq. (1): Nordhaus [20] combines Wright's law and Moore's law, and Sinclair, Klepper, and Cohen (SKC) [19] combine Wright's law and Goddard's law. For completeness, we also test Wright's law lagged by one year. Note that these methods forecast different things: Moore's law forecasts the cost at a given time, Wright's law at a given cumulative production, and Goddard's law at a given annual production.
We test these hypotheses on historical data consisting of 62 different technologies that can be broadly grouped into four categories: Chemical, Hardware, Energy, and Other. All data can be found in the online Performance Curve Database at pcdb.santafe.edu. The data are sampled at annual intervals with timespans ranging from 10 to 39 years. The choice of these particular technologies was driven by availability -we included all available data, with minimal constraints applied, to assemble the largest database of its kind.
The data was collected from research articles, government reports, market research publications, and other published sources. Data on technological improvement was used in the analysis if it satisfied the following constraints: it retained a functional unit over the time period sampled, and it included both performance metric (price or cost per unit of production) and  production data for a period of at least 10 years, with no missing years in between. This inclusive approach to data gathering was required to construct a large dataset, which was necessary to obtain statistically significant results. The resulting 62 datasets are described in detail in File S1.
These datasets almost certainly contain significant measurement and estimation errors, which cannot be directly quantified and are likely to increase the error in forecasts. Including many independent data sets helps to ensure that any biases in the database as a whole are random rather than systematic, minimizing their effects on the results of our analysis of the pooled data.
To compare the performance of each hypothesis we use hindcasting, which is a form of cross-validation. We pretend to be at time i and make a forecastŷ y (f ,d,i) j for time j using hypothesis (functional form) f and data set d, where jwi. The parameters for each functional form are fitted using ordinary least squares based on all data prior to time i, and forecasts are made based on the resulting regression. We score the quality of forecasts based on the logarithmic forecasting error: The quality of forecasts is examined for all datasets and all hypotheses (and visualized as a three-dimensional error mountain, as shown in File S1). For Wright's law, an illustration of the growth of forecasting errors as a function of the forecasting horizon is given in Fig. 1.
An alternative to our approach is to adjust the intercepts to match the last point. For example, for Moore's law this corresponds to using a log random walk of the form log y tz1~l og y t {mzn(t), where n(t) is an IID noise term (see File S1). We have not done this here to be consistent with the way these hypotheses have been presented historically. The method we have used also results in more stable errors. Developing a statistical model to compare the competing hypotheses is complicated by the fact that errors observed at longer horizons tend to be larger than those at shorter horizons, and errors are correlated across time and across functional forms. After comparing many different possibilities (as discussed in detail in File S1), we settled on the following approach. Based on a search of the family of power transformations, which is known for its ability to accommodate a range of variance structures, we take as a response the square root transformation of the logarithmic error. This response was chosen to maximize likelihood when modeled as a linear function of the hindcasting horizon~target { origin~j{i, using a linear mixed effects model.
Specifically, we use the following functional form to model the response: where r fdij is the expected root error. The parameters a f and b f depend on the functional form and are called fixed effects because they are the same for all datasets. a f is the intercept and b f is the slope parameter. The parameters a d and b d depend on the dataset, and are called random effects because they are not fitted independently but are instead treated as dataset-specific random fluctuations from the pooled data. The quantities a d and b d are additive adjustments to the average intercept and slope parameters a f and b f , respectively, to take into account the peculiarities of each dataset d.
In Finally, we add an E fdij random field term to take into account the deviations from the trend. This is assumed to be a Gaussian  Var We also define an exponential correlation structure within each error mountain (corresponding to each combination of dataset and hypothesis, see File S1), as a function of the differences of the two time coordinates with a positive range parameter r and another small positive nugget parameter g quantifying the extent of these correlations: where the two Kronecker d functions ensure that each error mountain is treated as a separate entity. Equations (7) and (8) were chosen to deal with the observed heteroscedasticity (increasing variance with increasing logarithmic forecasting error) and the serial correlations along the time coordinates i (hindcasting origin) and j (hindcasting target). Based on the likelihood, an exponential correlation function provided the best fit. Note that instead of a Euclidean distance (root sum of the squares of differences), the Manhattan measure was used (the sum of the absolute differences), because it provided a better fit in terms of the likelihood. Using this statistical model, we compared five different hypotheses. (We removed the Nordhaus model from the sample because of poor forecasting performance [20]. This model gave good in-sample fits but generated large and inconsistent errors  when predicting out-of-sample, a signature of over-fitting. This points to the difficulty in separating learning from exogenous sources of change [20].) Rather than the 62|5|2~620 parameters needed to fit each of the 62 datasets separately for each of the five functional forms, there are only 16 free parameters: 5|2 = 10 parameters a f and b f , three parameters for the covariance matrix of the bivariate random vector (a d ,b d ), and three parameters for the variance and autocorrelation of the residuals E fdij .

Results and Discussion
We fit the error model to the 37,745 different r fdij data points using the method of maximum likelihood. In Fig. 2 we plot the expected root error r fij~af zb f (j{i) for the five hypotheses as a function of the hindcasting horizon. While there are differences in the performance of these five hypotheses, they are not dramatic. The intercept is tightly clustered in a range 0:16va f v0:19 and the slope 0:024vb f v0:028. Thus all the hypotheses show a large initial error, followed by a growth in the root error of roughly 2:5% per year. This is a central tendency for the pooled data.
The error model allows us to compare each hypothesis pairwise to determine whether it is possible to reject one in favor of another at statistically significant levels. The comparisons are based on the intercept and slope of the error model of Eq. (6). The parameter estimates are listed in Tables S1 and S3 in File S1 and the corresponding p-values in Tables S2 and S4 in File S1. For example, at the 5% level, the intercept of Goddard is significantly higher than any of the others and the slope of SKC is significantly greater than that of Wright, lagged Wright and Goddard. With respect to slope, Moore is at the boundary of being rejected in favor of Wright. Fig. 2 makes the basic pattern clear: Goddard does a poorer job of forecasting at short times, whereas SKC, and to a lesser extent Moore, do a poorer job at long times.
We thus have the surprising result that most of the methods are quite similar in their performance. Although the difference is not large, the fact that we can eliminate Goddard for short term forecasts indicates that there is information in the cumulative production not contained in the annual production, and suggests that there is a learning effect in addition to economies of scale. But the fact that Goddard is not that much worse indicates that much of the predictability comes from annual production, suggesting that economies of scale are important. (In our database, technologies rarely decrease significantly in annual production; examples of this would provide a better test of Goddard's theory. ) We believe the SKC model performs worse at long times because it has an extra parameter, making it prone to overfitting. Although Moore performs slightly worse than Wright, given the clear difference in their economic interpretation, it is surprising that their performance is so similar. A simple explanation for Wright's law in terms of Moore's law was originally put forward by Sahal [22]. He noted that if cumulative production grows exponentially: then eliminating t between Eqs. (2) and (9) results in Wright's law, Eq. (3), with w~m=g. Indeed, when we look at production vs. time we find that in almost every case the cumulative production increases roughly exponentially with time. (Note that if production grows exponentially, cumulative production also grows exponentially with the same exponent.) This is illustrated in Fig. 3, where we show three representative examples for production and cost plotted as a function of time. Fig. 3 also shows histograms of R 2 values for fitting g and m for the 62 datasets. The agreement with exponential behavior ranges from very good to rather poor, but of course these are short time series and some of them are very noisy. We test this in Fig. 4 by plotting the measured value of w d against the derived valueŵ w d~m =g for each data set d. The values cluster tightly along the identity line, indicating that Sahal's conjecture is correct.
The differences in the data sets can be visualized by plotting a d and b d as shown in Fig. 5. All but one data set is inside the 95% confidence ellipsoid, indicating that the estimated distribution of (a d ,b d ) is consistent with the bivariate normal assumption. The intercepts vary in a range roughly {0:10va d v0:17 and the slopes {0:018vb d v0:015. Thus the variation in the corresponding logarithmic forecasting error for the different datasets is comparable to the average error for all datasets (Fig. 5) and about an order of magnitude larger than the difference between the hypothesized laws (Fig. 2).
To illustrate the practical usefulness of our approach we make a forecast of the cost of electricity for residential scale photovoltaic solar systems (PV). Fig. 6 shows the best forecast (solid line) as well as the expected error (dashed lines). These are not confidence limits, but rather projected absolute log deviations from the best forecast, calculated from Eq. (6) using a Moore , b Moore , a Photovoltaics2 , and b Photovoltaics2 . The sharp drop in the one year forecast relative to the last observed data point comes from the fact that forecasts are based on the average trend line, and because this data series is particularly long. PV costs rose in recent years due to increased material costs and other effects, but industry experts expect this to be a short-lived aberration from the long-term cost trend.
The expected PV cost in 2020, shown in Fig. 6, is 6 cents/kWh with a range (3,12). In 2030 the cost is 2 cents/kWh, with a range (0.4, 11). This does not include the additional cost of energy storage technologies. The current cost of the cheapest alternative, coal-fired electricity, is roughly 5 cents/kWh. This is the wholesale cost at the plant (busbar), which may be most directly comparable to industrial scale PV (rather than the residential scale shown in Fig. 6). Industrial scale PV is typically about two-thirds the cost of electricity from the residential scale systems. In contrast to PV, coal-fired electricity is not expected to decrease in cost, and will likely increase if there are future penalties for CO 2 emissions [23].
The costs of other technologies can be forecasted in a similar way, using historical data on the cost evolution to project future performance. The expected error in this forecast is calculated using our error model (Eq. (6)). The error is determined for each future year j from the present year i based on parameters specific to the technology of interest, as well as insight gained from examining data on many technologies. This approach allows us to forecast both the expected error and the expected cost. The method outlined is suited to Moore's functional form. Forecasting future performance based on production levels requires an additional step of forecasting future production over time.
Our primary goal in this paper is to compare the performance of proposed models in the literature for describing the cost evolution of technologies. Our objective is not to construct the best possible forecasting model. Nonetheless we outline above the steps one would take in making a forecast in order to demonstrate the utility of the general approach we develop, which centers on analyzing a large, pooled database, and estimating the expected, time horizon-dependent error associated with a given forecasting model. This approach can be applied to other forecasting models in the future.
The key postulate that we have made in this paper is that the processes generating the costs of technologies through time are generic except for technology-specific differences in parameters. This hypothesis is powerful in allowing us to view any given technology as being drawn from an ensemble. This means that we can pool data from different technologies to make better forecasts, and most importantly, make error estimates. This is particularly useful for studying technology trends, where available data is limited. Of course we must add the usual caveats about making forecasts -as Niels Bohr reputedly said, prediction is very difficult, especially of the future. Our analysis reveals that decreasing costs and increasing production are closely related, and that the hypotheses of Wright and Moore are more similar than they might appear. We should stress, though, that they are not the same. For example, consider a scenario in which the exponential rate of growth of PV production suddenly increased, which would decrease the current production doubling time of roughly 3 years. In this case, Wright predicts that the rate at which costs fall would increase, whereas Moore predicts that it would be unaffected. Distinguishing between the two hypotheses requires a sufficient number of examples where production does not increase exponentially, which our current database does not contain. The historical data shows a strong tendency, across different types of technologies, toward constant exponential growth rates. Recent work, however, has demonstrated super-exponential improvement for information technologies over long time spans [24], suggesting that Moore's law is a reasonable approximation only over short time spans. This evidence from information technologies [24], and the results presented here, suggest that Moore may perform significantly worse than Wright over longer time horizons.

Supporting Information
File S1 Supporting Information (PDF)