Do speed cameras reduce road traffic collisions?

This paper quantifies the effect of speed cameras on road traffic collisions using an approximate Bayesian doubly-robust (DR) causal inference estimation method. Previous empirical work on this topic, which shows a diverse range of estimated effects, is based largely on outcome regression (OR) models using the Empirical Bayes approach or on simple before and after comparisons. Issues of causality and confounding have received little formal attention. A causal DR approach combines propensity score (PS) and OR models to give an average treatment effect (ATE) estimator that is consistent and asymptotically normal under correct specification of either of the two component models. We develop this approach within a novel approximate Bayesian framework to derive posterior predictive distributions for the ATE of speed cameras on road traffic collisions. Our results for England indicate significant reductions in the number of collisions at speed cameras sites (mean ATE = -15%). Our proposed method offers a promising approach for evaluation of transport safety interventions.


Introduction
Fixed speed limit enforcement cameras are a common intervention used to encourage drivers to comply with maximum legal speed limits. The cameras are installed at sites on selected links in order to detect speed limit violations, which can subsequently be punished with monetary fines, driver licence disqualification points, or prosecution. Since the introduction of speed cameras (SCs) there has been considerable debate about their effects on road traffic collisions (RTCs). At various times claims have been made that SCs serve to reduce RTCs, that they have no effect, or even that they increase RTCs by encouraging more erratic driving behaviour.
The paper is structured as follows. Section two outlines broad trends in road traffic casualties for Britain and then sets out a formal causal modelling framework to estimate the effects of SCs on RTCs. Section three describes our approximate Bayesian DR approach and presents some simulations that demonstrate its properties. Section four describes the data available for estimation and outlines our chosen model specifications. Results are then presented in section five and conclusions are drawn in the final section. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Speed cameras, road traffic collisions and causality
A number of academic studies of the effect of speed cameras on RTCs have been undertaken [1]. Most studies find that speed cameras have led to a reduction in RTCs, but the range of estimated effects is large (from 0% to -55%). Variation in estimates is to be expected given that study results pertain to diverse empirical contexts, but it is also the case that a number of different methods have been applied which can have a critical influence on results obtained. In particular, since SCs are not randomly assigned, it is essential that any adopted method recognises that the observed relationship between SCs on RTCs may be subject to confounding. Confounding arises when the characteristics that influence treatment assignment (i.e. whether a site is 'treated' and 'untreated' with an SC) also matter for outcomes (i.e. RTCs). Regression to the mean (RTM), for instance, is a well known manifestation of confounding that arises via 'selection bias'.
The extent to which confounding has been recognised and addressed in existing studies varies considerably. Some studies have simply ignored it, using simple before-and-after methods with control groups [2][3][4][5][6][7][8][9]. Others have used the empirical Bayes (EB) method as suggested by [10], largely to adjust for effects of confounding that arise via RTM [11][12][13][14][15][16]. Finally, there are a small number of studies that have used time-series methods, either interrupted timeseries analyses with control groups or ARIMA, to test for changes in outcome rates [17][18][19].
Where studies have attempted to address confounding this has been done via the inclusion of covariates in outcome regression (OR) models, typically using Poisson or negative binomial Generalised Linear Models (GLMs).
In a previous paper we adopted a propensity score (PS) matching approach to evaluate the effectiveness of speed cameras [1]. A key advantage of the PS over OR approach is that it provides an effective way of isolating a valid control group by ensuring that the distribution of pre-treatment covariates matches those of the treated group and that genuine overlap in the support of the covariates exists between the two groups. However, as with the OR approach, valid inference from PS models crucially depends on the unknown PS model being correctly specified.
In this paper we build on our previous work by developing and applying an estimation approach which we believe has much to offer in evaluating the effectiveness of road safety interventions. Our approach uses the principle of doubly-robust (DR) estimation, which provides robustness to model misspecification by combining both OR and PS models to derive an average treatment effect (ATE) estimator which is consistent and asymptotically normal under correct specification of just one of the two component models. The DR property offers a significant methodological advance for traffic safety analyses because it allows us evaluate interventions using combined inference from two key modelling standpoints: via a model for factors influencing assignment of road safety measures and via a model for the determinants of RTCs. A good specification of just one of these two models will yield valid inference.
The DR approach is attractive for our application because the PS and OR models we can construct make different assumptions about the nature of confounding. For the PS model, we are able to faithfully represent via measured covariates the formal criteria that exist for the assignment of speeds cameras to sites. For the OR model, we can difference our response variable before and after treatment to allow for the existence of site level time-invariant unobserved effects in addition to measured confounders.
To avoid common sources of misspecification error, we estimate both of our component models using semiparametric Generalized Additive Mixed Models (GAMMs) which make minimal a priori assumptions on the functional form of the relationships under study. We also use a matching algorithm prior to forming the DR model to establish a valid control group.
Thus, in our approach, potential biases from confounding are addressed by combining three compatible modelling tools: via matching to achieve comparability between treated and control sites and via model based adjustment for valid ATE estimation through the regression model for RTCs and the PS model for the treatment assignment mechanism.
DR estimators have been studied and applied extensively in the frequentist setting [20][21][22][23][24][25][26]. A further contribution of the paper is that we develop our binary DR estimator within the Bayesian paradigm. A Bayesian representation of the DR model has proven difficult to formulate in previous work because DR estimators are typically constructed as solutions to estimating equations based on a set of moment restrictions that do not imply fully specified likelihood functions. We choose the Bayesian paradigm for three main reasons. First, DR estimation of the ATE involves prediction and extrapolation over covariate distributions with underlying uncertainty in parameter estimates. Bayesian inference provides a suitable framework for prediction that explicitly addresses such uncertainty in the sense that both the predicted observations, and the relevant parameters for prediction, have the same random status. Second, by deriving a posterior predictive distribution for the ATE, rather than a fixed value, we can make probability statements about the causal quantity of interest allowing us to discuss findings in relation to specific hypotheses or in terms of credible intervals which can offer a more intuitive understanding of the effects of SCs for public policy formulation. Finally, we develop an approximate Bayesian approach that can utilise prior information about the parameters of interest, which could be useful in evaluating safety interventions when historical data or training data from other regions are available.

Road traffic casualties in Britain
For the year zcolorred spanning October 2015 to September 2016 the UK DfT recorded a total of 182,747 causalities on British roads of which 25,420 were classified as killed or seriously injured (KSIs) and 1,800 as fatalities [27]. Since 2010 the annual numbers of fatalities and KSIs have not changed significantly, following several years in which road safety was improving. The average number of fatal road traffic incidents over the period 2010 to 2016 is approximately 1,800. Since the volume of road traffic has continued to grow over this period, however, the number of fatalities per vehicle mile driven has been falling [28].
The DfT argue that there is good evidence to suggest that while the absolute number of fatalities on British roads now appears to be relatively static, overall absolute casualty numbers are continuing to fall. In short, levels of safety appear to be improving in relative terms and not deteriorating in absolute terms. Given the changes that have occurred in vehicle technology, medical care, and road safety interventions, however, the DfT also note that a comprehensive causal understanding of the factors underpinning casualty trends is currently out of reach. In this paper we attempt to contribute to such an understanding by quantifying the causal impact of one type of safety intervention: speed cameras (SCs).

ATE estimation within the potential outcomes framework
Our sample comprises n, i = 1, . . ., n, links on the road network. Some links have a SC other do not. We define D i 2 {1, 0} as a binary random variable indicating the presence or otherwise of a SC and we refer to this as the treatment variable. We are interested in the effect of the treatment on an outcome Y i , which measures collision frequency. We define Y i (1) and Y i (0) as the potential outcomes for unit i under treated and control status respectively. Recognising that SCs are not assigned randomly, we also define X i as a random vector of pre-treatment covariates that capture characteristics of links that are relevant to whether a SC was assigned or not, and are also relevant for outcomes. Thus, the data we observe for each link is a random vector, where y i is the response or outcome, d i is treatment status, and x i a vector of pretreatment (or baseline) covariates.
Ideally, we would assess the effects of SCs on each link by calculating the individual causal effect (ICE): , but the data reveal only outcomes that have actually occurred not potential outcomes. Thus, the data reveal the random variable is the indicator function for receiving the treatment. But they do not reveal the joint density, f(Y i (0), Y i (1)), since a SC cannot be both present and absent on a link simultaneously. Consequently, our focus inference on estimation of the ATE, defined as which measures the difference in expected outcomes under treatment and control status. Note that sometimes in the safety literature the average effect of the treatment on the treated (ATT), i.e. E½Y i ð1Þ À Y i ð0ÞjD i ¼ 1�, is estimated, but since in our case all matched units could potentially be exposed to the treatment ATE is a more appropriate estimand than ATT. We can estimate the ATE without observing all potential outcomes if three key assumptions hold.
1. Conditional independence-the potential outcomes for each unit must be conditionally independent of treatment assignment given observed covariates 2. Common Support-the support of the conditional distribution of assignment to treatment given covariates must overlap with that of the conditional distribution of assignment to control. This requires that 0 < Pr( 3. Stable Unit Treatment Value Assumption (SUTVA)-observed and potential outcomes must satisfy the SUTVA [29][30][31][32], which requires that observed outcomes under a given treatment allocation are equivalent to potential outcome under that same treatment allocation: These three assumptions are sometime referred to collectively as strong ignorability, and they allow the ATE to be estimated from observational data as follows.
The equality of (2) and (3) is justified via conditional independence, the substitution of observed and potential outcomes implied by the SUTVA gives (4), and common support ensures that there are both treated and control units and thus that the population ATE in (4) is estimable.
Thus, if strong ignorability holds, the potential outcomes approach offers a route to obtaining valid causal estimates of the ATE of SCs. To proceed we need to estimate the relevant expectations in (4) above.

Causal estimators
Following [33], we can express our observed data as joint densities of the form If strong ignorability holds, one of the following estimators can be used to estimate the ATE of SCs.
1. Outcome regression (OR) model-f D|X (d|x) and f X (x) are unspecified and instead we construct a model for E½Y i jD i ; X i �Þ; the mean of the conditional density of outcome given covariates. This is done via the OR model C −1 {m(D i , X i ;β)}, for given link function C, regression function m(), and unknown parameter vector β. Correct specification of the OR model for the mean response provides a consistent estimates of the ATE usinĝ 2. Propensity score (PS) model-under the PS approach we assume a model for f D|X (d|x); the conditional probability of assignment to treatment given covariates, and leave f Y|D, X (y|d, x) and f X (x) unspecified. The PS model, which we write π(D i |X i ;α), can form the basis of several different nonparametric estimators, but of primary interest here is the weighting estimator propsed by [34] which is consistent under correct PS model specification due to the fact that E½Y i ð1Þ� ¼ Ef½Y i ð1Þ � I 1 ðD i Þ�=pðD i jX i ; aÞg and similar for control treatment status.
3. Doubly-robust (DR) model-specify both an OR and PS model and use them in a combined estimation to yield a DR estimator. This can be achieved by forming a function of the inverse estimated PSs and using that function to weight or augment the OR model. Here, we estimate the weighted model where the unknown parameter vector ξ is obtained by weighting the model witĥ This model will provide consistent estimates of is correct, since although weighting may reduce efficiency, it does not adversely affect the consistency and asymptotic normality of the OR model. If the OR model specification is incorrect, but not the PS, the model DR model will still be consistent because weighting yields estimating equations of the form in which ϕ i � ϕ(D i , X i ) is a working conditional variance for Y i given (D i , X i ). These effectively correct for the bias in approximating E½Y i jD i ; X i �� using C −1 {m(D i , X i ; β)} [24].
We use estimates of ξ to form the DR estimator Approximate Bayesian doubly-robust estimation So far our discussion of DR estimation has been conducted within the frequentist semiparametric paradigm. As alluded to previously, however, there are good reasons why a Bayesian inferential approach may be particularly beneficial for estimation of road safety interventions. But inference for DR estimators is less straighfoward in the Bayesian than frequentist setting because DR models are based on moment restrictions that do not yield fully specified likelihood functions. Here, we make some improvements to the approach proposed by [35] in the context of continuous treatment. In contrast to that paper we focus on binary treatments using PS weighting rather than augmentation to achieve the DR model and we implement ways of incorporating prior information into the posterior distribution of the ATE. The basic theory underpinning approximate Bayesian inference in this context is covered comprehensively in [35] and so we provide only a brief summary here. The Bayesian bootstrap was first introduced by [36] and applied in weighted likelihood models by [37]. The basic idea is to create new datasets by repeatedly re-weighting the original data in order to obtain the posterior distribution for some parameter of interest. If we treat our observed data, z i say, as effectively coming from a multinomial distribution with distinct values a k , k = (1, . . ., K), and attach a probability to each distinct value θ = (θ 1 , . . ., θ k ), then by placing an improper Dirichlet prior on θ the posterior density also has a Dirichlet distribution with parameter n k . This posterior can be stimulated via the weighted likelihood in which the weights w = (w 1 , . . ., w n ) are distributed according to the uniform Dirichlet distribution and simulated as n independent standard exponential (i.e. gamma(1,1)) variates and standardised. The weighted likelihood reduces tõ say, where nγ k is the sum of the weights w i for which z i = a k . Since the vector γ = (γ 1 , . . ., γ K ) has a Dirichlet distribution with parameters n k = (n 1 , . . ., n K ), Do speed cameras reduce road traffic collisions?
and since at the point of maximisation ofLðyÞ isỹ ¼ g, then the solutions to the maximised weighted likelihood function with repeatedly sampled uniform Dirichlet weights w (l) represent a sample from the posterior of θ under the improper prior Q k y À 1 k . To apply the Bayesian bootstrap to our DR model we estimate with weights The maximiser ofLðxÞ, which we denotex, implies a solution to which as noted above has the DR property. We repeatedly draw sets of random weights fw ðlÞ i g n i¼1 as n standardised independent standard exponential variates and solve (19) to build up an empirical posterior density ofx, denoted p n ðxÞ, from which the sampled valuesx ðlÞ are consistent with the DR estimating equations.
[37] apply sampling-importance resampling (SIR) to improve accuracy of the weighted bootstrap approach, but this improvement requires a fully specified likelihood function. Instead, for our restricted moment model, we use the resampling scheme proposed by [38] which extends Rubin's bootstrap in a general Bayesian nonparametric context. Two attractive features of Muliere and Secchi's approach for causal modelling are that it ensures that predictive distributions are not constrained to be concentrated on observed values and it allows us to take prior opinions into account. The posterior predictive distribution of the ATE, incorporating prior information, is obtained in the following way.
i. Estimate the PS model π(D i |X i ; α), and form ii. Draw a single set of random weights fw ðlÞ i g n i¼1 and form the combined weights w ðlÞ i � k i ðd i jx i ;âÞ and estimate the weighted model iii. Repeatedly compute (ii) using new weights fw ðlÞ i g n i¼1 to obtain the empirical posterior distribution p n ðxÞ. ix. Form a sampled value of the ATE random variable as x. Repeat this procedure M times, m = (1, . . ., M), to obtain the posterior predictive distribution.

Simulations
In this subsection we present some simulation to demonstrate the DR properties of our approximate Bayesian approach. The simulations are based on the following data generating process: a binary treatment D is assigned as a function of covariate X, and the outcome of interest Y depends on both treatment D and covariate X X � Normalð0; 10Þ where α 0 = 2, α 1 = 0.2, β 0 = 10, β 1 = 5, β 2 = 0.2. The true ATE is given by parameter β 1 , that is τ = 5.0. The following models are tested: 1.t BOR1 -an approximate Bayesian OR model based on the correctly specified model: The point estimate reported in the simulations is the mean value of the ATE posterior predictive distribution, i.e.
2.t BOR2 -same as [1.] except based based on an incorrectly specified OR model with covariate X excluded.

3.t PS1 -an approximate Bayesian inverse PS weighted model based on the correctly specified PS modelt
4.t PS2 -an approximate Bayesian inverse PS weighted model based on an incorrectly specified PS model, in which the PS is generated randomly from the continuous uniform distribution:pðDjXÞ � Uniformð0; 1Þ.

5.t BDR1 -an approximate Bayesian DR model based on an incorrectly specified OR model (X excluded) but with weights based on the correct PS model
6.t BDR2 -an approximate Bayesian DR model based on a correctly specified OR model but with weights based on the incorrect PS model. 7.t BDR3 -an approximate Bayesian DR model based on the incorrectly specified OR model weighted with weights based on the incorrect PS model.
The simulations are based on 1000 runs on generated datasets of size 1,000. In each case, we place a Normal prior on the treatment coefficient β 1 , with mean equal to the true value (5 in this case). We set the measure of faith k to be relatively low so as not to overly affect the results. Table 1 shows our simulation results. Mean values and variances of the point estimates obtained (i.e. means and variances of the ATE distributions) and the mean squared error (MSE) are reported.
The mean of the posterior distribution for the ATE from the correctly specified OR model, t BOR1 , provides a good approximation to the true value of τ. The incorrectly specified OR model, BOR2, fails to address confounding and consequentlyt BOR2 provides a poor approximation to the true ATE. A good estimate of τ is achieved via the correctly specified PS model (t PS1 ), but when the PS is model is mispecified (t PS1 ) the estimate of the ATE is far away from the true value. In our simulations the PS model is severely misspecified, or simply wrong, having being generated randomly. This tendency of the inverse PS model to fail quite considerably under severe misspecification is well known in the literature [26]. Weighting the incorrectly specified OR model with weightskðD; XÞ, based on a correctly specified PS model, as in the BDR1 model, provides correction for misspecification bias with an average point estimate very close to the true value, but slightly larger posterior variances relative to the correctly specified OR model. The BDR2 model simulation also produces valid point estimates because weighting by weights based on an incorrectly specified PS model does not does not induce bias, but it does increase variance. Finally, if both the OR and PS models are wrongly specified as in BDR3, the model fails to produce a good point estimate of the mean ATE.

Treatment and outcome variable
We have data on the location of fixed speed cameras for 771 camera sites in the following English administrative districts: Cheshire, Dorset, Greater Manchester, Lancashire, Leicester, Merseyside, Sussex and the West Midlands. These sites form our group of treated units. To select potential control sites we randomly sampled a total of 4787 points on the network within our eight administrative districts. The large ratio of potential control to treated units is adopted to ensure that we have a sufficient number of control units after we apply a matching algorithm. Our outcome variable is the number of personal injury collisions (PICs) per kilometre as recorded from the location of the speed cameras, or in the case of control groups, from the randomly selected point. The PIC data are taken from records completed by police officers each time that an incident is reported to them. The individual police records are collated and processed by the UK Department for Transport as the 'STATS 19' data. The location of each PIC is recorded using the British National Grid coordinate system and can be located on a map using Geographical Information System (GIS) software. Because the established dates of speed cameras vary from 2002 to 2004, the period of analysis is from 1999 to 2007 to ensure the availability of collision data for the years before and after the camera installation for every camera site.

Covariates
To adequately adjust for confounding we require a set of measured covariates that adequately represent the characteristics of units that simultaneously determine treatment assignment and outcome. For the UK there exists a formal set of site selection guidelines for fixed speed cameras [5]  4. 85th percentile speed at collision hot spots: 85th percentile speed at least 10% above speed limit.
5. Percentage over the speed limit: at least 20% of drivers are exceeding the speed limit.
Criteria one to three are primary guidelines for site section and criteria four and five are of secondary importance. There are sites that do not meet the above the above criteria that will still be selected as enforcement sites, mainly for reasons such as community concern and engineering factors.
Selection of the speed camera sites was primarily based on collision history. collision data can be obtained from the STATS 19 database and located on the map using GIS. However, secondary criteria such as the 85th percentile speed and percentages of vehicles over the speed limit are normally unavailable for all sites on UK roads. If speed distributions differ between the treated and untreated groups, then the failure to include the speed data could bias the estimation, an issue discussed in previous research [5,15]. For untreated sites with the speed limit of 30 mph and 40 mph, the national average mean speed and percentages of speeding are similar to the data for the camera sites. The focus groups for this study are sites with the speed limit of 30 mph and 40 mph throughout the UK. It is reasonable to assume that there is no significant difference in the speed distribution between the treated and untreated groups and hence exclusion of the speed data will not affect the accuracy of the propensity score model.
It is also possible that drivers may choose alternative routes to avoid speed cameras sites. collision reduction at camera sites may include the effect induced by a reduced traffic flow. The benefits of speed cameras will therefore be overestimated without controlling for the change in traffic flow. The annual average daily flow (AADF) is available for both treated and untreated roads and the effect due to traffic flow is controlled for in this study by including the AADF in the propensity score model.
In addition to the criteria that strongly influence the treatment assignment, factors that affect the outcomes should also be taken into account when the propensity score model is specified. We further include road characteristics such as: road types, speed limit, and the number of minor junctions within site length, which are suggested as important factors when estimating the safety impact of speed cameras [2,6].

Component model specifications
The outcome variable of interest is the number of collisions per site. For the OR model the response is specified in differenced form, i.e. the number of collisions in the post-treatment period minus the number of collisions in the pre-treatment period. Differencing allows for the existence of unit level time-invariant effects, which could be random or fixed. The PS model is estimated using a logit Generalized Additive Mixed Model (GAMM) specification. Matching and overlap is achieved using nearest neighbour matching via the MatchIt package in R. The weighted OR model is then estimated on the trimmed dataset, which satisfies matching and overlap conditions, using a Gaussian GAMM specification. We use GAMMs for both models to avoid making a-priori assumptions on the functional form of the relationships under study.
As mentioned in the introduction, the DR approach is particularly attractive for our application because of the differences inherent in our PS and OR model specifications. Due to the existence of formal criteria for SC assignment we have a high degree of confidence in the ability of our covariates to eliminate confounding via the PS model. For the OR model, differencing of the response variable before and after treatment allows for the existence of site level time-invariant unobserved effects in addition to measured confounders. The DR model comprises the differenced OR model weighted by the estimated PS model. Thus, there are subtle differences in the way we model the ATE via the PS or OR approaches. A degree of robustness is offered using a DR approach since we will obtain a consistent estimate of the ATE if just one of the component models is well specified.

Results
The objective of our application is to estimate the marginal effect of SCs on RTCs, having adjusted for baseline confounders. We estimate the following models: an OR model, an inverse PS weighted model, a DR model comprising an OR model weighted with the inverse PS covariate (DR), and a naïve model which is simply the OR model without covariates. For the naïve model we report results using the matched and full samples. All models are repeatedly estimated using the approximate Bayesian approach outlined above. In addition to the posterior predictive distribution for the ATE we report point estimates at the mean of the posterior. For comparison, we also report Frequentist results.
The results are shown in Table 2 below including means and credible intervals of the ATE distributions. Our causal models (OR, IPW and DR) indicate that the presence of speed cameras corresponds with an average change in the number of RTCs of -14.4% to -15.5%. Note that the approximate Bayesian and Frequentist point estimates are very similar, which is what we would expect for linear models with uninformative priors. In comparison, the Naïve model which does not adjust for confounding, finds a higher ATE of -17.6% using the matched sample and -33.6% using the unmatched sample. Fig 1. below shows the posterior predictive distribution derived from the DR model.  Thus, it would appear that correcting for potential sources of confounding serves to reduce the magnitude of our ATE estimates, but we still find a substantial reduction in RTCs associated with presence of speed cameras. The difference in estimated ATE between the naïve and causal models makes sense given that the formal criteria used to assign SCs favours sites that have exhibited high rates of collisions in the past. Crucially, our causal models imply that SCs do make a real difference to RTCs over and above the modelled effect of confounding from non random assignment.

Conclusions
In this paper we have the quantified the causal effect of speed cameras on road traffic collisions via an approximate Bayesian doubly robust approach. This is the first time such an approach has been applied to study road safety outcomes. The method we propose could be used more generally for estimation of crash modification factor (CMF) distributions. Simulations demonstrate that the approach is doubly-robust for average treatment effect estimation. Our results indicate that speed cameras do cause a significant reduction in road traffic collisions, by as much as 15% on average for treated sites. This is an important result that could help inform public policy debates on appropriate measures to reduce RTCs. The adoption of evidence based approaches by public authorities, based on clear principles of causal inference, could vastly improve their ability to evaluate different courses of action and better understand the consequences of intervention.
There are thus two important implications of our study that could ultimately improve highway safety. First, is that such inference could be employed to achieve a more effective assignment of SCs and consequent reduction of RTCs. Second, the approach outlined above could be used to continually monitor SC effectiveness as baseline conditions (e.g. related to road traffic and wider demographic and social characteristics) change, thus providing a mean of monitoring the effectiveness of road safety interventions dynamically.