Effectiveness of Low Emission Zones: Large Scale Analysis of Changes in Environmental NO2, NO and NOx Concentrations in 17 German Cities

Background Low Emission Zones (LEZs) are areas where the most polluting vehicles are restricted from entering. The effectiveness of LEZs to lower ambient exposures is under debate. This study focused on LEZs that restricted cars of Euro 1 standard without appropriate retrofitting systems from entering and estimated LEZ effects on NO2, NO, and NOx ( = NO2+NO). Methods Continuous half-hour and diffuse sampler 4-week average NO2, NO, and NOx concentrations measured inside and outside LEZs in 17 German cities of 6 federal states (2005–2009) were analysed as matched quadruplets (two pairs of simultaneously measured index values inside LEZ and reference values outside LEZ, one pair measured before and one after introducing LEZs with time differences that equal multiples of 364 days) by multiple linear and log-linear fixed-effects regression modelling (covariables: e.g., wind velocity, amount of precipitation, height of inversion base, school holidays, truck-free periods). Additionally, the continuous half-hour data was collapsed into 4-week averages and pooled with the diffuse sampler data to perform joint analysis. Results More than 3,000,000 quadruplets of continuous measurements (half-hour averages) were identified at 38 index and 45 reference stations. Pooling with diffuse sampler data from 15 index and 10 reference stations lead to more than 4,000 quadruplets for joint analyses of 4-week averages. Mean LEZ effects on NO2, NO, and NOx concentrations (reductions) were estimated to be at most −2 µg/m3 (or −4%). The 4-week averages of NO2 concentrations at index stations after LEZ introduction were 55 µg/m3 (median and mean values) or 82 µg/m3 (95th percentile). Conclusions This is the first study investigating comprehensively the effectiveness of LEZs to reduce NO2, NO, and NOx concentrations controlling for most relevant potential confounders. Our analyses indicate that there is a statistically significant, but rather small reduction of NO2, NO, and NOx concentrations associated with LEZs.


Introduction
Low Emission Zones (LEZs) are areas or roads where the most polluting vehicles are restricted from entering. They are currently introduced in 13 European countries [1]. In Europe, vehicle emissions are classified by the so-called ''Euro Standards'' with a current range from Euro 1 to Euro 6 regarding the technical features of the vehicles which are fixed in several EU-Directives for passenger cars and heavy-duty trucks [e.g., 2]. Basically, this means that vehicles are restricted in relation to their Euro emission level. The configuration of LEZs is extremely different and heterogeneous in Europe, for example in Italy, where the entry standards, the subsistent regulations and the daily duration of LEZ conditions differ substantially from town to town. However, most LEZs in Europe operate 24 hours a day, 365 days a year [see 1].
One of the most developed applications is found in Germany. Low emission zones have been introduced in Germany since 2008 in different stages, resulting in meanwhile 48 LEZs with restrictions for pollutant groups 2 or 3 in 11 Federal states by the end of January 2014 [3]. In this study we analysed the effect of introducing the ''LEZ of pollutant group 1'' which restricts from entering Diesel cars of an European emission standard below Euro 2 without particulate reduction system and gasoline cars of an European emission standard below Euro 1 without appropriate exhaust gas catalytic converters [3].
Traffic emissions are considered to be a relevant source of air pollution [4] and LEZs are believed to be the most effective measure that cities can take to reduce vehicle-induced air pollution problems in their area [5][6][7]. The emissions that are aimed to be reduced by LEZs are mainly fine particles like PM10 or smaller [8][9][10][11][12]. The effectiveness of LEZs to reduce traffic-related exposures is still under debate [13] and there is an open discussion in the public about the ''outcome'' and cost-benefit ratio of LEZs [14][15][16]. Most of the published information refers to particulate matter.
Additionally, nitrogen dioxide is discussed to be a major trafficrelated pollutant as well as an epidemiologic marker of air quality and related adverse health effects [17][18][19][20][21]. On the other hand, a systematic literature review showed only moderate evidence for adverse health effects at a long-term exposure below an annual mean of 40 mg/m 3 NO2 [22].
According to EU rules [23,24] limits were additionally imposed for NO2 and are enforced in Germany since 2010: 200 mg/m 3 as an 1 hour average (acceptable: 18 excursions/year) and 40 mg/m 3 as an annual average. Values were and are in excess: about 69% of all stations near to traffic showed annual averages higher than 40 mg/m 3 in Germany [7,25]. This non-compliance is not restricted to Germany but the European limit value for NO2 is exceeded in many European cities [26][27][28]. The LEZ concept was extended and it was assumed that LEZs are an effective measure not only to lower PM10 dust levels but also to reduce NO2 concentrations [6,29]. There are indications that LEZs may indeed reduce NOx concentrations effectively [30][31][32], but ozone has to be considered a confounder in NO2 measurements [e. g. 33], and the gases NO and NO2 rapidly interconvert, too [34]. Furthermore, national emission ceilings were defined for NOx, i.e., the sum of NO2 and NO [35]. Thus, there is interest in the impact of LEZs on concentrations of NO and NOx also [36].
However, a scientific proof of the LEZ concept targeting at NO 2 , NO, and NO x is still missing. In order to test the views of legislators and researchers that LEZs are effective measures to reduce nitrogen oxide concentrations [29,37], this study focused on the potential effects of LEZs on ambient concentrations of NO 2 , NO, and NO x in LEZ areas of 17 German cities.
We reported on the effect of LEZs on PM 10 concentrations elsewhere [32].

Target parameters
The aim of the study was to analyse the effectiveness of German LEZs (as many as eligible) to lower NO 2 , NO, and NO x ( = NO 2 + NO) concentrations. The first analysis series of NO 2 , NO, and NO x were based on continuous half-hour measurement data of NO 2 and NO. Second, measurement data for NO 2 and NO concentrations collected by diffuse samplers and determined over longer sampling periods were available. These data were allocated to 4-week periods. Third, we collapsed the half-hour measurement data to four-week averages and pooled these collapsed continuous data and the diffuse sampler data to perform joint analyses over 4week periods. The original NO 2 and NO measurements were performed by the Environmental State Institutions in Germany (Landesumweltä mter). A federal data base [38] reports on the applied measurement procedures.

Measuring procedure
Two measuring procedures were applied: continuous measurement devices (chemiluminescence), data stored as half-hour averages and diffuse samplers (Palmes tubes, chromatography), data stored as long-term averages over weeks. The chemiluminescence method relies on the reaction of NO with O3: NO+ O3RNO2*+O2. Chemiluminescence is generated in the range of 600 nm to 3,000 nm when the excited molecules return to the ground state. The light intensity is proportional to the concentration of NO molecules. A deoxidation converter is used to reduce NO2 to NO. Thus, the NO2 concentration is determined as the difference between the NOx concentration measured when the sample gas is directed through a deoxidation converter and the NO concentration measured when the gas is not run through the converter. The diffuse samplers were Palmes type tubes modified with a glass frit as turbulence barrier. In these passive samplers molecules diffuse because of a concentration gradient through an intake opening with a defined cross-section along a fixed diffusion path to a sampling medium by which they are adsorbed. This process is described by Fick's first diffusion law. The chemical analysis is done by chromatography. [More details on both methods may be found in 39, [40][41][42][43].

Low Emission Zones
There were 34 German active LEZs until the end of 2009 and 774 monitoring stations in use. With introduction of these LEZs, as a main effect, only those diesel vehicles with an exhaust emission standard better than Euro 1 (with sticker) were allowed to enter the zone. In principle, the German ''LEZ of pollutant group 1'' restricts from entering -Diesel passenger cars, trucks and buses of an European emission standard below Euro 2 without particulate reduction system, and -Gasoline passenger cars, trucks and buses of an European emission standard below Euro 1 without appropriate exhaust gas catalytic converters.
Local authorities can set up exception permits especially for light duty vehicles, trucks and buses due to local necessities [3].
According to protocol LEZs were included into the study if and only if -monitoring stations existed, that operated before and after the LEZ introduction and measured inside the LEZ area (index stations) and -monitoring stations existed, that operated before and after the LEZ introduction and measured outside the LEZ area -in a circle around the centre with a radius of about 25 km -and if outside the city area, than in no other LEZ (reference stations) and -these monitoring stations measured NO 2 or NO (continuous measurements or diffuse samplers).
(For the terminology and the use of index and reference values in comparisons if exposures levels see Rothman et al. [44]) Seventeen cities with LEZs in 6 German Federal states could be included into the study (Baden-Württemberg: Herrenberg, Ludwigsburg, Mannheim, Reutlingen, Stuttgart, Tübingen; Bavaria: Augsburg, Munich; Berlin: Berlin; Hesse: Frankfurt; Lower Saxony: Hannover; North Rhine-Westphalia: Dortmund, Duisburg, Düsseldorf, Essen, Cologne, Wuppertal). Figure 1 shows all active 34 German active LEZs in December 2009 and the 17 LEZs included for study. File S1 entails maps of all LEZs eligible for study with all index and reference stations marked (Figure S1 in S1 to Figure S19 in S1).
In total, these 17 LEZs, eligible for study, contained 108 eligible monitoring stations with 53 index stations and 55 reference stations. The data base constructed from transferred data encompassed a total of 9,517,911 data lines which were used as input to analysis. An overview is given in Table 1.

Data analysis
The data set structured for analysis consisted of matched quadruplets. A matched quadruplet comprises four pairwise corresponding measurement values consisting of two index-and two reference values. One index value and the simultaneously measured reference value were obtained during the active LEZ period, the other pair of values was obtained before introducing the LEZ. The pairs of values had a 364 days difference in time of or a multiple of 364 days, hence keeping the season, day of the week and time of day constant within the quadruplets. The allocation of reference stations to index stations was done pairwise, i.e., quadruplets were constructed by the data of one index station and allocating to it all appropriate reference stations with their data without a prior collapsing (''collapsing'' is a technical term widely used in statistics describing the summary of a table in marginal, http://www.stata.com/manuals13/dcollapse.pdf). The method has been described in detail before [45] and is a refined approach in comparison to other analytical strategies [46]. The analysis plan was critically reviewed by a chair of statistics.
The quadruplets were analysed by the ''difference score method in the two period case'' [47]: Differences in index values were regressed on differences in reference values while other data were taken into account as covariates in fixed-effect regression analyses. Two types of models were fitted: a linear (additive) model and a log-linear (multiplicative) model. The difference of the index concentration data was used as the response variable in the linear model. The log of this response variable was entered into the loglinear regression model after applying an appropriate positive offset calculated from the data [48]. The two model types differ in the assumption on how covariables may influence the index station concentration data: on an additive scale or on a multiplicative scale [49,50].
The following covariables were taken into account in the basic fixed effects regression analyses: differences at reference stations in mg/m 3 (to control e.g. for large-scale meteorological changes and seasonal effects), baseline data at reference stations in mg/m 3 (to control for time-dependent effects of reference data, Allison [47], and baseline data at index stations in mg/m 3 (to control for ''regression to the mean'' [51]. This structure defines the basic regression approach. The covariables were entered into the log-linear (multiplicative) models after adding an appropriate offset if indicated [48] and then taking logs.
The following equation describes the analysis of matched quadruples in the basic fixed-effect linear (''additive'') regression model [47] D x mdh~E z X Z k~1 E k : z k zb x : x 0mdh,cent zb Dr : Dr zdh zb r : r z0dh,cent ze: D x mdh describes the difference of the index station data at monitoring station m between days d and d-364 ( = day d+1 in the year before), always at time (hour) h, i.e., x 1mdh -x 0mdh (compare Figure 2). x 0mdh,cent denotes the baseline value at station m on day d at time h, centred at the mean of all baseline values at station m. The terms Dr dh and r z0dh,cent are the corresponding reference value data. The coefficient of major interest is the intercept of the regression model because it estimates the LEZ effect: E measures the mean effect across all LEZs, E+E k the mean effect in zone k, 1#k#Z. The coefficient b x accounts for ''regression to the mean'', b Dr for the bias in annual levels (e.g., changed meteorological conditions), b r for a time-dependent effect of reference values and e is the residual error of the concentration difference at the index stations. The second model type had the same structure but used logs of the terms (''log-linear'', ''multiplicative''). An appropriate small offset was added to avoid undefined logarithms [48]. The equation of the basic fixed-effect linear (''additive'') regression model can be justified as follows (to keep the notation simple we suppress the time index: we write, eg, x z0 instead of x z0dh ).
Let us start with assuming an ideal hypothetical situation: measurements are without any distortions and random errors, no covariates are operating. In that case we will measure the index concentration at time point 0 before the LEZ was introduced as a constant value c 0 at all index stations of the LEZ. After the introduction of the LEZ we will measure at time point 1 at the same index station the constant value c 1 . The effectiveness of the zone is simply E = c 1 2c 0 .
But even if there are no biases, random errors and no covariates we do not expect to see the same E for all LEZs. The effectiveness may depend on characteristics of the zone k like the area of the LEZ, A k (e.g., we may expect a larger effect if the LEZ area is larger). The concentration at time point 1 may be written more appropriately as c 1 +f6A k (with a multiplicative coefficient f mapping the effect of the area into the concentration scale). The effect of zone k can be described as E = c 1 +f6A k 2c 0 . Note that A k operates as an effect modifier.
We can take account of differences between the zones without referring to a specific characteristic of LEZ k, like the area: We may describe the effect of zone k in more abstract terms as E+E k [E+E k means E+E k 6z k with a multiplicative indicator z k , that takes the value 1 for zone k and zero otherwise, 1#k#Z]. E k ist the specific effect offset of LEZ k in comparison to the overall mean E of the LEZ effects. It is simple to extend the notation to cover different baseline concentrations for the different LEZs. Thus, E+E k = c 1 +(c z1 2c 1 )2[c 0 +(c z0 2c 0 )] = c z1 2c z0 , i.e., the effect of zone k is the difference between the zone specific measurement values after (c z1 ) and before (c z0 ) the introduction of the LEZ k (c 1 and c 0 now denote the averages of the concentrations across all LEZs under study).
Still, the approach is not very realistic. We should take into account background variations of the intensities, resulting from e.g. large area changes of the concentrations. These large area variations are reflected in the values r z0 and r z1 at the reference stations. Despite all efforts to measure the concentrations as precisely as possible we always will have random errors e 1 and e 0 . In this extended approach the measurement values for zone k before introducing the LEZ are x z0 = c 0 +(c z0 2c 0 )+g6r z0 +e 0 and x z1 = c 1 +(c z1 2c 1 )+g6r z1 +e 1 after the introduction. The factor g measures how strong the reference values do influence the index values. It follows that x z1 2x z0 = E+E k +g* (r z1 2r z0 )+(e 1 2e 0 ). With D x z = x z1 2x z0 , D r z = r z1 2r z0 , e = e 1 2e 0 and b Dr = g we yield the major part of the equation of the basic fixed-effect linear (''additive'') regression model. Note that we substituted z by m which means that we apply the approach in a refined way to every index monitoring station m. We have demonstrated above that a potential confounder/adjuster, like the concentration at a reference station, enters the equation in terms of the difference of the values across time (e.g., D r z = r z1 2r z0 ). And we have seen that the model can be extended by potential modifiers of the LEZ effect (''interaction terms''), like the area, by adding terms like f6A k (in contrast to adjusters not as a difference in time). We will now explain in more detail why we included the effect modifying variables x z0 and r z0 additionally.
Altman and Bland [52] and Bland and Altman [53] suggested including the mean value x zm of the concentrations at index station  m as another covariate: this allows the difference D x zm to depend on the average concentration at the station. This inclusion of x zm operates again a distortion due to ''regression to the mean'' [54]. This phenomenon is inevitably complicating longitudinal comparisons. Baseline values that are very high due to random errors will probably not be reproduced but lower values will be measured, and this is so even if the null hypotheses of no effect is true [51,[55][56][57]. A better strategy to correct for this potential distortion is to include x z0m,cent , i.e., the baseline values at the index station [58,59]. Including additionally r z0, cent was exercised in Allison [47], p. 10. This approach allows for a flexible adjustment of the annual level bias because we get rid of the assumption of a time-invariant effect of the reference station values on the index stations values. The covariates x z0m, cent and r z0, cent are centered on the mean of the values of each measuring station so that the terms E und E k can be interpreted without further transformations.
Since the impact of meteorological conditions is extremely relevant [e.g., 60], the following data were collected to be used in addition to the reference station data to control for distortions due to meteorological changes. We took over the height of the inversion base H in m, the wind velocity V in m/s, the amount of precipitation P in mm/h from the PAREST project for all investigated measurement stations and half hours in the follow-up period [61][62][63][64][65].
We extended the basic regression models to the regression model 1 approach by adjusting additionally for the change of the three meteorological variables at the index stations According to the box model of meteorology [66] the differences were calculated on the additive scale after transforming the variables into 1/H, 1/ (V+0.1 m/s), and 1/(P+0.1 mm/h). The smallest unit of scale was 0.1 throughout, hence this value was used as an offset to avoid divisions by zero [48]. Differences were determined on the multiplicative scale after taking logs of these terms [48]. We adjusted for the time span (in years) between measurements considered within a quadruplet in order to adjust for trends in concentration levels before the LEZ was introduced. In multiplicative models log of the time span was used after applying an appropriate offset [48].
In regression model 2 approach, the following time-dependent binary indicators were additionally adjusted for: period of school holidays (yes/ no), period of environmental bonus paid (yes/no) and periods when trucks were not allowed to enter the area where the measurement station was located (yes/no). In Germany a bonus was paid to car owners between January 14, 2009 and November 2, 2009 if they bought a new car with a reduced exhaust emission (http://www.bafa.de/bafa/de/ wirtschaftsfoerderung/umweltpraemie/index.html). These binary indicators were entered also into the extended log-linear (multiplicative) models.
This statistical approach was successfully validated in advance to the study in an analysis of simulated data from FU Berlin [67]. The simulated data was produced by the PAREST project [61,63, www.parest.de]. The major aim of this project was the identification of emission reducing strategies by simulation. Transport and distribution models were developed and applied, the so called REM-CALGRID approach [68][69][70]. The model was applied to the city of Munich, and simulated half-hour PM 10 data were generated for each of the five index and three reference stations (see Figure S10 in File S1). Data of the year 2005 were simulated twice, with and without adding an LEZ effect (the value of the imprinted effect was unknown to the analyzing working group). 280,320 data lines were transferred. The simulated PM 10   Additive and multiplicative regression models were fitted to subsets of the data to perform sensitivity analyses: continuous measurement data, continuous measurement data collapsed to four-week averages, diffuse sampler data, pooled diffuse sampler and collapsed continuous data, always with and without excluding times with restrictions of truck traffic; quadruplets produced by index traffic stations only. The pooled continuous and diffuse sampler measurement data determined for four weeks periods was of major interest in this study because the annual average is the most critical endpoint to consider (see Introduction section) and these data cover both types of measurement data. Because annual data are generally too coarse for LEZ effect estimation, we followed-up on averages over about a month. The additive and multiplicative regression models analysing these sets of data were specified with three sets of covariables as described above. All basic models, the models evaluating continuous data collapsed to 4-week averages and the models fitted to single index stations were not used for statistical testing of the LEZ effects. Tests for effects measured by NO 2 , NO, and NO x quadruplets were not considered as independent. According to this structure, we evaluated 2*3*2*2*2 = 48 statistical tests for each of the three endpoints. Due to this multiple testing scenario we applied an adapted significance level of 5%/50 = 0.1% [''family wise error rate'', 71].
We fitted additionally explorative models that estimated the size of the LEZ effect at each index station enrolled. In addition, we estimated mean effects of the LEZs across the Federal states. The results of these exploratory analyses were mainly used for internal discussions of the project steering committee (see Acknowledgement).
All regression models used robust estimators of coefficient variances. All data analyses were performed using Stata 11 [72] on a 64-bit PC.

NO 2 -continuous measurements
The basic data consisted of 6,412,864 data lines leading to 3,038,781 quadruplets of continuous NO 2 measurement (half-hour averages) from 6 Federal states and 17 LEZs with 38 index stations and 45 reference stations. Table 2 gives an overview of the distributions observed: on average, NO 2 concentrations were between 50 mg/m 3 and 52 mg/m 3 at the index stations and between 26 mg/m 3 and 27 mg/m 3 at the reference stations. The differences at the stations varied substantially in a range of hundreds of mg/m 3 upwards and downwards. A comparison of mean and median differences at index and reference stations indicated a crude LEZ effect estimate of about 21 mg/m 3 . In the linear model 1 the absolute effect estimate was similar: 21.11 mg/ m 3 ( Table 3). The model 1 results showed a time-dependent impact of reference station data, a pronounced ''regression to the mean'', a clear influence of the three meteorological variables (independently from the crude adjustment by reference station data), and a downward trend of concentrations before the LEZs were introduced. The direction of impact of the meteorological variables was as expected: the smaller H, V, or P the larger the index NO2 concentrations. In linear model 2 the LEZ effect estimate was slightly more pronounced: 21.85 mg/m3. In the loglinear model 1 (multiplicative approach), the relative effect estimate was 0.979, i.e., a reduction of 2.1% was found ( Table 4). The estimated impact of covariables agreed with the finding in the corresponding linear model. When applying regression model 2 the relative LEZ effect estimate was 0.961, i.e., the reduction was estimated to be 3.9%.  (Table 5). Using the linear model 1 approach the absolute effect estimate was 20.826 mg/m 3 ( Table 6). The meteorological variables showed no substantial impact due to the long averaging period. Model 2 estimated the LEZ effect as 21.73 mg/m 3 . The log-linear modelling led to a relative effect of 0.980 (Table 7, model 1) or 0.961 (model 2). Table S1 in File S1 provides a detailed overview of the results when fitting a series of models to analyse the NO 2 measurements. LEZ effect estimates were about 21 mg/m 3 to 22 mg/m 3 (additive models) or 22% to 24% (multiplicative models).

NO -pooled continuous and diffuse sampler measurement data
A total of 5,790 data lines from 17 LEZs with 46 index stations and 54 reference stations were available to analyse pooled continuous and diffuse sampler NO measurement data. A descriptive analysis of the 4,005 quadruplets indicated a LEZ effect of about 0 mg/m 3 to 21 mg/m 3 (Table 8). Using the additive approach the absolute effect estimate was 21.13 mg/m 3 in model 1 (Table 9). When the model specification 2 was applied the LEZ effect estimate changed the sign: +0.38 mg/m 3 , i.e., no reduction was indicated in this extended model type. The loglinear regression model of type 1 yielded a relative effect estimate of 0.968 (Table S2 in File S1). The direction of the estimated relative effect changed when model 2 was applied: +1.20.

NO x -pooled continuous and diffuse sampler measurement data
The analysis of pooled continuous and diffuse sampler NO x measurement data was performed using 4,005 quadruplets that originated from a set of 5,790 data lines generated by 46 index stations and 54 reference stations of 17 LEZs. According to the distributions of differences a crudely estimated LEZ effect (based on averages or medians) was present of about 20.2 mg/m 3 to 2 1.3 mg/m 3 (Table S3 in File S1). Adjusting for covariables in linear model 1 returned an absolute effect estimate of 21.74 mg/m 3 (Table S4 in File S1). The adjustment for further covariables (regression model 2) led to an effect estimate of 20.89 mg/m 3 . When the log-linear model 1 was used (Table S5 in File S1), a relative LEZ effect of 0.976 was found. The adjustment for additional covariates (model 2) led to a change in direction: the relative effect was estimated as 1.048.
Summary of Results for NO 2 , NO, and NO x Table 10 gives an overview of the findings for NO 2 , NO, and NO x . The mean concentration levels at the index stations were about 50 mg/m 3 for NO 2 and for NO, thus, about 100 mg/m 3 for NO x . Model 1 analyses showed reductions of the concentrations after introducing the LEZs. Although small, all effect estimates were statistically significant at the 0.1% level. Model 1 estimates based on an additive structure gave compatible findings to the loglinear multiplicative approach (e.g., 2% of 50 mg/m 3 = 1 mg/m 3 ). The model 1 LEZ effect estimates were similar to, but slightly more pronounced than crude LEZ effect estimates based on direct comparisons of the measurement differences at index stations and reference stations within the quadruplets while ignoring the impact of covariables (compare Tables 2, 5, and 8 and Table S1 in S1). All analyses point to the conclusion that on average the concentration reducing effect of LEZs was smaller than 2 mg/m 3 for each of the three components NO 2 , NO, and NO x , i.e., not higher than about 4%, when considering all investigated index stations. However, breaking down the analyses by Federal states or LEZs yielded heterogeneous estimates of effects.
The NO 2 analysis was based on 192 comparisons of index vs reference stations, among them were 31 index stations characterized as ''background'', one characterized as ''industry'' and 160 as ''traffic'' stations. We performed a sensitivity analysis by restricting the evaluation to the stations close to traffic. The additive linear type 2 model estimated an effect of 21.73 mg/m 3 at all index stations (see last line in Table S1 in S1). When the analysis only accounted for the traffic stations we got a slightly more pronounced LEZ effect estimate of 22.26 mg/m 3 (3,406 quadruplets, pooled data: four week averages). An analysis of the continuous data yielded almost the same result: 22.35 mg/m 3 (2,105,702 quadruplets, half-hour averages).

Discussion
In this study we analysed the effect of introducing the ''Low Emission Zone (LEZ) of pollutant group 1'' (which restricts from entering Diesel cars of an European emission standard below Euro 2 without particulate reduction) on NO 2 , NO, and NO x concentrations in Germany. We included as many LEZs as possible (17 out of 34 in 2009 met our inclusion criteria) into a homogeneous analysis of nitrogen oxide data measured before and after the introduction of LEZs of pollutant group 1 until the end of 2009. We used matched quadruplets of index and reference station values and analysed the changes in concentrations with fixed-effect regression models while adjusting for important covariables. We performed sensitivity analyses by applying two model structures (additive and multiplicative) with varying sets of covariables to different subsets of the data. We based our study on precisely matched quadruplets to avoid distortions and to increase validity. A potential downside of the increased validity is a loss in precision due to the reduced data set eligible for analysis. However, the loss in power was negligible in this application because P-values were small even when taking multiple testing into account [73]. The statistical approach was successfully validated in advance to the study in an analysis of simulated data from FU Berlin [67]. We checked whether the adjustment in one model that analyzed all LEZs simultaneously and assumed unknown but identical covariate coefficients was appropriate for all LEZs. To do so we evaluated each LEZ separately and performed a meta-analysis on the findings. The precision weighted mean of the effect estimates  Regression coefficient, robust standard errors of coefficient, t-statistic, two-sided P-value, and 95%-confidence interval of coefficient. The absolute LEZ effect estimate is given by the coefficient E in mg/m 3 (,0: concentration is lowered by LEZ).  Table S1 in S1). We conclude that the fitted single model that evaluated all LEZs simultaneously was appropriate and did not suffer from an insufficient adjustment.
As an overall finding the average effect of LEZ introduction on nitrogen oxide concentrations (NO 2 , NO, and NO x = NO 2 +NO) was not higher than 2 mg/m 3 at all index stations, i.e., not higher than about 4%. The effect was only slightly larger when we restricted the analyses to stations close to traffic. In the main analyses the coefficients describing the reductions were statistically significant on the 0.1% level, i.e., after taking multiple testing into account. We note, however, that the P-values calculated are potentially too small because autocorrelations in the data were not taken into account.
We detected a substantial heterogeneity of effects across the investigated LEZs and Federal states. However, this finding is not surprising because -the realisation of LEZs differed between states and within states (e.g., date of introduction, covered population and area of LEZs differ (compare Table 1), some operate together with an additional restriction of van traffic) -the degree of representativeness of monitoring stations inside the LEZs differs across LEZs (index stations: distances from centre/border of LEZ differ, used as background or hot spot stations and sometimes placed in street canyons) -the degree of representativeness of monitoring stations outside the LEZs differs across LEZs (reference stations: distances from LEZ differ, traffic conditions differ) -the applied measuring systems differ (continuous chemiluminescense procedure vs diffuse long-term sampling with chromatography).
The large variation of LEZ effect estimates across the LEZs should be put into perspective by considering the phenomenon of ''regression-to-the-mean'' [51]. Due to this phenomenon we expect that single observations with high baseline values show potentially decreasing trends -and low baseline values potentially increasing trends. This is true even under the null hypothesis of no causal LEZ effects on nitrogen oxide concentrations. ''Regressionto-the-mean'' has been shown to be rather pronounced in this study. Thus, the interpretation of single LEZs effect estimates is clearly limited and we will not report any details with the consent of the involved state institutions who performed the measurements (see Acknowledgements).  Models of type 2 showed more instability and returned positive effect estimates in some situations (see Table 10). Regression model 2 included as additional variables time-dependent binary indicators for period of school holidays, period of environmental bonus paid and periods when trucks were not allowed to enter the area where the measurement station was located. In some LEZs these variables were highly correlated with the active LEZ periods so that unstable findings due to collinearities can be expected. Such collinearities can introduce a bias away from the null and may generate exaggerated negative or positive model coefficients even if the true effects are near to zero [74]. Log-linear models showed to be more sensitive to these distortions. This may indicate a less appropriate modelling of the data when assuming multiplicative effects of covariates.
There are evaluations available concerning potential effects of LEZs on NO 2 concentrations summarized by the German Federal Environmental Agency [7]: A total NO 2 reduction by 5% and a local traffic-related NO 2 reduction by 12% may be reached given ''LEZ of pollutant group 3'' so that only cars with a green sticker (Diesel vehicles of Euro 6, 5, 4 or Euro 3 with particle filter, gasoline cars with catalytic converter) are allowed to enter the LEZ [3]. This statement is based mainly on preliminary evaluations of the Berlin LEZ data by Rauterberg-Wulff and Lutz [29]. Puls and Jä ger-Ambrozewicz [75] reported for the Frankfurt LEZ and an observation period until the end of 2011 effects of less than 3% which is closer to our present findings although they also cover a period of ''LEZ of pollutant group 2'' after Jan1, 2010. Only cars with a yellow sticker (Diesel vehicles of Euro 3 or 4 standard or Euro 2 with particle filter, gasoline cars with catalytic converter) were allowed to enter the Frankfurt LEZ after Jan 1, 2010 [3]. Bruckmann et al. [6] reported reductions of the annual average of NO 2 concentrations up to 2% associated with the introduction of ''LEZs of pollutant group 1'' in North-Rhine Westphalia, and an absolute LEZ effect of about 21.2 mg/m 3 . In Hannover no NO 2 reduction could be shown after introducing an ''LEZ of pollutant group 1'' [76]. All of these statements, however, were based on crude comparisons without sufficiently adjusting for important covariates like weather conditions, and traffic restrictions etc. Only Puls and Jä ger-Ambrozewicz [75] applied a more sophisticated approach. They performed a time-series analysis and fitted Regression coefficient, robust standard errors of coefficient, t-statistic, two-sided P-value, and 95%-confidence interval of coefficient. The absolute LEZ effect estimate is given by the coefficient E in mg/m 3 (,0: concentration is lowered by LEZ).  Regression coefficient, robust standard errors of coefficient, t-statistic, two-sided P-value, and 95%-confidence interval of coefficient. The relative LEZ effect estimate is given by the coefficient E (,1: concentration is lowered by LEZ). regression models for the Frankfurt LEZ. These models', however, were not correctly specified as they did not include differences of the covariables but the absolute values only, and so they could not control for potential confounding effects although this was intended by the authors. All publications cited above reported only on individual LEZs or certain Federal states in Germany and not on the LEZ effect on the national level. Generalisations from these data are problematic because of the heterogeneous configurations of LEZs. A realistic estimate should be based on a homogeneous analytical approach covering as many LEZs and Federal states simultaneously as possible, as performed in this study. Table 11 presents an overview of other study results published in the peer-reviewed literature on forecasted or measured LEZ effects on NO 2 concentrations.
Our results are in good accordance with the prognosis study PAREST of FU Berlin [61,63]. An extensive description of the project is available [77]. The prognoses of PAREST are comparable with our estimates at all index stations because PAREST worked with an area coarseness defined by grid square of about 1 km61 km and, thus, cannot estimate changes at single stations. Duyzer et al. [78] studied whether monitoring station data are representative for the population living in the area and concluded that the background station data are more appropriate to describe the impact on the citizens than the hot spot traffic stations. We conclude that the findings of PAREST and our results about the effect at all index stations should be preferred in an evaluation (not the effect estimates restricted to the traffic stations). PAREST predicted LEZ effects on NO 2 levels assuming that only cars with green stickers are allowed to enter (LEZ of pollutant level 3). For the Berlin LEZ the authors calculated a reduction of about 1 mg/m 3 to 1.3 mg/m 3 in the city centre (relative: 3% to 5%), for the Munich LEZ a reduction of 1 mg/m 3 in the city centre (relative: up to 5%), for the Ruhr area a reduction of 1 mg/m 3 to 1.7 mg/m 3 (relative: 3% to 4%). Setting the whole Ruhr area to a LEZ of pollutant level 3 lead to the prognosis of a reduction in NO 2 concentrations of 1 mg/m 3 to 2 mg/m 3 (relative: 3% to 6%). It needs to be taken into account that these prognoses by PAREST are based on the pollutant level 3 LEZ scenario. We do not expect, therefore, that our findings from this study may change relevantly if the LEZs are extended to cover larger areas or if stricter traffic restrictions are applied.
A very large LEZ was introduced in London as a congestion charging zone. However, only prognoses of the potential LEZ effect on nitrogen oxide concentrations are available. NO x reductions between 3.8% in 2008 up to 7.3% in 2012 along roadways were predicted in a modelling scenario for the London LEZ with vehicles and buses required to meet Euro 4 standards   The INTARESE project [80] modeled NO 2 concentration changes for both LEZs in Rome and confirmed this finding of only small additional gains by stricter traffic restrictions. The main reductions were expected to be achieved already by excluding Euro 0 cars: 22.3 mg/m 3 or 23.0 mg/m 3 . If only Euro 4 cars were allowed to enter the LEZs the reductions were expected to increase only slightly to 23.0 mg/m 3 or 24.1 mg/m 3 [81].
The ''Stockholm Trial'' involved a road pricing system to improve the air quality and reduce traffic congestion. The test period of the trial was January 3, 2006 to July 31, 2006. Vehicles travelling into and out of the charge cordon were charged for every passage during weekdays. Annual mean contributions to total levels of nitrogen oxides from emissions from road traffic with and without charges according to the Stockholm Trial were estimated. NO x concentrations were lowered in periods with charges, but the study showed a small decrease only: 20.23 mg/ m 3 (Greater Stockholm) and 20.81 mg/m 3 (inner city) [82]. No multivariable modeling was tried.
Boogaard et al. [83] analyzed measurements of NO 2 and NO x conducted simultaneously at eight streets, six urban background locations and four suburban background locations before (2008) and two years after implementation of an LEZ (2010) in five cities of The Netherlands (8 index stations, 4 reference stations). Index concentrations were lower in 2010 than in 2008 (NO 2 : 24.5 mg/ m 3 , NO x : 26.1 mg/m 3 ) but the differences were not statistically different. The study performed only crude comparisons and did not apply regression techniques to adjust for covariables.
The present study can be regarded as one of the most comprehensive approaches so far, analysing measurement data of nitrogen oxides concentrations in order to assess LEZ effects. The LEZ pollutant group 1 reduction effect on nitrogen oxides (NO 2 , NO, and NO x ) was estimated as being no higher than 2 mg/ m 3 at all index stations and index traffic stations, i.e., no higher than about 4%. This estimate based on measurement data can be rated as the most profound currently available. This result also needs to be interpreted in the light of the existing EU limit values because LEZs are often supposed to be the most effective measure that cities can take to reduce air pollution problems in their area [84]. The respective NO 2 concentration limit [24] enforced in Germany since 2010 is 40 mg/m 3 (1 year average). Values are in excess and about 69% of all German traffic stations showed annual averages higher than 40 mg/m 3 [25]. The four week averages of NO 2 concentrations at index stations after LEZ introduction were found to be 55 mg/m 3 (median and mean) or 82 mg/m 3 (95th percentile). It follows that the estimated reduction of NO 2 concentrations in the range of 2 mg/m 3 appears to be of negligible impact when the current concentration levels should be lowered to the EU limit. The same judgement seems to apply on the EU level where the NO 2 concentrations were reported to show a pronounced excess in many cities [26].
Regarding the information from the HBEFA [85] for real driving conditions in Germany, Austria and Switzerland with respect to vehicles that meet Euro 5 and 6 emission standards, no noteworthy reductions of NO 2 and NO x immissions are to be expected until a remarkable share of vehicles with NO x after treatment systems (Euro 5 for HD trucks and Euro 6 for passenger cars) will be on the street [85].
The Handbook of Emission Factors for Road Transport (HBEFA) was originally developed on behalf of the Environmental Protection Agencies of Germany, Switzerland and Austria. In the meantime, further countries (Sweden, Norway, France) as well as the JRC (European Research Center of the European Commission) are supporting HBEFA. HBEFA provides emission factors, i.e. the specific emission in g/km for all current vehicle categories (PC, LDV, HDV, buses and motor cycles), each divided into different categories, for a wide variety of traffic situations (http:// www.hbefa.net/e/index.html).
Interestingly, remarkable differences in NO x and NO 2 emissions from passenger cars and light duty vehicles are documented when low test cycle emissions were compared with relatively higher NO x /NO 2 concentrations measured along roadsides [86,87].
We analysed PM 10 concentrations additionally [32] from 19 German LEZs. From about 2005 until the end of 2009 continuous half-hour measurement values as well as gravimetrically determined daily measurements of PM 10 were collected. Two continuous procedures were used to measure mean PM 10 concentrations per half-hour intervals [38,88]: -Absorption of b-radiation (BA). The particulate matter is deposited on a filter tape and the change in b-ray transmission is measured. -Tapered Element Oscillating Microbalance (TEOM). An inertial balance directly measures the mass collected on an exchangeable filter cartridge by monitoring the corresponding frequency changes of a tapered element.
In addition, gravimetric samplers were used to measure daily averages of PM 10 concentrations [49,88,89]. 2,110,803 quadruplets of continuous PM 10 and 15,735 gravimetric quadruples were identified leading to 61,169 quadruplets based on daily PM 10 averages. The analyses showed that best LEZ effect estimates were #0.2 mg/m 3 at all index stations, i.e., the relative PM 10 reduction #1%. Best estimates at all index stations near traffic (excluding urban background and industry index stations) were below 1 mg/ m 3 (less than 5%, resp). Effects were smaller than predicted prior to the introduction of LEZs. Limited data (1750 quadruplets of monthly averages) were also available to estimate the effects on soot parameters (elemental carbon, organic carbon and total carbon). The average of total carbon concentrations was estimated as 13 mg/m 3 and LEZ effect estimates were about 20.55 mg/m 3 or 24.2%. For PM 2.5 only 650 quadruplets based on half-hour data and 99 quadruplets of daily concentration averages could be analyzed. The PM 2.5 concentration mean was found at 17 mg/m 3 . All LEZ effect estimates on PM 2.5 were positive, i.e., no indication of reduced concentrations after the introduction of the LEZs was found.
Due to the proven marginal reduction of nitrogen oxide concentrations (NO, NO 2 , NO x ), LEZ as a regulatory action cannot be seen as an efficient measure to substantially reduce ambient nitrogen oxide exposures in the cities. Beyond that, this result is in good accordance to the effectiveness of LEZs on the reduction of PM 10 , too [32]. As predicted [33], long-term compliance problems with ambient air NO 2 concentrations should be expected even if LEZs were introduced or enlarged for the purpose of NO 2 reductions in cities.
The approach can be extended to account for other variables that are considered relevant [45]. Such data can only be used if these data are homogeneously available at all index and reference stations and are also available before and after the introduction of LEZs. Traffic density and car fleet properties are such variables of interest that do not meet the inclusion criteria: there are almost no data available in Germany to describe differences in flow of traffic and car fleet properties between index and reference stations and across time. To put this into perspective, we like to note first that changes in traffic density and car fleet properties are potentially affected by LEZs. It follows that traffic density and fleet properties should be considered as potential outcomes of LEZ introduction and not only as confounders of LEZ effects. This means that these data must not be accounted for by covariables in regression modelling even if the data were available in such a way that the inclusion criteria were met. Anyhow, authors who described changes in traffic-flow in Berlin argued against the interpretation that LEZs caused such displacements of traffic-flow from inside the LEZ to the reference stations [31]. Second, we note that the missing information on traffic density and fleet properties can be used to argue for biases in both directions. On the ones side, traffic could be displaced from the LEZ area to the reference stations outside so that the concentrations are underestimated inside but overestimated outside the LEZ, causing a potential overestimate of the LEZ effect. On the other side, if the car fleet is renewed not only inside the LEZ but also outside at the reference stations this may lead to a potential underestimate of the LEZ effect. We cannot conclude, therefore, on the direction of the potential bias.
The data analyzed in this study are the only available longitudinal measuring data to investigate the development of nitrogen oxide concentrations before and after the introduction of LEZs in Germany. We conclude that the material used can be considered as ''data best available''. Interpretations are limited, however, because spatial representativeness of the measuring sites can be disputed. It is unknown whether these data can be used to reliably estimate the exposures of citizens living in the LEZs. Since this is not only a problem of German measuring networks but an issue on the European level a research project was started to investigate the representativeness of measurement sites [78]. The authors concluded that measurements at the background stations are of greater importance than the data collected at the hot spots (traffic stations). Other limitations of hot spot data result from the fact that the citizens living in the LEZ area spend most of their time indoors and that indoor pollution data differ from hot spot outdoor concentrations [90,91].

Conclusions
This is the first comprehensive approach to assess effects of LEZs on NO 2 , NO and NO x concentrations with the help of measurement data on the Federal level in Germany. Reductions due to introducing LEZs of pollutant group 1 were estimated to be limited by 2 mg/m 3 (or 4%). The 4-week averages of NO 2 concentrations at index stations after LEZ introduction were found to be 55 mg/m 3 (median and mean) or 82 mg/m 3 (95th percentile). The NO 2 concentration limit [24] enforced in Germany since 2010 is 40 mg/m 3 (1 year average). Concerning the expenditure of regulations and controls which are required to introduce and operate LEZs in cities, the proven impact of LEZs on the reduction of NO 2 ambient air concentrations with at a maximum of 4% in the first phase is very small.

Supporting Information
File S1 Contains Tables S1-S4 and Figures S1-S19. Table S1: Detailed results on NO 2 -quadruplet analyses by linear (additive) log-linear (multiplicative) regression models. Table S2: NO: Log-linear (multiplicative) model 1 evaluating the quadruplets of pooled continuous and diffuse sampler NO-measurements. Regression coefficient, robust standard errors of coefficient, tstatistic, two-sided P-value, and 95%-confidence interval of coefficient. The relative LEZ effect estimate is given by the coefficient E (,1: concentration is lowered by LEZ). Table S3: NO x : Quadruplets of pooled continuous and diffuse sampler NO xmeasurements: index stations (Ind), reference stations (Ref) before (pre) and after (post) introduction of LEZ. Ind.diff and Ref.diff denote differences between index measurements and between reference measurements (negative post-pre differences indicate lower values after introduction of LEZ). Table S4: NO x : Linear (additive) model 1 evaluating the quadruplets of pooled continuous and diffuse sampler NO x -measurements. Regression coefficient, robust standard errors of coefficient, t-statistic, two-sided P-value, and 95%-confidence interval of coefficient. The absolute LEZ effect estimate is given by the coefficient E in mg/m 3 (,0: concentration is lowered by LEZ). Table S5: NO x : Log-linear (multiplicative) model 1 evaluating the quadruplets of pooled continuous and diffuse sampler NO x -measurements. Regression Gathe. Two reference stations outside the low emission zone: 3)DENW029 Hattingen-Blankenstein (not included in the figure since located approx. 13 km north of low emission zone) 4)DENW080 Solingen-Wald (not included in the figure since located approx. 5 km south west of low emission zone). (DOCX)