Estimating the Counterfactual Impact of Conservation Programs on Land Cover Outcomes: The Role of Matching and Panel Regression Techniques

Deforestation and conversion of native habitats continues to be the leading driver of biodiversity and ecosystem service loss. A number of conservation policies and programs are implemented—from protected areas to payments for ecosystem services (PES)—to deter these losses. Currently, empirical evidence on whether these approaches stop or slow land cover change is lacking, but there is increasing interest in conducting rigorous, counterfactual impact evaluations, especially for many new conservation approaches, such as PES and REDD, which emphasize additionality. In addition, several new, globally available and free high-resolution remote sensing datasets have increased the ease of carrying out an impact evaluation on land cover change outcomes. While the number of conservation evaluations utilizing ‘matching’ to construct a valid control group is increasing, the majority of these studies use simple differences in means or linear cross-sectional regression to estimate the impact of the conservation program using this matched sample, with relatively few utilizing fixed effects panel methods—an alternative estimation method that relies on temporal variation in the data. In this paper we compare the advantages and limitations of (1) matching to construct the control group combined with differences in means and cross-sectional regression, which control for observable forms of bias in program evaluation, to (2) fixed effects panel methods, which control for observable and time-invariant unobservable forms of bias, with and without matching to create the control group. We then use these four approaches to estimate forest cover outcomes for two conservation programs: a PES program in Northeastern Ecuador and strict protected areas in European Russia. In the Russia case we find statistically significant differences across estimators—due to the presence of unobservable bias—that lead to differences in conclusions about effectiveness. The Ecuador case illustrates that if time-invariant unobservables are not present, matching combined with differences in means or cross-sectional regression leads to similar estimates of program effectiveness as matching combined with fixed effects panel regression. These results highlight the importance of considering observable and unobservable forms of bias and the methodological assumptions across estimators when designing an impact evaluation of conservation programs.


Introduction
Land cover change continues to be a leading cause of biodiversity and ecosystem service loss. There is relatively weak empirical evidence on how well conservation policies and programssuch as protected areas or payments for ecosystem services (PES)-slow or halt these land cover changes. However, the conservation field has started to emphasize the need for more rigorous evaluation of conservation approaches [1,2]. Evaluation is fairly common in the medical sciences, and increasingly in rural development and agricultural programs, and the attention in conservation is especially relevant for the new wave of conditional, incentive-based approaches that fall under the term PES, as well as the UN's proposed Reducing Emissions from Deforestation and Degradation (REDD) program, which both emphasize additionality of land cover outcomes. While there are alternative approaches to evaluation, the use of quasi-experimental designs that establish a counterfactual outcome in order to evaluate effectiveness of a program or policy have arguably become some of the most promoted methods [1,3,4].
Quasi-experimental approaches are designed to correct for the fact that conservation programs are not randomly allocated across the landscape. This non-random allocation of where a conservation program is targeted and who enrolls in a conservation program influences the estimate of changes in land cover outcomes; this means that conventional methods of comparing changes, such as simple linear cross-sectional regression or differences in means tests, can be biased [5]. A clear example of this non-random targeting is illustrated by the placement of protected areas: most protected areas are located in places unsuitable for other economic activities, so much so that they are often given the nickname 'rock and ice' [6]. This remoteness reduces the impact that most protected areas have on preventing deforestation because they are much less likely to have forest cover change in the first place; of course, this does not account for any future benefits that protecting that forest today might have. One of the first studies to highlight the magnitude of the bias that arises when the non-random placement of conservation approaches is ignored was a study of the impact of Costa Rica's protected areas on deforestation; the researchers found that cross-sectional regressions that ignore the non-random placement of protected areas overestimated the impact of parks by as much as 65% [7].
To counter these biases, quasi-experimental methods rely on the construction of a valid 'control' group to estimate the impact of the conservation program or policy (the 'treatment'), where the control group is made up of observations that did not receive the conservation program. While some quasi-experimental designs utilize program rules to create the control group -for example, regression discontinuity designs and instrumental variables-the more common approaches in the conservation literature are matching and fixed effects panel methods. Matching is a statistical approach that constructs a counterfactual group based on observable variables thought to influence receiving the treatment and the outcome of interest [3]. The researcher constructs a control group that is as similar as possible to the treatment group-similarity is based on what data can be collected and is tested by comparing average values of covariates. With the matched sample of treatment and control observations, the researcher can then estimate the impact of the program using a variety of estimators, but the most common methods include a simple t-test on difference in means, or cross-sectional regression. Since the evaluation of protected area effectiveness in Costa Rica [7], the combination of matching with difference in means or cross-sectional regression has been used in a number of conservation evaluations of protected area effectiveness , and less extensively, in evaluations of PES, decentralization, land tenure, land zoning, and integrated conservation and development programs on land cover outcomes [28][29][30][31][32][33][34][35][36][37][38]. Reviews of conservation evaluations that use matching can be found in [39][40][41].
A second quasi-experimental approach to impact evaluation is to use fixed effects panel methods. This approach assumes that where a conservation program is targeted or who enrolls in it, as well as the outcome of interest, is based on observable and unobservable variables [3]. Panel data-data with multiple years of observation on the same cross-sectional units-must be available before and after the conservation program is implemented for the method to be used. By collecting data over time on the same observation, any time-invariant unobservable covariate is controlled through the use of fixed effects for each cross-sectional unit; the ability to control for limited forms of unobservable bias is the main advantage over cross-sectional methods. Fixed effects panel methods are a generalization of the DID method in program evaluation, where the latter use aggregate data [42]. In program evaluation, this method is sometimes referred to as a 'before-after-control-intervention' design. Fixed effects estimation, or DID, can be used with the full sample of treatment and control observations, or after treatment observations are 'matched' to a more similar control group.
In addition to controlling for time-invariant unobservables, fixed effects can be advantageous to estimation if data on observable time-invariant covariates are difficult to obtain. For example, it can be difficult to find local datasets on soil quality or rainfall that are of the same resolution as land cover data-even though coarser globally available datasets may exit. If these characteristics influence the likelihood of a conservation program or land cover change outcome, then in a fixed effects panel model they would be implicitly controlled for through the fixed effects for each cross-sectional unit. Parcel fixed effects capture all time-invariant parcel factors-observed and unobserved-that affect land cover change. In a cross-sectional study, all unobserved time-invariant parcel variables would be omitted from estimation. Similarly, if data correspond to a specific landowner, the fixed effects would control for any time-invariant household motivations to participate in the conservation program or to deforest the land. If these unobservable variables are not important (i.e. correlated with conservation variables), then cross-sectional and fixed effects methods will provide similar estimates of the impact of conservation programs on land cover outcomes. But, if they are important, then cross-sectional methods will lead to biased results, and fixed effects will move us closer to the 'true' estimate of impact. A comparison of experimental with quasi-experimental impact estimates for a water conservation program found that combining matching with fixed effects panel regression comes closest to replicating experimental results [43], suggesting that this might be the most robust way to estimate the impact of a conservation program when randomization is not possible. Yet, only a handful of studies that we know of use matching combined with DID or fixed effects panel methods to estimate the impacts of conservation programs on land cover outcomes [27,28,30,31,33,34].
The purpose of this paper is to highlight the potential of using fixed effects panel methods to control for time-invariant unobservables in impact evaluations of conservation programs on land cover change outcomes. There are a number of good overview papers on using quasiexperimental impact evaluation designs and the use of matching to construct a valid control group in the conservation field [1,3,[44][45][46], but none of these emphasize the use of fixed effects methods or compare and contrast panel methods with cross-sectional analysis. Our focus on how to use fixed effects to evaluate conservation programs is particularly important given the development of new global panel datasets on forest loss; for example, the development of a global database that tracks forest status on 30-meter by 30-meter pixels annually beginning in 2000 [47]. These data would provide access to the outcome variable of interest in many conservation evaluations-land cover change. Paired with the fact that many places where we want to do impact evaluations are data constrained, the use of fixed effects methods could be a relatively easy and more robust method of conducting an evaluation of the impact of conservation policies and programs on land cover change outcomes versus cross-sectional methods, since they can control for observable and time-invariant unobservable sources of bias.
After briefly reviewing quasi-experimental impact evaluation methods in Section 2, we compare impact estimates using matching to construct a control group combined with difference in means and cross-sectional regression to fixed effects regression with and without matching to create the control group for two conservation programs: a national PES program in Northeastern Ecuador and strict protected areas in European Russia. We discuss whether these methods lead to differences in statistical significance or the magnitude of the conservation outcome -conclusions that affect the policy implications of these conservation programs on land cover change. Our overall goal is to provide additional guidance on how the choice of quasi-experimental design can affect estimated program outcomes and suggest when attention to observable versus unobservable variables is important in estimating treatment effects.

Quasi-Experimental Impact Evaluation Methods Matching
A major emphasis in the conservation evaluation literature has been the non-random placement of many conservation interventions, which lead to selection bias in estimation [44]. As a result, the use of matching to construct a control group that is 'as similar as possible' to the observations that receive the conservation program has been emphasized. Using observable data, treatment observations are 'matched' to their most similar control observations and then this new sample is used to estimate the effect of the conservation policy or program on the land cover outcome [5,44,[48][49][50]. The process of finding the most similar control observations and then estimating the treatment effect with this new sample reduces the bias present in estimating the treatment effect using all the possible control observations, many of which are very different from the places or households that receive conservation programs. There are a number of matching algorithms and estimators available [51][52][53]. Some of the more popular matching algorithms include propensity score matching and covariate matching. In propensity score matching, the researcher estimates a probability of receiving the treatment for each observation, and then matches units in the treatment group to those outside the program that have the closest propensity score [54]. In covariate matching, treatment observations are matched to those that did not receive the program based on individual covariates using a multivariate distance metric [53]. Many matching algorithms can be implemented using pre-programmed codes in Stata or R. The new, matched, sample of treatment and control observations can then be used to calculate the treatment effect with a variety of estimators. The simplest method would be to take the average of the differences in means across the matched treatment and control observations.

Combining Matching with Cross-Sectional Regression
It is highly recommended that matching to produce the control group be combined with postmatching cross-sectional regression to estimate the treatment effect [5,50,54]. This is because, in many cases, the matching algorithm will not have balanced all of the observable covariates across treatment and control observations, and so additional balancing is necessary. Some estimators have built-in bias-adjustment options that automatically implement a post-matching cross-sectional adjustment [53].
The major assumption made when using a difference in means test or cross-sectional regression after matching the sample is that there are no unobserved characteristics associated both with potential outcomes and treatment that could bias the estimation; this is known as the unconfoundedness assumption [5]. While there are options to test the sensitivity of results from matching estimators to hidden bias-through tests such as Rosenbuam bounds-these tests cannot explicitly tell whether hidden bias is actually present or not [54].

Fixed Effects Panel Regression
As applied to the land-cover change setting, fixed effects panel regression uses repeated temporal observations of land cover change on the same plot or parcel of land. Rather than relying on the statistical construction of a control group as in matching, panel techniques use the temporal dynamics of the data-observing the treatment and control observations before and after treatment-along with cross-sectional variation in treatment status across plots to construct the counterfactual outcomes. The key dependent variable indicates land cover change (LCC) for plot i in time t, denoted LCC it . For example, suppose plot i was deforested in time t = 4 over a 5-year panel. Then, LCC i4 = 1 while LCC it = 0 for t = 1, 2, 3, and the plot is dropped in t = 5 since the parcel is no longer forested (assuming binary measure of forest/non-forest). A generic fixed effect panel regression equation is constructed as follows: Where X it denotes time-varying covariates of land cover change for plot i (e.g. distance to forest edge), G i denotes time-invariant covariates of land cover change (e.g. slope), C it indicates the time-varying conservation status of plot i, Y t indicates time fixed effects, and a i + ε it is the composite unobservable. In panel data analyses, the model unobservable is decomposed into a time-invariant component (a i ) and a time-varying component (ε it ). The primary goal of estimation is to obtain a consistent estimate of the treatment effect of conservation, β 3 . If all observable covariates (X it , G i , C it ) are independent of the composite unobservable (a i + ε it ), then cross-sectional regression will generate a consistent estimate of β 3 . Omitted variable bias arises most directly if conservation status C it is correlated with (a i + ε it ). For example, suppose conservation is targeted at forest types with low timber value but data do not exist on forest type. Then since forest types are likely time-invariant over reasonably short panels, and since forest types of low timber value are less likely to get harvested than types of high timber value, C it would be correlated with a i and cross-sectional estimation of β 3 would be biased. The conservation status of plot i is confounded by the unobserved type of forest stand that exists on plot i. If C it is not correlated with (a i + ε it ), then omitted variable bias could still arise indirectly if either of the observable covariates (X it , G i ) are correlated with both the composite unobservable (a i + ε it ) and with the primary conservation treatment variable C it .
Estimating the time-invariant component (a i ) in Eq 1 as a fixed effect aims to reduce the degree of omitted variable bias described in the previous paragraph by exploiting temporal variation in conservation status C it within plots of land. In particular, fixed effects estimation breaks any correlation between observable time-varying variables (X it , C it ) and the time-invariant unobservable (a i ) by transforming the data in two steps. First, for each plot i, average Eq (1) over time: where LCC i ¼ T À1 S T t¼1 LCC it , and so on for the other variables. Notice that the time-average of G i and a i remains fixed since these variables are time-invariant. Second, subtracting Eq (2) from Eq (1) yields the classic "within" estimator in panel regression: Notice two main components of Eq (3). First, the parameters on the time-varying covariates (β 1 , β 3 , β 4 ) are preserved in the differencing, and so least-squares regression of the "withintransformed" data in Eq (3) enables estimation of the primary treatment effect β 3 . Second, the time-invariant observable (G i ) and the time-invariant unobservable (a i ) are differenced out of Eq (3), and so consistent estimation on the within-transformed data in Eq (3) does not require the researcher to assume any independence between the time-varying covariates (X it , C it ) and the time-invariant unobservable (a i ). Identical estimates are produced by estimating Eq (1) with a dummy variable for each plot, though the within estimator in Eq (3) is preferred as a means of avoiding estimation of potentially many thousands of dummy variable parameters. Time fixed effects, Y t , on the other hand, are estimated as dummy variable parameters. Their inclusion controls for unobservables that differ across time but not observations, for example, national policy changes or global commodity prices.
For fixed effects to be a valid estimation strategy, parallel paths, or trends, must exist prior to treatment [42]. While this assumption cannot be directly tested, a graph of the temporal trends in outcomes for treatment and control observations before and after treatment can provide some visual assurances: we want to observe land cover outcomes on similar-or paralleltrajectories across treatment and control groups prior to the program. A more robust check is to statistically test whether there is a difference in pre-treatment outcomes by introducing an interaction term between C it and Y t . If the coefficient is not statistically significant in years prior to treatment then the parallel paths condition is more likely to hold.

Combining Matching with Fixed Effects Panel Regression
While fixed effects panel methods can be used on the full sample of treatment and control observations, using temporal variation to construct the control group, it can also be combined with matching. In this case matching is first used to "pre-process" the data to find the best set, or most similar set, of treatment and control observations before fixed effects regression is used to estimate the treatment effect. Estimating fixed effects regression on a matched sample can reduce omitted variables bias in the following sense. Suppose an element of the time-varying covariate vector X it is correlated with both a plot's conservation status C it and with the timevarying unobservable ε it . Then the regression estimate of the treatment effect β 3 will be biased because C it will be indirectly correlated with ε it through X it . Fixed effects estimation will not help reduce this bias since none of these time varying components (X it , C it , ε it ) are differenced out. A matched sample helps reduce bias in this situation because matching reduces correlation between the observables C it and X it by making treated plots (C it = 1) similar to control plots (C it = 0) in their observed values of X it . Thus, fixed effects regression on the matched sample is less sensitive to correct specification of X it in the model. Even with a dataset where there are no time-varying observables (X it ) other than conservation treatment (C it ), matching can still improve estimates from fixed effects regression. One reason is that the linear fixed effects model assumes that all landowners have a common response to the conservation treatment (β 3 )-and this assumption is more plausible when treated and untreated landowners have similar levels of fixed characteristics rather than very different levels [43]. Table 1 summarizes the two conservation programs we analyze in our comparison of the impact evaluation methods discussed above: (1) PES in Northeastern Ecuador and (2) strict protected areas in European Russia. These two programs cover the more common conservation approaches for which impact evaluation is being used and illustrate these methods across both continuous (Ecuador-PES example) and binary measures (Russia-Protected Areas example) of forest cover outcomes. No specific permissions were required to conduct these analyses and all data were acquired from publicly available data sources.

Conservation Programs
Ecuador-PES. In Ecuador, the study region is about 600 sq km around the northwestern border of Cuyabeno Faunal Wildlife Reserve, a protected area in the northeastern Ecuadorian Amazon. The Cuyabeno Reserve was officially created in 1979 and deforestation by smallholders living adjacent to the boundaries of the reserve has been a continuous conservation threat. Tenure insecurity is commonly cited as a driver of deforestation in the region [55]. In 2009, a nationally sponsored land titling effort resulted in the acquisition of private individual titles for smallholders that were living adjacent to the reserve; in all cases, these households occupied their land since the late 1970s or early 1980s but had not received formal title. In 2008, Ecuador implemented a national PES program, known as Socio Bosque [56]. The program targets forested lands that include a combination of multiple ecosystem service benefits, risk of deforestation, and populations with high degrees of social marginality. To be eligible to participate, a property must have a formal land title, meaning that smallholders around Cuyabeño Reserve became eligible for the program after they received their formal land titles in 2009; 63 households enrolled in the program in 2010.
Since PES is a household-level program, the ideal unit of analysis is the household parcel [44]; spatial boundaries of household parcels are available for this study area from a cadastral survey conducted by the Government of Ecuador. If this information were not available then it would be difficult to rigorously evaluate the impact without collecting property boundaries using GPS, even though some evaluations of PES have used pixels, or grid cells, for evaluation  [36], these do not represent the decision-making unit. In addition to parcel-level information for households that enrolled in PES, we also know which households did not enroll, providing us with a group of control observations that are likely to share at least some characteristics with the treatment observations given the small study area. To further refine this control group, we limit the pool of potential controls to titled smallholder parcels that are within the same precooperatives as the households that enrolled in PES. Pre-cooperatives are self-organized groups of smallholders that precede the land titling campaign, and represent a spatially contiguous group of households that migrated to the area in a similar time period. This gives us 450 households as potential controls. Defining control observations as parcels that are near enrolled households is similar to the strategy used in [31], but a stricter definition of PES controls would be to use households that applied but were rejected from the PES program, or households that applied in later years [29,30]; neither of these options exist for these data.
We measure the impact of this PES program on reduced deforestation rates. We use a globally available forest/non-forest product with resolution of 30 meter by 30 meter to measure deforestation [46]. While this dataset provides pixel-level information on forest cover, we aggregate it to household parcels, providing a continuous measure of forest cover loss per parcel between 2004 and 2013. While this land cover product provides data as early as 2001, we excluded these early years because our test of parallel trends detected differences in 2003 across our two groups. Thus, we can estimate a treatment effect for the impact of PES on reducing deforestation between 2011-2013, controlling for pre-treatment deforestation rates in 2004-2010.
PES is a voluntary program, and household-level characteristics, such as age, income, and prior conservation motivations, as well as parcel-level characteristics, such as size, agricultural suitability, and accessibility, are expected to influence a smallholder's decision to participate in the program [57,58]. These characteristics would also influence the probability of deforestation with or without the program. Past evaluations of PES programs on land cover change have controlled for parcel size, accessibility, agricultural productivity, baseline deforestation rates, forest type, past participation in forestry programs, poverty level, and tenure [29][30][31]34,36,38]. In our study area there are a number of available datasets measuring parcel-level characteristics related to accessibility and parcel size, but we do not have household-level information on past participation in conservation programs or poverty. Additionally, we do not have data directly related to agricultural suitability. Thus, the observable covariates that we can control for include parcel-level information on pre-treatment deforestation rates (between the years 2004-2010), parcel size, and distance of the parcel to roads, population centers, navigable rivers, and oil wells, which control for remoteness and opportunity costs in this region. Our potential unobservable variables are related to household-level characteristics that would lead a household to enroll in the program or to deforest, such as environmental motivations, and parcellevel agricultural productivity. However, since our unit of analysis corresponds with the decision-making unit, if these variables are time-invariant unobservables, they will be controlled for with the fixed effects methods.
Russia-Protected Areas. In Russia, our analysis focuses on the effect of four 'strict' protected areas-known as zapovedniks in Russia-in temperate European Russia. This is a subset of the data used to compare the effectiveness of different types of protected areas in Russia before and after the collapse of the Soviet Union [27]. Strict protected areas are equivalent to an IUCN designation of Category I protected areas and logging and other extractive activities are prohibited [59]. Russia established a number of new protected areas following the collapse of the Soviet Union in 1991 [60], and the four parks in this analysis were established between the years 1992 and 1994. Their total area is 515 km 2 .
Within a protected area, a remote sensing pixel is a common unit of analysis [44]. Since each protected area contains thousands of pixels, referred to as plots in the remainder of the paper, we must sample to get a manageable number of observations. We sample 1% of all plots within parks that were forested in our baseline year-1990. For control observations we randomly sample forested plots outside of protection. In Russia, all forested land is publically owned, so when sampling forest outside of parks the decision-making unit should also be the Russian Forest Service, although they do lease this land to private timber concessions. We sample four-times the number of treatment plots from areas outside of protected areas to generate the potential control group.
The outcome of interest in this case is forest disturbance for harvesting timber [61]-which varies from deforestation in the Ecuador case, since the land will eventually revert back to forest. Measures of forest disturbance come from a primary Landsat classification of forest cover change over 5-year increments from 1985 to 2010 [62]; we exclude the 1985-1990 period since forest disturbance follows different temporal trends during this period, likely due to political unrest. This primary analysis provides 30-meter resolution data on forest cover change with average accuracies greater than 90%. Forest disturbance is measured as a binary outcome, and so for each observation a value of "0" is recorded if there is no change within a 5-year period or a "1" if there is a change; a plot is removed from the dataset once change occurs so that forest disturbed on this plot-the outcome variable-is not double counted. Given the year of designation of the protected areas, pre-treatment forest disturbance is defined as the 1990-1995 time period and forest disturbance that occurs anytime between 1995 and 2010 is our outcome variable.
The majority of published evaluations of protected area effectiveness focus on tropical regions-where land cover change is driven by agriculture. In these studies, data is collected on locational and biophysical characteristics of the pixel or grid cell, such as slope, elevation, accessibility, agricultural suitability, and forest type [7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23]. Forest disturbance in European Russia is increasingly correlated with profit-maximization behaviors that factor in transportation costs and opportunity costs of the land for timber harvesting [63]. Thus, for both protected area designation and forest disturbance in Russia we control for biophysical characteristics of the plot and its location. The observable covariates include elevation, slope, and distance of the plot to the forest edge, closest town, Moscow, and closest road. There are important differences between the type of observable data that can be controlled for with binary and continuous outcome measures of land cover change worth highlighting here. With continuous measures a baseline measure of land cover change can be controlled for (e.g., in Ecuador, baseline deforestation rates). With binary data baseline rates cannot be included, but measures such as distance to forest edge, which would measure accessibility, can be included (e.g., Russian covariates).
Since the Forest Service manages forested plots in Russia, potential unobservable covariates are less likely to be related to differences in decision-making units, unless "controls" are in areas of private timber concessions. The more likely source of omitted time-invariant covariates, however, is if there are unobservable biophysical characteristics that influence protected area status and or forest disturbance. For example, soil quality, rainfall or type of tree species might influence conservation decisions if protection is targeted at specific ecological species, or might influence forest disturbance rates if some species are preferred for timber. In Russia, these datasets are not available and globally available datasets are typically at 1-square kilometer resolution.

Empirical Methods
As discussed above, matching is often used to construct a valid control group in program evaluation, and is then combined with several post-matching estimators to get the treatment effect.
Fixed effects panel methods can be used with or without matching to construct the control group. Effectively, this gives six combinations across the sample selection process and the estimation process for program evaluation (Fig 1). However, a number of sources have illustrated why 'no matching' combined with simple differences in means or cross-sectional regression are biased in conservation program evaluation [7,43,45]. The more common methods seen today in the conservation evaluation literature include combining 'matching' with either simple differences in means or cross-sectional regression. Thus, to compare these more common methods to fixed effects panel methods, we estimate treatment effects for the two conservation programs described above using the following approaches: (1) matching plus differences in means, (2) matching plus cross-sectional regression, (3) 'no matching' plus fixed effects panel regression, and (4) matching plus fixed effects panel regression.
Our analysis of the Ecuador-PES program uses a limited continuous variable which is never negative and includes a non-trivial 37% of parcels with a zero deforestation rate, while our analysis of Russian parks models a binary dependent variable for each plot which equals one if the plot is deforested and zero otherwise. Alternative non-linear maximum likelihood estimators for such limited dependent variables include Tobit and Probit models. We choose to use linear ordinary least squares regression rather than non-linear estimation approaches for the following reason: our primary interest is in obtaining a consistent estimate of the treatment effect of conservation-known as a marginal or partial effect-rather than predicting the level of deforestation. Thus, under similar identifying assumptions, linear ordinary least squares typically generates similar marginal effects as non-linear methods and has the advantage of being able to include fixed effects in a far simpler fashion-and with fewer assumptions-than nonlinear methods [42]. Fixed effects cannot typically be used in most non-linear methods due to the incidental parameters problem [64].
Matching. We match treatment to control observations using propensity score matching [54]. The propensity score is a measure of the probability of an observation receiving the conservation program. Since both conservation programs have binary treatments, the probability of receiving the program is estimated with a Logit model specified on the set of observable pretreatment covariates thought to influence treatment assignment and the outcome of interest identified in Table 1 (regression output in Table A in S1 and S2 Tables). Using the estimated propensity scores, each treatment observation is then paired with the 'best' control observation based on one-to-one nearest neighbor matching without replacement following [65]. To ensure 'good' matches the distance between propensity scores is limited by using a caliper size of a quarter of the standard deviation of the estimated propensity score as recommended by [54]. In the case of propensity score matching, the probability of receiving treatment varies between '0' and '1', and the caliper limits the distance that the algorithm searches to find an appropriate control observation for the treatment unit (e.g., a caliper of 0.2 would limit the search to twotenths). Our choice of matching algorithm is motivated by its popularity in the literature and the ease at which the matched sample can be exported for use in cross-sectional or fixed effects panel regression. While alternative matching algorithms might yield a slightly different sample of treatment and control observations, post-matching estimations of treatment effects with this sample would yield similar relative differences across findings to what we present below since it is the presence of time-invariant unobservables that potentially lead to bias in cross-sectional estimation.
With our matched sample, treatment effects are estimated under three estimators. The difference in means computes the average difference in outcomes between each treatment and control observation. For cross-sectional estimation, the matched sample is used in a linear ordinary least squares regression, controlling for the same observable covariates included in the match. The fixed effects panel estimator is described below.
Fixed Effects Panel Regression. First, we estimate a fixed effects panel regression for each conservation program using all treatment and control observations, i.e., 'no matching'. We could include the full list of observable covariates in Table 1, but since most observable covariates are time-invariant-with the exception of distance to forest edge in Russia-they fall out of the differencing equation, that is, they are not explicitly estimated in the regression output. However, since we have land cover change data before and after the conservation programs, β 3 in Eq 1 can be estimated.
Second, we use matching to pre-process the data as described above, and then use this trimmed sample to estimate β 3 in Eq 1. When matching the Ecuador-PES data, baseline deforestation is defined as years 2004-2006; these years are then omitted from the fixed effects regression.
Standard Errors. For cross-sectional and fixed effects regressions we use cluster robust standard errors to control for spatial autocorrelation. Clustering standard errors relaxes the assumption of no correlation across observations within the spatial unit used for clustering. For Ecuador we cluster at the informal administrative unit known as pre-cooperatives. In Russia we cluster at a political unit similar to counties in the United States. In addition to being robust against any form of spatial correlation within clusters, cluster robust standard errors also control for general forms of serial correlation and heteroskedasticity in panel regressions [66].
Similarities and Differences across Estimators. These methods ensure two things that aid in comparison across estimators. First, across three of the estimators, propensity score matching is used for sample selection, ensuring that the same treatment and control observations are compared. The only difference is that post-matching, one estimator controls for limited forms of omitted variables bias (i.e., fixed effects) and two do not (i.e., t-tests and crosssectional regression). While fixed effects without matching necessarily uses all of the treatment and control observations, the analysis of these data is exactly the same as that in fixed effects regression following matching, allowing us to compare the effect of using matching to pre-process the data on estimated impacts. Secondly, we are able to estimate similar standard errorscluster robust standard errors-for each method of analysis except for matching with differences in means. This ensures that our standard errors are robust to many known problems associated with land cover panel data (i.e., heteroskedasticity, spatial correlation within clusters).
Goodness of Fit Checks. We check whether there is improvement in covariate distribution after matching treatment to control observations and whether there is overlap of propensity score values for treatment and control observations. For the former, a t-test can be used, but differences can be skewed by sample size. The normalized difference in means is preferable over the t-statistic when there are large differences in sample size [5]. We report both since there is no statistical test of significance for normalized difference in means; although a general rule of thumb is that differences larger than 0.25 is indicative of sensitivity of estimates to functional form in linear regression [5]. Overlap of propensity scores can be plotted using a histogram.
For fixed effects regression we check the parallel paths assumption. We do this by graphing trends and by introducing an interaction term to Eq 1 and estimating whether there is any difference in forest outcomes prior to treatment.

Ecuador-PES
Households that enrolled in Ecuador's national PES program differ across observable characteristics from households that did not enroll (Table 2). On average, they have larger parcels of land, are farther from roads and towns, and have lower rates of deforestation before enrolling in the program. Propensity score matching does significantly improve the covariate balance between these two groups as indicated by the t-tests and normalized differences in means following matching. There is good overlap in estimated propensity score values between Ecuador-PES and Ecuador-non-PES households despite these differences in covariates (Fig 2). The graph of parallel trends using the full sample also illustrates the different pre-treatment deforestation rates of these two groups; despite a lower baseline deforestation rate, the temporal patterns of both groups appear to trend in similar patterns before treatment suggesting that fixed effects estimation is appropriate to use. When statistically tested using fixed effects regression, we find no differences in pre-treatment forest outcomes across these two groups between 2004 and 2010.
Using the matched sample, differences in means and cross-sectional regression both estimate a treatment effect of -0.4. This is statistically significant at the 99% level and can be interpreted as a four-tenths of a percentage point reduction in the average annual deforestation rate between 2011-2013 for parcels that enrolled in PES compared to parcels that did not enroll (Table 3). To put this in perspective, post-treatment deforestation rates on all parcels not enrolled in the PES program were an average of 0.7% per year (Table 2). For no matching and fixed effects regression, the estimated treatment effect is slightly smaller, at -0.3, but still statistically significant. Finally, matching combined with fixed effects regression yields a treatment effect of -0.4, statistically significant at the 99% level. Full regression output can be found in Tables B-D in S1 Table. We plot the coefficients in Table 3 with their 95% confidence intervals to detect the level of overlap (Fig 3). In all cases, the confidence intervals of the estimated treatment effects overlap substantially.
To better understand the conservation implications across these estimators, the relative reduction in forest cover change at the average estimated treatment effect is presented in Fig 3. This relative reduction in forest cover is calculated by dividing the estimated treatment effects in Table 3 by the rate of forest cover change in the matched control observations after the conservation program began (i.e., post-treatment), or in the case of 'no matching' combined with fixed effects regression, all control observations. In the Ecuador-PES program, the matched control observations have an annual deforestation rate of 0.55% in 2011-2013; if you divide the estimated treatment effect from propensity score matching plus difference in means (-0.40) by 0.55 you get 0.72. Thus, the average annual impact of PES is an approximately 72% reduction in the deforestation rate when estimated using matching with difference in means; this represents a decrease in average annual deforestation on a parcel from 0.55 to 0.16. For fixed effects regression without matching, -0.31 is divided by post-treatment deforestation in the full sample, which is 0.73, resulting in a relative effect of 42%. Table 4 shows summary statistics for the sample of plots within and outside of strict protected areas in European Russia. Protected plots appear more remote than unprotected plots-they are farther from the forest edge and closest road. However, they are closer to towns and the Standard deviations in parentheses. T-tests test for differences in means assuming unequal variances. "Before matching" uses the full sample; "After matching" is based on 1-to-1 propensity score matching without replacement and limiting the maximum distance between matches with a caliper.

Russia-Protected Areas
Normalized differences in means are calculated as recommended by [5]. There is no test for statistical significance associated with normalized differences in means but a rule of thumb is that sizes larger than 0.25 can bias simple ordinary least squares regression. doi:10.1371/journal.pone.0141380.t002 capital city, Moscow. The differences in means tests before matching find very large differences, and even after matching statistically significant differences remain. However, calculations of normalized differences in means after matching find that the differences are less than 0.1; well below the 0.25 rule of thumb. There is also overlap in propensity score values (Fig 2).
In the graph of parallel trends there does appear to be similar trends in the five years prior to protected area designation (i.e., 1990-1995) (Fig 2). A formal test using regression analysis finds no statistical difference in forest disturbance in the 1990-1995 time period. There also appear to be similar trends after protected area status indicating that the treatment effect, if any, is not likely to be large. It is important to point out the relatively small change in forest disturbance over the five-year periods in this sample-rates of change are less than one-tenth of a percentage point. The estimated treatment effects for Russia range between -0.01 and -0.03 (Table 3). The interpretation is that protected areas reduce the probability that a forested plot is disturbed by as much as -0.03 percentage points over each of the three five-year periods between 1995 (the Propensity Score Values Overlap in propensity scores is calculated in Stata 13; it shows the distribution of propensity scores across treatment and control observations. Parallel trends are graphed for both treatment and control groups before matching. A test for statistical differences in trends was estimated using fixed effects regression. After matching, trends become more similar, and are not shown here. year after creation of the protected areas) and 2010 (the last year of data). For context, the average five-year probability that a potential control plot was disturbed after 1995 was 0.08% (Table 4). Whether or not the effect of protected areas on forest disturbance is statistically significant or not varies across estimators. When matching with differences in means or cross-sectional regression is estimated, the treatment effect is statistically significant at the 95% level or higher and closer to -0.03 percentage points. With no matching plus fixed effects the estimated treatment effect is smaller, at about -0.02 percentage points, as is the statistical significance, at the 90% level. When matching is used to pre-process the data before fixed effects regression, the treatment effect is not statistically significant. Full regression output for this case can be found in Tables B-D in S2 Table. In Fig 3 we plot the 95% confidence intervals for these treatment effects; again, there is considerable overlap across each estimator.
For calculating the relative effect of Russia-Protected Areas on forest disturbance we divide the average treatment effect (Table 3) by the average forest disturbance in the control sample. For the three estimators that use matching, average forest disturbance in the matched control plots is 0.07 (not reported in the table). Thus, for matching plus difference in means we divide 0.03 by 0.07 to get a relative reduction in forest disturbance of 36% (Fig 3). When fixed effects regression without matching is used, average forest disturbance for all areas outside of parks is 0.08 (Table 4). Dividing the average treatment effect of 0.02 by 0.08 generates a relative reduction in forest disturbance of 23% (Fig 3).

Comparing Estimators
The two cases we evaluate are similar to many published conservation evaluations of PES and protected areas. In our Ecuador-PES case we have parcel, or household-level sampling units, even though we do not have household-level data, and are able to control for many covariates included in past studies. In our Russia-Protected Areas case we have pixel level information and sample inside and outside of protected areas, controlling for many common observables that control for location and accessibility. Across these two conservation programs, the different sampling and estimation strategies (Fig 1) result in different conclusions about the effectiveness of the conservation policy. One must be cautious, however, in interpreting the reason  for the differences in average treatment effects in Table 3 or relative impacts presented in Fig 3. Some estimators differ in more than one aspect, and so it is not clear which aspect explains the difference in estimates. The most telling comparisons are those that differ in only one key aspect, these include (1) matching with differences in means to matching combined with crosssectional regression; (2) no matching with fixed effects to matching combined with fixed effects; and (3) matching with cross-sectional regression to matching combined with fixed effects.
Considering the first comparison, cross-sectional regression following matching can provide additional control over differences in means if observable covariates are not completely  Table 3). For estimators that use matching, the relative reduction in forest cover change is calculated as the estimated average treatment effect (Table 3) divided by the deforestation rate (Ecuador-PES) or the probability of forest cover change (Russia-Protected Areas) in the matched control observations after treatment. When matching is not used, the deforestation rate or probability of forest cover change in the full set of control observations is used. As an example, the relative effect for Ecuador-PES under propensity score matching and differences in means is calculated as 0. 40 balanced in matching. For both the Ecuador-PES and Russia-Protected Areas cases, the adjustment on observables after matching has little to no effect on the interpretation of conservation impacts. In Ecuador-PES, the statistical significance of the results does not change; in Russia-Protected Areas the statistical significance decreases from the 99% to 95% level. Of course, in other examples and contexts, this additional adjustment on observables could lead to changes in treatment effects, and is generally recommended [5,50,54].
In the second comparison, pre-processing the data with propensity score matching changes the sample used in estimation. Trimming the sample to ensure treatment and control parcels are similar in observable characteristics matters when there is some misspecification of how the observed independent variables drive land cover change, and such misspecification is a common challenge for applied work. Further, since panel fixed effects estimation typically assumes that treatment and control households have a common response to the conservation treatment, this assumption is more plausible in a matched sample than in a sample with very different treatment and control households. Our results suggest that trimming the sample to include only good matches is especially important for Ecuador-PES, increasing the relative impact by 30% in Fig 3 when the average treatment effect increases from 0.31 to 0.42 (Table 3). For protected areas in Russia, the relative impacts in Fig 3 are within 3% of one another, but it decreases enough when the matched sample is used that the estimate changes from 90% significance level to no longer statistically significant. In general, when treatment observations are quite different from controls, matching provides an additional assurance in getting the two groups as similar as possible before panel regression analysis, similar to what has been shown in the case of cross-sectional regression analysis [48][49][50][51][52]. Standard deviations in parentheses. T-tests test for differences in means assuming unequal variances. "Before matching" uses the full sample; "After matching" is based on 1-to-1 propensity score matching without replacement and limiting the maximum distance between matches with a caliper.
Normalized differences in means are calculated as recommended by [5]. There is no test for statistical significance associated with normalized differences in means but a rule of thumb is that sizes larger than 0.25 can bias simple ordinary least squares regression. a After matching, pre-treatment forest disturbance was "0" for both protected areas and areas outside of protected areas; thus, differences in means could not be calculated. In the comparison of matching combined with cross-sectional versus matching combined with fixed effects regression, the same sample of treatment and control observations are used but the estimators differ in the control of time-invariant unobservables. Thus, we see that explicit control of fixed effects in a panel setting is responsible for differences in conclusions about a conservation programs' relative effect on forest cover change in Fig 3 ranging from between an increase of 4% (Ecuador-PES) to a decrease of 16% (Russia-protected areas) when time-invariant unobservables are controlled. In the case of Ecuador, the policy implications would be similar between these two estimators, as both estimators come to similar conclusions about statistical significance and size. Thus, in this case the observable covariates do a good job of controlling for confounding factors. In the case of Russia, however, when time-invariant unobservables are controlled the program does not have a statistically significant effect on the probability of forest disturbance. In the Russia case, these time-invariant unobservables are likely related to the hypothesized missing measures for timber value-when the low timber values of protected areas are controlled for with fixed effects the impact estimate is reduced.
The explicit control of fixed effects is the primary advantage to using panel data. The control of fixed effects matters when there is some correlation between the time-invariant unobserved determinants of land cover change, and the location of the conservation program. If timeinvariant unobservables are not a concern, estimation under cross-sectional or fixed effects methods will yield similar results. These two conservation programs illustrate that the importance of hidden bias in conservation evaluation will vary. In the Ecuador-PES case, when matching with cross-sectional or matching with fixed effects panel regression is used, the estimated treatment effects are almost identical. This indicates that time-invariant unobservables are not a large concern. In contrast, for the Russia-Protected Areas case, the conclusion of statistically significant conservation impacts does not hold when time-invariant unobservables are controlled (Table 3). In other datasets and examples, the direction of bias from time-invariant unobservables and their importance will also vary. The researcher will have to critically evaluate the likelihood of time invariant unobservables in each case, and the validity of the assumption that treatment effects can be estimated on observable data alone, as is the case with cross-sectional estimation strategies. It is important to re-emphasize as well, that no quasiexperimental design can control for potential time-varying unobservable bias.

Conclusion
The conservation community is increasingly evaluating if conservation tools are effective and incorporating this information into decision-making. For conservation policies and programs that have a goal of affecting land cover change, the ability to conduct a rigorous quasi-experimental evaluation is bolstered by the release of the Landsat archives [67], the advancement of remote sensing techniques to provide temporally-rich land cover change classifications [68,69], and the publication of free, global datasets of forest cover [46,70]. The number of studies using remote sensing products and quasi-experimental techniques to conduct an impact evaluation is increasing rapidly and the majority of these rely on matching estimators combined with differences in means or cross-sectional regression to estimate treatment effects exclusively on observable covariates. The results from our comparison of estimators across two conservation programs illustrate that estimated treatment effects can vary across estimators, and that in some cases time-invariant unobservables can lead to differences in statistical significance and differences in magnitude of relative impacts. These estimated differences could affect policy lessons and actions such as who should get paid in incentive-based programs, allocation of funding for protected areas, and whether a program should be scaled up or stopped. However, if time-invariant unobservables are not a concern, then estimates from matching combined with cross-sectional regression result in similar policy lessons.
While bias from time-invariant unobservables could be reduced in cross-sectional analyses by collecting more data on plot or parcel characteristics or instrumenting for the conservation program, such data is not always easily available or well measured, and conservation instruments are often far from obvious. An easier solution is often to build better temporal variation with spatial data on land cover change that can reduce the number of assumptions required for identification of conservation program effectiveness. This allows the researcher to control for many of the observable variables already commonly included in evaluation studies-since they do not vary over time-as well as hidden sources of time-invariant bias through the fixed effect, as long as the unit of analysis corresponds with the appropriate decision-making unit. Combining matching to pre-process the dataset with fixed effects regression to control for time-invariant unobservables has also been shown to be one of the most robust strategies for replicating experimental evidence of conservation effectiveness [43]. The identification of conservation impacts with fixed effects relies on two conditions: (1) repeated remote sensing landscape images over time, and (2) temporal variation in the location of conservation programs within the time frame of the estimation sample.
If these two conditions do not hold for a particular conservation policy or program, then matching combined with t-tests or cross-sectional estimation approaches will be the only alternative. Our findings suggest that when these methods are used, they are likely still in range of the average treatment effect estimate that would be obtained from fixed effects regression, as indicated by the overlap in confidence intervals in our two examples in Fig 3. Across both cross-sectional and panel program evaluation methods, researchers should take greater care to emphasize the uncertainty associated with the estimated average treatment effect and report the range of values using the 95% confidence interval, in addition to just reporting the mean effect. Additionally, if researchers are able to hypothesize on the potential sources of bias from time-invariant and time-varying unobservables they can discuss the direction of that bias, and how this would affect estimated treatment effects. These best practices can help move the field of conservation evaluation to more rigorous, and reliable, estimates of the impact of conservation policies and programs on land cover change outcomes, and facilitate synthesis of impacts across multiple studies in systematic reviews and meta-analysis [71].
Supporting Information S1