Soy moratorium impacts on soybean and deforestation dynamics in Mato Grosso, Brazil

Previous research has established the usefulness of remotely sensed vegetation index (VI) data from the Moderate Resolution Imaging Spectroradiometer (MODIS) to characterize the spatial dynamics of agriculture in the state of Mato Grosso (MT), Brazil. With these data it has become possible to track MT agriculture, which accounts for ~85% of Brazilian Amazon soy production, across periods of several years. Annual land cover (LC) maps support investigation of the spatiotemporal dynamics of agriculture as they relate to forest cover and governance and policy efforts to lower deforestation rates. We use a unique, spatially extensive 9-year (2005–2013) ground reference dataset to classify, with approximately 80% accuracy, MODIS VI data, merging the results with carefully processed annual forest and sugarcane coverages developed by Brazil’s National Institute for Space Research to produce LC maps for MT for the 2001–2014 crop years. We apply the maps to an evaluation of forest and agricultural intensification dynamics before and after the Soy Moratorium (SoyM), a governance effort enacted in July 2006 to halt deforestation for the purpose of soy production in the Brazilian Amazon. We find the pre-SoyM deforestation rate to be more than five times the post-SoyM rate, while simultaneously observing the pre-SoyM forest-to-soy conversion rate to be more than twice the post-SoyM rate. These observations support the hypothesis that SoyM has played a role in reducing both deforestation and subsequent use for soy production. Additional analyses explore the land use tendencies of deforested areas and the conceptual framework of horizontal and vertical agricultural intensification, which distinguishes production increases attributable to cropland expansion into newly deforested areas as opposed to implementation of multi-cropping systems on existing cropland. During the 14-year study period, soy production was found to shift from predominantly single-crop systems to majority double-crop systems.

The relatively few samples not fitting into one of these bins were discarded, such as rice and single crop corn. According to IBGE statistics, the discarded data were from areally minor cover types that did not fit into our class structure. An 85-15 filter [1] was used to purify the ground reference crop data, whereby all samples with at least 12 of 23 annual time series NDVI values falling outside the 15-85% class-specific and period-specific NDVI value percentile interval (i.e. the 70% data band) were excluded from the ground reference dataset. The excluded data represented severe statistical outliers presumed to be unreliable entries in our endmember ground reference dataset that was based on farmer records and recollections [1,2]. This statistical filter was not applied to pasture/cerrado samples from the two field campaigns, which were so small in number that a supplemental pasture/cerrado ground reference dataset was needed to increase sample size and to round out statewide representability of the pasture/cerrado data in support of the RF modeling effort.
Ideally our ground reference data, which we use for model development and accuracy evaluation, would include samples from all four of MT's main growing regions as identified in [3]. This would be more consistent with the principals of probability sampling with respect to spatial and areal representativeness of our mapped classes, which is an important consideration when evaluating map accuracy [4]. In reality, however, time and resource availability constrained our opportunistic, farmer interview-based ground data collection efforts, especially considering the immense size of our study area. With our available resources, it simply was not possible to reach all the growing regions during the two field campaigns. Though the state is 3 developing quickly, there are still only a relatively small number of reliable and easily trafficable roads.
Nevertheless, we know from interviews conducted in 2013 with representatives of major growers associations (APROSOJA, MT's Soy and Corn Growers Association; and FAMATO, MT's Agriculture and Ranching Federation) that our agricultural ground reference data are representative of the major classes and class proportions found throughout all of MT's main growing regions, with respect to growing season timing as well as the types of cropland management practices evident across the landscape. This assessment is corroborated by the favorable results obtained from the roadside data evaluation in the main text, which uses a substantially more spatially extensive set of reference points than the farmer interview-based ground reference data that were used for classification model construction.

C. Supplemental pasture/cerrado data acquisition
High resolution imagery was used to facilitate identification of additional pasture/cerrado ground reference samples from 82 locations around MT for CY2005-2009. This occurred in two separate efforts. The purpose of the first pasture/cerrado data supplementation was to increase sample size and mitigate underrepresentation of the pasture/cerrado data in the ground reference dataset from the field campaigns. The purpose of the second pasture/cerrado data supplementation was to improve the statewide representativeness of the NDVI profile distribution for this diverse, widespread and prominent cover type.
Locations for supplemental pasture/cerrado data samples were identified using the ESRI World Imagery layer (which at the time featured imagery from 2005) and Google Earth (2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011). This dataset provided needed areal and distributional enrichment of the pasture/cerrado ground reference data used for RF model development. Similar to [3], we employed 4 interpretation of aerial imagery to determine locations for supplemental pasture/cerrado ground reference data. When using visual inspection to obtain ground reference data, it is important to use imagery with higher spatial resolution than that of the finished classified map product [4]. To this we would add that using visual inspection to obtain ground reference data should only be done when it can be performed with high confidence, which requires that the cover type in question be visually (spectrally) distinct in the higher resolution imagery.
Areas covered by PRODES forest or IBGE water and urban layers were avoided. A visual assessment was conducted to identify areas where no particular managed land use (such as cultivated agriculture) was evident. Large areas that were homogeneous in appearance and visually representative of pasture/cerrado land cover were identified around the state. From these areas, maximally interior points were placed and NDVI profiles extracted [5]. Identification and remediation of visually evident high misclassification areas is a recommended practice for quality map construction; obvious errors in preliminary outputs should be identified and rectified if possible [4]. After we inspected a preliminary map series 5 developed using the ground reference data and the supplemental pasture/cerrado samples just described, we determined that managed pasture locations were frequently misclassified as soysingle. Managed pastures have a distinct appearance in high resolution imagery, and they should fall in the pasture/cerrado class. To mitigate this problem, 22 managed pasture points, along with 15 additional cerrado points from other visually evident high misclassification areas around the state, were added to improve statewide pasture/cerrado representation in the ground reference dataset. This process was completed with the same protocol used for the first supplemental pasture/cerrado extraction. With data extracted from CY2005-2009 for these new sites, this resulted in the addition of 110 managed pasture and 75 cerrado samples. Thus, in total across both supplemental pasture/cerrado data selection efforts, 410 pasture/cerrado samples were added to the ground reference dataset.
Pasture/cerrado sample count values shown in Table 1 reflect the full pasture/cerrado ground reference dataset, which includes the samples from both supplementation efforts in addition to the samples obtained during the two field campaigns.

D. PRODES forest/deforestation data preparation and processing
The PRODES data consist of dated (year and day-of-year, or DOY) and classified polygons, of which four classes were used in this study: (1)  for RES records we only know that this area was deforested at some previous time that may or may not be close to the associated Landsat scene acquisition date. Despite this uncertainty, we treated RES records identically to DEF records, reasoning that because RES polygons were known to be forest at some previous time during the PRODES period of record, the best guess for the deforestation date is the provided date. Later we discuss the potential impact of this assumption on the analysis.  2) RES records consisted of 29,325 polygons totaling 334.5 Kha. Recall that these polygons were identified as being deforested at some previous, unspecified point in time during the PRODES period of record, but not necessarily near the associated Landsat scene acquisition date. We decided to treat these polygons identically to the DEF polygons and associated them with their Landsat scene date, with this treatment expected to be more reasonable than leaving them as either forest or non-forest throughout the study period. 3) The PRODES dataset harbors a number of 'stich line' data gaps predominantly occurring in Landsat tile overlap zones (the PRODES coverage was developed primarily using Landsat data). In total, upon conversion to the 240-m MODIS grid, there were 5418 'NoData' MODIS pixels. For each 'NoData' pixel and each year, if the majority (>50%) of its (typically) eight-pixel set of immediately adjacent or diagonal neighbors were FOR, then the missing pixel was classified as FOR. For complete application of this strategy, the process was repeated until no more pixels were changed, which required four iterations. With the RF classification providing the default assignment for the 'NoData' pixels, a total of 1085 unique pixels were affected by the neighborhood analysis, resulting 9 in 13,662 FOR assignments during CY2001-2014, for an average of 12.6 recoded years per pixel out of a possible 14. This high per-pixel rate of recode suggests that most of the adjusted pixels were interior to stable forest regions. An example location before and after treatment is shown in the upper panels of S2 Fig.   4) Visual inspection of preliminary MT land cover maps revealed a number of 'stitch line' forest artifacts in non-forest areas. These bogus FOR ('FORerr') pixels were found to occur exclusively within central portions of Landsat tile overlap zones. Though strips of offending PRODES polygons generally were apparent to the eye, their distribution was not sufficiently unique in size, shape, or attribution to allow for clean isolation from the source data. Instead, a spatial filtering approach was developed and applied to each year of the annual LULC map set to mitigate the problem. First, visual inspection was used to reduce the Landsat overlap region (described in S6) to minimize the search area while still including all visually apparent FORerr pixels. All FOR pixels interior to the mask area were examined, and those with at least seven non-FOR neighbors (from a possible eight) were reassigned to their RF class. With FOR a declining coverage over time, the number of adjustments generally increased over time, with 1193 FOR pixels overwritten in CY2001 and 1963 FOR pixels overwritten in CY2014. A second iteration was applied whereby reassignment required all eight neighboring pixels to be non-FOR, which resulted in a small number (ranging from 57 to 80) of additional FOR pixels being overwritten each year during CY2001-2014. In total, annual recoded area ranged from 7,229 ha in CY2001 to 11,866 ha in CY2014. An example location before and after treatment is shown in the lower panels of S2 Considering the relatively small (4% of the total cropland area) and spatially invariant (Fig 3) areas that were affected, this manual approach provided our best option with respect to reasonability (once established, sugarcane tends to be continuously cropped rather than rotated to other, non-sugarcane lands) and practicality (sugarcane often has an irregular, multi-year phenology pattern due to its management, which greatly complicates its mapping in the traditional annual profile endmember framework). This sugarcane backfill effort impacted just 3 of the first 4 study years, when sugarcane area in MT was the lowest during the study period (Table 3).

12
The CY2004 sugarcane coverage was initialized by including all areas that appeared in both CY2003 and CY2005 Canasat datasets. Next, the remaining sugarcane areas appearing in either CY2003 or CY2005 were examined using the Temporal Vegetation Analysis System MODIS visualization tool (SATVeg; www.satveg.cnptia.embrapa.br). Professional judgment was used to characterize some of these areas as sugarcane in CY2004, with areas not identified as sugarcane assigned to their RF class. NDVI profiles from CY2002 were examined for the CY2003 Canasat areas and handled in the same manner, with areas identified as sugarcane in CY2002 further examined to define the CY2001 sugarcane coverage. With this approach to filling in missing Canasat years, we could ensure both spatial consistency with the Canasat dataset while also exhibiting the increase in sugarcane area known to exist in MT during the study period.

F. Processing details for step #3 of map development
The following processing steps were applied to "Identify and repair PRODES data commission and omission anomalies" (step 3a in the text): RESULT: In total across all 14 years, 830,882 recodes were applied to 618,471 pixels (3.9% of the study area). On average, 0.4% of the study area (5.5% of the cropland area) was adjusted each year, with the fewest changes occurring in CY2004 (22,661 pixels, or 2.9% of the cropland area) and the most occurring in CY2011 (136,364 pixels, or 10.3% of the cropland area).
NOTE: Though we classified MODIS through CY2015 to use for CY2014 trajectory adjustments, we did not have complete ancillary data for CY2015, nor were we able to properly inspect classification results from CY2015 for land cover trajectory anomalies as described above. Consequently, mapping results from that year are not included in the study.