Combining global land cover datasets to quantify agricultural expansion into forests in Latin America: Limitations and challenges

While we know that deforestation in the tropics is increasingly driven by commercial agriculture, most tropical countries still lack recent and spatially-explicit assessments of the relative importance of pasture and cropland expansion in causing forest loss. Here we present a spatially explicit quantification of the extent to which cultivated land and grassland expanded at the expense of forests across Latin America in 2001–2011, by combining two “state-of-the-art” global datasets (Global Forest Change forest loss and GlobeLand30-2010 land cover). We further evaluate some of the limitations and challenges in doing this. We find that this approach does capture some of the major patterns of land cover following deforestation, with GlobeLand30-2010’s Grassland class (which we interpret as pasture) being the most common land cover replacing forests across Latin America. However, our analysis also reveals some major limitations to combining these land cover datasets for quantifying pasture and cropland expansion into forest. First, a simple one-to-one translation between GlobeLand30-2010’s Cultivated land and Grassland classes into cropland and pasture respectively, should not be made without caution, as GlobeLand30-2010 defines its Cultivated land to include some pastures. Comparisons with the TerraClass dataset over the Brazilian Amazon and with previous literature indicates that Cultivated land in GlobeLand30-2010 includes notable amounts of pasture and other vegetation (e.g. in Paraguay and the Brazilian Amazon). This further suggests that the approach taken here generally leads to an underestimation (of up to ~60%) of the role of pasture in replacing forest. Second, a large share (~33%) of the Global Forest Change forest loss is found to still be forest according to GlobeLand30-2010 and our analysis suggests that the accuracy of the combined datasets, especially for areas with heterogeneous land cover and/or small-scale forest loss, is still too poor for deriving accurate quantifications of land cover following forest loss.


GFC
The Global Forest Change (GFC) annual tree cover loss dataset is based on Landsat 7 ETM+ growing season data [9]. The commission error for GFC loss to 13% and 21% in the tropics and sub-tropics respectively, and omission errors of 17-21% in the tropics/subtropics [9]. In a more recent accuracy assessment (using multi-temporal Landsat and GoogleEarth imagery for 1200 30x30 m reference pixels) Tyukavina et al.
[10] found even lower commission and omission errors for Latin America.

PRODES
Deforestation in the Brazilian Legal Amazon (BLA) has been monitored within the PRODES program by the Brazilian National Institute for Space Research (INPE) since 1988 [11]. We used this dataset on annual deforestation for 2001 to 2014. PRODES is primarily based on Landsat data July-September [12].
The PRODES data includes attributes on the year that deforestation was detected, as well as on when the area was last observed (e.g. if the area was covered by clouds in the preceding years). For example, d2001_4 is used for deforestation detected in 2001, in an area which had not been observed for four years prior. In our analysis we ascribe the deforestation to the earliest year of these (e.g. for d2001_4, we treat this as deforestation in 1997), to ensure that only pixels deforested within the analysis time period were included in the main analysis.

TerraClass
TerraClass is based primarily on the same Landsat imagery as PRODES, focused on the time period July-September [12].

TerraClass and GlobeLand aggregated classes look-up table
For some of the analyses, we use aggregated classes for GlobeLand30 and TerraClass. Tables A and B detail how the original classes were aggregated.

Data access and processing
Data processing and analysis was primarily carried out using Safe Software's FME Desktop (2015.1.1 Build 15515). Some reprojections were done in QGIS 2.16.0, as was the creation of an (equal size/area, approx. 3 200 km 2 ) hexagonal grid (using the mmqis plugin). Most figures were created in R with ggplot, and Excel, except the maps, which were created in Environmental Systems Research Institute's (Esri) ArcMap 10.2.
As we here use data that were are available online, data were downloaded from their respective online locations and reprojected as needed.
The Global Forest Change data version 1.2 were downloaded (on 18 December 2015) via http://earthenginepartners.appspot.com/science-2013-global-forest. The part of the dataset used here was the "year of gross forest cover loss event (lossyear)", provided in tiff format. The forest loss data were masked to where canopy cover in the year 2000 exceeded 30% (based on GFC treecover2000, downloaded on 20170112 using the R package gfcanalysis, 2015 version (v.1.2) of data).
The GlobeLand30-2010 data set is provided by National Geomatics Center of China (DOI10.11769/GlobeLand30.2010.db) and was ordered via http://www.globallandcover.com and subsequently downloaded (between November 2015 and March 2016). In addition to the geotiff files containing the land cover classification, the data were supplemented by a shapefile with information on the source Landsat tile used, which we use to gain information on the actual year the land cover classification was valid for.
In a visual examination of the datasets, PRODES and TerraClass align well with each other, as do GFC and GlobeLand30 (Fig B), but PRODES / TerraClass seemed somewhat offset relative to GFC / GlobeLand30. Therefore, PRODES and TerraClass data were offset by -0.001 degrees in longitude and 0.001 degrees in latitude. This improved the correspondence between PRODES and GFC, seen in the number of pixels classified as deforested by both datasets. To combine the datasets pixel-by-pixel, all datasets were aligned to the GlobeLand30 grid. To reduce the consequences of shifting when aligning raster pixels for the pixel-by-pixel comparisons, the GlobeLand30 data were rescaled to a third of their input pixel size (i.e. from c. 30 m to c. 10 m; this does of course not improve the actual resolution of data). PRODES and TerraClass were then rasterized to directly match this grid, and the GFC data slightly resampled (all resampling done using nearest neighbour) and/or offset, and the analysis performed at this finer resolution. Pixel cell sizes were calculated in a Lambert Azimuthal Equal Area projection for each tile.
The map comparing overlap and differences between PRODES and GFC (S1 Fig) was created by reclassifying the pixels into three classes: PRODES only; GFC only; and "Both" for pixels labelled as deforested by both datasets at some point during the years 2001-2012. For this map, the comparison was made for the full time period, and any differences in timing between forest loss in the two datasets were not taken into account. For visualisation only, data were resampled to approx. 60-m resolution.

Shares of GFC forest loss that could be attributed a post-loss GlobeLand30 land cover
The amount and share of GFC forest loss that could be attributed a post-loss GlobeLand30 land cover per forest loss year are shown in Fig C and

PRODES -GFC inter-comparison
Many pixels are marked as deforested by only one of GFC and PRODES, but for the pixels classified as forest loss by both datasets, the timing is in many cases within a few years of each other (Fig E and S4 Table). As Richards et al. [13] note, this can in part be due to the different time of year that the datasets are based on. A quarter of the additional GFC forest loss was found by Richards et al. [13] to be in areas smaller than PRODES minimum mapping unit (6.25 ha).
Similar comparisons between PRODES and GFC have been done by Fanin et al. [14] and Richards et al. [13]. Fanin et al. [14] perform their analysis at a coarser resolution (500 m). The analysis by Richards et al. [13] differ slightly from what is done here in that they use a 90-m raster version of PRODES (whereas we use the vector version) and also do not include deforestation for 2001.

Intercomparison TerraClass and GlobeLand
Many of the differences between TerraClass and GlobeLand30 lie in TerraClass' Non-forest and Secondary vegetation classes. These do not have a direct correspondence to either of the GlobeLand30 classes, and are not necessarily inconsistencies.

Post-loss forest
Fig G presents the spatial variations in the proportion of forest loss that is followed by forest. In some countries (e.g. Nicaragua, Colombia, and Peru) forest comes up as the dominant post-loss land cover in general, while in others it dominant in certain places only (e.g. Brazil and Paraguay).

Boundary pixels
To examine whether pixels near class boundaries (in either of the datasets) are particularly likely to have a large proportion of forest as post-loss land cover, we calculate the proportion of forest loss pixels in each hexagon that are also on the boundary between classes in either of the (GlobeLand30 or GFC) datasets.
To detect class boundaries, we use R raster boundaries ("Rooks case", where boundaries are defined as pixels with more than one class in the four neighbouring pixels; based on the joint raster of GFC forest loss prior to GlobeLand30, resampled with FME to match better the input resolution), and summarised the resulting "boundary raster" per hexagon. We then use the ratio between boundaries and non-boundaries and compare this to the proportion of forest loss pixels followed by forest in the GlobeLand30 data.
If there were no correlation between frequency of boundaries and the share of post-loss forest, we would expect that the range of boundary ratios would be the same between hexagons with low and high proportions of "post-loss forest". However, as seen in a majority of the hexagons with large proportions of "post-loss forest" are also dominated by a large proportion of boundaries (Fig H). (Although there are also several outliers; in these hexagons, other sources of error is likely. This relationship could be used to identify locations of larger errors.) Hexagons with lower "post-loss forest" proportions, exhibit a greater range in boundary proportions. This is even clearer if we exclude hexagons with less than 5 kha forest loss. This is unlike the proportions of Grassland and Cultivated land, which exhibit the reverse pattern (Fig H).  Another factor is that GlobeLand30's definition of forest is quite broad, and includes sparse woodland with 10-30% canopy cover [1]. For post-loss forest in the BLA, the comparison with TerraClass showed that what is classified by GlobeLand30 as forest (following forest loss), is a mix of a third Non-forest/secondary vegetation, a third Pasture, and a quarter forest (S5 Table).
While one way of dealing with the higher error incidence at class boundaries is to exclude pixels on class boundaries and thus improve the overall accuracy of the remaining dataset (as is done by e.g. Serra et al. [15] and Brovelli et al. [3]), this would likely influence the resulting post-loss land cover, favouring results of large-scale forest loss in areas with homogenous land cover, and underestimate the importance of deforestation drivers that cause of smaller-scale forest loss and/or loss in fragmented landscapes.