A geographically weighted random forest approach for evaluate forest change drivers in the Northern Ecuadorian Amazon

The Tropical Andes region includes biodiversity hotspots of high conservation priority whose management strategies depend on the analysis of forest dynamics drivers (FDDs). These depend on complex social and ecological interactions that manifest on different space–time scales and are commonly evaluated through regression analysis of multivariate datasets. However, processing such datasets is challenging, especially when time series are used and inconsistencies in data collection complicate their integration. Moreover, regression analysis in FDD characterization has been criticized for failing to capture spatial variability; therefore, alternatives such as geographically weighted regression (GWR) have been proposed, but their sensitivity to multicollinearity has not yet been solved. In this scenario, we present an innovative methodology that combines techniques to: 1) derive remote sensing time series products; 2) improve census processing with dasymetric mapping; 3) combine GWR and random forest (RF) to derive local variables importance; and 4) report results based in a clustering and hypothesis testing. We applied this methodology in the northwestern Ecuadorian Amazon, a highly heterogeneous region characterized by different active fronts of deforestation and reforestation, within the time period 2000–2010. Our objective was to identify linkages between these processes and validate the potential of the proposed methodology. Our findings indicate that land-use intensity proxies can be extracted from remote sensing time series, while intercensal analysis can be facilitated by calculating population density maps. Moreover, our implementation of GWR with RF achieved accurate predictions above the 74% using the out-of-bag samples, demonstrating that derived RF features can be used to construct hypothesis and discuss forest change drivers with more detailed information. In the other hand, our analysis revealed contrasting effects between deforestation and reforestation for variables related to suitability to agriculture and accessibility to its facilities, which is also reflected according patch size, land cover and population dynamics patterns. This approach demonstrates the benefits of integrating remote sensing–derived products and socioeconomic data to understand coupled socioecological systems more from a local than a global scale.


Introduction
The Tropical Andes is a mountainous region at the base of the Andes ridge. Due to its altitudinal gradient, it is characterized by 23 ecoregions and 8 bioregions [1], and it provides important economic and ecological services to almost 40 million inhabitants [2]. The region is recognized as an endangered biodiversity hotspot of high conservation priority [3,4], where population growth and agriculture expansion [5,6] are the major driving forces of deforestation, contributing to potential impacts of climate change [7]. On the other hand, large-scale reforestation has been detected in some areas of Latin America [8], especially along old colonization fronts [9]. However, these areas are less studied or understood, and their role in forest recovery and restoration of important environmental services is ignored [10,11]. Therefore, analyzing forest dynamics drivers (FDDs), i.e., deforestation and reforestation, in the Tropical Andes is very important for conservation, climate change adaptation, and sustainability. This knowledge is decisive for countries like Ecuador, where most of the remaining native forests are located and deforestation rates have been the highest in South America for some years [12,13].
Forest dynamics are shaped by complex societal and ecological interactions, or drivers. Geist and Lambin [14] proposed a conceptual framework to facilitate the understanding of these drivers of land dynamics, classifying them as follows: 1. proximate causes (local level, direct agents); 2. underlying causes (different levels, socioeconomic processes); and 3. other causes (determined by environmental factors and social triggering events).
These drivers have been accepted by countries participating in Reducing Emissions from Deforestation and Forest Degradation (REDD+), but recent research has recognized that underlying causes are less frequently analyzed in Latin America [6,15]. Proximate causes are mostly identified through remote sensing-based techniques [16], while underlying causes can be more complex, as they rely on socioeconomic data. These data are frequently not available or reliable at the scale needed [17]. Moreover, impacts of globalization [18] and economic development [11,19] generate more complex scenarios.
In Ecuador, previous studies combined remote sensing products and socioeconomic data to identify FDDs. For instance, Southgate et al. [20] analyzed thematic cartography and census data in a regression analysis to highlight agricultural rents, spontaneous settlements, and land tenure insecurity as deforestation drivers in eastern Ecuador. Following a similar approach but adding survey data, Rudel et al. [9] discussed reforestation drivers observed among ethnic groups and their relationships between land-use practices, cultural backgrounds, and distance to roads in southern Ecuador. Later, Mena et al. [21] combined thematic cartography, census, and survey data in a spatial regression model to conclude that road accessibility and population density were the most important deforestation drivers in northern Ecuador. Similarly, Walsh et al. [22] identified that reforestation drivers were motivated by land security and distance to roads. More recently, Bonilla-Bedoya et al. [23] related deforestation processes to legal timber harvesting, road expansion, and poverty indices. From these studies, it can be observed that deforestation is not commonly associated with reforestation. In this paper, the evaluation of contrasting driving forces (e.g., population growth/decay, agricultural expansion/contraction) with regard to possible linkages defines the first research interest.
Processing of multivariate data for FDD analysis has made significant progress in recent years. For instance, advances in remote sensing and open access to satellite archives [24] have contributed to a better understanding of global land-cover and land-use changes [25]. As a result, products derived from time series (e.g., spectral trends, class-level metrics) has been increasingly applied to explain driving forces, making it possible to identify direct drivers [26] or better understand ecosystem fragmentation [27]. Furthermore, censuses are common sources of socioeconomic data, while processing of these data is not common in FDD analysis of underlying causes. Obstacles such as boundary changes [28] and scale effects [29] are probably the most challenging, and different approaches have been proposed to solve them, including areal interpolation [30,31] and statistical modeling [32][33][34]. Among them, areal interpolation with dasymetric mapping is perhaps the most popular [35], as it can combine land-cover maps and census data to model population distribution more precisely than other methods [36]. Other advances are related to capturing the spatial variability of FDDs. In this regard, geographically weighted regression (GWR) [37] has been demonstrated to satisfy this objective [38,39], but is sensitive to local collinearity and can produce unreliable results [40]. Nonparametric algorithms such as random forest (RF) [41] have interesting applications for high-dimensional problems with correlated variables [42]. Moreover, implementation of RF with GWR has recently proposed [43] but further applications explaining variables relationships are yet to be evaluated. The design of an innovative methodology for analyzing FDDs using these techniques constitutes the second research interest of this paper.
As the Tropical Andes constitutes a complex mosaic of landscapes, a workflow to analyze its FDDs is presented in this paper. We conducted our research in the Northwestern Ecuadorian Amazon (NEA), a study area located in an altitudinal gradient that includes different bioregions and colonization fronts with heterogeneous socioeconomic settings. Our main objective was to explore a set of variable groups to observe how they influenced deforestation and reforestation rates in the NEA in 2000-2010. This period is known as the beginning of the dollarization and economic stabilization in Ecuador [44]. For this purpose, we designed and implemented an experimental methodology for FDD analysis that benefits from the novel techniques mentioned above. Two research questions guided our work: • What are the theoretical and empirical implications of our experimental methodology in FDD analysis?
• What could be the linkages between the driving forces of deforestation and reforestation in the NEA during 2000-2010?
To answer these questions, we (i) explain how we calculated the forest change rates and time series-derived products, (ii) conduct dasymetric mapping for intercensal analysis, and (iii) briefly describe the variable groups before (iv) explaining our implementation of GWR and RF, together with a clustering and hypothesis test to summarize our results. The discussion considers the benefits and limitations of the proposed methodology and its contribution to the current knowledge of FDDs in the NEA.

Study area
The NEA covers 21,857 km 2 over an altitudinal gradient from 200 to 2,800 m.a.s.l. on the western slopes of the Andean Range (Fig 1). It includes 16 cantons (second-level administrative units in Ecuador), which are used in this research to identify specific zones in the NEA. According to Olson et al. [1], two ecoregions exist in the NEA: the Napo moist forests and the Eastern Cordillera real montane forest. The latter is of the highest conservation priority in Ecuador as it covers less than 33% of its original area [45]. Moreover, the NEA is characterized by extraordinary biodiversity, intense annual precipitation (1,500-4,500 mm), and a multitude of ecosystems [46]. Most of the soils are ferric, with low fertility and high aluminum toxicity, although volcanic and alluvial soils can be an exception [47,48]. Under these conditions, the agricultural limitations are well known; however, this does not prevent the native people from co-evolving with their natural environment [49]. Dramatic changes began in the 1970s with the exploration and extraction of oil, generating accelerated economic growth and industrialization [50]. Extensive road construction and the Agrarian and Colonization Reform of 1964 stimulated in-migration and rapid settlement over the whole Ecuadorian Amazon. According to Brown et al. [51], its population grew by 432% from 1950 to 1990, resulting in an urban system that followed the discovery of petroleum and the related economic opportunities. This led to disorganized and arbitrary colonization where land conflicts between the colonos (mestizo colonists) and native people were common and traditional land-use practices were replaced by extensive agriculture and cattle ranching [55].
Forest clearing in the Ecuadorian Amazon peaked during 1970-1990, when the deforestation rate was one of the highest in South America [12]. In the NEA, the forested areas experienced an 19.6% reduction (4130 km 2 ) by the end of 2014, principally due to pasture expansion for cattle ranching [56] (Fig 2). However, this was less intense than in the northeast NEA, where oil fields were located [57]. The declaration of protected areas, which accounted for 29% of the area and few oil discoveries [58], contributed to a reduced interest in colonization and to deforestation. Improved road connections between Quito and Nueva Loja and recent oil discoveries motivated further colonization of remote areas [59]. Despite this, reports indicate a drop of deforestation from 92,800 to 74,000 ha/year -1 in Ecuador since 1990 [60].
Later, financial instability led to a crisis that ended with the dollarization of the Ecuadorian economy in September of 2000. A reduction in the inflation rate from 96 to 7% was seen as an important sign of economic stabilization for the period 2000-2014 [44]. As consequence, Ecuador experienced an unprecedented wave of emigration, especially between 2000 and 2007 (around 483.000 migrants) [61]. Nevertheless, the effects of migration and remittances through land-use change have been associated with an increase of agriculture activities rather than land abandonment and forest transition in Ecuador [62].

Methods
This research was fully implemented using R programming language [63] and integrating specific libraries for spatial data [64,65], database management [66], parallel processing [67], and data visualization [68]. For mapmaking, we used QGIS 3.4.3 Madeira [69]. Fig 3 shows the workflow of the proposed methodology.

Annual forest change rates and remote sensing time series-derived products
We collected a set of land-cover and land-cover change maps generated biannually for the period 1990-2014 in the NEA. They were generated for previous research to monitor longterm forest dynamics with scarce data [70], reporting an overall accuracy above 70%. Specifically, this approach uses the Landsat surface reflectance time-series product [71] to reduce it into cloud-free biennial composites. Then, it trains and executes a supervised-classification algorithm to derive land-cover maps, classified into 4 classes: evergreen forest, bamboo forest (guadua spp.), bare soil/infrastructure, and pasture/cropland. This collection of maps is postprocessed and the classes are aggregated into forest and non-forest binaries to derive deforestation and reforestation areas. In the case of deforestation, the algorithm simply flags the date of conversion from forest to non-forest, while for reforestation it first considers a minimum time classified as forest after a disturbance (i.e., 4 years) to flag it as reforestation if the area remains as forest until the end of the time series. To derive the annual forest change rate, we first removed areas less than 1 ha from deforestation and reforestation maps, as they frequently represent misclassified pixels [26]. Then, the annual forest change rate q according to the UN Food and Agriculture Association (FAO) [72] was calculated, considering A 1 and A 2 as forest cover for time periods t1 and t2: To operate this, we created an analysis grid with a cell size of 400 ha and extracted for each cell the deforested and reforested areas between 2000 and 2010. These years were selected to match the census years used in this research (2001 and 2010) as well as the cell size to reduce processing time during calculation. The resulting deforestation and reforestation rates constituted the set of dependent variables analyzed (Fig 4), a summary of which is shown in Table 1.
Furthermore, historical land use influence ecological landscape functions and link causeconnection patterns [73]. For this, we derived land-cover frequencies of deforested and reforested areas to pasture/cropland and bare soil/infrastructure classes. This provided further information about LCLUC dynamics and helped us to determine the permanence or semipermanence of a specific land-cover class Z. For this, we stacked the land-cover maps used in deforestation and reforestation mapping to obtain time series M 1:n , which was split to match deforestation or reforestation date i. This gave us 2 segments, defining the conditions (1) after a change M a = M i:n and (2) before a change M b = M 1:(i−1) . For deforestation, the segment M a was used to determine the land-cover frequency f of class Z after a deforestation event D fz . For this, we summed up the class occurrences in M a and divided the sum by the extent of the segment: For reforestation, segment M b was used to determine the land-cover frequencies of class Z before a reforestation event R fz happened. Similarly, it was calculated by adding up their occurrences in M b and dividing the sum by the extent of the segment: These calculations gave us 4 layers in total, describing the land-cover frequencies for pasture/cropland and bare soil/infrastructure as percentages, for both deforestation ( Fig 5A) and reforestation.
A final time series-derived product was obtained from the Visible Infrared Imaging Radiometer Suite (VIIRS) and its Nighttime Lights Time Series cloud-free composites (nightlights 2000-2010) [74]. This dataset constitutes a measure of visible and near-infrared emission sources at night (e.g., cities, towns, gas flares, and other sources of persistent lighting), which can refer to access to electricity and human development factors such as: access to education [75], emissions of CO 2 [76] or socioeconomic trends [77]. The latter motivated its use in this research to enrich socioeconomic parameters. For this, we used Google Earth Engine [78] to calculate a pixel-wise linear trend map for 2000-2010 using the stable light band from this dataset. From the resulting pixel-wise linear model, we extracted its slope to determine its trend ( Fig 5B).

Dasymetric mapping and population change
We processed the 2001 and 2010 population censuses published by the National Institute of Statistics and Census of Ecuador (INEC) [79,80] to generate population density surfaces with dasymetric mapping. This technique allows redistribution of population counts from a set of areal units into a grid using land-cover maps. To implement it, we extracted the most detailed level of census information (census blocks) to avoid aggregation and bias effects [29]. Moreover, since rural population better explains conversion from forest to agricultural land [81], we used census blocks from rural areas with an average size of 4,995 ± 10,250 ha and filtered the population by working age (15-72 years). Following Mennis [82], we adapted his dasymetric mapping with areal weighting approach to operate with rural populations. We first binarized deforestation maps from 2000 and 2010 to produce 2 non-forest masks to represent areas where rural populations were mostly settled in a specific year. Since other features not related to rural population were also present in these masks (e.g., water bodies, cliffs, urban areas), we identified and removed them in order to be consistent with our focus on rural populations. As further details were required to locate rural populations, a road accessibly model [83] was added to these masks to facilitate their identification. We assumed that higher rural population density would occur in areas with better road accessibility [21,84]; therefore, we reclassified the road accessibly map into 3 travel time ranges (high: 0-1 h; medium: 1-3 h; low: >3 h) to obtain what is called rural density classes. We considered only 3 classes, as the algorithm proposed by Mennis [82] was not tested with more than 3. In the next step, we identified census blocks that were almost completely within each rural density class to calculate their densities ( Table 2).  We then averaged their population density fraction to obtain the next values: 0.570, 0.553, and 0.092, which correspond to high, medium, and low rural density classes, respectively. These values are dimensionless and are obtained by: where d uc is the population density fraction of rural class u in census block c, and p uc is the population density (persons/ha) of rural class u in census block c and is divided by the sum of all rural density classes (high h, medium m, and low l) and their census blocks. After calculating with all census blocks, the next step in the algorithm is to evaluate the area ratio. This operation divides the area of class u by 33.3% to adjust densities equally according to the difference in area of each rural density class within that census block. This can be expressed as: where a ub is the area ratio of rural class u and n ub is the number of grid cells of rural class u in census block b, and n b is the number of grid cells in census block b. The next step is to calculate the total fraction by multiplying d uc and a ub and dividing that result by the result of that same expression for all 3 rural classes (h, m, l) in that census block: where f ubc is the total fraction of rural class u in census block b of spatial unit c. The final step in the algorithm is to assign each rural class to grid cells within that census block. This was done by dividing the population assigned to the rural class evenly among the grid cells in the census block that has that rural class. This can be expressed as: where pop ubc is the population assigned to one grid cell of rural class u in census block b of spatial unit c. The result is a population density surface (Fig 6), which represents the number of persons by pixel area (which was set as 1 ha to avoid census block elimination during vectorto-raster conversion). We iterated the algorithm with 20 census variables (see Section 1.3.3) to obtain population density surfaces. In all cases, the pycnophylactic property [85] was verified by adding population density surface pixels and comparing them with their original values.
Only incomplete census blocks were observed as suspicious, because their geometry was modified during the study area extraction. Consequently, their counts were adjusted proportional to their original areas before calculation. In the final step, population density surfaces for 2001 and 2010 were subtracted for each variable to obtain their change.

Variable groups
With the processed censuses, we defined a set of variable groups related to demographic features and their change between 2001-2010. These were selected based on similar research [86,87] and included the categories: age composition (Age), literacy level (Education), gender distribution (Gender), household structure (Household), spoken languages (Language), and work sectors (Work). These constituted the Socioeconomic and Sociocultural macro levels ( Table 3). Another set of variable groups was defined following similar research [6] to include biophysical (Biophysical) and land cover (Land Cover) features, constituting the Landscape macro-level (Table 4). Moreover, cost-distance models of agricultural collection facilities (palm oil, coffee, cacao, fruit, and milk products) were used as proxies of commercial agricultural activities (Agriculture). Another variable group (Infrastructures) considered the Euclidean distance to human-built infrastructures such as oil wells and mining blocks established between 2000 and 2010. Additionally, we include here the linear trend map from the nightlights 2001-2010 dataset. These variable groups formed the Commodities macro level ( Table 4). A total of 34 variables were organized in 10 groups and 4 macro levels, with their values extracted into the analysis grid cells as averages.

Geographically weighted regression (GWR) and random forest (RF)
GWR is a statistical method to model spatial relationships under the assumption of spatial non-stationarity and location interdependency. It was conceived as an extension of linear regression analysis incorporating local estimates and surface representations of relationships among dependent and independent variables [37]. A GWR model can be specified as: . .,m} independent variables at i location and ε i is the error term of the estimation at i location. Since i is considered as an n × n (with n number of observations) diagonal matrix in GWR, its formulation for local parameter estimates at i location is more conveniently expressed as:β whereβðiÞ is a vector of spatially weighted estimates for the k-th independent variables at i location, X is a n × k matrix of independent variables, W(i) is the n × n diagonal weights matrix which ensure that observations near i have the largest weight values rather than those further away [90], and γ is a vector of k observations of the dependent variable. To define W(i) a weighting function is declared considering: (1) the type of distance between i and its neighbors, (2) a kernel function specifying the weighting scheme, and (3) the bandwidth distance to control the number of observations within the kernel. Commonly, the Euclidean distance and the exponential kernel function are used as the weighting scheme [91]. The latter is defined by: where W ij is the weight assigned to observation j for the estimation of i, d ij is the distance between j and i, and bw is the bandwidth. The latter defines GWR mapping sensibility, as large values result in global regression estimates, while small ones introduce randomness [92]. Moreover, bw can be set as a fixed (constant distance) or an adaptive kernel (constant number of local observations). The latter is recommended, as it ensures a sufficient flow of information for each local calibration, while its size can be determined through cross-validation [93]. We implemented an adaptive exponential kernel function using Euclidean distance but tested different values for bw = {100,200,. . .,800} to determine it. Then, for each i and its neighbors, we constructed an RF model instead of a linear regression. https://doi.org/10.1371/journal.pone.0226224.g006 2 Most commonly spoken language by colonos in the NEA. 3 Second most commonly spoken language and ethnicity in the NEA. 4 The language of Huaorani people.
https://doi.org/10.1371/journal.pone.0226224.t003 RF is an ensemble learning method for classification and regression that produces multiple decision trees using bagging to select subsets of training samples and random feature selection to split them [41]. It is easy to compute and is tolerant to missing and multicollinear data [94], moreover it provides error estimates without requiring a validation dataset. During the training phase, it randomly sample with replacement, about two-third of the training samples (referred to as in-bag samples) for a given training set T = {1,. . .,t} to grow a specified number of trees to the largest extent possible, selecting randomly a number of variables V = {1,. . .,v} at each node to determine their split. This gives an ensemble of classification or regression trees, if T and V are bagged repeatedly B times to grow trees with these samples. After training, it averages predictions from all individual regression trees or by taking the majority vote in classification. This can be summarized as: For b = {1,. . .,B}: 1. Sample randomly, with replacement, n training samples and variables from T, V. Set them as T b , V b .

Train a classification or regression tree on
End for 3. Average individual RF b results in regression or by taking the majority vote in classification and calculate model performance To monitor error, the remaining one-third samples (referred to as out-of-bag samples or OBB) are used in an internal cross-validation technique [95], which computes the number of correct predictions. Other accuracy metrics are also possible to derive from OBB (e.g. Kappa, R-square, etc.). In addition, variables predictive power (or importance) can be calculated through different approaches (e.g. Gini index, accuracy decrease, permutation) but permutation is mostly recommended [96].
While the vast majority of RF problems can be solved with a unique (or global) model; here, we followed the approach of Georganos et al.
[43] to combine GWR and RF to derive multiple spatially weighted (or local) RF models. This is possible if during the bagging step of RF, we assign to the neighbor observations of i a sampling probability based on the distance weights or W(i). For this, we can reshape step one from the previous RF workflow as follows: For b = {1,. . .,B}: 1. Apply W(i) probabilities in sampling, with replacement, for n training samples from T. Sample n random variables from V. Set them as Contrary to a global model, in this approach, an ensemble of spatially weighted (or local) RF models are obtained. Their features can be mapped and among them we can mention: 1) local variables importance (LVI), which shows variables predictive power locally (or spatially), for each variable in V; 2) prediction results, which can be also reported as probabilities; and 3) model performance. The latter includes specific metrics for classification (e.g. kappa index, confusion matrix) or regression (e.g. r-squared, mean absolute deviation). To compute these features, in the next section we explain how we implemented GWR and RF.

Implementation of GWR and RF
To operate this GWR and RF as an algorithm, we used the ranger [42] and GWmodel [97] packages. The first case, is a fast C++ and R implementation of RF that allows weights for sampling training observations. This parameter, called case weights in the ranger function, is used to define the spatial weights W(i) for observations near i. The second case, is a complete toolbox for the geographically weighted approach, including functions for: regression analysis, spatial metrics, weight-decay functions, among others. Since the proposed algorithm (named from now as GWRF) involves multiple steps, we summarized them as a pseudocode: Note that the algorithm inputs require to define the dependent variable and we set it as DEF and REF to refer to the annual deforestation and reforestation rates (See Table 1). As the algorithm assume that the rest of variables are independent, we can express them according to their variables groups names:

Vars ¼ fLandscape; Commodities; Socioeconomic; Socioculturalg
Where Vars refers to all variables described in Table 3 and Table 4. Now, we can represent the calibration of GWRF for DEF as: And similarly, for REF as: After reading and preparing models (step 1), a loop is defined for process each element in the spatial dataset Sp. This processing included operations such as: • Data cleaning (steps 4, 5, 8, 10 and 11), following Genuer et al. [98] for recommended practices in RF variables selection analysis; • GWR calculations (steps 2, 3 and 6), following Gao  • Accuracy assessment (step 12 and 13), calculating Kappa and prediction failure in classification; and R-squared and residual standard error in regression. This assessment is conducted with the OBB samples; • and Storing outputs (step 14 and 15). These included: LVI score for each variable in Vars, predictions and probabilities (YHAT), and models accuracies (ACC).
Since all above calculations were computing demanding, we implemented GWRF for parallel computing but processing time depended of kernel bandwidth K bw (See S1 Appendix). Furthermore, we tested GWRF for classification, reclassifying DEF and REF rates into 5 classes (see Fig 4), while for regression we maintained them as continuous values. To decide the best approach, we compared results of ACC for different values of K bw (see section 1.3.4).

DEF and REF linkeages assesment
Since LVI results were extensive, we first plotted LVI DEF|VARS and LVI REF|VARS in a radar plot [99] to observe how forest change rates were influenced by Vars. This facilitated identification of variables with opposite predictive power in DEF and REF, which were selected to map and observe with more detail. Moreover, we calculated variables correlation with forest change rates to further explore their similitude with LVI. For the next step, we followed Freitas et al.
[39] and clustered LVI DEF|VARS and LVI REF|VARS . For this, we used the expectation-maximization algorithm [100], as it allows continuous and categorical data. We determined 2 clusters based on the gap statistics [101] to later select the one with the highest rate. We assume that these areas represent active forest change fronts with similar variables importance, which distil their driving forces. We named them as CLUS DEF  We applied the Wilcoxon rank sum test [102], which computes P-values that test the H0 hypothesis that the two groups have the same distribution. If H0 was rejected (P-value > 0.05), we assumed a difference and subtracted the median values to observe if CLUS DEF|Vars was higher or lower than CLUS REF|Vars . This operation allows us to classify results for each variable in Vars according three categories: • DEF and REF were equal (H0 is accepted; similar DEF and REF medians); • DEF was greater (H0 is rejected; DEF is higher than REF median); • DEF was lower (H0 is rejected; DEF is lower than REF median).
Additionally, we calculated the Cliff's Delta [103] to observe the effect size of Vars in DEF and REF. We applied the thresholds provided in Romano [104] to classify this metric into 4 classes (i.e. negligible, small, medium and large) and complete our analysis.

Comparison between dasymetric mapping and censuses
We processed 21 census variables from 2001 to 2010 with dasymetric mapping (DAS) to compare their population densities with those derived from unprocessed censuses (CEN). For this, we calculated the population density for each variable and census year, using DAS and CEN data sources to plot them and highlight their differences (Fig 7A). DAS exceeded CEN for variables with larger values (e.g., L_spa, G_pom, G_pof) but fell behind for those with smaller values (e.g., L_wao, H_sma, E_lit). On average, DAS obtained 0.06 ± 0.04 and 0.15 ± 0.12 persons/ha for the population density in 2001 and 2010, while CEN obtained 0.02 ± 0.01 and 0.09 ± 0.07 persons/ha, respectively. This means that DAS exceeded CEN by 145% in 2001 and 58% in 2010. Furthermore, we subtracted population densities in 2001 and 2010 in both sources to obtain their change (Fig 7B). Similarly, it was observed that DAS exceeded CEN for variables with large values (e.g., L_spa, G_pom, E_sec) but was inferior for those with small values (e.g., H_lar, L_wao, E_lit). Averaging all variables, DAS obtained 0.08 ± 0.08 persons/ha for the population density change between 2001 and 2010, while it was 0.07 ± 0.08 persons/ha for CEN; i.e., an increase of 27% by DAS. Interestingly, by DAS, variables H_lar and L_wao showed decreases of -23.2% and -31.2%, respectively, for the population density change between 2001 and 2010.

Accuracy assessment of GWRF
We analyzed GWRF according to eight bw values and two approaches (classification and regression) to decide which one achieved the highest accuracy. It can be seen that the classification improved the Kappa as the size of bw increased, around 400 observations stabilized and variability was reduced. Moreover, results for REF were better than for DEF, showing in both cases a logarithmic curve with increasing bw (Fig 8A). This was shown by different regression results, as the R-squared diminished with bw > 100 for DEF but was not seen as relevant for REF (Fig 8B). This was interpreted as unexpected, as it is know that accuracy increases with larger values of bw [93] until its value is large enough to cover all the study area and become a global average. For these reasons, we decided to use a classification approach for bw = 400, as larger values did not significantly improve results, achieving a Kappa of 96 ± 2% for DEF and 97 ± 1% for REF. This indicated that the classification approach resulted in adequate predictions for all classes considered but not in regression. This was probably due to the imbalanced sampling introduced by the kernel during calculations. Finally, we mapped the Kappa for GWRF with bw = 400 to observe its spatial distribution (Fig 9). Here we could see that the lowest relative Kappa values (74-95%) covered areas with the largest rates (>2.5%) in DEF and REF. This implies that GWRF resulted in poor predictions in areas where rates varied (e.g., JS or La Joya de los Sachas in DEF, and TN or Tena in REF) than in areas with homogeneous rate intervals or where few high rate peaks were observed.

LVI comparison and visualization
After the RF classification achieved acceptable results, we created radar plots using the LVI results for the four macro levels considered. In the case of Landscape (Fig 10A), it can be seen that variables related to land cover were important for both DEF and REF but those related to Biophysical seem to be more in DEF. Among them, C_pas, C_bls, and B_alt were more important in REF, while B_fer was only in DEF. Commodities (Fig 10B) indicates that variables related to Infrastructures were more important in REF but those related to Agriculture were more important in DEF. Among them, variables I_min and I_ngt were more important in REF, while A_cao, A_fru and A_plm were more important in DEF. The Socioeconomic ( Fig  10C) indicate that Work was more important in REF but Education was more important in DEF with the exception of E_ilt. Furthermore, the variable group Age showed a similar result Following, we mapped variables: B_fer, I_ngt, E_sec and H_med; as they showed opposite predictive power. Fig 11 shows results for DEF and here it can be seen that high LVI values are spatially related to high DEF rates as well. In REF, this was different as its observed low to medium LVI values (Fig 12) except for variables E_sec and I_ngt whose values are higher in areas with also high REF rates. This is particularly interesting, as these variables seem to be good predictors in both DEF and REF when correlation exists. Following, results from correlations between LVI and forest change rates (see S2 Appendix) indicated that E_pri and W_agr were also good predictors (correlation > 0.131), while worse predictors were A_fru and L_wao (correlation < 0.061). It was observed, that the latter were associated to zero LVI values (see Fig 10) as this was the result of the cleaning routine of GWRF (see section 1.3.5). Furthermore, it was observed that variables such as G_chm, E_hgr, H_med, L_kcw, B_alt and G_chf generated the opposed effect in DEF and REF. These variables have inverse relationships and highlights the complex structure of these land cover change dynamics. To facilitate the analysis of these dynamics, in the next section, we report results from the clustering and the hypothesis testing.

Clustering and hypothesis testing
After clustering LVI into two groups, we tested our hypothesis. In the case of CLUS DEF|Vars we observed that its rate q achieved 2.0 ± 7.5% and included 1160 grid cells (464,000 ha), while in CLUS REF|Vars the rate achieved 4.1 ± 16.4% and included 722 grid cells (288,800 ha). The CLUS DEF|Vars was larger than CLUS REF|Vars with 438 grid cells (175200 ha) but also its rate was lesser by 2.1%. Regarding their locations, both groups matched cantons where higher rates were observed (Fig 13). Interestingly, a boundary between CLUS DEF|Vars and CLUS REF|Vars appears at~482 m.a.s.l. at cantons CH, LR, TN, and AJ (El Chaco, Loreto, Tena and Arajuno), indicating the limit between these regions and their governing forest change phenomena. Moreover, an overlap was seen between CLUS DEF|Vars and CLUS REF|Vars but it was marginal as it included only three grid cells (or 1200 ha).
Following, we describe the results of the hypothesis. We first show results for variables where CLUS DEF|Vars and CLUS REF|Vars medians were equal. This is summarized in the Table 5 and it can be noted that only variables from the Socioeconomic and Sociocultural macro levels were present. These similitudes were also evidentiated by the Cliff´s delta, as resulting effect sizes were negligible. We identified the next variable groups: Education (E_hgr), Gender (G_chm), Household (H_med, H_sma), Language (L_kcw, L_spa and L_wao) and Work (W_agr) and could observe that variables H_sma (small families) and W_agr (agriculture workers) were close to be significant but their Cliff´s delta indicates that their magnitude effects were still negligible.
The next section belongs to variables whose H0 was rejected. We first report variables whose median value was lower in CLUS DEF|Vars . This is shown in Table 6 and here its seen variables from Commodities, Landscape and Socioeconomic macro levels exclusively. They were represented by variables groups: Work (W_ind, W_ser), Agriculture (A_mlk, A_cao, A_fru), Infrastructure (I_oil) and Biophysical (B_alt, B_rfl). From them, variables with the largest difference meant that CLUS DEF|Vars was characterized by: higher accessibility to oil palm extraction facilities (A_plm), closer distance to oil wells (I_oil), lower altitudes (B_alt) and lower annual rainfall (B_rfl). Following, higher accessibility to fruit, coffee and cacao collection centers (A_fru and A_cao) was also seen in CLUS DEF|Vars but their differences with CLUS REF| Vars were smaller. The opposite picture can be inferred from the abovementioned to describe CLUS REF|Vars characteristics. On the other hand, the zero value of L_wao variable can be attributed to its absence in these regions. Finally, variables A_mlk, W_ind and W_ser seem to achieved negligible differences; therefore, we qualified them as not relevant for our analysis. Contrasting these results, we now report variables whose median value was higher in CLUS- DEF|Vars . This is shown in Table 7 and in this case, all macro levels were observed. Among identified variable groups, we can mention: Infrastructure (I_ngt, I_min), Biophysical (B_fer), Land cover (C_frac, C_pas, C_sze, C_bsl), Age (D_adt, D_ygr, D_old), Education (E_pri, E_ilt, E_sec), Gender (G_pof, G_pom, G_chf) and Household (H_lar). From them, the only variable that achieved a large difference was bare soil frequency (C_bsl). This indicated that CLUS DEF|Vars was more prone to experience land clearing after tree removal. This is reasonable if we consider that forest succession implies vegetation regrowth after land clearing. Following, variables with medium effect size meant that CLUS DEF|Vars was characterized by: larger patches sizes (C_sze) and higher pasture frequency (C_pas). In addition, other variables with small effect size indicated higher population density for: secondary education (E_sec), older adults (D_old), Illiterate (I_ilt), and female chief household (G_chf). Remaining also in small magnitude, larger distance to mining infrastructures (I_min) seems to also characterize CLUS DEF|Vars . For the remaining variables, i.e. I_ngt, B_fer, C_fra, D_adt, D_ygr, E_pri, G_pof, G_pom and H_lar, a negligible effect size is observed and were less informative to identify differences between CLUS DEF|Vars and CLUS REF|Vars .

Utility of remote sensing time series-related products for FDD analysis
Some studies have successfully identified proximate causes in FDD analysis by using remote sensing time series-related products [105,106]. In this research, we extended these applications through the use of (i) grid-based rate calculations, (ii) derivation of land-cover metrics, and (iii) trend analysis of Nighttime Lights Time Series to explore correlations with DEF and REF rates. Except for the (i), due its simplicity, the other two deserve further discussion as they represent innovative approaches which are not well documented to our current knowledge. Derivation of land-cover metrics indicates that it is possible to extract additional information from remote sensing time series that can be useful for determining the degree of land-use intensity from previous or posterior land-cover change events. This has been done using spectral trajectories [107,108], but here we show how they can be derived from land-cover maps, with a less sophisticated approach and comparatively limited results. With more dense optical and radar time series availability, it might be possible to more precisely detect land-cover classes that are usually not identifiable by their spectral features (e.g., coffee, cacao, forest plantations) but rather by their spectrotemporal signatures, as other studies have demonstrated [109][110][111]. This could help to improve forest monitoring, but also improve agriculture-related accessibility models, which are more difficult to derive and validate. Furthermore, trend analysis of Nighttime lights Time Series has shown that despite the low spatial resolution, they are still useful for investigating unknown patterns that strengthen model predictions (see Section 1.4.3). This was made possible thanks to free cloud-based platforms that allow processing of vast amounts of data from remote sensing time series and derivation of unprecedented products [25,112]. This opens new possibilities for future research and reformulating known limitations due to processing capabilities and data availability. This does not mean that field data, novel algorithms, and local knowledge can be shared to such platforms naively, as sensitive information could be exposed and distributed without any ethical concern [113].

DAS achievements and failures in intercensal analysis
Previous studies applying DAS have shown its effectiveness and improved performance for census processing in urban areas [114,115], specifically its capability to harmonize data and allow intercensal analysis. However, few studies have explored DAS in rural areas, as it was done in this study to enhance our comprehension of underlying causes in FDDs. This is because DAS reduces uncertainty in rural population mapping, as census blocks in rural areas are generally large in their extension, depending mostly on larger administrative units and have small population counts. Therefore, their population density calculations result in low figures that tend to obscure negative trends. Assuming that the data were collected properly, we saw this effect with CEN results, as it hid negative trends for the variables L_wao and H_lar (i.e., variables with small counts), contradicting DAS as well as other research observations in this region [116]. This is particularly important, as future research may consider more precise mapping approaches than choropleths to perform more reliable population density calculations. Furthermore, as we used a road accessibility model to enhance its location, some observations are worth mentioning. First, this input data incorporated restrictions on non-forest masks with regard to areas less likely to be inhabited. Therefore, their use is valid under the assumption that road accessibility and rural populations are related. However, other transportation sources (e.g., rivers, airfields) may attract rural populations, especially among indigenous groups in the Amazon [117]. This can generate a bias effect that forces allocation of populations to exclusively road-related intervention areas. Moreover, errors in non-forest masks (e.g., confusion with nonanthropic deforestation events, misclassified pixels) could add additional noise that may explain why populations were allocated to areas not known to be occupied (see eastern side of canton CH in Fig 6B, which is a ridge). While identifying and eliminating these artifacts are important tasks in this approach, our recommendation is that future research must improve the methods of rural population mapping before applying DAS, such as using products from Nightlights products with higher spatial resolution than the used in this research.

Advantages and limitations of GWR and RF
GWR is a proven methodology for capturing spatial nonstationary relationships, not as a global overview but as a local estimate [118]. However, the use of this approach depends on its calibration (especially for the bw parameter) and variable selection to reduce its sensitivity to multicollineary. While some studies have proposed different ways to do this [119,120,121], in this study we present a novel approach using the RF algorithm. Even though it was not possible to determine the impact of variables directly from LVI, we could analyze all proposed variables, no matter their multicollinearity, noise, or even type. This represents an advantage in overcoming multicollinearity in GWR, but also selection bias effects [122], which are more complex to control in multivariate problems. Moreover, LVI and its mapping showed the predictive power of selected variables that helped not only to identify those relevant for modelling but also their spatial extent. This subsequently facilitated to extract regions with similar physical and human impact characteristics, which allowed us to discuss with more detail where spatial determinants were relevant to DEF and REF. This strengthens the need for better strategies in land planning. Also future work is needed to explore more applications of GWRFC, as we do not discuss other additional results (e.g. prediction probabilities) or RF model interpretation approaches (e.g. partial dependence plots), which are also possible to derive using this methodology (See S3 Appendix). Furthermore, clustering of LVI spatial representations to later extract the impact of variables without any transformation of original values should be considered as another advantage. While Wilcoxon rank sum test allowed to identify a similitude or difference between rates and variables; it was the Cliff´s Delta test, which gave additional detail to quantify these findings. Critics of null hypothesis significance testing [123,124] and GWR [125], may have found this procedure more convenient that application of parametric approaches in GWR, as assumptions failures and bias effects are less relevant to non-parametric algorithms such as RF. Its further LVI clustering and identification of focus areas allowed to conduct exploratory analysis of original data and statistical tests to better discriminate specific variables interactions.
Nevertheless, some limitations of the proposed methodology are important to mention. Our approach does not constitute a GWR but rather a geographically weighted RF classification (GWRFC; see S3 Appendix). As seen in Section 1.4.2, RF regression achieved a relatively poor performance with respect to GWRFC, forcing our FDD analysis from a regression to a classification problem. That is why we discretized our independent variables into classes and up-sampled unbalanced cases during RF training. The latter operation allowed the GWRFC to obtain acceptable results, as other studies also found [126]; however, is unknown whether unbalanced sampling affected RF regression as well. Breiman [41] also warned the limited performance of RF regression that may be also applied to our results. This highlights the need for further research, and our recommendation is that experimentation with other nonparametric algorithms, especially for regression analysis (e.g., support vector machines, neural networks) should be considered, as novel studies has shown [127]. However, we must caution that GWR is more accepted as an exploratory or interpolation technique rather than a predictive tool [128,129], something already known in the literature but with only marginal discussion [130].

Linkages between DEF and REF in the NEA
Prior works on FDD analysis in Latin America have documented contrasting dynamics where population growth, socioeconomic development, and agricultural expansion affect DEF and REF differently [131]. In our research, we extend these findings to identify more specific and localized FDDs, which enrich the explanations from those already known in the NEA (see Section 1.1). With respect to their location, our results indicated two hotspots that highlighted the tendency for DEF and REF to be spatially clustered, which supports Fagua et al. [132], showing that forest change is not an accidental process, but rather is determined by geographical location and intensity. In this regard, DEF showed a relationship with intense land-use changes associated with oil extraction, increasing nightlight intensity, suitability for commercial agriculture [133], and accesibility to facilities (especially for palm oil, coffee and cacao). This landscape verifies the expansion of the oil industry, economic oportunities, and colonization of the northern and western Ecuadorian Amazon [21,57,134]. This is in contrast to REF, as its biophysical setting (>482 m.a.n.m.) indicated less suitability for commercial agriculture (except for coffeee and cacao) and diminished accesibility to their facilities. Moreover, an increasing distance from oil wells but less distance from mining blocks indicates other natural resource extraction interests [135]. Here, agroforestry systems with patches less than 4 ha combining secondary forests, cacao, and coffee plantations seem to dominate the landscape. This is similar to the traditional "chakra" land-use system described by Torres et al. [136] and to naturally regenerated forests as a consequence of the abandonment of degraded pastures due to nutritional limitations of soils in the region [137,138]. The latter may explain why accessibility to milk production facilities was better than other related agriculture products in REF, but also compares well with Rudel et al. [9] for the highest probability of REF at the shortest distances to roads. This was manifested especially in abandoned pastures where colonos experienced an important out-migration from the late 1980s, as is also described by Carr [81] as a regional trend in Latin America. Nevertheless, future research may consider incorporate migration censuses to corroborate these findings and identify where they manifest locally.
These differences between DEF and REF landscapes were also reflected in their demographic structure. Despite people of all ages (especially older adults, i.e. 45-72 y) from colonos and Kichwas groups related to agricultural activities and secondary education showing more of a link to DEF than REF, we found some variables that highlighted their particularities. In this respect, the diminishing trend of high fertility and large families, which favors more REF than DEF, is remarkable. This resembles the demographic and forest transition theories [11,139] that fit well considering the economic development after the discovery of oil in the Ecuadorian Amazon. Another finding in this direction is increasing education years that seems to have a positive effect on DEF and REF but the latter only when is related to higher education. This suggests environmental externalities produced by education, as has also been reported in other regions [140,141]. Furthermore, a slight DEF association was seen where the male population exceeded the female population, a phenomenon observed in other studies [142,143], but it was not true at all in our case, as chief female households were more strongly associated with DEF, similar to the results of Sellers [144]. However, we found that the number of chief male households exceeded female households for both DEF and REF indicated different proportions. This suggests that land-tenure and land-use decision-making is mostly dominated by males in the NEA, but this could be different among ethnicities, since a diminished effect in both DEF and REF was observed for the Kichwa group respect to colonos but little or no effect compared with the Huaorani group. This agrees with the results of Sierra et al. [145], who reported levels of DEF and REF of 42.7% and 35.7% in the Kichwa territory and 0.3% and 0.4% in the Huaorani territory for almost the same period of time (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008). This indicates that language (as a proxy of ethnicity) together with gender should be considered in future research to better characterize and discuss FDDs, as other studies have also suggested [146,147].

Conclusions
This research underlines the importance of downscaling global problems to the local scale and assessing individual drivers of land-use change in coupled socioecological systems. Applying an experimental methodology fusing remote sensing time series products, dasymetric mapping, and GWRFC, we were able to support the analysis of the spatial distribution of the population and forest dynamics in the Ecuadorian Amazon in more detail. Our findings reveal that at the local scale, key FDDs identified at the global scale can be better described. This was demonstrated in our study, as different groups played different roles in forest change, with varying impact in different regions in the NEA. Accessibility to agricultural collection centers and distance to infrastructure had an influence on both DEF and REF. However, biophysical and land-cover variable groups demonstrated that they could not be minimized, since they are ancillary sources that support and corroborate findings focused on them, i.e., describing suitable conditions for agriculture or natural resource extraction. Furthermore, socioeconomic and sociocultural variable groups had a strong influence on untangling population dynamics and their relationship with forest change, which made interpreting the results challenging and final statements fuzzier. Nevertheless, combining forest dynamics and population information in a geospatial environment underlines their variable complexity and extent. Combining aspects of livelihood patterns can be more meaningful than using proxies to represent individual aspects. The results of this study also highlight the roles of education, gender, and language in forest dynamics, which are more studied in social sciences but therefore show a strong relevance also for environmental studies. Interdisciplinary expertise and transdisciplinary exchange are needed to foster a better understanding of coupled socioecological systems from local to global scales. This can only be facilitated by inter-and transdisciplinary research.
Supporting information S1 Appendix. Processing time for multiple bandwidth sizes using a sample dataset of 1000 obs. with 34 variables.