Biodiversity Mapping in a Tropical West African Forest with Airborne Hyperspectral Data

Tropical forests are major repositories of biodiversity, but are fast disappearing as land is converted to agriculture. Decision-makers need to know which of the remaining forests to prioritize for conservation, but the only spatial information on forest biodiversity has, until recently, come from a sparse network of ground-based plots. Here we explore whether airborne hyperspectral imagery can be used to predict the alpha diversity of upper canopy trees in a West African forest. The abundance of tree species were collected from 64 plots (each 1250 m2 in size) within a Sierra Leonean national park, and Shannon-Wiener biodiversity indices were calculated. An airborne spectrometer measured reflectances of 186 bands in the visible and near-infrared spectral range at 1 m2 resolution. The standard deviations of these reflectance values and their first-order derivatives were calculated for each plot from the c. 1250 pixels of hyperspectral information within them. Shannon-Wiener indices were then predicted from these plot-based reflectance statistics using a machine-learning algorithm (Random Forest). The regression model fitted the data well (pseudo-R2 = 84.9%), and we show that standard deviations of green-band reflectances and infra-red region derivatives had the strongest explanatory powers. Our work shows that airborne hyperspectral sensing can be very effective at mapping canopy tree diversity, because its high spatial resolution allows within-plot heterogeneity in reflectance to be characterized, making it an effective tool for monitoring forest biodiversity over large geographic scales.


Introduction
Mapping biological diversity is a fundamental conservation priority [1] as threats from habitat loss, fragmentation and climate change [2] continue to increase, and international agreements to reduce biodiversity loss (e.g. the Aichi Biodiversity Targets, CBD 2010) require a basis for prioritizing their response [3]. The need is particularly great for tropical forests, because they are major repositories of plant diversity [4], [5], [2]and play a critical role in the global carbon cycle and climate change mitigation, as recognized in international processes such as REDD [6], [7]. However, effective large-scale mapping of biodiversity in tropical forests has proven challenging and spatial information about tropical forest biodiversity is scarce.
Airborne and spaceborne sensors are able measure land cover characteristics over large scales so have the potential to map plant biodiversity at these required scales [8], [9], perhaps because spectral variation of reflectance values are correlated with spatial variation in the environment by means of landscape structure and complexity [10], [11]. The diversity of vegetation was found to relate to the NDVI [12], [13], [14], [15] and both the richness and evenness of tropical tree species were found to correlate with Landsat TM reflectance [16], [17] and spaceborne hyperspectral imagery [18]. Both the alpha and beta diversity of temperate deciduous forest could be predicted using ASTER imagery [19]. In more specific applications, multi-temporal data have been used to discriminate areas occupied by native trees from those dominated by invasive alien species [20], [21]. Given that canopy-tree diversity is often a good proxy for diversity of other taxonomic groups, remote sensing may have potential for mapping biodiversity in general [22]. Nonetheless, results using spaceborne sensors have so far shown only moderate to poor predictive power, even when using high resolution imagery [23], possibly due to low spatial and radiometric resolutions. Furthermore, most sensors on satellites are unable to capture fine-scale variation in biodiversity [16], [12], [23], [24].
Airborne hyperspectral sensors enable mapping at the fine scales desired by land managers [25], [26]. Hyperspectral data may provide information on how chemical and structural properties of vascular plants vary within and across ecosystems [27], [28] and technological improvements now allow them to be used to monitor terrestrial ecosystem characteristics [29], [30]. Hyperspectral data allow individual tree species to be identified from their signatures collected at the forest scale when using airborne sensors [31], [32], [33]. The addition of co-registered LiDAR data further improves performance by identifying intra and inter-canopy shadows which alter species signatures [34], [35]. [36] suggested that hyperspectral data reflects environmental conditions acting upon plants, such as soil pH, water availability, nitrogen availability and others, which are known to influence species distributions and community composition.
Few maps of tree biodiversity are available for Africa [37], [38], [39], [40], so our objective was to assess whether airborne hyperspectral data could be used for this purpose, focusing on canopy-tree biodiversity of a West African moist forest. We estimated alpha-diversity (Shannon-Wiener Index; [41]) in 64 permanently-marked plots in Gola National Park, Sierra Leone, which were also surveyed with a high-resolution airborne spectometer. Among the wide range of modeling tools, we selected Random Forests [42], a machine learning algorithm which handles high dimensional input with ease and has been demonstrated to function robustly [33], [36], [43]. In previous studies using hyperspectral data, RF has been used by [33] to discriminate tropical tree species and by [36] to analyze the species richness of a temperate montane forest in Germany.

Study area and field data
The study area is located at the westernmost end of the West African Upper Guinean Forest Belt, in Sierra Leone, covering the central portion of the Gola Rainforest National Park (GRNP) and some of the southern portion ( Fig. 1), and included in an area defined by UTM coordinates (WGS84, 29N) N307591, E858452 (northeast) and N253197, E807411(southwest). GRPN is collaboratively managed by the Royal Society for Protection of Birds, the Conservation Society of Sierra Leone, and the Forestry Division of the Government of Sierra Leone; they provided the permits to collect the field data used in this study, and to fly over the area during airborne data acquisition. Field study did not involve endangered or protected species.
The region is characterized by lowland moist evergreen forests, with some drier types in place, dominated by Fabaceae, Euphorbiaceae and Sterculiaceae families [44]. The GRNP area has been protected through conservation programs since 1989 but commercial logging, most intensively in the southern block, was carried out in 1963-1965 and 1975-1989. Recent land cover mapping highlighted the importance of the GRNP in conserving this forest from anthropogenic pressure in the surrounding areas [45]. The climate is moist tropical, with annual rainfall around 2500-3000 mm, a dry season from November to April coincident with leaf-off condition of some semi-deciduous tree species, and an altitude of 70-410 m. Floristic information has been derived from a field survey carried out in 2006-2007 [46]. During that survey all trees with a Diameter at Breast Height (DBH) .30 cm were recorded in circular plots sized 0.125 ha. We selected the plots surveyed by an hyperspectral airborne campaign, excluding those located less than 1 km from the park boundary and those affected by cloud shadow in the hyperspectral data, retaining a total of 64 ground truth plots.
The biodiversity of a particular group of organisms in a location can be quantified in terms of richness and evenness [47]. An abundance-based measurement of plant diversity, like the Shannon-Wiener Index, should reflect the structural variability of a landscape much better than species richness, because it captures differences in composition and dominance structure of a given plant community [16]. We calculated the Shannon-Wiener index for each plot, according to the formula: where p i is the proportion of individuals belonging to the ith species in the plot data (R = total number of species).

Hyperspectral data
In March 2012 an airborne survey collected hyperspectral data over parts of the Gola GRNP, using an AISA Eagle sensor with FOV equal to 39.7u, set to record 244 bands with 2.3 nm spectral resolution in the 400-1000 nm range and spatial resolution of 1 m after radiometric correction and orthorectification (Fig. 2). Atmospheric correction of the hyperspectral image strips was performed using the Fast Line-of-Sight Atmospheric Analysis of Spectral Hypercubes (FLAASH) algorithm [48]. Due to high noise levels, all the bands out of the 450-900 nm range and four bands in the 759-766 nm range were removed, reducing the total number of bands to 186. Minimum Noise Fraction (MNF) transformation [49] was used to reduce noise further in the dataset. For each image strip, 9 to 15 MNF components were selected by visual screening and used to compute the inverse MNF and to transform the whole set of bands back to the original data space.
For each of the 0.125-hectare permanent plots, we extracted hyperspectral information from about 1250 pixels, and summarise these data in three ways: (a) the minimum, maximum, mean, and standard deviation of reflectances were calculated for the 186 hyperspectral bands remaining after data cleaning (n = 744; 186 bands 64 metrics); (b) first-order derivatives of the hyperspectral reflectance curves can be useful for data analysis, as they allow small variations of the spectral curve to be enhanced and background noise to be suppressed [50], [51], so these were generated by dividing the difference between successive reflectance values by the wavelength interval, and then applying a seven-point moving filter to smooth results [52], [53]; we calculated the minimum, maximum, mean, and standard deviation of the derivative values obtained for each plot (n = 716; 179 derivatives 6 four metrics); and (c) we calculated the Photochemical Reflectance Index [54], the Red Edge Normalized Difference Vegetation Index [55], the Atmospherically Resistant Vegetation Index [56], the Vogelmann Red Edge Index [57], the Red Green Ratio [58], the Simple Ratio [59], and the Anthocyanin Reflectance Index [60]. We refer to these three datasets as (a) reflectance-based metrics, (b) derivative-based metrics and (c) vegetation indices.

Random Forests regression
We predicted the Shannon diversity index from spectral information contained in the three alternative datasets using Random Forests (RF), a machine learning algorithm employed in many different application domains [61], [42]. RF is a tree-based ensemble algorithm that generates hundreds or even thousands of alternative models (hence, 'forests'). In building a tree, instead of using the best split among all variables, the best split among a subset of randomly chosen variables is used (hence 'Random'). To incorporate the results from the hundreds of models, RF regression uses averaging. The importance of ''features'' (i.e. explanatory variables) can be ranked in two ways. The first is the increase in OOB-MSE if a particular feature is removed. The second is the increase of purity among the splitting groups in the process of building a decision tree if a particular feature is used. We chose to use the first strategy to understand the relative importance of different spectral regions in correlating with biodiversity.
RF was selected after careful consideration of its advantages and shortcomings. An advantage of RF is that it only has two parameters to tune -the number of random features for each split (mtry), and the number of the trees/models to build (ntree) -and having few parameters makes the result highly repeatable. Unlike some other tools, there is no assumption on data distribution. The embedded Out-of-Bag (OOB) strategy which separates one-third of the samples aside for evaluation each time when a model is built provides unbiased internal error estimation, and makes crossvalidation unnecessary [61] http://www.stat.berkeley.edu/ ,breiman/RandomForests/cc_home.htm#ooberr). The OBB strategy also makes feature (i.e. explanatory variable) ranking very straightforward. In our data set, there are only 64 plots, which represent a relatively small sample size considering the great variety of tree species and the vast areal coverage of the study area. Thus a tool using internal estimates is well-suited. However, RF does have some well-recognised limitations. Given that it is a nonlinear statistical modelling approach based on empirical data, models derived in one study region cannot be generalized to any new data sets. Additionally, different airborne data acquisition characteristics and preprocessing steps such as atmospheric and radiometric corrections further complicates a direct reuse of certain model. We chose RF after careful consideration, but do not claim it is necessarily the best tool, nor has comparison been made with other regression methods to show that RF provides the most accurate results.
RF was implemented within the R statistics framework (randomForest package; [62]) using procedures followed in numerous other studies [63], [64]. [42] suggests mtry should be set at 1/3 of the number of input features, while ntree should not normally exceed 1000 [61]. We varied mtry but found 1/3 was a good setting, and varied ntree between 100 and 1000 before settling on 200 after examining the goodness-of-fit statistics. RF regression provides an estimate of the mean squared error of residuals, but this is calculated from the OOB strategy so is different from the MSE generated by least-squares regression. For this reason we call it OOB-MSE. We calculated a pseudo-R 2 which is equal to 1-(OOB-MSE/% variability explained). Again, pseudo-R 2 is indicative, and cannot be compared directly with conventional R 2 .

Forest plot data
The 64 plots contained a total of 133 species. In the cumulated sampled area (8.125 ha) the total number of recorded trees was 676. The 15 most common species (i.e. .10 individuals) comprised .50% of individuals (Table 1), with Caesalpinioideae being the most represented sub-family. The species-area curve showed that the sampled area was big enough to capture most of the large-tree diversity of the site [65] (Fig. 3). The Shannon-Wiener index ranged between 0 and 2.63, with a mean value of 1.68 and a standard deviation of 0.48.

Regression results
RF indicate that the Shannon-Wiener index can be predicted to a good level of accuracy using the plot-level statistics derived from hyperspectral bands (Figure 4 and Table 2). Models fitted using the reflectance-based metrics (i.e. calculated directly from the hyperspectral reflectances) had pseudo-R 2 = 84.9% and OOB-RMSE = 0.30. Models fitted using derivative-based metrics had lower explanatory power, with pseudo-R 2 = 71.4% and OOB-RMSE = 0.35. Vegetation indices were very poor predictors of diversity, giving rise to negative pseudo-R 2 that indicate an inability of the models (on average) to explain any of the variability in biodiversity among plots The mtry and ntree for the HS metrics were set at 340 and 200, respectively. The mtry and ntree for the HS 1 st derivatives were 280 and 200 respectively.
The rank importance of ''features'' (calculated from the percentage increase in OOB-MSE when features are removed one-by-one from the model) indicates that within-plot-variation in hyperspectral reflectances are strongly correlated with the biodiversity index. Fig. 5 shows the ranking of hyperspectral reflectance-based metrics (maximum, minimum, mean, standard deviation of band reflectance) and Fig. 6 for the same metrics derived from the derivative-based dataset. When hyperspectral band metrics were used, the most important inputs were standard deviations from the green region, but contributions came from across the spectrum and for other metrics. When the derivativebased dataset was used, standard deviations from the near infrared region provided by far the highest ranking inputs, possibly due to the ability of the derivatives to suppress background signals that are prevalent in this region. In both of these models, the most important statistical metric was standard deviation, indicating that within-plot spectral variation is most informative in explaining diversity variation.

Discussion
The West African study is the latest in a series to shows that airborne imaging spectroscopy can be effective at mapping tree diversity, particularly when recorded at high resolution [26]. The Random Forests algorithm found that within-plot variability in various hues of green was closely related to biodiversity (pseudo-R 2 = 84.9%). Spectral reflectance vary greatly within individual tree crowns, between tree crowns of the same species, and are influenced by viewing geometry soil characteristics, forest vigor and the presence of liana [31], [32], [66], [38], [67]. However, these statistical analyses seem to have picked up the same signal as the naked eye would -that species-rich plots have a greater number of subtly different canopy colors than species-poor plots.
It is likely that the high resolution of our imagery (1 m pixel size vs .5 m diameter for a typical tree crown) was important to characterizing variability in spectral reflectances. In another study using high resolution imagery, [25] related vascular plants species richness in lowland forest in Hawaii to hyperspectral data from NASA's Airborne Visible/Infrared Imaging Spectrometer (pixel size of 3.6 m). They found that a regression model using derivative reflectances in regions associated with upper-canopy pigments, water and nitrogen content had a high goodness of fit (R 2 = 0.85). In contrast, [36] had less success with lower resolution imagery in German montane forests. Using HyMap hyperspectral imaging (VIS-SWIR with 7 m spatial resolution), they obtained a maximum R 2 of only 0.29 between species richness and reflectances, even when full waveform lidar data were included in the model. A better fit was obtained by the same sensors, this  time at 5 m pixel size, when mapping Shannon-Wiener index within a savanna ecosystem (R 2 of 0.41; [9]). Among the studies based on satellite data at lower spatial resolution, one successful result has been obtained by [19] who retrieved the Shannon index with a coefficient of determination of 0.61 in a temperate forest using ASTER data. Another study directly estimating the Shannon-Wiener index in a tropical forest was realized by [18] using the Hyperion sensor in Costa Rica with 30 m pixel size. Using wavelet decomposition followed by a stepwise regression they found that the Shannon index could be predicted with a R 2 of 0.84; vegetation indices were not such good predictors as wavelet features. The selected bands were those from the shortwave infrared region and one from the visible region of the spectra (621 nm). Our results are very similar to those obtained by [18] with respect to the ecosystem under analysis, the ability to retrieve the Shannon-Wiener index, and the poor results obtained with vegetation indices.
Together, these satellite and airborne based results suggest that spatial resolution is not the main key to successful mapping of biodiversity, and additional studies targeting different ecosystems are needed to clarify the relative importance of spatial and spectral resolution.
Derivative analysis might not be optimal for our aims and data, resulting in a lower R 2 ( Fig. 4; Table 2), similarly to what has been found by [32] in a tropical tree classification study. A possible explanation can be attributed to the fact that derivative is very sensitive to noise in the original spectrum. The residual noise is emphasized in the derivative spectra and this may vary according to the pixel location on the tree crown. In addition, environmental or stress factors such as moisture content and leaf age introduce subtle variations in crown reflectance that are enhanced by differentiation. Consequently, spectral variation within crowns can be unnecessarily boosted in the derivative domain, interfering with the identification of differences amongst crowns.
Species richness and Shannon-Wiener index are both widely used as indices of diversity in the remote sensing literature, but we argue that the abundance weighted index is more valuable from an ecological perspective. [25] estimated species richness and Shannon-Wiener index in lowland Hawaii from AVIRIS, obtaining a better goodness-of-fit when using species richness. Similarly, [36] found that species richness was the better of the two response variables in terms of goodness-of-fit for German forests. The reason we recommend using the Shannon-Wiener index is that ecosystem processes, such as water balance and nutrient cycles, depends primarily on the functional characteristics of the most abundant species [68]. The Shannon-Wiener index is weighted in favor of abundant-species, making it more useful for relating spectral signals to local ecological processes. However, a  key open question in biodiversity studies is whether information on canopy biodiversity can be a surrogate for sub-canopy biodiversity; with this respect is of interest the [34] research, which estimated the diversity of foliar chemicals within the canopy as a whole using hyperspectral data, and related this to faunal and floral distributions.
There is currently great interest in using airborne remote sensing to go one step further, and map individual canopy species in tropical forests [26]. Biophysical and functional attributes of forest canopies that can be used to distinguish among individual species [69], and help in understanding the relationships among spectral response, foliar biochemical components and canopy geometry. For instance, [20] used spaceborne hyperspectral data to map the spread of a nitrogen-fixing invasive trees in Hawaii, because the nitrogen-fixer was spectrally different from non-fixing trees. They also found that phenology is a key to distinguish species, and suggested the need for intense multi-temporal monitoring to maximize species separability. [70] have also discussed the role of hyperspectral remote sensing in tracking plant invasions, highlighting that these data can inform predictive models of invasions and species habitat suitability analysis. Using AVIRIS data from Hawaii, [71] found that differences in canopy spectral signatures were linked to differences measured in leaf pigment (chlorophyll, carotenoids), nutrient (N,P), and structural (specific leaf area, SLA) properties, as well as to canopy leaf area index. In a study addressing how leaf spectroscopy scales to canopy level reflectances, [72] used a leaf optical radiative transfer model (PROSPECT-5) to explore the relationship linking classi-fication accuracy at the leaf level to canopy biodiversity, and found that it showed an asymptotic trend which suggests the uniqueness of spectral signature for a significant proportion of the 188 studied tropical species. Detecting individual species from aircraft is more technically demanding than the analyses presented here, but the approaches hold great promise and may eventually dispense off the need for diversity-index mapping.

Conclusions
The present research demonstrates the ability of an airborne hyperspectral sensor to predict the canopy Shannon-Wiener index in African tropical forests, and is among those pioneer valuable efforts that could open the way to improved biodiversity monitoring. Airborne hyperspectral sensors represent today an important and cost-effective tool to target areas with high biodiversity, high vulnerability to change (e.g., occurring on deforestation fronts) and/or with tree species that are of particular importance [66].
However, data acquisition in remote and biodiversity rich study areas is still exceptionally challenging. Problems with data and ground truth gathering as those we faced, such as the time lag between field data collection and the airborne survey, or the difficulties in obtaining accurate geo-referencing of field plots, might affected the results and have to be carefully considered when planning hyperspectral-based biodiversity monitoring.
Our experience shows that the use of standard devation of reflectance provides satisfactory results, in agreement with the spectral variation hypothesis. We find RF an effective regression tool which is fairly easy to use, and the OOB feature ranking a valuable source of info pertaining to the feature importance.
Overall, considering other available studies and results, there is a clear need to further increase research on the use of airborne and spaceborne hyperspectral imagery in different ecosystems, to enhance our understanding of the optimal techniques to map the distribution of life on earth. This should be accompanied by quality biodiversity field information collected with proper sampling strategies. For future studies planning, the addition of SWIR spectral region should be considered, as well as of airborne laser scanner (ALS) data, recently reported as valuable source of information for biodiversity [73], [35], [74].