Climatic and Catchment-Scale Predictors of Chinese Stream Insect Richness Differ between Taxonomic Groups

Little work has been done on large-scale patterns of stream insect richness in China. We explored the influence of climatic and catchment-scale factors on stream insect (Ephemeroptera, Plecoptera, Trichoptera; EPT) richness across mid-latitude China. We assessed the predictive ability of climatic, catchment land cover and physical structure variables on genus richness of EPT, both individually and combined, in 80 mid-latitude Chinese streams, spanning a 3899-m altitudinal gradient. We performed analyses using boosted regression trees and explored the nature of their influence on richness patterns. The relative importance of climate, land cover, and physical factors on stream insect richness varied considerably between the three orders, and while important for Ephemeroptera and Plecoptera, latitude did not improve model fit for any of the groups. EPT richness was linked with areas comprising high forest cover, elevation and slope, large catchments and low temperatures. Ephemeroptera favoured areas with high forest cover, medium-to-large catchment sizes, high temperature seasonality, and low potential evapotranspiration. Plecoptera richness was linked with low temperature seasonality and annual mean, and high slope, elevation and warm-season rainfall. Finally, Trichoptera favoured high elevation areas, with high forest cover, and low mean annual temperature, seasonality and aridity. Our findings highlight the variable role that catchment land cover, physical properties and climatic influences have on stream insect richness. This is one of the first studies of its kind in Chinese streams, thus we set the scene for more in-depth assessments of stream insect richness across broader spatial scales in China, but stress the importance of improving data availability and consistency through time.


Site selection
Eighty sites were included in the study from three locations (West: 10 sites; Central: 29 sites; East: 41 sites) in the mid-latitudes of China between 25°and 36°N latitude, 98°and 118°longitude (Fig 1), and with an altitudinal gradient of 3899 m (min: 65; max: 3964 m asl; S1 Fig), spanning from the western Chinese Province of Yunnan to the Anhui province in the east. All the sites were selected to be relatively unimpaired sites, in near-natural conditions, based on present-day conditions. Most were located in landscapes dominated by forest, with limited agricultural and urban land cover.
Where permission to sample was required, it was granted by Cangshan, Baimaxueshan and Gaoligongshan Nature Reserve Administrations (some western sites), and Hubei Shennongjia National Nature Reserve Bureau (some central sites). All other sites were outside of nature reserves and thus no specific permission was required to sample the streams, given no endangered or protected species were collected or involved in sampling.

Biological data
Sampling was performed between November 2006 and October 2011. Sampling approaches and times differed between regions (West: October 2011; Central: November 2006 and April 2008; East: November 2010). Some of these data have been used in previous publications [28,[30][31][32]. However, sampling was performed to reflect habitat composition adequately. Macroinvertebrate samples were collected from a 50 to 100-m reach of the selected stream. Sampling at most sites followed a standardized multihabitat sampling approach [33], where between 10 and 20 subsamples representing the current habitat composition (total sampling area 1.25 m 2 ) were taken with a 500-μm shovel sampler. Eleven of the 29 central sites were collected using three 0.1-m 2 Surber sampler, but EPT richness did not differ between the two approaches within this region (Mann-Whitney U = 96.5, P = 0.928), so we included these 11 sites in the analysis. A single composite sample was made and samples were preserved with 75-95% alcohol (or 10% formalin in the 11 aforementioned central sites) in the field and sorted and identified in the laboratory to genus level where possible (with the exception of very few difficult-to-identify taxa) using available keys [34][35][36][37][38].

Environmental variables
To predict richness of EPT we used three categories of environmental variables: bioclimatic (based on immediate location of sampling site), catchment land cover (based on percentage of upstream land cover categories), and physical properties (based on elevation and catchment slope and size). To do this, we created watersheds for each site with the 'watershed' tool in Arc-Map 10.0 (ESRI Inc., Redlands, CA, U.S.A.), using the 'flow direction' layer of 30 arc-seconds (~1 km) resolution from the HydroSHEDS database [39] (http://hydrosheds.cr.usgs.gov). The catchment size in km 2 was calculated by summing up the number of grids (1 × 1 km). Catchment slope was also calculated in ArcMap 10.0.
Land cover data (Global Land Cover Database; http://bioval.jrc.ec.europa.eu/products/ glc2000/products.php, accessed 16 August 2013) was extracted from the entire upstream catchment of each site ( Table 1). The percentage of present land cover was calculated for the following land cover types: broadleaf trees, needleleaf trees, shrub, herbaceous, cultivated, and water ( Table 1). The cumulative land cover of the upper subcatchment upstream from individual sites has been found to be a better predictor of species occurrence than the land use at the sampling site itself [9,42].

Statistical analyses
All statistical analyses were carried out in R 3.0.2 [43].
To characterise and compare insect richness patterns and environmental variables between each of the three regions, we tested for differences using a combination of linear and generalized linear modeling (GLM). First, we tested for differences in environmental variables between the three regions with one-way ANOVA using the 'aov' function (S1 Fig). Second, we tested for normality of richness of each of the groups and EPT combined using the Shapiro-Wilk test of normality with the 'shapiro.test' function, followed by visual inspection of qq-plots. As richness was not normally distributed, we used GLMs with Poisson error distribution and log link functions to test for differences between the three regions. We followed these up with posthoc pairwise Tukey's tests to test for differences between individual groups, using the 'glht' function in the 'multcomp' package [44].
Boosted regression trees. To explore the influence of the environmental variables on stream insect richness for both combined and individual EPT groups, we used boosted regression trees (BRT) [45][46][47] in the package 'dismo' [48]. BRT is a powerful and advanced form of regression, which combines traditional statistical techniques with machine learning. The boosting approach of BRT enhances predictive ability by combining a large number of relatively simple regression trees [47,49], as opposed to producing a single optimal (complex) model of traditional least squares regression. BRT excels in its ability to model complex non-linear relationships, and is thus ideal for identifying key predictor variables.
We aimed to extract the most important drivers of richness, thus we used the gbm.step procedure, which takes a stepwise approach to model selection, and we used the poisson family loss function. Each successive tree is fitted to the residuals of those already selected, and shrinking is used to reduce the contribution of each tree and averaged across the final selected set of trees. For selection of the optimal number of trees, we used 10-fold cross-validation. Cross validation here allows testing the developing model with held-out data, ensuring a final model that is able to predict non-training data. We set the bag fraction of models to 0.5, which introduces randomness into the model, by randomly selecting a fraction of the training set to propose for the next tree. We used a learning rate of 0.0005 to ensure at least 1000 trees were reached, which is a relatively slow learning rate ensuring reliable model estimates. We set tree complexity to 5, which controls the number of splits in individual trees. Given the lack of latitudinal range in this study and the fact the three regions were well separated, as well as the strong west to east gradient of many of the environmental variables such as elevation, slope and rainfall, we performed two groups of analyses separately: one including and one not including latitude as a predictor variable (eight models total). This allowed us to tease apart the importance of latitude and the underlying environmental gradients driving patterns in richness. Furthermore, because of the clear grouping of sites into three distinct regions, we also included a variable 'region' into the models, to account for any potentially unmeasured or historic influences unique to the three regions.
For each of these two approaches, we used the same settings for each of the four models (EPT, E, P, T) and included all environmental variables from each of the three categories (climate, land cover and physical) in each model (Table 1). While gradient boosting methods are robust towards collinear predictor variables, to aid interpretability, we removed highly correlated variables (r > 0.75). This resulted in 7 of the original 21 bioclimatic variables being used ( Table 1).
The relative influence of individual predictors can be assessed in BRT, based on how often the variable is selected, and how this selection improves the model. These relative influences are then scaled so the sum of influences equal 100%. We also investigated the shape of the fitted functions of the top five variables influencing each of the four models using partial dependence plots. These plots indicate the model outcome in relation to the given independent variable, after considering the average effect of other independent variables in the model. We estimated overall model performances with cross-validated percentage of deviance explained, and the cross-validated correlation coefficient between observed and model-fitted values. While we present both the training data and cross-validated results, the training data results should not be evaluated as they have likely overfit the data.

Boosted regression trees
Important environmental variables. Climatic, physical and land cover variables all influenced richness of the four groupings (Table 2 and Figs 3 and 4). The only land cover variable to have a strong input in the models was percentage of catchment in broadleaf tree cover (Figs 3  and 4). Slope, elevation and catchment size all contributed substantially to individual models (Figs 3 and 4). Despite significant variation in climatic variables between the three regions (S1 Fig), no climatic variables had consistently strong influences for the three taxonomic groups (Figs 3 and 4). However, temperature variables were consistently more important than precipitation (Figs 3 and 4). Inclusion of latitude in the models did not improve model fit for any of the models (Table 2). Furthermore, the factor 'region' had little influence on any of the models.
Models for each of the three orders were considerably different from each other ( Table 2 and Figs 3 and 4). The only climatic variables to appear in the top five influential variables for more than one group were mean annual temperature and temperature seasonality (Figs 3 and 4).
Prediction of EPT richness. The EPT model explained 47.3% of the cross-validated deviance (CV correlation between raw and fitted values = 0.691 ± 0.061) in richness ( Table 2). The most important variables in the EPT model were elevation and the percentage of catchment broadleaf tree cover ( Table 2 and Fig 3), which both had a positive effect on richness after accounting for the average effects of all other variables (Fig 4). In general, the first five most influential variables suggest EPT richness is promoted in areas with high broadleaf tree cover, high elevation and slope, large catchment size, and low mean temperature (BIO1) (Fig 4). Latitude had little influence on the EPT model where it was included ( Table 2 and S2 Fig).
Prediction of richness for individual orders. The Ephemeroptera model explained 25.9% of the cross-validated deviance (CV correlation = 0.559 ± 0.048) in richness ( Table 2). Broadleaf tree cover was the most important influence on mayfly richness, with a similar influence on the model as in the EPT model (Table 2 and Figs 3 and 4). On the whole, the first five most influential variables on the mayfly model suggest mayfly richness is promoted in areas with high broadleaf tree cover, low PET, medium-to-high catchment sizes, high temperature seasonality (BIO4), and mean temperature of the wettest quarter (BIO8) (Fig 4). Including latitude as a predictor did not improve model fit, but the relative influence of latitude was high (19.4%), leading to a reduction in importance of climatic variables (Table 2 and S2 Fig). The Plecoptera model was able to explain 30.3% of the cross-validated deviance (CV correlation = 0.600 ± 0.098) in richness ( Table 2). Temperature seasonality (BIO4) was the most important influence on the stonefly model, with a negative influence on the response after accounting for the average effects of other variables (Table 2 and Figs 3 and 4). The first five most influential variables indicate stonefly richness is highest at sites with low temperature seasonality and mean annual temperature (BIO1), high slope and elevation, and high rainfall in the warmest quarter (BIO18) (Fig 4). Latitude contributed 10.3% to the Plecoptera model where it was included ( Table 2 and  The Trichoptera model explained 63.1% of the cross-validated deviance (CV correlation = 0.806 ± 0.039) in richness ( Table 2). Elevation had the strongest influence on caddisfly richness after factoring out other variables (Table 2 and Figs 3 and 4). Based on the most important variables, caddisfly richness appears highest at high elevation sites, with high broadleaf tree cover, and low mean annual temperature (BIO1), temperature seasonality (BIO4) and AI. Latitude had little influence on the Trichoptera model (Table 2 and

Discussion
Environmental predictor variables of local richness differed strongly, such that no variables, either climatic, land cover or physical, were consistently important for each of the three orders, and EPT in general. Furthermore, the amount of deviance explained in richness patterns was much higher for caddisflies compared to mayflies and stoneflies. These discrepancies likely reflect differences in habitat preferences, tolerances and dispersal abilities. For instance, stoneflies are generally considered to favour colder, more oxygen rich streams [13,50], which often restricts them to headwaters, whereas the opposite may be true for mayflies and caddisflies in certain cases (e.g. [13]). While our study was influenced by several factors including clear Table 2. Results of boosted regression trees. Results of the eight boosted regression tree models predicting genus richness of Ephemeroptera (E), Plecoptera (P), Trichoptera (T) and combined EPT from climatic, catchment land cover and physical variables in 80 streams across mid-latitude China. The first four models do not include latitude as a predictor. N trees = number of trees; dev = deviance.

Model Summary Information
Training differences between regions in stream and catchment properties, as well as different sampling dates between the regions, models had good cross-validated predictive ability. Furthermore, where season differed between sampling sites in the central, there was no difference in EPT richness. Interestingly, while latitude had a reasonably strong influence on the Ephemeroptera and Plecoptera models, it did not improve the predictive performance of any of the models. Furthermore, latitude did not operate as expected, particularly with regard to Plecoptera. Stonefly genus richness is substantially lower in the Oriental zoogeographic zone than the Palaearctic, which the sites sampled here span the boundary of [51]. However, a reduction in richness was not evident from north to south in our study (Fig 2), and in fact the southernmost sites were the richest, reflecting more suitable habitat conditions in the southwestern sites. This latitudinal influence was less expected for mayflies and caddisflies given the minor difference in their generic richness between these two zoogeographic regions [52,53].

Land cover influences
Given we selected sites based on minimal anthropogenic influence, we expected land cover influences to be minor on the prediction of richness. However, the most or second most  Table 1.  important influence on EPT and mayfly (but not stonefly or caddisfly) richness was the percent of catchment broadleaf tree cover, which had a positive, and relatively linear, influence on the models. This is in line with a recent study of EPT richness in the contiguous USA [6], although their study differed from ours in that their sites covered a range of land covers including agricultural. Furthermore, across a much broader scale, Vinson & Hawkins [5] found richness of each EPT taxa was highest in broadleaf forest biomes. Forest cover in this instance likely represented a host of other factors that were associated with it, such as substrate characteristics, temperature and nutrients; a finding also suggested by Beche & Statzner [6].
As we do not expect anthropogenic pressure to be driving differences in these streams, the reason for the relatively depauperate fauna in the eastern region sites is unclear. One potential reason could be related to underlying geology. However, it may also be related to the smaller average size of these eastern sampling sites (S1 Fig). While headwaters harbor high levels of biodiversity, this is often at the beta diversity level through turnover between sites rather than at the alpha diversity level [54][55][56]. This may be a result of lower connectivity in headwaters than downstream sections, leading to a possible transition from species sorting in headwaters to mass effects downstream [57]. This should result in higher alpha diversity in downstream sections compared to headwaters and vice versa for beta diversity.

Physical influences
We expected elevation to be an important predictor of richness given the large range between the three regions (see S1 Fig). However, while elevation strongly influenced the BRT models for caddisflies and total EPT, it had a minor influence on mayflies and stoneflies. This was mostly evident as an increase from 0-1000 m asl. Previous studies on stream invertebrates have found decreasing richness with elevation [58,59], while in our study, elevation had a positive influence on the EPT and caddisfly richness models. While stonefly richness clearly increased from east to west, elevation did not have a strong influence on their prediction. Nonetheless, when assessing the link with elevation separately, richness also increased significantly for stoneflies (S3 Fig), which somewhat fits with their previously stated temperature preferences. There was also evidence of a mid-domain effect (i.e. peak at intermediate elevation) for mayflies (S3 Fig), although this may be an artifact of the data related to reduced richness in the eastern region streams, which were also lower in elevation than the other sites.
The lack of influence of elevation likely reflects the fact that temperature and slope, which covary strongly with elevation, are more important factors than elevation itself. Therefore, elevational changes often affect species through indirect changes to factors such as temperature. The benefits of the approach we used here (BRTs) is clearly emphasized with its ability to tease apart the key and potentially more direct influences on richness as opposed to the more indirect influence of elevation. For instance, slope was a good predictor of stonefly richness, which fits with expectations for the preferences for cool, high-oxygenated streams. Nevertheless, it is important to bear in mind that streams were clustered in three distinct regions, which differed in their characteristics, such as elevation and slope, as well as sampling approach, giving rise to spatial autocorrelation in some predictor variables.

Climatic influences
Of the climatic drivers, temperature mean and seasonality were the most consistently important for all groups, with organisms favouring less arid areas with lower mean temperatures and seasonality. This trend was usually apparent in the middle of the range of temperature related variables. We expected temperature to play a major role in predicting richness due to species richness being more closely correlated with energy availability in regions without water deficit [60,61]. However, important temperature variables suggested cooler and less extreme (i.e. lower PET, AI, BIO1, BIO4) areas supported higher richness and their influence was variable across the three groups.
Given productivity and disturbance can have strong and often interactive effects on biodiversity patterns, both in general [62,63] and in streams [64][65][66], we expected that variables associated with disturbance (e.g. precipitation seasonality), despite being seasonally predictable in monsoonal areas, and productivity (e.g. PET and temperature of wettest quarter) may have important influences on richness patterns. This proxy-based approach was used by Vinson & Hawkins [5], who found productivity and disturbance strongly explained global patterns in stream insect richness. However, climatic factors that were consistently more important in our study represented temperature, rather than rainfall, seasonality and extremes (e.g. temperature seasonality and PET), which likely reflects the smaller spatial scale of our study compared to Vinson & Hawkins [5]. Models at the catchment scale (< 2000 km 2 ) in Chinese streams have found that seasonal hydrological events triggered by monsoonal precipitation are important predictors of species distributions of stream macroinvertebrates [31].

Historical and scale-related influences
While contemporary ecological factors may be more influential on stream insect diversity patterns than historical biogeographical influences (e.g. [5]), it is important to consider that historical land use may influence the present day biodiversity pattern [8]. Despite selecting for low impact sites based on the present-day conditions, the long history of Chinese civilization suggests there may be a historical land use signature present in these streams through centuries of use (mainly rice and tea plantations in the central and eastern zones and grazing in the west) that we were unable to determine at the site selection phase.
The factor 'region', which can be thought of as somewhat representing historical conditions, had little influence on the outcome of the BRT models, although many environmental variables differed strongly between the three regions. For instance, streams in the east were smaller (based on catchment size) and many other physicochemical variables differed (S1 Fig). A previous study on macroinvertebrate communities across a large spatial scale in China found spatial variation between locations was the best predictor of communities, with communities grouping based on location rather than some other environmental gradient [28]. The influence of local and regional influences on richness in streams varies with scale [19], and clear evidence of this scale-dependence exists in China for woody plants, for example [67]. The inclusion of localscale variables would likely have increased the predictive ability of models, as these are essentially finer resolution reflections of larger-scale influences. For instance, a recent study in pristine New Zealand streams found high levels of variation in EPT richness could be explained from variables limited to the local scale (i.e. variables limited to the sampling reach, rather than catchment based) [68].
Traditionally, but somewhat anecdotally, stoneflies have been considered weaker dispersers than other groups of stream insects (e.g. [69]), thus one might expect historical influences to shape their distribution more significantly than the other groups. However, as China has experienced a relatively benign climatic history, not experiencing glaciation during the Last Glacial Maximum (LGM), with the exception of the Qinghai-Tibet-Plateau, we did not expect strong dispersal limitation to be a major factor shaping these stream assemblages. In fact, niche/environmental factors were found to exert more control on EPT richness in contiguous USA streams than spatial factors suggesting little dispersal limitation for these taxa [6]. Furthermore, Heino and Mykrä [70] found no difference in the spatial structuring of mayflies, stoneflies, caddisflies and midges, which are considered to differ substantially in dispersal ability.
Nonetheless, historical influences may have had an influence on patterns with regard to the higher stonefly richness in western sites. Others have found high richness in this Eastern Himalayas region [71], known as a biodiversity hotspot. Higher stonefly richness in this region may be related to their weaker dispersal ability leading to greater species accumulation than the other areas as a result of the dynamic historical rising processes of the Qinghai-Tibetan Plateau [72].

Conclusions
The relative importance of climate, land cover, and physical factors on stream insect richness varied considerably between the three orders across these mid-latitude Chinese streams. No specific variables were consistently strong predictors of richness of the three orders or combined EPT, with broadleaf tree cover, elevation, slope, temperature seasonality and mean temperature being the most important. Furthermore, while latitude was an important predictor of Ephemeroptera and Plecoptera richness, it did not improve overall prediction of any of the groups. Given the lack of large-scale assessments of stream insect richness across China, this research opens the door for more in-depth research across broader spatial scales. However, for this to be possible it is crucial that data be made available and collection methods are consistent across studies.