Fig 1.
A map showing the distribution of four river basins (i.e., New River, Illinois River, Brazos River, and Snake River) selected for this study in the contiguous United States.
We can see that all these four rivers pass through multiple states. Fish presence data are sufficient in these four basins in the IchthyMap database for developing and validating species distribution models (S2 File). Specifically, the number of presence records of non-game species used to develop species distribution models was 2,716 for Brazos River Basin, 5,635 for Illinois River Basin, 5,192 for New River Basin and, 412 for the Snake river Basin.
Table 1.
The sources and descriptions of environmental variables used to develop species distribution models for the 76 native stream fish species in the United States.
Table 2.
A summary on the Analysis of covariance, ANCOVA [58].
Fig 2.
Comparing the performance of Lasso logistic regression model and boosted regression tree (BRT) models in terms of the area under the receiver-operating-characteristic (ROC) curve in the 5-fold cross validation for 76 species in the four selected river basins (i.e., New River, Illinois River, Brazos River and Snake River).
The results from the two set of models were generally in agreement, with Pearson’s r over 0.9. For fish species Mountain whitefish, Prosopium williamsoni and Torrent sculpin, Cottus rhotheus (marked as circles) in the Snake River where occurrence data was relatively sparse, the Lasso logistic models outperformed the BRT models.
Table 3.
A table summarizing the Tukey's test [6,4] after the analysis of variance that evaluated the sources of effects on the performance of species distribution models.
Fig 3.
The effect of prevalence (i.e., the proportion of presences among all the observations) on the performance of species distribution models.
The total sample size (N) for the two rare species (R), Candy darter (Etheostoma osburni) and Spotfin shiner (Cyprinella spiloptera), was set at 100; while N was decreased from 300 to 100 for the two common species (C), Bigmouth chub (Nocomis platyrhynchus) and Northern hog sucker (Hypentelium nigricans), to evaluate the effect of sample size.
Fig 4.
Examples of using partial dependence curves to capture ecological thresholds of spatial distribution of species.
For example, the thresholds of mean slope (degree) in the watershed and number of stream-road crossings were identified for Rainbow darter (Etheostoma caeruleum) in the panel A and B. The thresholds of 20-year (1961–1980) average annual minimum temperature and mean annual flow velocity were identified for Mountain redbelly dace (Chrosomus oreas) in the panel C and D.