Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Body Size and Geographic Range Do Not Explain Long Term Variation in Fish Populations: A Bayesian Phylogenetic Approach to Testing Assembly Processes in Stream Fish Assemblages

Body Size and Geographic Range Do Not Explain Long Term Variation in Fish Populations: A Bayesian Phylogenetic Approach to Testing Assembly Processes in Stream Fish Assemblages

  • Stephen J. Jacquemin, 
  • Jason C. Doll


We combine evolutionary biology and community ecology to test whether two species traits, body size and geographic range, explain long term variation in local scale freshwater stream fish assemblages. Body size and geographic range are expected to influence several aspects of fish ecology, via relationships with niche breadth, dispersal, and abundance. These traits are expected to scale inversely with niche breadth or current abundance, and to scale directly with dispersal potential. However, their utility to explain long term temporal patterns in local scale abundance is not known. Comparative methods employing an existing molecular phylogeny were used to incorporate evolutionary relatedness in a test for covariation of body size and geographic range with long term (1983 – 2010) local scale population variation of fishes in West Fork White River (Indiana, USA). The Bayesian model incorporating phylogenetic uncertainty and correlated predictors indicated that neither body size nor geographic range explained significant variation in population fluctuations over a 28 year period. Phylogenetic signal data indicated that body size and geographic range were less similar among taxa than expected if trait evolution followed a purely random walk. We interpret this as evidence that local scale population variation may be influenced less by species-level traits such as body size or geographic range, and instead may be influenced more strongly by a taxon’s local scale habitat and biotic assemblages.


Attributing stream fish assemblage dynamics to random or deterministic factors is a long standing theme of community ecology [1], [2]. A current paradigm is that assemblages are highly organized by a variety of abiotic and biotic variables dictated by geographic and evolutionary scale [3], [4]. Specifically, local assemblage variation is linked to local scale factors such as predation [5], competition [6], habitat quality [7], and regional scale factors such as watershed land use type and history [8], stream size [9], and geologic history [10]. Unexplained assemblage variation is typically attributed to random noise or other untested mechanisms. Ultimately, however, assemblage patterns or characteristics are an emergent product of variation at the population level [11].

In addition to biotic and abiotic scale dependent factors, body size and geographic range are not necessarily independent of assemblage variation [12]. An inverse relationship between body size and abundance is expected as a function of energetic constraints [13] in both terrestrial [14] and aquatic [15] assemblages/ecosystems. Furthermore, macroecological studies have demonstrated a relationship between body size and geographic range [16], [17]. The expectation is that larger sized individuals are more capable of long range movements and thus, exhibit increased range sizes.

However, the utility of body size and geographic range as model predictors to describe long term population dynamics is understudied. Conceptually, small bodied species are expected to exhibit greater population variation as a result of higher intrinsic rates of increase r [17]. Similarly, species with larger geographic ranges are expected to be generalists for environmental niches [18] and more likely to exhibit stable populations.

However, there are complications with testing the relationship between population variation and traits such as body size and geographic range. Traits are not independently distributed across species, due to varying lengths of shared evolutionary history among related species. Thus, comparative analyses account for the expected covariance structure across species, based on hypothesized evolutionary relationships. Testing for phylogenetic signal (e.g. Blomberg’s K [19]) provides predictable patterns concordant with expected levels of evolutionary covariance (Brownian motion model of an evolutionary random walk), or alternatively, covariances may be lower (indicating more diverging paths, or convergence of unrelated species) or higher (indicating more conserved traits). Furthermore, interpretation of phylogenetic signal values can facilitate conclusions regarding broad evolutionary process of trait convergence or divergence [20].

A second complication involves quantitative issues associated with incorporating phylogeny into models which describe variation among taxa and the presence of collinearity among predictors. Body size and geographic range are correlated [12], yet, the relationship between these predictors and variation in abundance is of great interest [21], [22]. Incorporation of phylogeny into a model describing variation in abundance while accounting for collinearity among predictors is problematic with generalized least squares methods. However, Bayesian inference is an alternative statistical methodology which has been shown to result in more precise parameter estimates in phylogenetic models while accounting for collinearity among predictors [23], [24], [25].

The primary objective of this study was to test if body size and geographic range influence long term variation in local scale stream fish species abundance. Our secondary objective was to evaluate the phylogenetic signal of body size and geographic range associated with the stream fishes represented in our study. We hypothesized that body size and geographic range are negatively related to increased variation in long term population dynamics. We expected that taxa with small bodies and small geographic ranges would exhibit greater temporal variation in abundance as a result of energetic constraints, r vs. K selection mode, and small environmental niche.

Materials and Methods

Field collection

Fish were sampled yearly at six sites from 1983 to 2010 in the West Fork White River in East-Central Indiana (Indiana Department of Natural Resources Permit – JCD # 10-0098; see Table 1). Fish were collected following Simon and Dufour [26] and the Ohio Environmental Protection Agency for assessment of streams in the Eastern Corn Belt Plains ecoregion (Ohio Environmental Protection Agency (OEPA)) in accordance with American Fisheries Society guidelines for the safe and ethical use of fishes in research ( Sampling was completed at normal pool water levels while turbidity was less than 40 Nephelometric Turbidity Units. All sites were sampled with a boat mounted Smith-Root model 5.0 GPP electrofisher with a 5000-watt generator. Sampling proceeded on a linear reach for a distance of 15 times the wetted width with a minimum distance of 500 m. Fish were collected using a 3 mm stretch mesh net and placed into a live well for processing. All fish (see Table 1) were identified to species using regional keys [27], counted, and released at the site. Voucher specimens curated at the Bureau of Water Quality, Muncie, Indiana were also used for species identification. All sites were sampled as part of the Bureau of Water Qualities long-term fisheries monitoring program in White River.

Table 1. Species included in analysis with descriptions of CV (long term population variation among sites), maximum body size (cm), and geographic range (km2).

Data summary

Abundance per site was expressed as electrofishing catch per 1000 km. Body size and geographic range for each taxon were estimated using the Fish Traits database and standardized to their z-score ([27]). Z-scores were calculated as follows: Where x is the observation, x is the mean value of the sample, and s is the standard deviation of the sample. The Fish Traits database has been concatenated from numerous regional and local distribution and life history studies [28] and can be used in large taxonomic scale studies [29]. Taxonomic relationships used in comparative analyses (Figure 1) were from published molecular studies of Catostomidae [30], Cyprinidae [31], [32], [33], Centrarchidae [34], Percidae [35], and Ictaluridae [36]. Higher order relationships (e.g. family) were from Betancur-R et al. [37].

Figure 1. Phylogeny of study taxa with body size (BS) and geographic range (GR) categories.

Darker bars indicate higher values. See Table 1 for raw data values.

Statistical analysis

Long term variation in species abundance was estimated as the coefficient of variation, cv, for each species at each site. Where cvij is the coefficient of variation for species i at site j, sij is the standard deviation of species i at site j, and xij is the mean abundance of species i at site j. Given the setup of the model only taxa that were collected at least once at each site over the collection period could be included. This resulted in a single species by site matrix of cv values (i.e. for species by sites).

We modeled cvij as a linear function of body size and geographic range incorporating phylogenetic relationships following de Villemereuil et al. [23]. Here cvij is modeled as a multivariate normal distribution where the mean is a linear function of body size, bsi, and geographic range, gri, and the variance-covariance matrix, Σ, is proportional to the shared branch lengths from the root of the tree to the common ancestor of each pair of taxa (Figure 2).

Figure 2. Variance-covariance matrix of a generalized phylogenetic tree.

Variance is set to the branch length from the root to the tip and the covariance is the branch length from the root to the most recent common ancestor (adapted from de Villemereuil et al. 2012).

Where μi is the mean of each species cv from the multivariate normal distribution (mnorm), α is the intercept and represents the hypothetical mean cv with a body size and geographic range of zero, and β1 and β2 are model coefficients representing the effect of body size and geographic range.

We used published molecular hypotheses to represent the phylogenetic relationship between species and used this single tree with an inverse-Wishart prior as a prior for the variance-covariance matrix, Σ [23]. We assumed equal branch lengths. As caveat to this analysis, if model parameters are identified as important the robustness of the model to choice of variance-covariance structure could be evaluated by generating and using a distribution of random trees [38] as the prior for the variance-covariance matrix.

Since body size and geographic range are known to be correlated [12] we used a Bayesian Lasso approach to include both variables in the model. The Bayesian Lasso is a variable selection technique that uses a double-exponential prior on the coefficients [24], [25]. The Bayesian Lasso will pull the weakest parameter to 0 thus providing a variable selection method with correlated predictors.

We used Bayesian inference to estimate parameters of the model. Bayesian inference is based on Bayes’ Theorem:

Where P(X|θ) is the likelihood function and represents the probability of the data, X, given the parameters, θ, P(θ) is the prior distribution of the parameters, θ, and the denominator is a normalizing parameter.

We used vague (i.e., noninformative) priors for all model parameters except the variance-covariance matrix, Σ, to specify our prior uncertainty about the model parameters. The variance-covariance matrix, Σ, prior was constructed as the inverse of the single phylogenetic tree matrix specified above in a Wishart prior. We used the freely available JAGS 3.3 program [39] implemented in R 2.15.3 [40] using the rjags package [41]. Complete model specifications in the JAGS language can be found in Appendix S1 of the Supporting Information. We ran 3 MCMC chains for a total of 125,000 steps, discarding the first 25,000 steps as a burn-in period, and thinning every 5 steps. The burn-in period is necessary to reduce the effect of the starting values on the MCMC results [42]. Convergence of the MCMC algorithm was assessed using the Brooks-Gelman-Rubin (BGR) scale-reduction factor [43]. The BGR factor is the ratio of between chain variability to within chain variability. Convergence is obtained when the upper limit of the BGR factor is close to 1.00 indicating there is not more variability between chains compared to within chains. Values below 1.10 are considered acceptable [42]. We additionally performed a posterior predictive check to evaluate model fit. This was conducted by calculating the posterior mean of the overall coefficient of variation for each species at each step in the Markov Chain. The 95% credible intervals from the estimated coefficient of variation was compared to the mean value for each species.



The analysis included 48,071 individuals comprised of 32 species collected from 6 sites along the West Fork White River (Muncie, IN, USA) spanning 1983 to 2010 (Table 1). Taxon body size range was from 6.5 to 155 cm (mean 35 cm) and geographic ranges were from 481,459 to 8,850,545 km2 (mean 2,831,692 km2).

Bayesian hierarchical model

The BGR statistic for all parameters were less than 1.10 indicating the model converged after 100,000 iterations (33,333 steps per chain). The 95% credible interval estimates of the parameters for body size and geographic range overlapped 0 (Table 2), indicating there is no credible evidence to support a relationship with species coefficient of variation given the phylogenetic tree. When modeled separately with a normally distributed prior the posterior distribution of the body size and geographic range coefficient did not overlap 0. All of the 95% credible intervals from the posterior predictive check of the cv overlapped the observed mean value (Figure 3). Species with high observed average cv corresponded with a high credible interval values.

Figure 3. Results of the coefficient of variation posterior predictive check from the Bayesian hierarchical model.

Points are mean estimates from the 95% credible intervals, vertical bars are the bounds of the 95% credible intervals, and solid triangles are observed mean coefficient of variation values for each species.

Phylogenetic signal

Body size exhibited low phylogenetic signal (K 0.57; P<0.001; Figure 1) indicating size distributions among taxa less similar than expected. Geographic range also had low phylogenetic signal (K 0.33; P = 0.07; Figure 1).


Long term variation in stream fish population abundances did not covary with body size or geographic range of taxa. This finding is contrary to our initial expectations; however, we do not interpret this as evidence that White River stream fish assemblages are random or stochastic. In a recent study of the same White River fish assemblage, Jacquemin and Doll [29] attributed a significant portion of the long term variation to differences in habitat and niche breadth (measured as association with particular substrate types, flow regime, woody debris, submerged vegetation, and distribution elevation) and responses to environmental variation among species. Specifically, Jacquemin and Doll [29] found that species with more general habitat niches showed smaller fluctuations in abundance through time. We interpret this as evidence that local scale stream fish assemblages are more closely aligned with environmental variation as a result of their respective niches than other traits such as body size or geographic range. However, while long term data provide a robust measure of local assemblage variation we suggest expanding spatial and taxonomic coverage through the addition of sites in other watersheds that may yield different results.

Ignoring multicollinearity in model parameters (e.g., body size and geographic range) can result in increased standard errors of the coefficients which can result in variables being found non-significant in traditional analysis. Thus, the relationship between variation in abundance with body size and geographic range is often conducted independently [16], [21], [22]. Typically, multicollinearity issues are addressed by increasing sample size or removing one of the intercorrelated variables. Increasing sample size is often not an option, particularly when analyzing long term data sets. Further, removal of a variable may not be an option when there is strong theoretical justification for including both. This study is the first to our knowledge that tests for a relationship between variation in abundance with body size and geographic range in the same model. The methods used here permit the inclusion of the correlated variables and provided a quantitative method of determining what variable is more important in driving variation in abundance when the correlated variables are considered important when tested individually.

Interestingly, our results for phylogenetic signal (low K values) of body size and geographic range implies less similarity among close relatives in the assemblage than expected under a Brownian model. Low K values are typically attributed to high levels of divergence, the opposite of niche conservatism [20]. One potential source of influence outside of divergence may also be our use of branch lengths in the analysis. Kraft et al. [44] interpret an assemblage level of highly ‘derived traits’ as evidence for habitat filtering influence on taxonomic assemblage variation. Further study of phylogenetic signal of ecologically relevant traits may improve understanding of assembly patterns in freshwater stream assemblages.

We suggest that our results are particularly relevant to conservation biology. Rabinowitz [45] and others [46], [47] identified utility in using life history traits to define rarity and extinction risk. Our results expand on these studies to indicate traits that may not covary with long term population dynamics. We suggest that while body size and geographic range did not contribute directly to long term variation at the population level that these species traits could explain variation at the assemblage level. Post hoc graphical observations of the dataset support a generally lower abundance among taxa that are larger and generally higher abundance among taxa that are smaller in the White River fish assemblage (as predicted in mammals [14], and for other North American fishes [16]). Ultimately, any information for long term covariates of threatened or endangered species could be incorporated into management plans. The inclusion of evolutionary relationships into community assembly studies can provide insight into species distribution patterns and population dynamics [48].


We are grateful to Mark Pyron and several anonymous reviewers for providing feedback on earlier drafts of this manuscript. We acknowledge the Bureau of Water Quality (Muncie, IN) and all past biologists for their field collection efforts.

Author Contributions

Conceived and designed the experiments: SJJ JCD. Performed the experiments: SJJ JCD. Analyzed the data: SJJ JCD. Contributed reagents/materials/analysis tools: SJJ JCD. Wrote the paper: SJJ JCD.


  1. 1. Gorman OT, Karr JR (1978) Habitat structure and stream fish communities. Ecology 59: 507–515.
  2. 2. Grossman GD, Moyle PB, Whitaker JO Jr (1982) Stochasticity in structural and functional characteristics of an Indiana stream fish assemblage: a test of community theory. Am Nat 120: 423–454.
  3. 3. Smith CL, Powell CR (1971) The summer fish communities of Brier Creek, Marshall County, Oklahoma. Am Mus Novit 2458: 1–30.
  4. 4. Jackson DA, Peres-Neto PR, Olden JD (2001) What controls who is where in freshwater fish communities – the roles of biotic, abiotic, and spatial factors. Can J Fish Aquat Sci 58: 157–170.
  5. 5. Trumpickas J, Mandrak NE, Ricciardi A (2011) Nearshore fish assemblages associated with introduced predatory fishes in lakes. Aquatic Conser: Mar Freshw Ecosyst 21: 338–347.
  6. 6. Grossman GD, Ratajczak Jr RE, Crawford M, Freeman MC (1998) Assemblage organization in stream fishes: effects of environmental variation and interspecific interactions. Ecol Monog 68: 395–420.
  7. 7. Waite IR, Carpenter KD (2000) Associations among fish assemblage structure and environmental variables in Willamette Basin streams, Oregon. T Am Fish Soc 129: 754–770.
  8. 8. Harding JS, Benfield EF, Bolstad PV, Helfman GS, Jones III EBD (1998) Stream biodiversity: The ghosts of land use past. Proc Natl Acad Sci USA 95: 14843–14847.
  9. 9. Vannote RL, Minshall GW, Cummins KW, Sedell JR, Cushing CE (1980) The river continuum concept. Can J Fish Aquat Sci 37: 130–137.
  10. 10. Jacquemin SJ, Pyron M (2011) Impacts of past glaciation events on contemporary fish assemblages of the Ohio River basin. J Biogeogr 38: 982–991.
  11. 11. Strong DR Jr, Simberloff D, Abele LG, Thistle AB (1984) Ecological communities: conceptual issues and the evidence. Princeton Univ. Press.
  12. 12. Gaston KJ, Blackburn TM (1996) Global scale macroecology: interactions between population size, geographic range size and body size in the Anseriformes. J Anim Ecol 65: 701–714.
  13. 13. White EP, Morgan Ernest SK, Kerkhoff AJ, Enquist BJ (2007) Relationships between body size and abundance in ecology. Trends Ecol Evol 22: 323–330.
  14. 14. Damuth J (1981) Population density and body size in mammals. Nature 290: 699–700.
  15. 15. Jonsson T, Cohen JE, Carpenter SR (2005) Food webs, body size and species abundance in ecological community description. Adv Ecol Res 36: 1–84.
  16. 16. Pyron M (1999) Relationships between geographic range size, body size, local abundance, and habitat breadth in North American suckers and sunfishes. J Biogeogr 26: 549–558.
  17. 17. Gaston KJ, Lawton JH (1988) Patterns in body size, population dynamics, and regional distribution of bracken herbivores. Am Nat 132: 662–680.
  18. 18. Slatyer RA, Hirst M, Sexton P (2013) Niche breadth predicts geographical range size: a general ecological pattern. Ecol Lett online early.
  19. 19. Blomberg SP, Garland Jr T, Ives AR (2003) Testing for phylogenetic signal in comparative data: behavioral traits are more labile. Evolution 160: 712–726.
  20. 20. Losos JB (2008) Phylogenetic niche conservatism, phylogenetic signal and the relationship between phylogenetic relatedness and ecological similarity among species. Ecol Lett 11: 995–1007.
  21. 21. Blackburn TM, Brown VK, Doube BM, Greenwood JJD, Lawton JH, et al. (1993) The relationship between abundance and body size in natural animal assemblages. J Anim Ecol 62: 519–528.
  22. 22. Lawton JH (1993) Range, population abundance and conservation. Trends Ecol Evol 8: 409–413.
  23. 23. de Villemereuil P, Wells JA, Edwards RD, Blomberg SP (2012) Bayesian models for comparative analysis integrating phylogenetic uncertainty. BMC Evol Biol 12: 102.
  24. 24. Park P, Casella G (2008) The Bayesian Lasso. J Am Statist Assoc 103: 681–686.
  25. 25. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Series B 58: 267–288.
  26. 26. Simon TP, Dufour R (1997) Development of Index of Biotic Integrity expectations for the ecoregions of Indiana V. Eastern Cornbelt Plain. U.S. Environmental Protection Agency. Region V. Water Division Watershed and Nonpoint Source Branch. Chicago, IL. EPA 905/R-96/002.
  27. 27. Trautman MB (1981) The fishes of Ohio. The Ohio State Univ. Press.
  28. 28. Frimpong EA, Angermeier PL (2009) FishTraits: a database of ecological and life-history traits o freshwater fishes of the United States. Fisheries 34: 487–495.
  29. 29. Jacquemin SJ, Doll JC (2013) Long-term fish assemblages respond to habitat and niche breadth in the West Fork White River, Indiana. Ecol Freshw Fish 22: 280–294.
  30. 30. Doosey MH, Bart Jr HL, Saitoh K, Miya M (2010) Phylogenetic relationships of catostomid fishes (Actinopterygii: Cypriniformes) based on mitochondrial ND4/ND5 gene sequences. Mol Phylogenet Evol 54: 1028–1034.
  31. 31. Simons AM, Berendzen PB, Mayden RL (2003) Molecular systematics of the North American phoxinin genera (Actinopterygii: Cyprinidae) inferred from mitochondrial 12S and 16S ribosomal RNA sequences. Zool J Linnean Soc 139: 63–80.
  32. 32. Mayden RL, Simons AM, Wood RM, Harris PM, Kuhajda BR (2006) Molecular systematics and classification of North American Notropin shiners and minnows (Cypriniformes: Cyprinidae). In: Lourdes Lozano-Vilano MD, Contreras-Balderas AJ (eds.), Studies of North American Desert Fishes in Honor of EP (Phil) Pister, Conservationist. Universidad Autonoma de Nuevo Leon.
  33. 33. Schönhuth S, Mayden RL (2010) Phylogenetic relationships in the genus Cyprinella (Actinopterygii: Cyprindae) based on mitochondrial and nuclear gene sequences. Mol Phylogenet Evol 55: 77–98.
  34. 34. Near TJ, Bolnick DI, Wainwright PC (2005) Fossil calibrations and molecular divergence time estimates in Centrarchid fishes (Teleostei: Centrarchidae). Evolution 59: 1768–1782.
  35. 35. Near TJ, Bossu CM, Bradburd GS, Carlson RL, Harrington RC, et al. (2011) Phylogeny and temporal diversificiation of darters (Percidae: Etheostomatinae). Syst Biol 60: 565–595.
  36. 36. Hardman M, Hardman LM (2008) The relative importance of body size and paleoclimatic change as explanatory variables influencing lineage diversification rate: an evolutionary analysis of bullhead catfishes (Siluriformes: Ictaluridae). Syst Biol 57: 116–130.
  37. 37. Betancur-R R, Broughton RE, Wiley EO, Carpenter K, López JA, et al.. (2013) The Tree of Life and a New Classification of Bony Fishes. PLOS Currents Tree of Life. Edition 1.
  38. 38. Paradis E, Claude J, Strimmer K (2004) APE: analysis of phylogenetics and evolution in R language. Bioinformatics 20: 289–290.
  39. 39. Plummer M (2003) JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In: Hornik K. et al. (eds.). Proceedings of the 3rd International Workshop on Distributed Statistical Computing, March 20–22. Vienna, Austria: DSC. Available:
  40. 40. R Development Core Team (2013) R: A language and environment for statistical computing. - R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. Available:
  41. 41. Plummer M (2013) rjags: Bayesian graphical models using MCMC. R package version 3–10. Available:
  42. 42. Gelman A, Carlin JB, Stern HS, Rubin ADB (2004) Bayesian data analysis. Chapman and Hall/CRC.
  43. 43. Brooks SP, Gelman A (1998) General methods for monitoring convergence of iterative simulations. J Comput Graph Stat 7: 434–455.
  44. 44. Kraft NJB, Cornwell WK, Webb CO, Ackerly DD (2007) Trait evolution, community assembly, and the phylogenetic structure of ecological communities. Am Nat 170: 271–283.
  45. 45. Rabinowitz D (1981) Seven forms of rarity. In: Synge, H. (ed.), The biological aspects of rare plant conservation. Chichester, J. Wiley, pp. 205–217.
  46. 46. Wiens JJ, Graham CH (2005) Niche conservatism: integrating evolution, ecology, and conservation biology. Annu Rev Ecol Evol Syst 36: 519–539.
  47. 47. van Kleunen M, Richardson DM (2007) Invasion biology and conservation biology time to join forces to explore the links between species traits and extinction risk and invasiveness. Prog Phys Geog 31: 447–450.
  48. 48. Johnson MTJ, Stinchcombe JR (2007) An emerging synthesis between community ecology and evolutionary biology. Trends Ecol Evol 22: 250–257.