DNA barcoding, as it is currently employed, enhances use of marine benthic macrofauna as environmental condition indicators by improving the speed and accuracy of the underlying taxonomic identifications. The next generation of barcoding applications, processing bulk environmental samples, will likely only provide presence information. However, macrofauna indices presently used to interpret these data are based on species abundances. To assess the importance of this difference, we evaluated the performance of the Southern California Benthic Response Index (BRI) and the AZTI Marine Biotic Index (AMBI) when species abundance data were removed from their calculation. Presence only versions of these two indices were created by eliminating abundance weighting while preserving species identity. Associations between the presence and abundance BRI, and the presence and abundance AMBI were highly significant, with correlation coefficients of 0.99 and 0.81, respectively. The presence versions validated almost equally to the abundance-based indices when applied to the spatial and the temporal monitoring data used to validate the original indices. Simulations in which taxa were systematically removed from calculation of the indices were also conducted to assess how large the barcode library must be for the indices to be effective. Correlation between the BRI-P and BRI remained above 0.9 with only 370 species in the library and reducing the number of species to 450 had almost no effect on correlation between the presence and abundance versions of the AMBI.
Citation: Ranasinghe JA, Stein ED, Miller PE, Weisberg SB (2012) Performance of Two Southern California Benthic Community Condition Indices Using Species Abundance and Presence-Only Data: Relevance to DNA Barcoding. PLoS ONE 7(8): e40875. https://doi.org/10.1371/journal.pone.0040875
Editor: Keith A. Crandall, Brigham Young University, United States of America
Received: September 1, 2011; Accepted: June 18, 2012; Published: August 7, 2012
Copyright: © Ranasinghe et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: No external funding. This work was supported by Southern California Coastal Water Research Project internal funds only. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Marine benthic macrofauna are frequently used as indicators of environmental condition because they reside in sediments where contaminants accumulate and their immobility allows them to integrate exposure at a site –. Benthic community composition is typically summarized using benthic indices that allow easy communication of complex biological information as a single number that ranks sites on a scale from good to bad. These index values allow managers to prioritize impacted sites, track trends over time, or correlate biological responses with stressor data.
Challenges in using benthos as indicators are the cost, time and error associated with identifying the biota. Organisms must be manually separated from the sediment, which can take more than a day and is subject to underestimation as some remain hidden among the debris . Every captured organism must then be identified, typically to species, which requires highly skilled taxonomists that provide expertise over the range of different taxonomic groups. This adds substantial cost and is subject to error, particularly when the specimens are damaged or immature life stages are present.
DNA barcoding has the potential to increase the speed, accuracy and resolution of species identification –. Barcoding involves identifying species based on a short gene sequence from a standardized portion in the genome, and for animals this is the mitochondrial cytochrome c oxidase 1 gene (CO1). Using standard molecular biology tools, DNA is extracted from the specimen tissue, and a 658 base pair region of the CO1 gene is amplified by polymerase chain reaction and sequenced . DNA from unknown specimens collected in benthic samples can be identified by comparing their barcode sequences to the Barcode of Life Data Systems (BOLD, http://www.boldsystems.org) reference library.
Traditional DNA barcoding is based on Sanger sequencing in which each specimen is processed individually. Next-generation sequencing has the potential to further reduce the time and cost for processing through bulk processing , , in which the entire sample is homogenized, tissue lysed, DNA extracted and the species composition of the entire sample determined using metagenetic analysis of one or more markers. While this promises greater speed and lower costs , it produces presence-only information and prevailing benthic condition indices require abundance for each species. Unlike analysis of prokaryotes where number of reads can be used as a surrogate for number of phylotypes, the same assumption cannot be made for next-generation sequencing of multicellular organisms. Another consideration is that barcoding provides identification of unknown specimens by matching them to species in the reference library. Therefore, the reference library must contain a sufficient number of species to allow environmental samples to be analyzed to a level where there is enough taxonomic resolution to apply commonly used benthic indices. Here we redevelop presence-only versions of two benthic condition indices used for assessments in southern California and compare their performance to their abundance-based counterparts. Second, we evaluate the minimum number of taxa needed in a reference library to calculate a credible presence-based index.
The two indices used to compare performance between abundance and presence-only derivations were the Benthic Response Index (BRI)  and the AZTI Marine Biotic Index (AMBI) . The BRI is based on the abundance-weighted average pollution tolerance of species in a sample. The pollution tolerance scores (“p-values”) are developed using ordination analysis to place sites along a pollution disturbance gradient, with pollution tolerance scores assigned to each species based on the position of its peak abundance along the gradient. Lower scores indicate sensitive species and higher scores indicate pollution tolerant species. The AMBI is also based on abundance-weighted pollution tolerances of species that are present, but tolerance is expressed categorically as one of five ecological groups. Species are assigned to ecological groups based on consensus expert judgment and assignments are transferrable among geographies.
For each sample, concentrations of eight metals are integrated as mean effects range median (ERM, ) quotients. For each metal, the ERM quotient measures the ratio of the observed concentration to the value at which biological effects are likely, and the mean ERM quotient for the eight metals is an integrated measure of the likelihood of contaminant effects.
The presence BRI (BRI-P) was calculated using the tolerance scores of Smith et al. , but the average tolerance score for the BRI-P was calculated as the sum of species tolerance scores divided by the number of taxa with tolerance scores, without abundance weighting. Assessment thresholds for the resulting index were then developed by regression against the BRI and selecting thresholds corresponding to the original BRI assessment categories: (i) natural benthic assemblages; (ii) the loss of biodiversity, above which 25% of the species pool occurring in reference samples no longer occurred; (iii) loss of community function, where echinoderms and arthropods were lost from the assemblage; and (iv) defaunation, where 90% of the species pool in reference samples no longer occurred.
The presence AMBI (AMBI-P) calculations followed Borja et al. , with the ecological group scores expressed as the percentage of species in each ecological group without abundance weighting. Assessment thresholds for the AMBI-P were developed by regression against the AMBI and selecting thresholds corresponding to the original AMBI assessment categories: (i) undisturbed, (ii) slightly disturbed, (iii) moderately disturbed, (iv) heavily disturbed, and (v) extremely disturbed.
Taxa were segregated into six categories along the disturbance gradient by BRI tolerance scores. The total number of taxa with tolerance scores in each category are presented above the bar indicating sequence availability. Barcodes were available for 163 of 462 taxa with tolerance scores.
Assessing performance of the Presence BRI and the Presence AMBI
Performance of the BRI-P and AMBI-P were assessed using the two southern California data sets used to validate the original BRI, both of which were independent of the calibration data used to develop the indices. The first tested whether the BRI-P and AMBI-P reproduced known temporal gradients of benthic conditions over several years near a southern California waste-water outfall using data from two Los Angeles County Sanitation Districts monitoring sites which were sampled monthly since 1972. The first site, Station 6C (located 2220 m from the outfall) was severely impacted in the early 1970 s and has improved since that time , . The second site, Station 0C (located 14,720 m from the outfall) was less affected than Station 6C, but has also improved. The hypothesis was that BRI-P index values should decrease over time at Stations 6C and 0C and that index values will be higher and decrease more at Station 6C than at Station 0C.
The second data set was used to test whether the presence indices reproduced a known spatial gradient between two stations on the 60 m isobath from the Orange County Sanitation Districts outfall. Previous studies have shown that Station 0 located near the outfall has altered species composition in comparison to reference Station Con, which is located 7840 m from the outfall .
The first data set was also used to confirm that the observed benthic index patterns were likely responses to pollution. Sediment contaminant effects were tested by comparing benthic index values to the combined effects of eight metals (arsenic, cadmium, chromium, copper, lead, mercury, silver, and zinc), and eutrophication effects were tested using organic nitrogen concentrations. The effects of the eight metals were combined by calculating mean effects range median (ERM ) quotients. For each metal, the ERM quotient measures the ratio of the observed concentration to the value at which biological effects are likely, and the mean ERM quotient for the eight metals is an integrated measure of the likelihood of contaminant effects. Pearson correlation coefficients were used to measure associations between annual means for the BRI-P and the AMBI-P benthic indices and mean ERM quotients and organic nitrogen concentrations. The hypothesis was that the benthic indices and one or both pollution measures would be significantly correlated.
Effect of Reducing the Number of Taxa
To assess how large the barcode library must be for an effective index, changes in Pearson correlation coefficients between the BRI-P and the BRI and the AMBI-P and AMBI were determined as taxa were systematically removed from calculation of the indices. To accomplish this, taxa were ranked according to their tolerance scores (BRI) or ecological groups and abundance (AMBI) and selected for removal at even intervals. This process was repeated with increasing percentages of taxa removed from the calculation.
There was a strong linear relationship between the BRI-P and BRI (Figure 1) with a Pearson correlation coefficient of 0.99 (p<0.0001). Application of the BRI-P assessment thresholds calculated by applying the equation from the calibration linear regression analysis (Table 1) resulted in 92% of the 493 calibration samples assigned to the same assessment category by both indices. Where assessment categories disagreed, the samples were always in adjacent categories, with no samples disagreeing by more than one category. The high agreement is reflected in the strong linear relationship in Figure 1.
The relationship between the AMBI-P and AMBI (Figure 2) was also significant, with a Pearson correlation coefficient of 0.81 (p<0.0001). Application of AMBI-P assessment thresholds (Table 2) resulted in 86.4% of the calibration samples assigned to the same assessment category by both indices.
Application of the BRI-P to the validation data resulted in patterns almost identical to the original abundance BRI results for both validation data sets, and AMBI-P patterns were similar to AMBI results. For the Los Angeles County outfall temporal data, the BRI-P (Figure 3) and the AMBI-P (Figure 4) accurately reflected the severe impacts at Los Angeles County Station 6C in the early 1970's, as well as the substantial improvement that occurred over time. Both the BRI-P and AMBI-P also correctly identified Station 0C, located 14.7 km from the outfall, as less affected than Station 6C with only slight biodiversity loss in the early 1970's and improvement to reference conditions in the 1990's.
The BRI-P (Figure 5) and AMBI-P (Figure 6) also correctly identified the spatial patterns near the Orange County outfall, with Station 0 located adjacent to the outfall having poorer benthic condition relative to Station Con, which is located 7.8 km away. Both indices also retained similar among-station differences across all seasons, as was observed in the original BRI validation.
The BRI-P and AMBI-P index values for the Los Angeles County outfall temporal data were significantly correlated with mean ERM quotients and sediment organic nitrogen concentrations (Table 3), indicating that index values were likely responding to anthropogenic pollution. Correlation coefficients for the presence based indices were comparable or stronger than the abundance based formulations. The time series of sediment metal and organic nitrogen concentration measurements at the two stations also resulted in patterns similar to the BRI-P and the AMBI-P time series (Figure 7).
Reducing the number of species included in BRI-P calculations resulted in a correlation greater than 0.9 relative to the abundance BRI even when 20% of the species with tolerance scores were removed (Table 4). For the AMBI-P, removing up to 30% of the species with AMBI ecological group assignments had negligible effects on the correlation between the AMBI-P and AMBI (Table 5).
The benthic community gradients used to validate the original BRI were reproduced closely by the BRI-P and AMBI-P. BRI-P and AMBI-P values at the two temporal validation stations were also highly correlated with sediment metal and organic nitrogen concentrations and patterns over time closely matched, indicating that the benthic indices were likely responding to anthropogenic pollution effects. There were only minor reductions in classification efficiency using presence-only information and no changes in the patterns or magnitudes of difference among sites. Reductions in index performance were small relative to previous attempts to reduce processing cost by identifying taxa only to genus or family level –. DNA barcoding provides the potential for improving index resolution by identifying cryptic species and clarifying species complexes that are inseparable by traditional morphological taxonomy methods , .
Similarity in performance between the BRI-P, AMBI-P and the traditional BRI results because the indices rely on composition of the whole community, not just the dominant taxa. Smith et al.  found that even removing the top ten most abundant taxa had minimal effect on the performance of the abundance BRI. Reliance on the entire community is enhanced in the BRI by use of a cube-root transformation which lessens the influence of abundant species, selected by Smith et al.  using an optimization algorithm to maximize index performance. This is similar to the findings of Warwick et al.  and Teixeira et al.  that performance of the AMBI is enhanced by transformations that reduce the influence of dominant taxa.
While the presence based approach worked well for the BRI and AMBI, it is unclear whether it will be equally successful in all situations. For instance, this study was conducted in euhaline water where there was an average of 67 taxa per sample; the presence-only index may be less sensitive in oligohaline waters where index development is more challenging because there are typically fewer than ten taxa per sample, even at reference sites , . It might also not work as well in locations with more severe benthic community effects, such as in Chesapeake Bay where hypoxia leads to substantial reductions in benthic abundance. The disturbance gradient effects in this study were mostly limited to replacement of pollution sensitive with pollution tolerant taxa and indices that rely more heavily on dominance and diversity measures, which require abundance information, may be necessary to capture the larger range of effects. Still, species sensitivity and tolerance to disturbance are more robust measures of benthic community condition than abundance, diversity, or other community measures in multimetric indices –.
Development of a locally relevant DNA reference library has been suggested as a potential impediment to incorporation of molecular methods in routine bioassessment, but our results suggest this should not be problematic. There are DNA barcode sequences for 159 southern California taxa with index tolerance scores already cataloged in BOLD and we found that reliable indices can be produced with a reference barcode library of less than 400 taxa. Moreover, the 159 taxa are relatively evenly distributed across the disturbance gradient (Figure 8). Of 225 species for which DNA barcoding was attempted, before February 2012, barcoding was successful for 61.8%, including 73.5%, 64.3%, and 44.1%, of arthropod, polychaete and mollusc species, respectively (Table 6). Many recent barcoding failures are due to molecular and biochemistry challenges that are in the process of being solved (for example, by improving primer sets). Success rates are high enough that we don't expect barcoding failure to be a major impediment and they don't affect our conclusions about performance of presence-based metrics.
There are several aspects of molecular methods that need to be investigated before they can be adopted for biological assessments, such as improved understanding of intraspecific genetic variation within groups currently considered as single species. There are also issues associated with sample handling. For instance, formalin, the presently used preservative for benthic studies, does not preserve DNA and the typical methods for preserving DNA are not conducive to field collections. However, the lack of quantification in species identification does not seem to be an impediment to adoption of molecular methods in biological assessments.
We are grateful to Dr. Angel Borja for assigning AMBI ecological groups to the 493 sample calibration data set, David E. Montagne for suggesting an approach to incorporating sediment chemistry data, Valerie Raco-Rands for preparation of the figures, and Bryan White for searching the Barcode of Life database for successful DNA barcode sequence catalogs for species with BRI tolerance scores. We also acknowledge the sediment chemistry data provided by the Los Angeles County and Orange County Sanitation Districts, and the cities of Los Angeles and San Diego.
Conceived and designed the experiments: SBW JAR PEM EDS. Analyzed the data: JAR. Wrote the paper: JAR EDS PEM SBW.
- 1. Diaz RJ, Solan M, Valente RM (2004) A review of approaches for classifying benthic habitats and evaluating habitat quality. Journal of Environmental Management 73: 165–181.
- 2. Borja Á, Ranasinghe JA, Weisberg SB (2009) Assessing ecological integrity in marine waters using multiple indices and ecosystem components: Challenges for the future. Marine Pollution Bulletin 59: 1–4.
- 3. Pinto RIC, Patricio J, Baeta A, Fath BD, Neto JM, et al. (2009) Review and evaluation of estuarine biotic indices to assess benthic condition. Ecological Indicators 9: 1–25.
- 4. Ranasinghe JA, Montagne DE, Weisberg SB, Bergen M, Velarde RG (2003) Variability in the identification and enumeration of marine benthic invertebrate samples and its effect on benthic assessment measures. Environmental Monitoring and Assessment 81: 199–206.
- 5. Stoeckle MY, Hebert PDN (2008) Barcode of life. Scientific American 299: 82–88.
- 6. Borisenko AV, Sones JE, Hebert PDN (2009) The front-end logistics of DNA barcoding: challenges and prospects. Molecular Ecology Resources 9: 27–34.
- 7. Janzen DH, Hallwachs W, Blandin P, Burns JM, Cadiou JM, et al. (2009) Integration of DNA barcoding into an ongoing inventory of complex tropical biodiversity. Molecular Ecology Resources 9: 1–26.
- 8. Hebert PDN, Cywinska A, Ball SL, DeWaard JR (2003) Biological identifications through DNA barcodes. Proceedings of the Royal Society of London Series B-Biological Sciences 270: 313–321.
- 9. Creer S, Fonseca VG, Porazinska DL, Giblin-Davis RM, Sung W, et al. (2010) Ultrasequencing of the meiofaunal biosphere: practice, pitfalls and promises. Molecular Ecology 19: 4–20.
- 10. Fonseca VG, Carvalho GR, Sung W, Johnson HF, Power DM, et al. (2010) Second-generation environmental sequencing unmasks marine metazoan biodiversity. Nature Communications 1.
- 11. Hajibabaei M, Shokralla S, Zhou X, Singer GAC, Baird DJ (2011) Environmental barcoding: A next-generation sequencing approach for biomonitoring applications using river benthos. PLoS One 6: e17497.
- 12. Smith RW, Bergen M, Weisberg SB, Cadien DB, Dalkey A, et al. (2001) Benthic response index for assessing infaunal communities on the southern California mainland shelf. Ecological Applications 11: 1073–1087.
- 13. Borja Á, Franco J, Perez V (2000) A marine biotic index to establish the ecological quality of soft-bottom benthos within european estuarine and coastal environments. Marine Pollution Bulletin 40: 1100–1114.
- 14. Stull JK, Haydock CI, Smith RW, Montagne DE (1986) Long-term changes in the benthic community on the coastal shelf of Palos Verdes, Southern California. Marine Biology 91: 539–551.
- 15. Stull JK (1995) Two decades of marine biological monitoring, Palos Verdes, California, 1972 to 1992. Bulletin of the Southern California Academy of Sciences 94: 21–45.
- 16. County Sanitation Districts of Orange County (CSDOC) (1991) 1991 Annual Report. Fountain Valley, California, USA: County Sanitation Districts of Orange County.
- 17. Long ER, MacDonald DD, Smith SL, Calder FD (1995) Incidence of adverse biological effects within ranges of chemical concentrations in marine and estuarine sediments. Environmental Management 19: 81–97.
- 18. Ferraro SP, Cole FA (1995) Taxonomic level sufficient for assessing pollution impacts on the Southern California Bight macrobenthos – revisited. Environmental Toxicology and Chemistry 14: 1031–1040.
- 19. Dethier MN, Schoch GC (2005) The consequences of scale: assessing the distribution of benthic populations in a complex estuarine fjord. Estuarine, Coastal and Shelf Science 62: 253–270.
- 20. Bevilacqua S, Fraschetti S, Musco L, Terlizzi A (2009) Taxonomic sufficiency in the detection of natural and human-induced changes in marine assemblages: A comparison of habitats and taxonomic groups. Marine Pollution Bulletin 58: 1850–1859.
- 21. Hajibabaei M, Singer GAC, Hebert PDN, Hickey DA (2007) DNA barcoding: how it complements taxonomy, molecular phylogenetics and population genetics. Trends in Genetics 23: 167–172.
- 22. Ratnasingham S, Hebert PDN (2007) BOLD: The Barcode of Life Data System (www.barcodinglife.org). Molecular Ecology Notes. 7: 355–364.
- 23. Warwick RM, Clarke KR, Somerfield PJ (2010) Exploring the marine biotic index (AMBI): variations on a theme by Ángel Borja. Marine Pollution Bulletin 60: 554–559.
- 24. Teixeira H, Weisberg SB, Borja A, Ranasinghe JA, Cadien DB, et al. (2012) Calibration and validation of the AZTIs Marine Biotic Index (AMBI) for Southern California marine bays. Ecological Indicators 12: 84–95.
- 25. Dauvin J-C, Ruellet T (2009) The estuarine quality paradox: Is it possible to define an ecological quality status for specific modified and naturally stressed estuarine ecosystems? Marine Pollution Bulletin 59: 38–47.
- 26. Thompson B, Weisberg SB, Melwani AR, Lowe S, Ranasinghe JA, et al. (2012) Low levels of agreement among experts using best professional judgment to assess benthic condition in the San Francisco estuary and delta. Ecological Indicators 12: 167–173.
- 27. Weisberg SB, Ranasinghe JA, Schaffner LC, Diaz RJ, Dauer DM, et al. (1997) An estuarine benthic index of biotic integrity (B-IBI) for Chesapeake Bay. Estuaries 20: 149–158.
- 28. Alden RW III, Dauer DM, Ranasinghe JA, Scott LC, Llansó RJ (2002) Statistical verification of the Chesapeake Bay benthic index of biotic integrity. Environmetrics 13: 473–498.
- 29. Borja A, Dauer DM, Diaz R, Llanso RJ, Muxika I, et al. (2008) Assessing estuarine benthic quality conditions in Chesapeake Bay: A comparison of three indices. Ecological Indicators 8: 395–403.