Relationship of taxonomic error to frequency of observation

Biological nomenclature is the entry point to a wealth of information related to or associated with living entities. When applied accurately and consistently, communication between and among researchers and investigators is enhanced, leading to advancements in understanding and progress in research programs. Based on freshwater benthic macroinvertebrate taxonomic identifications, inter-laboratory comparisons of >900 samples taken from rivers, streams, and lakes across the U.S., including the Great Lakes, provided data on taxon-specific error rates. Using the error rates in combination with frequency of observation (FREQ; as a surrogate for rarity), six uncertainty/frequency classes (UFC) are proposed for approximately 1,000 taxa. The UFC, error rates, FREQ each are potentially useful for additional analyses related to interpreting biological assessment results and/or stressor response relationships, as weighting factors for various aspects of ecological condition or biodiversity analyses and helping set direction for taxonomic research and refining identification tools.


Introduction
discuss biodiversity in terms of not only richness of genotypes, species, and ecosystems, but also evenness of spatial and temporal distribution, functional characteristics, and their interactions. The sheer magnitude of biological species richness is largely unknown, with estimates ranging from 3-100 million [2][3][4][5][6][7]. For almost 300 years, efforts to organize and understand that diversity have used nomenclature and classification to provide a direct pathway to actual and conceptual catalogues of information about the biota; it is a system that can conceptually and functionally be thought of as a card catalogue in a library. With growing acceptance of the reality of global change and degradation in climate and both small-and large-scale habitat, along with diminishing taxonomic expertise, the task to census and record biota seems ever more daunting and urgent. Increases in computing power, information technology, and molecular techniques are encouraging some optimism in biodiversity research [8][9][10][11][12][13]. Even with some of these advances, progress in understanding biological diversity is uneven across taxonomic groups representing different segments of the tree of life, the bias mostly reflecting differential research attention and uneven sampling for some taxa in selected geographic areas [5,14,15].
Routine biological monitoring and assessment is about gathering representative sample data from defined habitat and using them for quantitative inference of environmental conditions [16,17]. Though such monitoring is not about documenting biodiversity or even absolute richness, the two fields rely on identical basic data as input for indicator calculations, model building, and decision-making, that is, taxonomic identifications. The name of an entity or object, whether individually or as a group or class, associates it with information on observable characteristics, provides answers to questions, and potentially allows new lines of enquiry to be framed and pursued. It is as much a truism of biological taxonomy as it is of basic human language that inconsistency in terminology impedes understanding and progress. Historical development of biological nomenclature and classification has been considered by anthropologists as a fundamental component of language. Efforts to understand folk taxonomies have been through debating the relative merits of intellectualism vs. utilitarianism [18][19][20], approximating the difference between, respectively, basic curiosity and material need. The greater frequency with which an object is observed, there is improved reliability and consistency in its recognition, potentially leading to greater refinement of naming conventions/ nomenclatural structure. In this context, it is important to define what is intended by labelling an object (or a taxon) as rare. From a theoretical perspective, rarity has been defined using niche-or phylogenetic-based concepts of abundance, distribution, rarity, or conservation priority-setting [21][22][23][24]. As an operational descriptor, rarity or relative commonness is frequency of encounter or observation.
The first principle and purpose of taxonomic identification and nomenclature is communication, and logically, objects that are more frequently observed will be recognized with increasing speed, reliability, and consistency. Biologist and ecologist perceptions of the relative rarity or commonness of taxa is a combination of life history and encounter frequency. As an example, reliability of botanical nomenclature used by the lay community in Chiapas, Mexico, was evaluated and use of plant names was found to be strongly related to cultural significance [25]. Techniques for communicating about plants with low cultural significance receiving little human attention were imprecise, that is, under-differentiating. Those with moderate cultural significance had a folk taxonomy which came closer to biological taxon definitions; and the extreme, plants with a high cultural significance tended to be over-differentiated. There is a conceptual relationship between cultural significance and familiarity, the latter of which would be enhanced by a high frequency of encounters/observation.
[26] developed a system for distribution classes of benthic macroinvertebrates, based on frequency of occurrence in the Netherlands. Using a combination of species rarity or commonness in their national dataset and direct input from a group of selected taxonomists, they developed a system comprising six different classes ( Table 1). One of the driving factors behind their analysis was to have a classification system that would contribute to decision-making relative to conservation of aquatic resources. Routine taxonomic quality control (QC) analysis used by the USEPA National Aquatic Resources Surveys (NARS) and several state, regional, and local monitoring programs for benthic macroinvertebrate samples are based on direct inter-laboratory comparisons. Randomly selected samples are identified by independent taxonomists, resulting in quantitative descriptors of data quality, error rates and potential causes, and information used for formulating corrective actions. A secondary use/added benefit of these analyses is that taxon-specific error rates are produced that can be used as direct indicators of taxon uncertainty, as weighting factors during calculation of quantitative indicators, to help guide development of tools for biological monitoring, in general, and taxonomic identification, in particular. The purpose of this paper is to present the process used for deriving the uncertainty values using morphologybased taxonomic identifications, discuss and summarize the results, and provide recommendations for their application and next step analyses.

Methods
Data used in this analysis are from freshwater benthic macroinvertebrate samples, collected from rivers, streams, and lakes across the U.S., including the Great Lakes. All taxonomic identifications were executed in laboratories using necessary sample/specimen preparation techniques, optical equipment, and appropriate technical literature. The level of effort expended by taxonomists for identifications is standardized for individual programs or projects, and is typically genus level, with occasionally more coarse targets for selected taxa. The taxonomic comparison process used for routine QC analysis is described in detail elsewhere [27][28][29] and involves blind sample reidentification by independent taxonomists in separate laboratories of a randomly selected 10% of each sample lot.
We compiled interlaboratory comparison data for 914 samples from 10 large programs or projects ( Table 2) which are conducted at selected local, regional, State, and National scales. Samples used by each of the programs for QC analyses [27,30] were randomly selected from the full sample load of the program, typically at a rate of approximately 10%. Thus, results reported here can be considered as representative of more than 9,000 samples. There is a total

TOTAL 914
The number of samples generally represents approximately 10% of the entire sample load for each program during the indicated time period.
https://doi.org/10.1371/journal.pone.0241933.t002 of 1,003 taxa, primarily at genus level (Fig 1), but also including more coarse levels because the level of effort was limited by defined standard procedures and/or poor specimen condition. Following Genus at 79.9 percent, the most frequently used categories were Family (14.6 percent), and Order and Subfamily (1.9 and 1.6 percent, respectively); other levels represent <1 percent of the dataset. There are occasionally "slash taxa", such as Cricotopus/Orthocladius (Diptera: Chironomidae), and one genus-group taxon, Thienemannimyia genus group which includes the chironomid genera Conchapelopia, Rheopelopia, Helopelopia, Telopelopia, Meropelopia, Hayesomyia, and Thienemannimyia. Truncatelloidea (Mollusca: Gastropoda) is used as a grouping for all Hydrobiidae. Two informal/undefined groupings were used: "Tubificoid Naididae" for those taxa formerly identified as Tubificidae (Oligochaeta: Haplotaxida); and Hydracarina for water mites that could not be taken to genus level. Two different taxon-specific characteristics are quantified, frequency of observation, or relative rarity, and relative percent difference (RPD). The total number of individuals (count) for a given taxon is the sum across all primary taxonomists (T1), from all samples in all projects. That count is derived in the same manner for the QC taxonomists (T2). Frequency of observation ([FREQ] relative rarity, commonness) for a taxon is the percentage of samples for which a taxon was recorded, calculated as the number of samples in which the taxon was found relative to the total number of samples (n = 914). The number of samples for each taxon is the average between T1 and T2. We plotted numbers of taxa versus numbers of samples using logarithmic scales to illustrate the dominance of taxa observed in a single sample.
The proportional difference between two taxon-specific values is calculated using RPD [31] as an indication of the confidence with which a data user can rely on an identification result. It is calculated as follows: where A and B are the numbers of individuals counted for a taxon by T1 and T2, respectively, and pooled across all samples and projects. Values range from 0, indicating perfect agreement, to 200, or perfect disagreement. A general characteristic of RPD is that low values indicate better consistency of identifications between/among taxonomists, thus conveying greater certainty than high values. Caution is warranted in using RPD when taxon-specific counts are low. If either T1 or T2 recorded �1 specimen of a taxon, and the other found none (0), RPD would be 200%. Although the number itself (200) would not be informative, it would indicate that one of the taxonomists recognized individuals of a taxon where the other did not. This would be a clue that some morphological key character (and, thus, the taxon) is not being recognized, or incorrect nomenclature is being applied. Other than these cautions, low values of RPD are reliable indicators of consistency. Thus, each taxon is represented by two data values, x = RPD and y = frequency of observation (FREQ) (S1 Appendix), as input for an x:y scatterplot. We used R-script to run a nonlinear regression model relating RPD to FREQ.

Results
The first data visualization was to use a logarithmic plot of numbers of taxa versus numbers of samples (Fig 2). There are 304 taxa that are observed in only 1-2 samples, where the 33 most common taxa are found in anywhere from 200-674 samples. Seventy-five percent (75%) of the taxa were documented in �20 samples. Overall distribution ranged from 200 taxa each being found once (in a single sample), to one taxon, Polypedilum (Diptera: Chironomidae: Chironominae: Chironomini), occurring in 674 samples.
Taxon-specific RPD plotted against FREQ (Fig 3) illustrates that most taxa have low taxonomic uncertainty (mostly identified consistently) and are relatively infrequently encountered. The best fit nonlinear regression model is given by the exponential decay equation: RPD = 22.673 + (200.498) � e^(-0.192 � FREQ), and all model terms were significant at p<0.001 (S1 Table). We delineated six uncertainty/frequency classes (UFC) based on graphic patterns (Figs 4 and 5), resulting in approximately 60% of taxa as being considered rare and identified with a high degree of certainty, that is, low RPD. All taxa are listed with associated numbers of individuals by primary and QC taxonomists, RPD, the number and percentages of samples, and UFC (S1 Appendix). Most taxa fall within UFC3 and 5 (Table 3; Figs 5 and 6), with roughly similar proportions within major taxa (Fig 7). UFC6 should be considered anomalous due to its representation by a small number of taxa (n = 6); otherwise, the mean and median values of RPD and FREQ, respectively, generally decrease and increase from UFC1-5 (Table 4, Fig 8).
We selected several taxa from each UFC (Table 5) to illustrate representative, quantitative outcomes and characteristics. UFC1 is high confidence, common, with representative taxa such as Pisidium, Stenelmis, Caenis, and Hyalella; overall, taxa in this class are observed in 23-74 percent of samples. Other than Nais with an RPD of 20.3, all other taxa in this class have RPD<10. UFC2 is high confidence, moderately common; overall, ranging in frequency of observation from 14-22 percent of samples, these taxa are also identified with low uncertainty (RPD, 0.3-17.6). Example taxa of this class include Stempellinella, Baetis, Arrenurus, and Hemerodromia. UFC3 groups taxa that are identified with confidence, simultaneous with being relatively rare (low frequency of occurrence) (high confidence, rare). Taxa range from being observed in only a single sample (0.1 percent of total n), such as Anchycteis, Susperatus, Marilia, and Armiger, to just under 14 percent, 120-125 samples (Stenonema, Chimarra, Limnesia, Stictochironomus). UFC4 groups taxa that are identified with increased uncertainty and are uncommon (Fig 4) (moderate confidence, rare). RPD ranges from 55-82, and taxa represent 0.1 percent of the samples (n = 1) to 5.7 percent (n = 52). Examples of UFC4 taxa include Halesochila, Vacupernius, and Macrelmis from only a single sample to Cernotina, Teloganopsis, and Micromenetus (n = 11, 15, and 52 samples, respectively). UFC5 groups taxa that are simultaneously rare and identified with a high degree of uncertainty (low confidence, rare), with taxa being observed in from 0.1-4.3 percent of samples, and identification uncertainty ranging from 85.7-200 (S1 Appendix). Example UFC5 taxa of lowest observation frequency include Amphicosmoecus and Kogotus (n = 1 sample) to Placobdella and Sphaerium in 16 (1.8 percent) and 39 (4.3 percent) samples, respectively. UFC6 taxa are outliers, mixed, not clearly falling in the other classes; there are six in this dataset, three of which are genus level (Conchapelopia, Thienemannimyia, and Dero), and three, family (Polycentropodidae, Libellulidae, and Naididae).

Discussion
Taxa with the highest RPD values, that is, with greater uncertainty, are documented in smaller numbers of sites (Table 7), corresponding with very rare and rare distribution classes of [23], and clearly illustrated by UFC1-2 versus UFC4-5 (Fig 7). In general, the more rare a taxon is, the greater is the uncertainty associated with its identity; and the obverse, increasingly common taxa are better known and identified with elevated confidence. This observation is demonstrated by the near mirror images of error rate (RPD) and rarity (FREQ) for UFC1-5 (Fig 8) and reflects the outcome predicted by [25], i.e., familiarity is borne of repeated encounters. This also speaks, in part, to the collective sense of our limited understanding of biological diversity, and of the most appropriate and effective ways of communicating about that diversity.
Higher level macroinvertebrate taxa in this analysis shown to have greater identification confidence and consistency are midges (Insecta: Diptera: Chironomidae), caddisflies (Insecta: Trichoptera), beetles (Insecta: Coleoptera), snails (Mollusca: Gastropoda), and stoneflies (Insecta: Plecoptera) (Fig 7), as they are mostly made up of finer level taxa within UFC1-3. Conversely, higher level taxa for which identification data seem to be more problematic (i.e., greater uncertainty) are bivalves (Mollusca: Bivalvia) and Crustacea (Arthropoda); these groups have a higher percentage of taxa in UFC4-5.
Several potential uses of UFC designations are relevant to informing data analysts and data users on the extent to which confidence can be placed in results. They include being used as taxon-specific weighting factors for calculating biological indicator values, such as indexes of biological integrity (IBI), River Invertebrate Prediction and Classification System (RIVPACS) models, various diversity calculations, species protection, or habitat prioritization. Testing is necessary to determine the effect on indicator values, but a weighted-average index could be formulated to elevate or restrict the importance of a taxon due to the relative potential of identification error. Similar to use of stressor tolerance values in the Hilsenhoff Biotic Index (HBI), UFC numbers could be used as taxon count modifiers. This approach would retain the inherent value and information content of organism identity, and simultaneously help objectively moderate the influence of those taxa on quantitative indicator outcomes.
Taxa demonstrated as having elevated identification uncertainty could be targeted for basic focused research, including morphological re-description, dichotomous identification keys, genetic fingerprinting, or other tools. Commonness values (FREQ) for individual taxa would  allow users of comprehensive identification manuals (such as, for example, [32]) to evaluate the relative rarity. The need for independent verification of an identification result would be emphasized for those with known elevated error rates (high RPD).  The confidence (certainty) placed in taxonomic identifications is related to both frequency of observation (commonness) and the consistency of identification. 1 Percent is the percentage of taxa relative to the overall dataset. https://doi.org/10.1371/journal.pone.0241933.t003 Another potential use of these results would be in helping target individual taxa for determining causes, beyond lack of familiarity, of higher error rates. A common cause is known to be specimens in poor condition and/or small body size (early life stages, or instars). An outcome of such an investigation might be to specify standard procedures for some taxa, including for sampling, handling, preservation, and identification. An example of this would be a requirement that all larval Chironomidae be slide-mounted for examination under a compound microscope. We do not necessarily advocate this, as slide-mounting is not consistently needed by all laboratories or taxonomists. Rather, we stress that the taxonomist use whatever method is needed to attain the target taxonomic level as defined by program or study goals. The goal in this case is not to require that all taxonomists (or taxonomic technicians) slidemount all chironomid midges; rather, the goal is to acquire genus level data for the taxon. In some cases, slide-mounting might be needed, in others, it would not. Thus, the need for such actions would be determined on a case-by-case, taxon-by-taxon, or even taxonomist-by-taxonomist basis, but the goal of genus level data remains the same.  Percentages of taxa in "major" benthic macroinvertebrate groups in six uncertainty/frequency classes. 1, high confidence, common; 2, high confidence, moderately common; 3, high confidence, rare; 4, moderate confidence, rare; 5, low confidence, rare; 6, outliers, mixed.

Uncertainty of taxonomic identifications
Our interest is in seeing UFC values used as one tool to enhance biological assessments, whether as direct input to indicator calculations, as information to help formulate additional analytical questions, or to help set or justify interpretive procedures. This analysis was possible RPD is a measure of uncertainty associated with taxonomic identifications, thus lower values equate to increased confidence. 1, high confidence, common; 2, high confidence, moderately common; 3, high confidence, rare; 4, moderate confidence, rare; 5, low confidence, rare; 6, outliers, mixed.
https://doi.org/10.1371/journal.pone.0241933.g008 by having access to available output of inter-taxonomist comparisons and demonstrates added benefits of routine QC and operational data management routines. The full list of 1,003 taxa is presented in S1 Appendix. T1 and T2 are the summed counts across n samples. RPD is relative percent difference, and Pct. is the percentage of total samples (n = 914) used in this analysis. https://doi.org/10.1371/journal.pone.0241933.t005
Project administration: James B. Stribling. Table 7. Relating relative percent difference (RPD) to distribution classes.