A Standardized Reference Data Set for Vertebrate Taxon Name Resolution

doi:10.1371/journal.pone.0146894

Fig 1.

Data workflow to create the Gold Standard set of vertebrate names.

Data from providers participating in VertNet were gathered directly from self-hosted data sets (top left), or, for data sets that undergo data cleaning through VertNet migrators prior to publication, as provided previous to migration (top right). The table (center) shows how data were organized in distinct fields for assessment. The composition and characteristics of the 1,000 name combinations in the data set (lower) distinguishes subsets resolved by each researcher (left and right), and cross-analysis performed by both researchers at different levels of certification (check for format errors only and complete double check). DwC: Darwin Core; Sciname: scientific name; con-rank and sn-rank: taxon ranks of constructedscientificname and scientificnameplus fields respectively.

More »

Expand

Table 1.

Summary information on the 1,000 name combinations that form the basis of the validation set.

More »

Expand

Fig 2.

Summary of issues encountered in the name combinations that form the basis of the validation set.

Numbers indicate name combinations that showed one or more types of issues. Total number of name combinations assessed for issues = 991, total number of those name combinations with issues = 532, total number of those name combinations with errors (misspelling, conceptual or format error) = 341.

More »

Expand

Fig 3.

Main effects of the predictors on the probability of occurrence of issues on the taxonomic name combinations.

Factors: basisOfRecord, Geographic region, Clade, Number of records shared by Institution and Year. Issues: Synonymy, Misspellings, Conceptual errors, Format errors. “Has issue” denotes the presence of at least one of the cited issues. Effects calculated through logit GLM, with binomial response. For definition of the “Fishes” clade, see text. NS: statistically not significant.

More »

Expand