Fast-Evolving Homoplastic Traits Are Best for Species Identification in a Group of Neotropical Wasps

Biological characters can be employed for both taxonomy and phylogenetics, but is conscripting characters for double duty a good idea? We explore the evolution of characters designed for taxonomic diagnosis in Costa Rican heterospiline wasps, a hyperdiverse lineage of parasitoid Braconidae, by mapping them to a robust multi-locus molecular phylogeny. We discover a strong positive relationship between the amount of evolutionary change a character undergoes and how broadly useful the characters are in the context of an interactive identification key- e.g., how evenly the character states are distributed among taxa. The empirical finding that fast characters are the most useful for species identification supports the idea that characters designed for taxonomic diagnoses are likely to underperform- or be positively misleading- in phylogenetic analyses.


Introduction
Systematics as a biological discipline is broadly concerned with two topics: description of biological diversity, and inference of evolutionary history. While these topics have historically involved different theoretical underpinnings and modes of analysis, both rely on the common currency of characters. Characters are observable qualities of an organism-typically morphological, molecular, or behavioral-that are quantified to form basic units for all subsequent analyses.
Consider the simple example of a songbird's beak. The beak's shape is a character, and it might be described as having several states: long, short, or crossed. We can observe the state of individual birds, marking each beak as either long, short, or crossed, and compiling the data into a matrix. The data can then serve in a variety of studies, especially when combined with observations of additional characters. A taxonomic project, for example, might try to delimit species by searching for clusters of similar beak shapes. A phylogenetic project might use the data in conjunction with models of beak evolution to reconstruct evolutionary histories. In both scenarios the underlying character data is the same.
That the same characters can be employed for both taxonomic and phylogenetic studies does not imply they perform equally in both arenas, however. Systematists have long waged a largely philosophical argument over the supremacy of various character systems (especially, molecular versus morphological data [1][2][3][4][5][6][7], and even over the extent to which the efficacy of different characters can be judged, especially a priori). Winston [8] provides a clear description of what qualifies a character to be an ideal diagnostic (as opposed to phylogenetic) character: easily recognizable and tending to be constant within a taxon. Such characters need not reflect basic biological differences. On the other hand, phylogenetically useful characters need not be easy to observe with a specimen in hand, but do need to incorporate the notion of homology. To be maximally useful, character states should not be restricted to a single taxon [9]. Although many systematists recognize that different types of questions merit different types of data, the considerable labor involved in assembling data matrices continues to provide an impetus for conscripting the same data for multiple questions, even if the data were not collected for all possible downstream purposes. What effect might this have on the resulting phylogenetic analyses?
In the present study, we use a robust multi-locus molecular phylogeny of a hyperdiverse clade of braconid wasps as a framework for examining evolutionary patterns of discrete morphological characters developed for species-level taxonomy. The new ability of interactive identification software to quantitatively assess character utility allows us the novel approach of correlating character performance with evolutionary rate. Specifically, we employ the "Best" function in Lucid Player 3.4 [10] to rank characters according to their probability of eliminating taxa when states are selected. As each character is scored for both a utility rank and a measure of evolutionary change, we can provide an empirical estimate of the extent to which evolutionary change influences the usefulness of characters for species identifications.
Our focal organisms are a highly species-rich complex of doryctine braconid wasps associated with the genus Heterospilus Haliday. This genus is especially diverse in the Neotropics [11]. Although Heterospilus is among the most abundant wasps collected in passive biodiversity surveys in MesoAmerica ( [12], P. Marsh & J. Whitfield pers obs.) the vast majority of species remain undescribed [11] and little has been published about their biology. Where known, species are often ectoparasitic on Coleoptera [13][14][15], or other holometabolous insects [16] concealed within plant tissue (e.g. twigs, stems, twig nests).

Morphological characters
An ongoing species-level revision of Costa Rican Heterospilus provided 47 discrete morphological characters useful for diagnosing species. The characters (File S1) relate to color, sculpture, pilosity, and morphometric proportions, and when employed in conjunction with the interactive identification software Lucid player 3.4 [10], they are sufficient to separate all of the approximately 350 species in the Costa Rican fauna (Marsh & Wild, pers. obs.) The relative utility of each morphological character was assessed in the framework of Lucid player 3.4 [10]. The Lucid identification process progresses by eliminating taxa as the user selects character states matching those in the specimen to be identified. A central feature to Lucid's software is a quantitative ranking algorithm-the "Best" function-that automatically directs users at any point in the process to characters with the highest probability of eliminating remaining taxa. In essence, the "Best" characters are those whose character states are most evenly distributed across taxa, such that choosing a state cleaves the candidate taxa into similar sized groups. At any stage of the key, the "Best" function will select the character whose alternate states are most evenly balanced among the remaining taxa, as this maximizes the chance that choosing a character state will eliminate possible identifications. In our experience with Heterospilus, a specimen can be identified by selecting as few as four observed character states following the "Best" suggestion. As Lucid's "Best" algorithm is a quantitative assessment of character utility for taxonomy, we employ it here to rank order all 47 characters from most taxonomically useful (=more even state distribution among taxa) to least useful (=more skewed state distribution among taxa), with the full set of c. 350 species in the matrix.

Phylogenetic inference
We inferred the relationships of 95 species of doryctine wasps, focusing on Costa Rican Heterospilus and related genera, using 4.3kB of sequence data from 5 loci: nuclear protein coding gene fragments from Alpha Spectrin, RNA Polymerase II, and Carbamoyl Phosphate Synthetase (CAD), the nuclear ribosomal gene 28S, and the mitochondrial proteincoding gene COI. Specimen data and Genbank accessions are listed in Table 1, and primer sequences are provided in Table  2. This taxon sample is necessarily smaller than the 350 species in the Costa Rican fauna, as most species are known only from older collections with degraded DNA, while some are represented by single specimens. Thus, we limited our phylogenetic sample to all available specimens collected into ethanol within the past 8 years. In addition to Costa Rican collections, we included seven freshly-collected specimens from Ecuador and two from warm temperate North America. Additional doryctine genera were included in the analysis because of potential paraphily of Heterospilus [12].
Genomic DNA was non-destructively extracted from the intact mesothorax, metathorax, and mesosoma after removing the head and prothorax. Specimens were soaked for 4-12 hours in a proteinase K solution, and the DNA was isolated using a Qiagen DNeasy kit according to the manufacturer's protocol. Each sampled specimen was then scored for the Lucid key characters. The morphological matrix was >95% complete, as species sampled only from males and specimens with damaged antennae precluded observation of ovipositor and antennal characters for several taxa. Voucher specimens are deposited in the Illinois Natural History Survey (INHS) collections.
DNA was amplified in a polymerase chain reaction using Takara Ex Taq and the manufacturer's reagents under the recommended protocol. The nuclear protein-coding genes were amplified using a 2-stage nested PCR (described in [17]), with extension times adjusted to suit the length of the target fragment and annealing temperatures in accordance with primer Tm. PCR product was purified using Qiagen Qiaquick elution columns per the manufacturer's protocol, and the DNA was sequenced using Sanger sequencing on an ABI 3730XL capillary sequencer. Chromatograms were edited in BioEdit [18] and aligned in Mesquite 2.7 using Opal [19].
Phylogenies were inferred for individual loci and for the concatenated data using MrBayes 3.1 [20], with substitution models selected using MrModeltest [21]. We employed a 5partition set for the final analyses as follows: 28S, COI codon positions 1 & 2; COI codon position 3; nuclear protein-coding genes codon positions 1 & 2; and nuclear protein-coding genes codon position 3. We replicated the final MrBayes analysis twice, for over 2.5 x10 7 generations each time, and checked convergence among runs using AWTY [22] and among parameter estimates using Tracer [23]. We obtained an ultrametric tree by reanalyzing the matrix using the same models and partition scheme in BEAST [24] for 5 x 10 7 generations, using the MrBayes consensus as a starting tree and a relaxed molecular clock model. We did not specify any absolute age constraints.

Character evolution
Morphological characters were mapped to the ultrametric tree using Mesquite 2.7 [25]. Mesquite scored each character for the number of character state changes in a parsimony framework. These assessments of character change were then plotted against the Lucid "Best" rank, and the statistical

Ethics statement
Biological samples used in this study were collected and exported with the requisite permission of the governments of Costa Rica (via INBio) and Ecuador (collection N° 019-IC-FAU-DNBAP/MA and export 011-EXP-CIEN-FAU-DNBAPVS/MA to Lee Dyer). North American samples were collected with the expressed permission of the landowners, no specific permissions were required as the locations are not protected in any way nor did our collections involve endangered or protected species.

Phylogeny
The concatenated genetic data produced a well resolved tree, with 87% of nodes within the heterospiline clade showing posterior probabilities > 95% and 82% of nodes with a posterior probability of 100%. Topologies generated from individual loci were broadly congruent with each other and with previously published molecular phylogenies [26], differing largely at nodes with low levels of support. MrBayes and BEAST produced topologies identical to each other except for alternate resolutions of two internal nodes, one within Allorhogas and the other over the sister-taxon relationship of Heterospilus 15 and 71.
Heterospilus was not recovered as monophyletic in the concatenated analysis (Figure 1), nor in analyses of most of the individual loci. The paraphily of Heterospilus is not unexpected, as it echoes results from the recent study by Zaldívar-Riverón et al. [12]. In our study, two specimens of Pioscelus and seven specimens from the phytophagous genus Allorhogas emerged basally within Heterospilus with high support. To maintain monophyly of the focal lineage, we coded the relevant characters in Pioscelus and Allorhogas and included them in the character evolution analysis.
Our small sample of less than 20% of Costa Rican doryctine genera reinforces earlier findings [26,27] that the internal relationships of Doryctinae remain poorly understood. In addition to the paraphily of Heterospilus, the Neotropical genus Notiospathius consistently emerges in two disparate parts of the outgroup tree. Notiospathius is one of several paraphyletic genera in the analyses of Zaldivar-Riverón et al. [26]. The amount of systematic disarray in Doryctinae is perhaps not surprising considering the tremendous and largely undocumented diversity of parasitic Hymenoptera and the small number of taxonomists devoted to the group [28,29].

Character evolution
The vast majority of taxonomic characters (38 of 47) are reconstructed to reverse state at least once (Table 1). Figure 2 illustrates parsimony reconstructions of the evolution of two of these characters. Three characters did not change state at all, reflecting rare states in the full 350-taxon data set that were not picked up in the phylogenetic subsample. The mean number of changes per character inferred on the tree was 15. These rates of evolution are higher than even the 3rd nucleotide positions in the molecular matrix ( Figure 3). The high rates of character change and reversals may explain the observed taxonomic confusion among doryctine genera.
We found a strong and significant correlation between the Lucid "Best" rank and the number of state changes ( Figure 4, Spearman Rank Correlation coefficient = -.80, n = 47, 2-tailed test, P < .0001). This correlation is not an artifact character state number; the relationship holds within characters of the same number of states (2-state characters, Spearman Rank Correlation coefficient = -.76, n = 19, 2-tailed test, P < .0002; 3state characters, Spearman Rank Correlation coefficient = -.76, n = 17, 2-tailed test, P < .0005; 4-state characters, Spearman Rank Correlation coefficient = -.67, n = 10, 2-tailed test, P < . 05; 6 state characters not tested as there were only three). Thus, characters that change state frequently are the most useful for species diagnosis in this group of wasps.
The correlation is possibly due to the phenomenon whereby independent characters that evolve rapidly often find themselves in novel combinations with other characters, providing unique character combinations that allow for easy diagnosis. Thus, fast homoplastic characters may be best for species diagnosis. A logical next step in exploring this phenomenon would be to simulate characters of varying rates on a tree, record the recovered patterns of homoplasy, code the final states in an interactive key, and verify that the artificial   The Best Taxonomic Characters Are Fast PLOS ONE | www.plosone.org 7 September 2013 | Volume 8 | Issue 9 | e74837 characters produce the same correlation of rate to taxonomic utility. We do not intend these results as commentary on the "molecules v. morphology" debate in phylogenetics, or as a general statement about morphology. Our characters are not a random sample of all possible morphological traits, but were developed specifically because they were useful for taxonomic identification. In fact, morphological characters freed from the demands of phylogenetic inference should encourage taxonomists to be bolder in developing new systems for diagnosis, as long as the characters are recognized as such.
The correlation between evolutionary rate and diagnostic utility illustrates a tension between the properties of characters that render them suitable for taxonomic questions and those that render them suitable for phylogenetic questions. Characters that evolve as quickly as those observed here will saturate in the deeper regions of the tree, providing little useful phylogenetic signal. Consequently, we recommend that data culled for taxonomic projects be used primarily for taxonomy. Although our study does not directly address the converse, it is likely that data collected for phylogenetic projects are likewise best used primarily for phylogenetics. Just because a character matrix exists does not mean it ought be used to answer questions for which it was not designed.