Oligonucleotide Primers for Targeted Amplification of Single-Copy Nuclear Genes in Apocritan Hymenoptera

Background Published nucleotide sequence data from the mega-diverse insect order Hymenoptera (sawflies, bees, wasps, and ants) are taxonomically scattered and still inadequate for reconstructing a well-supported phylogenetic tree for the order. The analysis of comprehensive multiple gene data sets obtained via targeted PCR could provide a cost-effective solution to this problem. However, oligonucleotide primers for PCR amplification of nuclear genes across a wide range of hymenopteran species are still scarce. Findings Here we present a suite of degenerate oligonucleotide primer pairs for PCR amplification of 154 single-copy nuclear protein-coding genes from Hymenoptera. These primers were inferred from genome sequence data from nine Hymenoptera (seven species of ants, the honeybee, and the parasitoid wasp Nasonia vitripennis). We empirically tested a randomly chosen subset of these primer pairs for amplifying target genes from six Hymenoptera, representing the families Chrysididae, Crabronidae, Gasteruptiidae, Leucospidae, Pompilidae, and Stephanidae. Based on our results, we estimate that these primers are suitable for studying a large number of nuclear genes across a wide range of apocritan Hymenoptera (i.e., all hymenopterans with a wasp-waist) and of aculeate Hymenoptera in particular (i.e., apocritan wasps with stingers). Conclusions The amplified nucleotide sequences are (a) with high probability from single-copy genes, (b) easily generated at low financial costs, especially when compared to phylogenomic approaches, (c) easily sequenced by means of an additionally provided set of sequencing primers, and (d) suitable to address a wide range of phylogenetic questions and to aid rapid species identification via barcoding, as many amplicons contain both exonic and fast-evolving intronic nucleotides.


Introduction
Targeted amplification of single-copy genes is still a cornerstone of molecular phylogenetics despite the emergence of phylogenomic approaches analyzing transcriptome data and entire genomes. PCR approaches have focused primarily on mitochondrial genes, rRNA, and a restricted number of nuclear genes [1][2][3][4]. The phylogenetic analysis of a set of these standard genes and the study of phylogenomic data both have their pros and cons: the few standard genes are comparatively easy to amplify across a wide range of species, but their phylogenetic signal may be insufficient to answer the research question(s) of interest. In contrast, phylogenomic approaches provide a plethora of nucleotide sequence data and facilitate addressing difficult phylogenetic questions. However, phylogenomic approaches are (still) expensive and may require specially treated sample material (e.g., for preservation of RNA), which means that material from most scientific collections cannot be used. Degenerate oligonucleotide PCR primers designed to amplify a large set of single-copy nuclear genes in species of interest could close the gap between the two approaches and could be a viable alternative to both of them. Here, we present such a suite of PCR primers for amplifying single-copy nuclear genes from Hymenoptera (sawflies, bees, wasps, and ants).
Hymenoptera are one of the mega-diverse insect orders and encompass more than 125,000 described species, many of which have key functions in ecosystems and are of fundamental economical, agricultural, and medical importance [5]. Given this importance, it is surprising how few molecular markers are currently in use for phylogenetic and evolutionary studies of Hymenoptera (e.g., [6][7][8][9]). Even the most recent comprehensive phylogenetic investigation of Hymenoptera used a PCR approach that targeted only four genes (18S, 28S, EF1a, COX1) [4]. Many important nodes in the resulting phylogeny are not robust, indicating that more nucleotide sequence data are required to answer these and other fundamental phylogenetic questions involving Hymenoptera. Additionally, only two phylogenomic studies have been published that analyze EST data from Hymenoptera, both with very limited taxon samples [10,11]. Peters and colleagues [12] combined all published sequence data of Hymenoptera for a comprehensive phylogenetic analysis. This study revealed that only about ten molecular markers are frequently used to tackle phylogenetic questions in the Hymenoptera. These markers have undoubtedly given important insights into the evolutionary history of this group. Nonetheless, their limited phylogenetic signal has also left many difficult and longstanding phylogenetic questions unresolved.
There are three major advantages of establishing molecular markers for phylogenetic analyses from sequenced genomes compared to traditional approaches and to the exploration of EST data: (a) the ability to reliably assess the orthology of genes; (b) the ability to assess the probability of obtaining undesired secondary PCR products; and (c) the availability of gene models that inform about the position and length of introns and exons. One-to-one orthologous (single-copy) protein-coding genes can be identified with high confidence using orthology assessment software such as OrthoMCL [22]. When restricting oligonucleotide primer design to single-copy genes, the risk of accidentally sequencing pseudogenes and other paralogous genes is greatly reduced. If the main interest of a study lies in amplifying fast evolving sites, for example to address relationships within species or among closely related species, it is possible to focus on PCR primer pairs that maximize the amount of intronic sites in the PCR product. This kind of information cannot be inferred from EST data.
We present a suite of new degenerate oligonucleotide primers that are expected to amplify single-copy nuclear protein-coding genes in a taxonomically wide array of apocritan Hymenoptera (i.e., Hymenoptera with a wasp-waist). This lineage of Hymenoptera comprises the vast majority (.95%) of hymenopteran species [5]. We provide detailed primer statistics and a PCR protocol for rapidly assessing the functionality of primer pairs, and we show results from empirically testing ten randomly selected primer pairs on DNA from six Hymenoptera species, representing the families Chrysididae, Crabronidae, Gasteruptiidae, Leucospidae, Pompilidae, and Stephanidae. The targeted molecular markers can be used to address a wide range of phylogenetic and/or comparative evolutionary questions, may prove valuable for rapid species identification via barcoding, and can be easily generated at low financial costs.
Orthology of proteins between the nine genomes was inferred using a graph-based approach as implemented in OrthoMCL 2.0 [22]. This approach has been shown to have reasonably low false positive and false negative rates among the available methods to estimate gene orthology [23]. We only used sequence pairs from the 'orthologs.txt' output file for Markov clustering. The inflation value was set to 1.5. Finally, we extracted sets of 1:1 orthologs from the final OrthoMCL output file with the aid of a custom-made Perl script. The amino acids in each set of 1:1 orthologous proteins were aligned with MAFFT 6.833b [24,25] using the 'L-INS-I' alignment strategy. Note that we replaced the amino acid code 'U', which stands for selenocysteine and is not recognized by MAFFT, with the ambiguity code 'X' prior to alignment. The alignment was subsequently refined with MUSCLE 3.7 [26] using the refinement option. Each amino acid alignment was then used as a blueprint to align the nucleotides of the corresponding coding sequences with a custom-made Perl script and the BioPerl tool kit [27]. All sets of 1:1 orthologs were annotated by generating profile hidden Markov models (pHMMs) from the protein alignments. The pHMMs were used to search the official gene set (OGS) of the fruit fly Drosophila melanogaster (FlyBase release 5.22) [28] for the most similar sequence (E value ,10 210 ) with the HMMER 3.0 [29,30] software package. We also estimated the average nucleotide sequence divergence among the nine reference genomes for each amplified region by calculating Hamming distances ( = uncorrected p-distances) using a custom-made Perl script.

Oligonucleotide Primer Design
All 4,145 multiple nucleotide alignments of 1:1 orthologous genes were searched for suitable primer binding sites using a custom-made Ruby script (Janus Borner, Christian Pick, Thorsten Burmester, unpublished). The script designs degenerate primers for PCR-amplification of coding sequences from the nuclear genome. It searches for conserved regions in aligned protein-coding nucleotide sequences and checks whether or not possible oligonucleotide primers that would bind at these conserved regions do not exceed a certain degree of degeneration, exhibit a GC content within a given range, and do not possess more than a given number of nucleotide repeats (Table 1). All primer pairs consistent with these criteria were searched for matches in the genomic nucleotide sequences of the nine reference species. This allowed estimating the actual length and the relative intron content of each amplicon. Primers that did not match because they bind at an exon/intron boundary or because they would amplify a region exceeding a pre-defined size (Table 1), were discarded. Approximate genomic matches were also considered to assess the probability of obtaining undesired secondary amplification products. To allow for direct sequencing of the PCR products using specific oligonucleotide sequencing primers, predesigned oligonucleotides were added to the 59 end of each primer sequence (Table 2). Finally, we evaluated the melting temperatures and hybridization energies of homo-and heterodimers for each pair of primers with the aid of UNAFold 3.8 [31]. All primer design parameters are summarized in Table 1.
DNA was extracted from thoracic muscle tissue using the QIAGEN DNeasy Blood & Tissue Kit and following the protocol for insects (QIAGEN GmbH, Hilden, Germany). DNA quality and quantity were assessed by running the extracted DNA on a 1.5% agarose gel and by analyzing the DNA with a NanoDrop 1000 Spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA). Polymerase chain reactions (PCRs) were run in 20 ml volumes consisting of 0.56 QIAGEN Q-Solution, 16 QIAGEN Multiplex PCR Master Mix (QIAGEN GmbH, Hilden, Germany), 0.8 mM of each oligonucleotide primer, and 50 ng DNA.
The touch-down PCR temperature profile started with an initial denaturation and QIAGEN HotStarTaq DNA polymerase activation step at 95uC for 15 min., followed by 16 cycles of 95uC for 0.5 min., 60-45uC for 0.5 min., and 72uC for 1.5 min, followed by 20 cycles of 95uC for 0.5 min., 65 for 0.5 min., and 72uC for 1.5 min, followed by 10 min. at 72uC. Note that the annealing temperature (T a ) was decreased during the first 16 cycles by 1uC each cycle. All PCRs were run on a Biometra Whatman T3000 Thermocycler (Biometra GmbH, Göttingen, Germany). PCR products were separated on a 1.5% agarose gel, together with a Fermentas 100 bp Plus DNA Ladder (Fermentas GmbH, Sankt Leon-Rot, Germany). PCR products chosen for sequencing (i.e., the amplicons of the five best-performing PCR primer pairs) were purified with the QIAquick PCR Purification Kit (QIAGEN GmbH, Hilden, Germany) and sent to Macrogen Inc. (Amsterdam, Netherlands) for direct Sanger sequencing with the sequencing primers HOG-Seq-A-F and HOG-Seq-A-R (Table 2). Forward and reverse DNA strands were assembled to contigs, trimmed (to exclude the binding sites of the PCR and sequencing oligonucleotide primers), and aligned with Geneious Pro 5.4.6 [32] to the sequences of the nine Hymenoptera, from which the primer pairs were inferred.
All new sequences generated in this study have been submitted to the European Nucleotide Archive (accession numbers HE612159-HE612181).

Gene Orthology and Oligonucleotide Primer Design
Analyzing the official gene sets of the nine hymenopterans with sequenced genomes, we identified a total of 4,145 single-copy orthologous genes that were present in every species. Studying the multiple nucleotide sequence alignments of these 4,145 orthologous genes, we inferred 304 oligonucleotide primer pairs for amplifying a total of 154 single-copy nuclear protein-coding genes. The length of the inferred primers ranges between 20 and 25 nucleotides (avg. 21), their estimated T m (approximate melting temperature) ranges between 44.6u and 65.7uC (avg. 53.5uC), and their degree of degeneration ranges between 1 and 192 (avg. 31). For each of the 154 genes, we inferred between 1 and 11 (avg. 2) primer pairs with a maximum overlap between amplicons of 50%. The total degree of degeneration of the primer pairs ranges  Of the 304 inferred oligonucleotide primer pairs, we found 80 (referring to 71 different genes) to be compatible with sequencing primer pair HOG-Seq-A, 130 (referring to 107 different genes) to be compatible with sequencing primer pair HOG-Seq-B, 46 (referring to 46 different genes) to be compatible with sequencing primer pair HOG-Seq-C, and 73 (referring to 62 different genes) to be compatible with sequencing primer pair HOG-Seq-D ( Table 2).
The complete list of inferred degenerate oligonucleotide primers, along with complementary information (e.g., annealing temperature, degree of degeneration, expected length of amplicons, compatibility with sequencing primers attached to the 59 end of the PCR primers) is given in Table S1. Additional supplemen- Figure 1. Hypothesized phylogenetic relationships of apocritan Hymenoptera studied in this investigation [4,12]. Taxa with sequenced genomes are highlighted in green; their genome sequences were analyzed to identify single-copy genes and to design degenerate oligonucleotide PCR primers. DNA of non-highlighted species was used to assess the functionality of the inferred PCR and sequencing primers. doi:10.1371/journal.pone.0039826.g001 Figure 2. Polymerase chain reaction (PCR) products separated on 1.5% agarose gel. The depicted gel shows the PCR products obtained from using the inferred oligonucleotide primer pair 7229_02_A (Table 3) to PCR amplify DNA of Stephanus serrator (Stephanidae, 1), Leucospis dorsigera (Leucospidae, 2), Gasteruption tournieri (Gasteruptiidae, 3), Chrysis mediata (Chrysididae, 4), Lestica alata (Crabronidae, 5), and Episyron albonotatum (Pompilidae, 6). All PCR products were suitable for direct sequencing with the sequencing oligonucleotide primers HOG-Seq-A-F/2R (

Empirical Evaluation of Oligonucleotide Primer Pairs
All tested PCR primers had the oligonucleotides of the sequencing primer pair set HOG-Seq A attached to their 59 ends ( Table 2 and Table 3). One pair of tested primers produced amplicons suitable for direct sequencing in all six species (Tables 4 and 5, Figure 2). An additional five primer pairs produced amplicons suitable for direct sequencing in at least four of the six studied species, with a tendency to less reliably produce a PCR product suitable for direct sequencing with increasing evolutionary distance from ants (Tables 4 and 5, Figure S1). Overall, the PCR success rate when using DNA from species of Aculeata (i.e., apocritan wasps with stingers) was ,80%. When considering all Apocrita, the PCR success rate was still ,60%.

Discussion
We inferred 304 oligonucleotide primer pairs that can be used for PCR amplification of up to 154 different genes in apocritan Hymenoptera. The ten primer pairs that were empirically tested proved to be highly successful in amplifying the desired target DNA of Aculeata and showed a reasonable success-rate when applied to DNA of other Apocrita. Extrapolating these results and considering that we provide on average two primer pairs for a given gene, we expect up to 148 genes of interest to be amplifiable in aculeate Hymenoptera and roughly 110 to be amplifiable in many other groups of Apocrita. The high successrate of our new PCR primers is most likely the result of the strict selection criteria that we applied during primer design (e.g., low potential for self-priming and the formation of hairpin loops, no alternative binding sites in the reference genomes). However, given that seven of the nine analyzed reference genomes are from ants, we expect fewer primers to amplify the desired product when they are applied to DNA of species that are distantly related to ants (e.g., non-aculeate Apocrita). The ten degenerate oligonucleotide primers were tested with the respective binding sites for sequencing primer HOG-Seq-A (see Table 2) attached to the 59 end and used to amplify ten target genes in six apocritan Hymenoptera.  Rating of the PCR products obtained from using the degenerate oligonucleotide primers shown in Table 3 to amplify ten target genes in six apocritan Hymenoptera. ++ = target PCR product in excess. + = target PCR product sufficient for direct sequencing. +/2 = target PCR product insufficient for direct sequencing. -= no target PCR product. (?) = unclear whether or not PCR products include amplicon of target gene. *Secondary PCR amplification product likely hampering direct sequencing. doi:10.1371/journal.pone.0039826.t004 There are several options for improving the PCR success-rate of the primers reported here. For example, while we used a touchdown temperature profile to rapidly assess the functionality of the ten evaluated primer pairs, one could instead use a PCR temperature profile with a constant annealing temperature that is close to the optimal annealing temperature of the specific primer pair (Table S1). Such a temperature profile could reduce the risk of obtaining secondary amplification products. Since we did not apply primer-specific PCR temperature profiles when empirically testing primer pairs, we expect their success-rate to be slightly underestimated. Researchers using these new primers should also consider increasing the concentration of oligonucleotides in the PCR mix to counterbalance the high degree of degeneration of some of the oligonucleotides (Table S1).
We calculated the average nucleotide sequence divergence among the nine reference genomes for the amplified region plus the absolute number of intronic and exonic nucleotides in the expected amplicon for each primer pair (Table S1). Consequently, users are able to search for markers that are more-or lessconserved than others, and users are additionally able to select for primers that specifically amplify genes with or without introns. Intronic DNA could prove highly valuable for resolving genealogical relationships of recently diverged lineages. These nuclear markers may also prove to be very useful for DNA barcoding. Overall, the ability to select genes that seem particularly suitable to address a specific research question makes the plethora of PCR primers presented here a highly valuable toolbox for research in apocritan Hymenoptera. Finally, the inferred primers are compatible with pre-designed oligonucleotides ( Table 2) attached to their 59 end. This allows users to select a single oligonucleotide sequencing primer pair from a set of four for sequencing all PCR products.
Our approach for designing oligonucleotides for PCR-amplification of orthologous genes in a wide range of species requires the availability of sequenced genomes. One group of insects, besides Hymenoptera, for which genomes of several taxa have been sequenced, and for which such an approach might prove fruitful, is Diptera. Genome sequences from more than 15 species of Diptera are currently available and those of many more are already in progress. As in Hymenoptera, however, there is a strong taxonomic bias: only genomes of fruit flies (Drosophila spp.) and of mosquitos (Culicidae) have been published. As these two taxa belong to two distantly related lineages that split early in the evolution of Diptera, the available genomes might nonetheless already reflect a significant proportion of the molecular diversity in Diptera. With the i5K initiative [33], we expect the number of sequenced insect genomes to explode in the very near future. This will likely allow the inference of large numbers of phylogenetic markers for many more insect orders. Figure S1 Polymerase chain reaction (PCR) products separated on 1.5% agarose gels. The depicted gels show the PCR products obtained from using the inferred oligonucleotide primer pairs A. 3683_01_A, B. 4652_02_A, C. 4747_02_A, D. 5119_01_A, E. 5257_01_A, F. 5592_01_A, G. 5768_01_A, H. 6917_01_A, and I. 7036_02_A (see Table 3) to PCR amplify DNA of 1. Stephanus serrator (Stephanidae), 2. Leucospis dorsigera (Leucospidae), 3. Gasteruption tournieri (Gasteruptiidae), 4. Chrysis mediata (Chrysididae), 5. Lestica alata (Crabronidae), and 6. Episyron albonotatum (Pompilidae). -= negative control. L = 100 bp ladder (see also Figure 2). (TIF)

Supporting Information
Table S1 Inferred degenerate oligonucleotide primers for studying single-copy nuclear genes in apocritan Hymenoptera. (XLS)