Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A PCR primer design method for identifying spider mite species using k-mer counting

  • Tomoko Matsuda ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    tomokom@nbiodata.com

    Affiliation Research and Development Department, Nihon BioData Corporation, Kawasaki, Kanagawa, Japan

  • Hironori Sakamoto,

    Roles Investigation, Methodology, Resources, Validation

    Affiliation Biodiversity Division, National Institute for Environmental Studies, Tsukuba, Ibaraki, Japan

  • Takumi Kayukawa,

    Roles Investigation, Methodology, Resources, Validation

    Affiliation Institute of Agrobiological Sciences, National Agriculture and Food Research Organization, Tsukuba, Ibaraki, Japan

  • Yasuki Kitashima,

    Roles Investigation, Resources

    Affiliation Faculty of Agriculture, Ibaraki University, Ami, Ibaraki, Japan

  • Toshinori Kozaki,

    Roles Investigation, Resources

    Affiliation Faculty of Agriculture, Ibaraki University, Ami, Ibaraki, Japan

  • Tetsuo Gotoh

    Roles Funding acquisition, Project administration, Resources, Supervision, Writing – review & editing

    Affiliations Faculty of Agriculture, Ibaraki University, Ami, Ibaraki, Japan, Faculty of Economics, Ryutsu Keizai University, Ryugasaki, Ibaraki, Japan

Abstract

Using PCR to distinguish closely related species can be difficult because they may have very similar genomes. Advances in bioinformatics make it possible to design PCR primers that are species-specific. In this study, we developed a bioinformatics method for extracting species-specific primer candidate sequences (i.e., unpaired primers that were specific to a single species) from RNA-Seq data sets of 19 species of spider mites (Acari, Tetranychidae). Using k-mer counting, we obtained between 257 and 48,621 species-specific unpaired primer candidates for the 19 species. We then manually obtained a second primer that was also species-specific. The primer pairs were then confirmed to work in the target species and not to work in the non-target species. Finally, species-specific primer pairs were obtained for 17 of the 19 species tested. Such species-specific primers may be used for practical species discrimination by optimizing multiplex PCR. Our primer design method is expected to be applicable to other taxa.

Introduction

Polymerase chain reaction (PCR) is a method for amplifying DNA fragments in vitro [1]. Because of its convenience and low cost, PCR has become a standard technique not only for biomedical, infectious disease, and forensic applications but also in other fields of biology, such as agricultural and ecological research. Among its many uses, PCR can be used to distinguish morphologically similar species. For example, PCR-RFLP (restriction fragment length polymorphism) distinguishes closely related species by using a single restriction enzyme to produce fragments of different lengths from DNA markers [24]. Another method is to design species-specific primers that can only be amplified in certain species and to identify the species based on successful PCR amplification and correct product size [57]. In that case, designing PCR primers to be specific to the target is essential. Several web-based services and stand-alone software programs, such as Primer-BLAST [8], PrimerSNP [9], MultiMPrimer3 [10], and ssGeneFinder [11], are designed to optimize primer design for PCR. Furthermore, advances in bioinformatics make it possible to obtain more efficient primer candidates and reduce the amount of false-positive amplification [1214].

One of the methods for designing primers from large sequence data uses k-mer counting. A k-mer is an oligonucleotide of length k and is utilized for various bioinformatics analyses, such as de Bruijn graph assembly [1517], estimation of genome size [1820], and estimation of species abundance in metagenomic samples [2123]. Some studies have provided ways to design primers or probes to detect strains or species using k-mers from large-scale sequence data. PriMux software predicts degenerate primers and probes from a large target set with very low homology and poor alignment (full-length genomes of all serotypes of the dengue virus, 2,863 sequences) [24]. The method allows the selection of primers optimized for viral genomes of different serotypes. It does not require multiple sequence alignment, which involves computation time and memory space, and instead uses k-mer analysis. PathogenMIPer software is a program for designing unique and specific molecular inversion probe (MIP) oligonucleotides for pathogen identification and detection [25]. Probes generated by PathogenMIPer were able to detect 24 human papillomavirus (HPV) types in clinical samples with 100% sensitivity and no false positives. RUCS is a program that identifies PCR primer pairs and probes present in one genomic dataset but not another [26]. These studies have targeted viruses and pathogens with relatively small genomes. For eukaryotes, which have larger genomes than prokaryotes, more time is generally required for k-mer analyses. In addition, in the case of eukaryotes, whole genome sequencing is often performed only for common species. For example, in the case of the spider mite family Tetranychidae, only four assembled genomes (Tetranychus urticae Koch [27], Tetranychus cinnabarinus (Boisduval) [28], Panonychus citri (McGregor), Tetranychus truncatus Ehara) are available in the Sequence Read Archive (SRA). Using whole genome data would likely increase the number of available targets and the number of sequences and non-transcribed regions considered. However, publicly available sequencing resources for spider mites (Acari, Tetranychidae) are limited, with only 13 species having whole genome short-read data deposited in the SRA, compared to 79 species with available RNA-Seq data [29] (according to the NCBI SRA database accessed on 24 December 2024). In a previous study, we performed RNA-Seq analysis on 72 species (73 strains) of spider mites. The resulting data have been made publicly available in DRA/SRA/ERA databases under the accession number DRA007145. Once a method for designing species-specific primers utilizing these RNA-Seq data is established, it could be applied to other taxa.

The purpose of the present study was to develop bioinformatics methods for extracting species-specific primer candidate sequences from the whole-transcriptome RNA-Seq data of spider mites. Spider mites of the family Tetranychidae include some agricultural pest species that cause severe economic losses worldwide [30,31]. Accurate species identification is instrumental for effective pest control, but closely related mite species are difficult to distinguish by morphological characters. In particular, some species of the genus Tetranychus are morphologically similar, differing only in the diameter of the aedeagal knob in males [32]. However, most specimens collected in the field are adult females because of a female-biased sex ratio [33]. To overcome this problem, several molecular methods have been used to distinguish spider mite species [34], such as DNA barcoding [3540], PCR-RFLP [41,42], multiplex PCR [43], and real-time PCR [44]. A limitation with most of these techniques is that they do not work well for discriminating species across more than two genera [34]. In the present study, we developed a method for discriminating species in five genera in the family Tetranychidae. First, species-specific primer candidates were extracted from RNA-Seq assemblies using k-mer counting. We then manually obtained a second primer that was also species-specific (Fig 1). To validate the primers, we performed PCR amplification and confirmed the amplification in the target species and the absence of amplification in non-target species. Finally, species-specific primers were obtained for 17 of the 19 species tested. These species-specific primers may be used for species discrimination by optimizing multiplex PCR.

thumbnail
Fig 1. Primer design methods for the detection of spider mite species using k-mer counting.

https://doi.org/10.1371/journal.pone.0321199.g001

Materials and methods

Spider mites

Nineteen species of spider mites were used in this study (Table 1). The green and red forms of T. urticae were treated as separate species in this study. Mite samples were maintained on leaf discs of common bean (Phaseolus vulgaris L.) or the original host plants placed on a water-saturated polyurethane mat in a plastic dish (90 mm diameter, 20 mm depth) at 25°C under a 16L-8D photoperiod until analysis. Voucher specimens were prepared as described previously [45] and were preserved in the Laboratory of Applied Entomology and Zoology, Faculty of Agriculture, Ibaraki University.

thumbnail
Table 1. Classification and sources of tetranychid mites used in this study.

https://doi.org/10.1371/journal.pone.0321199.t001

Identification of species-specific primers

The protocol for identifying the species-specific primers and PCR validation used in this study has been deposited and published on protocols.io with DOI: dx.doi.org/10.17504/protocols.io.q26g7m78qgwz/v1.

RNA-Sequencing.

RNA-Seq dataset for 16 of the 19 species was reused from our previous work. The BioSample accession number of each RNA-Seq data is associated with the DRA accession number: DRA007145 (https://www.ncbi.nlm.nih.gov/sra/?term=DRA007145, S1 Table). RNA-Seq data for the remaining three species (Eotetranychus nomurai Ehara, Eotetranychus celtis Ehara, Oligonychus castaneae Ehara & Gotoh) were newly acquired. Total RNA was extracted from whole bodies of 100–200 adult females of the same population using an RNeasy Micro Kit (Qiagen, Valencia, CA, USA). Live female individuals for RNA samples and female individuals for voucher specimens were obtained from the same leaf discs and plants. The quantity and quality of the total RNA were evaluated using Agilent RNA ScreenTape System (Agilent Technologies, Santa Clara, CA, USA). The cDNA libraries were prepared from the total RNA using the TruSeq RNA sample prep kit (Illumina, San Diego, CA, USA), and the single ends were sequenced for 75 cycles on the NextSeq 500 sequencing platform (Illumina). All the reads were deposited in the DDBJ Sequence Read Archive. The BioSample accession number of each RNA-Seq dataset is associated with the DRA accession number: DRA018635 (https://www.ncbi.nlm.nih.gov/sra/?term=DRA018635, S1 Table).

de novo assembly.

The workflow of the subsequent bioinformatics analysis is shown in Fig 2. The sequence reads were trimmed by fastx_trimmer of the FASTX-Toolkit [46] (parameter: -f 15) and by fastq_quality_trimmer (parameters: -t 28 and -l 40), and then filtered by fastq_quality_filter (parameters -q 28 and -p 80). The processed sequence reads were assembled into contigs per species by Bridger [47] with the following command: Bridger --seqType fq --output [output_dir] --single [reads.fq] --CPU 16. Contigs with 95% or more similarity were judged as redundant and removed by CD-HIT [48]. The open reading frames (ORFs) were identified by TransDecoder [49].

thumbnail
Fig 2. Workflow diagram for extracting species-specific primers from RNA-Seq data using k-mer counting.

https://doi.org/10.1371/journal.pone.0321199.g002

Identification of putative orthologous genes.

A problem with using RNA-Seq data for primer design is that they don’t include introns, which are present in genomic DNA. Introns in the primer sequence itself or between the primers could prevent amplification. Therefore, we extracted species-specific primers from intronless genes, which were identified from the exon annotations of the T. urticae genome [27]. The ORFs of the 19 species were annotated by FATE [50] with the TBLASTN engine and other default parameters against the intronless genes of the T. urticae genome [27]. A total of 431 intronless genes were annotated, and orthologous ORFs were aligned by DIALIGN-TX [51] with the -L option. Despite recent advances, existing ortholog detection methods still suffer from false-positive and false-negative rates. We therefore constructed a phylogenetic tree using RAxML [52] to differentiate between orthologous and paralogous genes. A gene was classified as a paralog if its phylogenetic relationship distinctly deviated from the established relationships among known spider mite species in the family Tetranychidae [29]. One hundred and sixteen genes that appeared to be paralogs were identified visually. After removing paralogous genes, 315 putative orthologous intronless genes were used for k-mer analysis.

k-mer analysis.

Contigs clustered with CD-HIT, which were assigned to orthologous intronless genes (315 genes), were used for the k-mer analysis. K-mer analysis with k set to 30 was performed by Jellyfish [53] for each species. The command in Jellyfish is: jellyfish count -m 30 -s 10000 -t 4 -o [output_file] [input_fasta].

Extracting unique k-mers.

K-mers obtained from Jellyfish were filtered using the following requirements.

Primer specificity for species-specific detection:

  1. If a k-mer matched another k-mer within the same species or a k-mer of another species, the k-mer was removed.

Tm (primer melting temperature):

  1. If the Tm value of a k-mer was less than 67 or greater than 82, the k-mer was removed.

GC Content: Preferably in the range of 40%–60%. Include 1–2 G or C bases at the 3’ end to improve binding stability; however, avoid excessive GC content at the 3’ end to prevent non-specific binding.

  1. If the GC content of a full-length k-mer (30-mers) was less than 11 or greater than 17, the k-mer was removed.
  2. If the GC content of the former or latter half of a k-mer (15-mers) was less than four or greater than 10, the k-mer was removed.
  3. If both the 5’ and 3’ ends of a k-mer were A or T, the k-mer was removed.
  4. If both the 5’ and 3’ ends of a k-mer contained three or more of G or C, the k-mer was removed.
  5. If both the 5’ and 3’ ends of a k-mer contained three or more of A or T, the k-mer was removed.

Avoid Repeats and Runs: Prevent intra-primer and inter-primer complementarity to reduce primer-dimer formation.

  1. If a k-mer contained four or more runs of a single base (e.g., AAAAA or GGGG), the k-mer was removed.
  2. If a k-mer contained more than three dinucleotide repeats (e.g., ATATAT or CGCGCG), the k-mer was removed.

Filtering k-mers.

The filtering of unique k-mers, which serve as candidates for species-specific primers, was performed using both Bowtie2 [54] and BLASTN [55]. Bowtie2 offers a significant advantage in computational speed, allowing for rapid initial filtering. In contrast, BLASTN permits more mismatches and supports finer matching criteria, providing greater flexibility and accuracy in sequence comparison. By first applying Bowtie2 for preliminary filtering and then refining the reduced set of k-mers with BLASTN, we optimize both computational performance and precision in the filtering process. Extracted unique k-mer sets of 19 species were mapped to contigs of the 19 assemblies using Bowtie2 software. When a k-mer was mapped multiple times to contigs of the same species or matched 26-mers or more to other species, the k-mer was removed. Then, BLASTN (blastn-short) searches were performed between the k-mers filtered by Bowtie2 (query) and contigs of the 19 assemblies after clustering with CD-HIT (database). When a k-mer aligned multiple times or 24–29 bp to a contig of the same species, the k-mer was removed. Furthermore, k-mers that matched contigs of other species by more than 25 bp were removed. The k-mers remaining after filtering by Bowtie2 and BLASTN were considered species-specific primer candidates.

PCR validation

Finding potential primer pairs from alignment.

Computationally identified species-specific k-mers occasionally exhibited unintended sequence matches to other species during sequence alignment checks, reducing their specificity. For these reasons, PCR primers to be paired with species-specific primer candidates were designed from the alignment of the coding sequences (CDSs) and the 5’ and 3’ UTR sequences (Table 2, S2 Table, S1 File). CDSs were aligned using DIALIGN-TX, which considers amino acid sequences to improve alignment accuracy, as described in the section ‘Identification of putative orthologous genes.’ In contrast, 3’ and 5’ UTR sequences were aligned using MAFFT [56], a widely used alignment tool, as considering amino acid sequences is not necessary for these non-coding regions. Selection criteria included maintaining an appropriate distance for amplification and ensuring primer specificity within the transcriptome to minimize cross-amplification. This approach enabled the practical refinement of primer design beyond automated k-mer pairing. In some cases, the position and length of the species-specific primers were modified according to the alignment. To confirm the uniqueness of the species-specific primers, we conducted additional BLASTN searches to compare the species-specific primers with the contigs from the 19 assemblies after clustering with CD-HIT.

thumbnail
Table 2. Species-specific primer sequences and results of PCR amplification.

https://doi.org/10.1371/journal.pone.0321199.t002

DNA extraction and PCR amplification.

Total DNA was extracted from the whole bodies of individual female spider mites using a QIAamp DNA Micro Kit (Qiagen, Valencia, CA, USA). To ensure the quality of the extracted DNA and to provide a positive control for the PCR reactions, we tested the DNA using primer sets known to amplify spider mite DNA. Specifically, we targeted the cytochrome c oxidase subunit I (COI) gene of mitochondrial DNA and the 28S nuclear ribosomal RNA (rRNA) gene [45]. The successful amplification of these markers confirmed that the extracted DNA is of sufficient quality for reliable PCR analysis. The PCR primers are shown in Table 2 and S2 Table. A pair of primers, designed to amplify a single species, was tested against all 19 species used in this study. When a gel band was observed for only a single species, the primer set was considered to be effective in detecting the species. PCR reactions were performed in a 20 μL mixture containing 1 μL of DNA template, 0.4 μL of KOD FX Neo (1 U/μL, Toyobo, Osaka, Japan), 0.6 μL of each primer (10 pmol/μL each), 4 μL of 2 mM dNTPs (Toyobo), 10 μL of 2 × PCR Buffer for KOD FX Neo (Toyobo), and 3.4 μL of distilled water. The PCR cycling parameters were as follows: 2 min of denaturation at 94°C, 35 cycles of 10 sec at 98°C, and 1 min at 68°C. An additional 1 min at 68°C was allowed for final strand elongation. PCR products were visualized by electrophoresis on an agarose gel. A 100-bp DNA ladder (Takara Bio, Shiga, Japan) was used as a molecular size marker.

Sequencing.

The PCR product was sequenced only when the target species was successfully amplified. PCR products were purified using Sephacryl S-300 High Resolution (GE Healthcare, Chicago, IL, USA) and directly sequenced. Sequencing was carried out in both directions using the amplifying primers with the BigDye Terminator Cycle Sequencing Kit v.3.1 (Applied Biosystems, Foster City, CA) and on an ABI 3130xl Genetic Analyzer (Applied Biosystems). Then, BLASTN searches were performed between the sequences of the PCR products (query) and the contigs that contain species-specific primers (database) to confirm sequence similarity. The sequences of the PCR products have been deposited in the DDBJ/EMBL/GenBank International Nucleotide Sequence Databases under accession numbers LC867280 to LC867298.

Results

RNA-Seq data were obtained using RNA extracted from 100–200 adult females of each spider mite species. Pooling individuals can introduce intraspecific variation, which could potentially affect the accuracy of downstream analyses. However, as most RNA-Seq data are derived from transcribed regions, they are less affected by noise from intraspecific variation compared to whole-genome DNA-Seq data, which include non-coding or non-transcribed regions. This makes RNA-Seq data particularly advantageous for detecting species-specific markers. The de novo assembly of the RNA-Seq data used in this study yielded an N50 length of 1,369–2,256 bp and a maximum contig length of 11,286–24,030 bp. These values are comparable to those reported in other spider mite studies [29,57]. Based on this level of assembly quality, we determined that the data were appropriate for robust k-mer analysis. K-mer analysis (Fig 2) provided between 257 and 48,621 species-specific primer candidates for the 19 species (Table 1, S2 Table, and S2 File). Primer pairs were designed by manually examining alignment files to locate positions corresponding to species-specific sequences. Alignments were reviewed to confirm the specificity of candidate primers across the tested species. The primer pairs were initially designed randomly. However, since sequence alignments were carefully examined during the primer design process, additional primers were occasionally designed from the same contig when a particular alignment was deemed highly suitable for primer design. As a result, multiple primers were designed from the same transcript in certain cases, such as XM_015937235.2 and NW_015449938.1, as shown in Table 2 and S2 Table. Finally, 43 PCR primers were designed to target species-specific sequences.

The species specificity of the primers was confirmed by amplification in the target species and the absence of amplification in non-target species. PCR validation showed that 19 of the 43 primer pairs amplified only the target species (Fig 3, Table 2, S2 Table, S3 File, and S1 Raw Images). The remaining 26 primer pairs amplified non-specific products or did not amplify anything. For most species, a species-specific primer pair was successfully developed after multiple cycles of design and testing. However, despite multiple attempts, the specific primer pairs designed for P. citri and the green form of T. urticae failed to amplify the target species specifically. Sequence similarity between the sequences of PCR products and the contigs that contain species-specific primers was greater than 96% in 17 of the 19 cases (Table 2, S2 Table).

thumbnail
Fig 3. Agarose gel showing the results of PCR amplification using species-specific primers.

Lanes 1: P. citri, 2: P. mori, 3: P. ulmi, 4: P. osmanthi, 5: S. shii, 6: E. nomurai, 7: E. celtis, 8: O. castaneae, 9: O. ilicis, 10: O. coffeae, 11: O. gotohi, 12: O. amiensis, 13: T. kanzawai, 14: T. parakanzawai, 15: T. urticae (red-form), 16: T. urticae (green-form), 17: T. truncatus, 18: T. pueraricola, 19: T. piercei. (a) Bands of P. citri (lane 1) and P. osmanthi (lane 4) were amplified with primers Pci6_F1 and Pci6_R1. (b) The band of P. osmanthi (lane 4) was amplified with primers Pos5_F1 and Pos5_R1. (c) The band of E. celtis (lane 7) was amplified with primers Ece3_F1 and Ece3_R1. (d) The band of O. castaneae (lane 8) was amplified with primers Oca4_F1 and Oca4_R1. (e) The band of T. urticae (red form, lane 15) was amplified with primers TurR6_F1 and TurR6_R1. (f) Primers TurG5_F2 and TurG5_R2 did not amplify any products. Additional results are provided in the supplementary data (S3 File).

https://doi.org/10.1371/journal.pone.0321199.g003

Discussion

Our k-mer analysis generated between hundreds and tens of thousands of species-specific primer candidates for each of the 19 species tested (Table 1). As k-mer counting is used for alignment-free phylogenetic inference [5860], k-mer frequency profiles tend to be similar among closely related species. As expected, for closely related species, the number of primer candidates was at the low end of the range. This was the case for 1) T. pueraricola and the green and red forms of T. urticae, 2) Tetranychus kanzawai Kishida and T. parakanzawai Ehara, and 3) Panonychus citri (McGregor) and P. osmanthi Ehara & Gotoh. Nineteen of the 43 primer pairs amplified only the target species, whereas the remaining 26 primer pairs either amplified non-specific products or did not amplify anything. Finally, species-specific primers were obtained for all but two of the 19 species tested: P. citri and the green form of T. urticae (Tables 2 and S2).

Primers unique to the RNA-Seq data may also match sequences in non-transcribed regions of the genome, and therefore, may result in non-targeted or non-specific amplification when genomic DNA is used as the template. For example, in this study, primers specific to the green form of T. urticae amplified products in Tetranychus species other than the green form of T. urticae, and primers specific for P. mori amplified a product in T. kanzawai (S2 Table). We conducted k-mer analyses with 30 bp k-mers (i.e., 30 bp primers), which are longer than the commonly used length (18–24 bp) and therefore provide greater specificity [61]. However, species-specific amplification was not successful for P. citri and the green form of T. urticae. This may be due to the close relationship between the green form of T. urticae, T. pueraricola, and the red form of T. urticae, as well as between P. citri and P. osmanthi. In contrast, species-specific primers were successfully designed and experimentally validated for T. kanzawai and its close relative T. parakanzawai, as well as for O. gotohi and O. amiensis. PCR results for closely related species not tested in this study could differ from the expected outcomes. This limitation is not unique to our method; it also affects other molecular diagnostic techniques, such as PCR-RFLP and real-time PCR, which face challenges in differentiating closely related species. Further studies involving a broader range of species are necessary to improve the reliability and practicality of these approaches for comprehensive species identification.

Of the 43 primer pairs designed in this study, 10 did not amplify any products (S2 Table). Several factors may have contributed to this outcome. One possible explanation is the presence of sequence mismatches at the primer binding sites, caused by genetic variation within or between populations, which may lead to inefficient or failed primer annealing. Additionally, secondary structures within the template DNA may hinder primer binding or polymerase extension. Another contributing factor may be suboptimal primer design parameters, such as melting temperature or GC content, despite careful optimization. To enhance the robustness of primer sets, their efficiency should be tested across a broader range of populations and closely related species.

In summary, we developed a bioinformatics method for extracting species-specific primers from transcriptome RNA-Seq data. Our results show that species-specific primers can be designed using RNA-Seq assemblies even for non-model organisms whose genomes have not been sequenced. These species-specific primers may be used for practical species discrimination using PCR. We identify two ways to improve our method. First, our method requires the manual design of primers to pair with species-specific primer candidate. A program that automatically extracts both primers would be more useful. Second, the method becomes cumbersome when distinguishing larger numbers of species, as each species requires individual PCR reactions for validation. Modifying the method for use with multiplex PCR would make it simpler to use.

Supporting information

S1 Table. Summary of sequence data, de novo assembly, and k-mer analysis.

https://doi.org/10.1371/journal.pone.0321199.s001

(XLSX)

S2 Table. Species-specific primer candidates and results of PCR amplification.

https://doi.org/10.1371/journal.pone.0321199.s002

(XLSX)

S1 File. Aligned sequences in FASTA format.

https://doi.org/10.1371/journal.pone.0321199.s003

(ZIP)

S2 File. Species-specific primer candidates.

https://doi.org/10.1371/journal.pone.0321199.s004

(ZIP)

S3 File. Electropherogram of agarose gel showing PCR amplifications from spider mites.

Lanes 1: P. citri, 2: P. mori, 3: P. ulmi, 4: P. osmanthi, 5: S. shii, 6: E. nomurai, 7: E. celtis, 8: O. castaneae, 9: O. ilicis, 10: O. coffeae, 11: O. gotohi, 12: O. amiensis, 13: T. kanzawai, 14: T. parakanzawai, 15: T. urticae (red form), 16: T. urticae (green form), 17: T. truncatus, 18: T. pueraricola, 19: T. piercei.

https://doi.org/10.1371/journal.pone.0321199.s005

(PDF)

Acknowledgments

We thank Prof. Dr. N. Ogata (Nihon BioData Corporation, Kawasaki, Kanagawa, Japan) for proposing the idea of designing primers using k-mer analysis, and Dr. S. Ohno (Okinawa Prefectural Agricultural Research Center, Ishigaki, Okinawa, Japan) for collecting the spider mites.

References

  1. 1. Mullis KB, Faloona FA. Specific synthesis of DNA in vitro via a polymerase-catalyzed chain reaction. Methods Enzymol. 1987;155:335–50. pmid:3431465
  2. 2. Clark TL, Meinke LJ, Foster JE. PCR–RFLP of the mitochondrial cytochrome oxidase (subunit I) gene provides diagnostic markers for selected Diabrotica species (Coleoptera: Chrysomelidae). Bull Entomol Res. 2001;91(6):419–27.
  3. 3. Asraoui JF, Sayar NP, Knio KM, Smith CA. Fly diversity revealed by PCR-RFLP of mitochondrial DNA. Biochem Mol Biol Educ. 2008;36(5):354–62. pmid:21591219
  4. 4. Chua TH, Chong YV, Lim SH. Species determination of Malaysian Bactrocera pests using PCR-RFLP analyses (Diptera: Tephritidae). Pest Manag Sci. 2010;66(4):379–84. pmid:19946858
  5. 5. Lu W-N, Wu Y-T, Kuo M-H. Development of species-specific primers for the identification of aphids in Taiwan. Appl Entomol Zool. 2008;43(1):91–6.
  6. 6. Zhang T, Wang Y-J, Guo W, Luo D, Wu Y, Kučerová Z, et al. DNA barcoding, species-specific PCR and real-time PCR techniques for the identification of six Tribolium pests of stored products. Sci Rep. 2016;6:28494. pmid:27352804
  7. 7. Zhao Z-H, Cui B-Y, Li Z-H, Jiang F, Yang Q-Q, Kučerová Z, et al. The establishment of species-specific primers for the molecular identification of ten stored-product psocids based on ITS2 rDNA. Sci Rep. 2016;6:21022. pmid:26880378
  8. 8. Ye J, Coulouris G, Zaretskaya I, Cutcutache I, Rozen S, Madden TL. Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction. BMC Bioinformatics. 2012;13:134. pmid:22708584
  9. 9. Yao J, Lin H, Van Deynze A, Doddapaneni H, Francis M, Lemos EGM, et al. PrimerSNP: a web tool for whole-genome selection of allele-specific and common primers of phylogenetically-related bacterial genomic sequences. BMC Microbiol. 2008;8:185. pmid:18937861
  10. 10. Koressaar T, Jõers K, Remm M. Automatic identification of species-specific repetitive DNA sequences and their utilization for detecting microbial organisms. Bioinformatics. 2009;25(11):1349–55. pmid:19357101
  11. 11. Ho C-C, Wu AKL, Tse CWS, Yuen K-Y, Lau SKP, Woo PCY. Automated pangenomic analysis in target selection for PCR detection and identification of bacteria by use of ssGeneFinder Webserver and its application to Salmonella enterica serovar Typhi. J Clin Microbiol. 2012;50(6):1905–11. pmid:22442318
  12. 12. Bekaert M, Teeling EC. UniPrime: a workflow-based platform for improved universal primer design. Nucleic Acids Res. 2008;36(10):e56. pmid:18424794
  13. 13. Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M, et al. Primer3--new capabilities and interfaces. Nucleic Acids Res. 2012;40(15):e115. pmid:22730293
  14. 14. Kalendar R, Khassenov B, Ramankulov Y, Samuilova O, Ivanov KI. FastPCR: an in silico tool for fast primer and probe design and advanced sequence analysis. Genomics. 2017;109(3–4):312–9. pmid:28502701
  15. 15. Miller JR, Koren S, Sutton G. Assembly algorithms for next-generation sequencing data. Genomics. 2010;95(6):315–27. pmid:20211242
  16. 16. Paszkiewicz K, Studholme DJ. De novo assembly of short sequence reads. Brief Bioinform. 2010;11(5):457–72. pmid:20724458
  17. 17. Compeau PEC, Pevzner PA, Tesler G. How to apply de Bruijn graphs to genome assembly. Nat Biotechnol. 2011;29(11):987–91. pmid:22068540
  18. 18. Li X, Waterman MS. Estimating the repeat structure and length of DNA sequences using L-tuples. Genome Res. 2003;13(8):1916–22. pmid:12902383
  19. 19. Williams D, Trimble WL, Shilts M, Meyer F, Ochman H. Rapid quantification of sequence repeats to resolve the size, structure and contents of bacterial genomes. BMC Genomics. 2013;14:537. pmid:23924250
  20. 20. Vurture GW, Sedlazeck FJ, Nattestad M, Underwood CJ, Fang H, Gurtowski J, et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics. 2017;33(14):2202–4. pmid:28369201
  21. 21. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014;15(3):R46. pmid:24580807
  22. 22. Lu J, Breitwieser FP, Thielen P, Salzberg SL. Bracken: estimating species abundance in metagenomics data. PeerJ Comput Sci. 2017;3:e104. pmid:40271438
  23. 23. Ren J, Ahlgren NA, Lu YY, Fuhrman JA, Sun F. VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data. Microbiome. 2017;5(1):69. pmid:28683828
  24. 24. Hysom DA, Naraghi-Arani P, Elsheikh M, Carrillo AC, Williams PL, Gardner SN. Skip the alignment: degenerate, multiplex primer and probe design using K-mer matching instead of alignments. PLoS One. 2012;7(4):e34560. pmid:22485178
  25. 25. Thiyagarajan S, Karhanek M, Akhras M, Davis RW, Pourmand N. PathogenMIPer: a tool for the design of molecular inversion probes to detect multiple pathogens. BMC Bioinformatics. 2006;7:500. pmid:17105657
  26. 26. Thomsen MCF, Hasman H, Westh H, Kaya H, Lund O. RUCS: rapid identification of PCR primers for unique core sequences. Bioinformatics. 2017;33(24):3917–21. pmid:28968748
  27. 27. Grbić M, Van Leeuwen T, Clark RM, Rombauts S, Rouzé P, Grbić V, et al. The genome of Tetranychus urticae reveals herbivorous pest adaptations. Nature. 2011;479(7374):487–92. pmid:22113690
  28. 28. Huo S-M, Yan Z-C, Zhang F, Chen L, Sun J-T, Hoffmann AA, et al. Comparative genome and transcriptome analyses reveal innate differences in response to host plants by two color forms of the two-spotted spider mite Tetranychus urticae. BMC Genomics. 2021;22(1):569. pmid:34301178
  29. 29. Matsuda T, Kozaki T, Ishii K, Gotoh T. Phylogeny of the spider mite sub-family Tetranychinae (Acari: Tetranychidae) inferred from RNA-Seq data. PLoS One. 2018;13(9):e0203136. pmid:30192794
  30. 30. Knapp M, Palevsky E, Rapisarda C. Insect and mite pests. In: Gullino ML, Albajes R, Nicot PC, editors. Integrated pest and disease management in greenhouse crops. Cham: Springer Cham; 2020. pp. 101–46.
  31. 31. Zhang ZQ. Mites of greenhouses: Identification, biology and control. Wallingford: CABI Publishing; 2003.
  32. 32. Ehara S. Revision of the spider mite family Tetranychidae of Japan (Acari, Prostigmata). Spec Div. 1999;4(1):63–141.
  33. 33. Sabelis MW. Life-history evolution of spider mites. In: Schuster R, Murphy PW, editors. The Acari: Reproduction, development and life-history strategies. Dordrecht: Springer Dordrecht; 1991. pp. 23–49.
  34. 34. Razuvaeva AV, Ulyanova EG, Skolotneva ES, Andreeva IV. Species identification of spider mites (Tetranychidae: Tetranychinae): a review of methods. Vavilovskii Zhurnal Genet Selektsii. 2023;27(3):240–9. pmid:37293445
  35. 35. Navajas M, Gutierrez J, Lagnel J, Boursot P. Mitochondrial cytochrome oxidase I in tetranychid mites: a comparison between molecular phylogeny and changes of morphological and life history traits. Bull Entomol Res. 1996;86(4):407–17.
  36. 36. Ben-David T, Melamed S, Gerson U, Morin S. ITS2 sequences as barcodes for identifying and analyzing spider mites (Acari: Tetranychidae). Exp Appl Acarol. 2007;41(3):169–81. pmid:17347920
  37. 37. Ros VID, Breeuwer JAJ. Spider mite (Acari: Tetranychidae) mitochondrial COI phylogeny reviewed: host plant relationships, phylogeography, reproductive parasites and barcoding. Exp Appl Acarol. 2007;42(4):239–62. pmid:17712605
  38. 38. Hinomoto N, Tran DP, Pham AT, Ngoc Le TB, Tajima R, Ohashi K, et al. Identification of spider mites (Acari: Tetranychidae) by DNA sequences: A case study in northern Vietnam. Int J Acarol. 2007;33(1):53–60.
  39. 39. Matsuda T, Hinomoto N, Singh RN, Gotoh T. Molecular-based identification and phylogeny of Oligonychus species (Acari: Tetranychidae). J Econ Entomol. 2012;105(3):1043–50. pmid:22812146
  40. 40. Matsuda T, Fukumoto C, Hinomoto N, Gotoh T. DNA-based identification of spider mites: molecular evidence for cryptic species of the genus Tetranychus (Acari: Tetranychidae). J Econ Entomol. 2013;106(1):463–72. pmid:23448063
  41. 41. Osakabe MH, Kotsubo Y, Tajima R, Hinomoto N. Restriction fragment length polymorphism catalog for molecular identification of Japanese Tetranychus spider mites (Acari: Tetranychidae). J Econ Entomol. 2008;101(4):1167–75.  pmid:18767725
  42. 42. Arimoto M, Satoh M, Uesugi R, Osakabe M. PCR-RFLP analysis for identification of Tetranychus spider mite species (Acari: Tetranychidae). J Econ Entomol. 2013;106(2):661–8. pmid:23786052
  43. 43. Zélé F, Weill M, Magalhães S. Identification of spider-mite species and their endosymbionts using multiplex PCR. Exp Appl Acarol. 2018;74(2):123–38. pmid:29435771
  44. 44. Li D, Fan Q-H, Waite DW, Gunawardana D, George S, Kumarasinghe L. Development and validation of a real-time PCR assay for rapid detection of two-spotted spider mite, Tetranychus urticae (Acari: Tetranychidae). PLoS One. 2015;10(7):e0131887. pmid:26147599
  45. 45. Matsuda T, Morishita M, Hinomoto N, Gotoh T. Phylogenetic analysis of the spider mite sub-family Tetranychinae (Acari: Tetranychidae) based on the mitochondrial COI gene and the 18S and the 5’ end of the 28S rRNA genes indicates that several genera are polyphyletic. PLoS One. 2014;9(10):e108672. pmid:25289639
  46. 46. Gordon A. fastx_toolkit. GitHub. 2014. [cited 2015 Apr 2. ]. Available from: https://github.com/agordon/fastx_toolkit
  47. 47. Chang Z, Li G, Liu J, Zhang Y, Ashby C, Liu D, et al. Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome Biol. 2015;16(1):30. pmid:25723335
  48. 48. Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28(23):3150–2. pmid:23060610
  49. 49. Haas BJ. TransDecoder. GitHub [Internet]. 2018 [cited 22 Apr 2022. ]. Available from: https://github.com/TransDecoder/TransDecoder
  50. 50. Suzuki H. FATE. GitHub [Internet]. 2020 [cited 11 Apr 2022. ]. Available from: https://github.com/Hikoyu/FATE
  51. 51. Subramanian AR, Kaufmann M, Morgenstern B. DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol Biol. 2008;3:6. pmid:18505568
  52. 52. Kozlov AM, Darriba D, Flouri T, Morel B, Stamatakis A. RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics. 2019;35(21):4453–5. pmid:31070718
  53. 53. Marçais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27(6):764–70. pmid:21217122
  54. 54. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–9. pmid:22388286
  55. 55. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–402. pmid:9254694
  56. 56. Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30(4):772–80. pmid:23329690
  57. 57. Bajda S, Dermauw W, Greenhalgh R, Nauen R, Tirry L, Clark RM, et al. Transcriptome profiling of a spirodiclofen susceptible and resistant strain of the European red mite Panonychus ulmi using strand-specific RNA-seq. BMC Genomics. 2015;16:974. pmid:26581334
  58. 58. Bonham-Carter O, Steele J, Bastola D. Alignment-free genetic sequence comparisons: a review of recent approaches by word analysis. Brief Bioinform. 2014;15(6):890–905. pmid:23904502
  59. 59. Fan H, Ives AR, Surget-Groba Y, Cannon CH. An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data. BMC Genomics. 2015;16(1):522. pmid:26169061
  60. 60. Bernard G, Ragan MA, Chan CX. Recapitulating phylogenies using k-mers: from trees to networks. F1000Res. 2016;5:2789. pmid:28105314
  61. 61. Dieffenbach CW, Lowe TM, Dveksler GS. General concepts for PCR primer design. PCR Methods Appl. 1993;3(3):S30–7. pmid:8118394