Evaluation of the ribosomal DNA internal transcribed spacer (ITS), specifically ITS1 and ITS2, for the analysis of fungal diversity by deep sequencing

The nuclear ribosomal DNA internal transcribed spacer (ITS) has been widely used to assess the fungal composition in different environments by deep sequencing. To evaluate the ITS in the analysis of fungal diversity, comparisons of the clustering and taxonomy generated by sequencing with different portions of the whole fragment were conducted in this study. For a total of 83,120 full-length ITS sequences obtained from the UNITE database, it was found that, on average, ITS1 varied more than ITS2 within the kingdom Fungi; this variation included length and GC content variations and polymorphisms, with some polymorphisms specific to particular fungal groups. The taxonomic accuracy for ITS was higher than that for ITS1 or ITS2. The commonly used operational taxonomic unit (OTU) for evaluating fungal diversity and richness assigned several species to a single OTU even with clustering at 99.00% sequence similarity. The clustering and taxonomic capacities did not differ between ITS1 and ITS2. However, the OTU commonality between ITS1 and ITS2 was very low. To test this observation further, 219,741 pyrosequencing reads, including 39,840 full-length ITS sequences, were obtained from 10 soil samples and were clustered into OTUs. The pyrosequencing results agreed with the results of the in silico analysis. ITS1 might overestimate the fungal diversity and richness. Analyses using ITS, ITS1 and ITS2 yielded several different taxa, and the taxonomic preferences for ITS and ITS2 were similar. The results demonstrated that ITS2 alone might be a more suitable marker for revealing the operational taxonomic richness and taxonomy specifics of fungal communities when the full-length ITS is not available.

The nuclear ribosomal DNA internal transcribed spacer (ITS) has been widely used to assess the fungal composition in different environments by deep sequencing. To evaluate the ITS in the analysis of fungal diversity, comparisons of the clustering and taxonomy generated by sequencing with different portions of the whole fragment were conducted in this study. For a total of 83,120 full-length ITS sequences obtained from the UNITE database, it was found that, on average, ITS1 varied more than ITS2 within the kingdom Fungi; this variation included length and GC content variations and polymorphisms, with some polymorphisms specific to particular fungal groups. The taxonomic accuracy for ITS was higher than that for ITS1 or ITS2. The commonly used operational taxonomic unit (OTU) for evaluating fungal diversity and richness assigned several species to a single OTU even with clustering at 99.00% sequence similarity. The clustering and taxonomic capacities did not differ between ITS1 and ITS2. However, the OTU commonality between ITS1 and ITS2 was very low. To test this observation further, 219,741 pyrosequencing reads, including 39,840 fulllength ITS sequences, were obtained from 10 soil samples and were clustered into OTUs. The pyrosequencing results agreed with the results of the in silico analysis. ITS1 might overestimate the fungal diversity and richness. Analyses using ITS, ITS1 and ITS2 yielded several different taxa, and the taxonomic preferences for ITS and ITS2 were similar. The results demonstrated that ITS2 alone might be a more suitable marker for revealing the operational taxonomic richness and taxonomy specifics of fungal communities when the full-length ITS is not available. PLOS

Introduction
Deep sequencing technologies and DNA barcoding are being increasingly applied to catalog and classify biodiversity. With the advent of high-throughput sequencing techniques, also known as next-generation sequencing, it has become feasible to study fungal diversity for both recovery of huge numbers of sequences from different environmental samples and in-depth analyses of fungal diversity at the same time. The nuclear ribosomal DNA (nrDNA) internal transcribed spacer (ITS) has been widely used in both molecular systematics and ecological studies of fungi [1][2][3][4][5][6] and has been selected as the formal barcode marker for fungi [4,7]. Because of its multicopy nature [8], the ITS allows easy amplification from samples containing low DNA concentrations. Furthermore, thousands of ITS sequences of different species are readily available from various online databases including UNITE [9] and the International Nucleotide Sequence Database (GenBank, EMBL and DDBJ), providing a large reference collection for taxonomic classification [10]. However, because of the read length limitations of pyrosequencing or Illumina sequencing, only part of the ITS region is usually used, e.g., ITS1 or ITS2. These subregions have been successfully applied in the characterization of fungal communities in some complex ecosystems by the most widely used high-throughput sequencing (including pyrosequencing and Illumina platforms) and have revealed unexpectedly high fungal diversities [1,[11][12][13][14][15]. The Illumina platforms, HiSeq and MiSeq, have become more widely used for the analysis of fungal composition by increasing the length of reads and reducing costs. ITS1 and ITS2 are likely to be the prime targets for the evaluation of fungal diversity through deep sequencing [16].
There are some controversies in the selection of markers for sequencing. Comparisons between ITS1 and ITS2 for fungal profiles have been assessed in a number of studies. The result that ITS1 was more variable than ITS2 was almost consistent [17][18][19]. Within the ITS region, ITS1 evolved more rapidly and has a more variable length than ITS2 [20]. Some reports revealed that ITS1 and ITS2 yielded similar clustering and taxonomic results [17,18]. However, taxonomic resolution was not equal at different taxonomic levels in terms of taxon identification with ITS1 and ITS2 [17]. Bazzicalupo et al. [17] also found that ITS2 seemed more suitable for revealing the richness of operational taxonomic units (OTUs) in fungal communities. However, Mello et al. [21] and Wang et al. [22] reported that ITS1 was probably the best choice for the study of fungi or eukaryotic species. As pointed out by some researchers, e.g., Nilsson et al. [10], standardization of the selection of particular ITS subregions for sequencing requires further study. Some factors might result in the emergences of conflicts, including length, GC content, interspecific variations, and clustering and taxonomic preferences of the target genes. Some characteristics of ITS1 and ITS2 might be limitations of their use in the fungal community.
In previous studies [17-19, 21, 22], comparisons of ITS1 and ITS2 were conducted mainly based on ITS1 and ITS2 sequencing separately or extracting ITS1 and ITS2 from the ITS database. Not all the ITS1 and ITS2 were from the same ITS. In our study, ITS1 and ITS2, regardless of insilico or pyrosequencing datasets, were all extracted from the same ITS, which might be more suitable to evaluate different portions of the ITS for investigating fungal communities and to clarify the divergences between them. In addition, most of the studies were focused on the influence of alpha diversity and beta diversity. The composition of each cluster (contained sequences) was not considered. This study attempted to examine the similarity in clustering using different portions of ITS sequences from existing in silico databases and pyrosequencing data. We hypothesized that the capacity of placing sequences into OTUs was different between ITS1 and ITS2.

Materials and methods
In silico analysis Database building. The UNITE database containing ITS sequences [9] was downloaded from the web site http://unite.ut.ee/repository.php, which included high-quality sequences from GenBank, EMBL and DDBJ. Only sequences with the complete ITS region and no ambiguous bases were retained for the analysis. The downloaded data formed the Fungi_insili-coITS database and was split into different groups at the phylum (Ascomycota, Basidiomycota, Chytridiomycota, Glomeromycota, Zygomycota) and subphylum (Pezizomycotina, Taphrinomycotina, Saccharomycotina, Agaricomycotina, Pucciniomycotina, Ustilaginomycotina) levels, forming a set of insilicoITS databases, as listed in S1 Table. The complete ITS1 and ITS2 sequences were extracted from each sequence using the program ITSx to create separate insili-coITS1 and insilicoITS2 databases for each phylum and subphylum (S1 Table). The full length ITS, ITS1 and ITS2 with full latin binomials were used for taxonomic resolution. All the resulting databases are listed in S1 Table. Pyrosequencing analysis Soil sampling, DNA isolation, PCR amplification and pyrosequencing. Ten soil samples were collected from an experimental site (34˚28 0 05.60@ N, 99˚51 0 59.09@ E) dominated by alpine shrublands and meadows. Based on observations over the years by local inhabitants and survey teams regarding the presence or absence of stromata of Ophiocordyceps sinensis, the 10 samples were divided into 3 types: Os, O. sinensis present; NOs, O. sinensis absent; MP, mycelial pellicle with soil particles firmly wrapping the sclerotia of O. sinensis (covered by the larval skeleton). The soil cores were sampled from the top 20 cm using a stainless steel cylindrical drill with a diameter of 5 cm, and the samples were stored at -20˚C in a portable electrical freezer. Even when unplugged, the portable freezer could maintain -20˚C for up to 12 hours, such as on an airplane. After transport to the laboratory, the soil samples were passed through a 2-mm sieve to remove plant tissues, roots, rocks, etc., and were stored at -20˚C prior to further experiments.
The total genomic DNA was extracted using the Powersoil DNA Isolation Kit (MoBio, Carlsbad, CA, USA) following the manufacturer's instructions. The DNA concentration was quantified on a NanoDrop spectrophotometer (Thermo Scientific, Worcester, MA, USA). The fungal ITS region was amplified using the primers ITS5 (5'-GGAAGTAAAAGTCGTAACAAGG-3') and ITS4 (5'-ATCCTCCGCTTATTGATATGC-3') [23]. The 5' end of the primer ITS4 was tagged with a 6 bp barcode. The PCR mixtures were as follows: 4 μl of 5× FastPfu Buffer, 1 μl of each primer (5 μM), 2 μl of dNTP mixture (2.5 mM), 2 μl of template DNA and 10 μl of H 2 O. The thermocycling conditions consisted of an initial denaturation at 95˚C for 2 min, followed by 30 cycles at 95˚C for 30 sec, 55˚C for 30 sec, 72˚C for 30 sec and a final extraction at 72˚C for 5 min. Three separate reactions were conducted to account for potentially heterogeneous amplification from the environmental template for each sample. The PCR products were purified using the AXYGEN Gel Extraction Kit and were quantified using a NanoDrop (Wilmington, DE). The PCR concentrations ranged from 0.80 ng/μl to 7.6 ng/μl using a NanoDrop.An equimolar mix of all three amplicon libraries was used for pyrosequencing. Each of the 3 mixed PCR products was diluted to 0.80 ng/μl for pyrosequencing. Pyrosequencing was performed with the primer ITS4 from the 5' end of the entire ITS on a 454 Life Sciences GS FLX system (Roche Applied Biosystems, Nutley, NJ, USA) at Allwegene Company (Beijing).
The raw sequencing reads were initially trimmed using MOTHUR [24] and the sequences were removed according to the criterions: shorter than 60 bp, quality score � 30, contained ambiguous bases or did not exactly match to primer sequences and barcode tags. Full-length ITS, ITS1 and ITS2 sequences were extracted from the trimmed reads using the program ITSx [25] to form databases hereafter referred to as PyroITS, PyroITS1, and PyroITS2 (S1 and S3 Tables). All the sequences have been deposited in the NCBI Sequence Read Archive (SRA) under accession number SRP126914.
Length variability, clustering and taxonomy. The lengths of ITS, ITS1 and ITS2 (for both the insilico and Pyro databases) were calculated by Mothur. The sequences in each database were clustered into OTUs at the 91-99% similarity levels using UCLUST [26], and the clusters were saved as OTU files associated with the database. The Chao and Shannon fungal richness and diversity indices were calculated using Mothur. The taxonomy analysis was conducted using Blast against the UNITE database. The clustering tree was constructed from OTUs abundances in each sample using R. The significant differences between different groups were tested using ANOVA test.
The similarity of each OTU was assessed by checking the commonality of the sequences contained in the same OTU in the ITS, ITS1 and ITS2 databases of the same series. In order to find the same OTU in the three datasets, searching the representative sequence for one OTU in other two clustering results from other two databases was performed. The commonality analysis was conducted with the equation below: A: the similarity between the two databases N: the number of reads in database 1 n: the number of OTUs in database 1 m: the number of identical sequences in the same OTU in database 1 and database 2

Statistic tests
The significance of the differences between the identification success rates of ITS, ITS1 and ITS2 was tested using Fisher's exact test. The significances for the differences between sequence lengths and GC content of ITS1 and ITS2 were tested using Student's t-test. The commonality of OTUs at sequence similarities 91-99% was tested using comparing t-test. The different taxa presented in different groups (Os and Nos) were tested by Mann Whitney u test. Tests were carried out using R, and P � 0.05 was considered statistically significant.
Length variation of ITS1 and ITS2 in different groups. The length of the entire ITS ranged from 260 bp to 1,794 bp, with an average length of 517 bp (Fig 1, S1 Table). The ITS lengths in Basidiomycota, Chytridiomycota and Zygomycota were longer than those in Ascomycota and Glomeromycota (Fig 1, S1 Table). The ITS length was the shortest in subphyla Taphrinomycotina and Saccharomycotina (447 bp and 454 bp, respectively) (S1 Table).
ITS2 was longer than ITS1 in all the 5 phyla (p<0.001). The length of the extracted ITS1 portions ranged from 9 bp to 1181 bp, with an average length of 177 bp (S1 Table), and the length of the extracted ITS2 portions ranged from 14 bp to 730 bp, with an average length of 182 bp, among the fungi (S1 Table). At the subphylum level, ITS1 was longer than ITS2 in all subphyla of Ascomycota except for Taphrinomycotina (Fig 1B). Both ITS1 and ITS2 were shorter in Ascomycota than in the other fungal phyla (Basidiomycota, Chytridiomycota, Glomeromycota and Zygomycota, S1 Table). The size differential between ITS1 and ITS2 was greater in Glomeromycota than in Ascomycota, Basidiomycota, Zygomycota or Chytridiomycota ( Fig 1A).
The length of ITS1 had a broader range of variation (S1 and S2 Tables). The sequences with a length longer than 600 bp or shorter than 100 bp were fewer in all the ITS2 datasets than in the Fungi kingdom, Ascomycota, Basidiomycota, Zygomycota, Glomeromycota, Agaricomycotina, Pezizomycotina, Pucciniomycotina, Saccharomycotina and Ustilaginomycotina ITS1 datasets (S2 Table). In Chytridiomycota and Ustilaginomycotina, the percentage of ITS2 was higher than that of ITS1. The highest rates of ITS1 and ITS2 were 28.08% and 23.68%, respectively, in Saccharomycotina.
Fungal diversity and clustering commonality using different fragments. The fungal diversity and richness were evaluated by the Chao and Shannon indices. Clustering the different ITS portions from the Fungi_insilico databases Fungi_insilicoITS, Fungi_insilicoITS1 and Fungi_insilicoITS2 at 97% sequence similarity resulted in 16,554, 17,394, and 17,210 OTUs, respectively (S1 Table). Chao and Shannon indices were not significantly different between ITS, ITS1 and ITS2 (comparing t-test. P>0.05). However, the Chao and Shannon indices were higher in the Fungi_insilicoITS1 dataset (32,259 and 8.44, respectively) than in the Fungi_insi-licoITS (30,788 and 8.35) and Fungi_insilicoITS2 (31,479 and 8.41) datasets. The same patterns of these diversity indices were presented in phyla Ascomycota, Zygomycota and in subphyla Saccharomycotina, Pucciniomycotina and Ustilaginomycotina. In contrast, the diversity and richness indices were higher in the ITS2 dataset for phylum Basidiomycota, Glomeromycota and Chytridiomycota and for subphyla Taphrinomycotina and Agaricomycotina (S1 Table).
For the insilico databases, the results from the commonality analyses of the Fungi_insili-coITS1 and Fungi_insilicoITS databases were the same as those of the Fungi_insilicoITS2 and Fungi_insilicoITS databases (p>0.05, S2 Table). The similarity of Fungi_insilicoITS1 and Fun-gi_insilicoITS2 was 70-78% when clustered into OTUs at 97-98% sequence similarity (S3 Table). As seen in Fig 3, there were several species in one OTU. The average number of species in each OTU for ITS at 97% sequence similarity was 5.37, while the average number of species in each OTU for ITS1 and ITS2 were 4.89 and 5.05, respectively (Fig 3). Even at 99% sequence similarity, the three datasets (ITS, ITS1 and ITS2) had 3.71, 3.60 and 3.69 species, respectively, in one OTU (Fig 3). There were more different species in one OTU in the ITS dataset than in the ITS1 or ITS2 datasets.
Taxonomic resolution of ITS, ITS1 and ITS2. Comparisons between the different portions of the ITS in terms of the resolution at which the reads were placed into taxa were also conducted. When blasted against the UNITE database, 53.27% of the ITS sequences returned themselves (the same accession number), which was a much higher percentage than for ITS1 (35.55%) and ITS2 (35.73%) (Fig 4). Taxonomic annotation is one of the most crucial steps in the identification of the fungal community, and it is important to annotate each sequence correctly. The queries used for blasting hit sequences with different accession numbers might belong to the same species. The number of the full length ITS sequences with full latin binomials (at species level) was 41,049 (The detailed information listed in S5 Table). The ITS analysis placed 78.00% of the queries into the same taxonomic groups, compared with only 65.16% and 64.72% for the ITS1 and ITS2 analyses, respectively (Fig 4). However, the resolution between ITS1 and ITS2 was not different in the placement of the reads into taxa. In addition, the same results were obtained at the phylum and subphylum levels.

Pyrosequencing analysis
Pyrosequencing. A total of 219,741 reads were recovered from pyrosequencing, with various lengths ranging from 10-747 bp. Among the reads, 34,398 were full-length ITS sequences.
Length variation and GC content of ITS1 and ITS2. For the 454 pyrosequencing data, the length of the extracted ITS1 fragments ranged from 58 bp to 278 bp, with an average length of 157 bp, and the length of the extracted ITS2 fragments ranged from 104 bp to 292 bp, with an average length of 158 bp. The length of ITS1 varied more than that of ITS2 in these reads. The GC content of the Pyro_ITS2 sequences was significantly higher (55.37%) than that of the ITS1 and ITS (50.35%) sequences.
Fungal diversity and richness in different samples. A total of 1377, 2895 and 1121 OTUs were generated from the PyroITS, PyroITS1, PyroITS2 databases, respectively, at 97% sequence similarity by UCLUST. The richness calculated based on the PyroITS1 database was much higher than that based on the PyroITS and PyroITS2 databases. The Chao indices were 347, 725 and 260 for the PyroITS, PyroITS1, and PyroITS2 databases, respectively (Fig 5). In addition, the Shannon diversity index was different between the PyroITS1 and PyroITS2 databases. The results revealed that the fungal diversity and richness might be overestimated when using ITS1 as the sequencing target.
There were no differences between the Nos and Os type soils in terms of richness and diversity, and the richness and diversity of both of these were much higher than those of the MP type soils (Fig 6). The relationships between different samples in terms of diversity and richness were not influenced by the sequencing genes.
Cluster commonality using different fragments. For the pyrosequencing data, the assessment of the cluster similarity between the different databases (PyroITS, PyroITS1 and PyroITS2 databases, the sequences from the 10 samples pooled together) showed that the OTU commonality between PyroITS and PyroITS2 was higher (p<0.05), reaching 55.1% at 97% similarity, than that between PyroITS and PyroITS1 (26.8% at 97% similarity) (S6 Table) at 97% similarity. Furthermore, the commonality between the PyroITS1 and PyroITS2 databases at 97% similarity was only 28.2% (S7 Table).
A hierarchical clustering analysis was performed based on the OTU compositions of different samples. All three results revealed that the fungal beta-diversity in the different soils (Os, Nos and MP) was not different when analyzed with ANOVA. However, the details of the Evaluation of ITS1 and ITS2 for fungal diversity relationships between the samples were different. As shown in Figs 7 and 3 clusters were formed in PyroITS and PyroITS2. The differences mainly from one sample Os4. Os4 distributed in cluster II in PyroITS and OS4 in cluster I in PyroITS2. However, only 2 clusters formed in PyroITS1. Only 2 groups were clustered for ITS1 database: Os2, Os4, Os5 and Os6 clustered together; and Os1, Os3, Nos1, Nos3 and MP clustered together in the other branch (Fig 7). The clustering for ITS2 was much more similar to that for ITS than that for ITS1 (Fig 7).
The total fungal composition in the PyroITS, PyroITS1 and PyroITS databases. For ITS, ITS1 and ITS2, the percentage of the reads in the PyroITS database assigned to named taxa was higher (ranging from 99.28% at the phylum level to 59.21% at the genus level) than that in the PyroITS1 database (ranging from 92.50% at the phylum level to 59.54% at the genus level) or the PyroITS2 database (ranging from 92.90% at phylum level to 57.69% at genus level) ( Table 1). At the phylum level, the reads belonging to Ascomycota and Basidiomycota accounted for more than 90% of the total sequences (S1 Fig). Dothideomycetes, Sordariomycetes, Leotiomycetes and Eurotiomycetes were the dominant classes in all the samples; in total, these classes represented 81.73%, 76.80% and 77.01% of the reads in the PyroITS, PyroITS1 and PyroITS2 databases, respectively (S1 Fig). Evaluation of ITS1 and ITS2 for fungal diversity Several different taxa were obtained when different target genes were used (ITS, ITS1 and ITS2), and the taxonomic preferences in the PyroITS and PyroITS2 databases were similar. After blasting against the UNITE database, 2, 3, 6, 12 and 34 different taxa were represented among the 3 databases at the phylum, class, order, family and genus levels, respectively (S7 Table). At the phylum level, more Chytridiomycota were targeted by ITS than by ITS1 or ITS2 (S7 Table). The percentages of Tremellomycetes in the PyroITS and PyroITS2 databases were 3.61% and 3.49%, respectively; these percentages were much higher than those in the PyroITS1 database (1.42%) at the class level. At the genus level, the percentages of Peyronellaea and Microscypha in the PyroITS database were 5.72% and 1.59%, respectively; these sequences were absent in the PyroITS1 and PyroITS2 databases. Nectria, Dioszegia, Dactylonectria, Cladosporium and Holtermanniella were nearly parallel in the PyroITS and PyroITS2 databases. Evaluation of ITS1 and ITS2 for fungal diversity The PyroITS1 database was more biased toward Paraphoma and Xenodidymella than were the PyroITS and PyroITS2 databases (S8 Table).
The fungal composition in different types of soil samples. After statistical analysis (mann whitney u test), the following taxa were found to be associated with Nos samples: in the PyroITS database, Rozellomycota_cls_Incertae_sedis at the class level, GS07 at the order level, and Comoclathris and Tumularia at the genus level (Table 2); in the PyroITS2 database, Pleomassariaceae at the family level; and in the PyroITS1 database, Geomyces, Clavariopsis and Ophiosphaerella at the genus level (Table 2). Auriculariales was possibly correlated with MP samples at the order level in the PyroITS2 database ( Table 2). The taxa associated with Os samples were Comoclathris in the PyroITS and PyroITS2 databases and Geomyces in the PyroITS1 database (Table 2).

Discussion
In this study, comparisons of clustering and taxonomy between different portions of the ITS were conducted based on online database (in silico) analyses and pyrosequencing reads (pyro). The results revealed that the clustering and taxonomy for ITS2 were more similar to those for ITS than to those for ITS1. The shorter length, lower GC content variation and greater taxonomic information content of ITS2 might make it more suitable than ITS1 for deep sequencing studies on fungal communities.  Some reports have shown that ITS1 and ITS2 generated similar patterns of community structure when used as DNA metabarcodes [17][18][19]. However, the commonality between ITS1 and ITS2 was very low in this study, especially in the pyrosequencing analyses; the similarity between the OTUs generated from the clustering of ITS1 and ITS2 at 97% similarity was only 28.20%. A possible explanation for the low similarity between ITS1 and ITS2 and the discrepancy from other studies [17][18][19]21] is that the sequence compositions of each OTU were not considered, and ITS1 and ITS2 were sequenced separately. Another finding was that several species were present in one OTU [18]. In accordance with some fungal analysis methods, the representive sequences were used for taxonomic blasting. OTUs could be used to evaluate the fungal diversity in environmental samples. But, caution must be exercised when unveiling fungal community composition at the species level using ITS1 or ITS2.
In past years, there has not been consistent agreement about the selection of ITS1 or ITS2 in studies of fungal diversity. Nilsson et al. [27], Ihrmark et al. [28] and Alanagre et al. [29] stated that ITS2 was the better choice for 454 pyrosequencing or sequencing with Illumina platforms. However, Wang et al. [22] showed that ITS1 might be a better taxonomic DNA barcode than ITS2 in eukaryotes, based on in silico analyses. There are two possible explanations for this contradiction. First, the sequences used in Wang et al.'s study belonged to all eukaryotes, including fungi, plants and animals. In the present study, only fungal sequences were studied in detail, including Ascomycota, Basidiomycota, Chytridiomycota, Glomeromycota, Zygomycota and so on. Second, Wang et al. [22] mainly focused on species identification based on the annotations of the sequences stored in the NCBI database. In this study, not only were species identifications validated, but OTU clustering based on the ITS, ITS1 and ITS2 sequences was also considered; this method negated the effect of erroneous full Latin binomials.
In the present study, the intraspecific and interspecific variations of ITS1 were much higher than those of ITS and ITS2, in agreement with the results of previous studies [17,18,21,27]. The clustering results obtained with ITS2 at 97% similarity might be the same as those obtained with ITS1 at 98 or 99% similarity. As seen in the present study, the variation in length was greater for ITS1 than for ITS2 (Table 1, Fig 1) and more ITS1 sequences were shorter than 100 bp or longer than 600 bp; for example, most Glomeromycota ITS1 sequences were shorter than 100 bp. The length of ITS1 was more variable, likely due to the intron frequency [10,20,27,30]. The length variation and intron frequency might lead to unintended or inaccurate clustering or taxonomic placement [31]. In many studies, reads with a length of 100-150 bp are removed [18,19]. These short reads that are filtered out might contain important taxonomic information. In addition, Illumina PE300 platforms can only cover sequences with a maximum length of 600 bp, longer sequences might not be overlapped in downstream analyses. More ITS2 sequences could pass through the filtering processing step.
It is usually difficult to amplify PCR products from templates with a high GC content compared to non-GC-rich templates [32]. The lowest GC content in the genomes of some species has been reported to be close to 20% [33,34]. The GC content cutoffs were set at 20% and 80%. Although the GC content of ITS2 was slightly higher than that of ITS1, fewer sequences were filtered out with this criterion (<20% and >80%). ITS2 might have a positive effect on PCR and sequencing efficiencies.
In addition, some potential amplification biases might introduce by various commonly utilized ITS primers during amplification. An in silico study to evaluate PCR biases by different primers revealed that some of the ITS primers had a high proportion of mismatches relative to the target sequences (ITS1 or ITS2) and introduce taxonomic biases during PCR, e.g. the primers ITS1-F, ITS1 and ITS5 biased towards amplification of Basidiomyceta, whereas others, the primers ITS2, ITS3 and ITS4 biased towards Ascomyceta [35]. However, a new primer pair covered ITS2 region was designed in 2016, 5.8S-Fun and ITS4-Fun. Both of the primers had high coverage (nearly 100%) for Fungi but lower coverage for some other eukaryote [36]. The suitable primers made ITS2 to be a more accepted regions to study environmental samples.
This study highlights the issue that the clustering of ITS1 and ITS2 in different taxa is variable and might generate different results when the sequences of ITS subregions are used as DNA metabarcodes for deep sequencing studies on fungi. Careful attention must be devoted to the selection of sequencing markers and taxonomic processing. Classifications at the species levels are not recommended. ITS2 might be the most suitable marker for fungal diversity because of its shorter length, lower GC variation, greater abundance of references in public databases, broader selection of lineage-specific primers and longer portion of its length that can provide taxonomic information.