Sequencer-Based Capillary Gel Electrophoresis (SCGE) Targeting the rDNA Internal Transcribed Spacer (ITS) Regions for Accurate Identification of Clinically Important Yeast Species

Accurate species identification of Candida, Cryptococcus, Trichosporon and other yeast pathogens is important for clinical management. In the present study, we developed and evaluated a yeast species identification scheme by determining the rDNA internal transcribed spacer (ITS) region length types (LTs) using a sequencer-based capillary gel electrophoresis (SCGE) approach. A total of 156 yeast isolates encompassing 32 species were first used to establish a reference SCGE ITS LT database. Evaluation of the ITS LT database was then performed on (i) a separate set of (n = 97) clinical isolates by SCGE, and (ii) 41 isolates of 41 additional yeast species from GenBank by in silico analysis. Of 156 isolates used to build the reference database, 41 ITS LTs were identified, which correctly identified 29 of the 32 (90.6%) species, with the exception of Trichosporon asahii, Trichosporon japonicum and Trichosporon asteroides. In addition, eight of the 32 species revealed different electropherograms and were subtyped into 2–3 different ITS LTs each. Of the 97 test isolates used to evaluate the ITS LT scheme, 96 (99.0%) were correctly identified to species level, with the remaining isolate having a novel ITS LT. Of the additional 41 isolates for in silico analysis, none was misidentified by the ITS LT database except for Trichosporon mucoides whose ITS LT profile was identical to that of Trichosporon dermatis. In conclusion, yeast identification by the present SCGE ITS LT assay is a fast, reproducible and accurate alternative for the identification of clinically important yeasts with the exception of Trichosporon species.


Introduction
Fungi are a major cause of human disease especially in patients with immune compromise and serious underlying disease [1]. Candida species are the leading cause of fungemia, which in the USA, is ranked as the third most common cause of bloodstream infection in ICU patients with a crude mortality of 47.1% [2]. However, serious infections due to non-Candida yeasts, including Cryptococcus, Trichosporon, Rhodotorula and other previously rare species, are also increasingly [3][4][5][6][7]. Accurate and timely species identification is important for clinical management of patients because these yeast species have different antifungal susceptibilities and an increasing number e.g. Candida glabrata, Candida krusei and Trichosporon species, are resistant or less susceptible to many antifungal agents [8,9].
The limitations of conventional phenotypic methods for yeast identification are well known. Currently, rDNA gene complex sequencing targeted at the internal transcribed spacer (ITS) regions is considered as the "gold-standard" method for identifying yeast species [10][11][12][13]. However, this approach requires time for the sequencing process, is expensive and dependent on public sequence repositories which are not curated and contain errors [14]. Other molecular identification methods include reverse line blot (RLB) assays, rolling circle amplification (RCA) [9,15,16] and pyrosequencing [17,18]. Proteomic approaches such as matrixassisted laser desorption/ionization time of flight mass spectrometry (MALDI-TOF MS) also lends itself as an identification tool [19]. As a simple and less expensive alternative to ITS sequencing, another molecular identification method, sequencer-based capillary gel electrophoresis (SCGE) systems, has been reported to have good accuracy, reproducibility and interlaboratory consistency, whilst retaining flexibility [20,21].
In the present study, we applied SCGE to develop an identification scheme for the major pathogenic yeast species. We constructed an in-house database of rDNA ITS length types (LTs) of a large number (n = 156) and broad range of yeast species (n = 32) by SCGE. We then evaluated the performance of this LT library for its ability to identify a separate and unrelated set of 97 (nine species) clinical yeast isolates. To broaden the scope of the evaluation, and to examine for potential "cross-identifications" or species misidentifications, we also performed in silico analysis of the combined ITS1 and ITS sequences of 41 additional yeast species against the in-house database. The results of all SCGE-based identifications were compared against the definitive identification provide by a combined phenotypic and molecular approach established in our laboratory [9,13,22].

Ethics
The study was approved by the Human Research Ethics Committee of Peking Union Medical College Hospital (No. S-263). Written informed consent was obtained from patients for the use of the samples in research.

Yeast isolates
(i) Database build set: A total of 156 isolates were used to construct the ITS LT database. These comprised five reference strains (Candida parapsilosis sensu stricto ATCC22019, C. krusei ATCC6258, Candida guilliermondii ATCC6260, Candida albicans ATCC90028 and Trichosporon asahii CBS2479) and 151 clinical isolates. The clinical isolates were part of the culture collection of the National China Hospital Invasive Fungal Surveillance Net (CHIF-NET) 2010 and 2011 [10,13,23] and encompassed 32 yeast species: Candida (21 species), Cryptococcus (four species), Trichosporon (four species) and other yeasts (three species) (see Table 1). All isolates were identified by MALDI-TOF MS (Vitek MS, bioMérieux, Marcy l'Etoile, France, database version IVD 2.0) according to the manufacturer's instructions. Any isolates with unsatisfied identification confidence values (<99.9%), or with "no identification" results, were further identified by sequencing of the ITS region as described by Zhang et al. [13]. For non-Trichosporon isolates, this identification was taken as the definitive identification result. Definitive identification of Trichosporon species was provided by sequencing of the intergenic spacer 1 (IGS1) region [9].
(ii) Clinical evaluation set: A "test" set of 97 clinical isolates cultured from patients with invasive fungal diseases (IFDs) as part of routine care at the Peking Union Medical College Hospital (PUMCH) in 2013 ( Table 2) were studied. Their identification was challenged the inhouse built SCGE ITS LT database. None of the isolates used in this evaluation were employed to establish the database.
(iii) GenBank in silico analysis "test" set: In addition, 41 ITS sequences of 41 additional yeast species that were not involved in development of the ITS LT database were studied   (Table 3). Their sequences were downloaded from the GenBank database for in silico analysis against the SCGE LT database to determine potential misidentifications.

Establishment of the ITS SCGE identification LT database
PCR fragment analysis was performed as before [20]. Generally, the ABI 3730xl DNA Analyzer (Applied Biosystems, Foster City, CA) was used employing a 96-capillary 50-cm POP-7 gel. The sample was added into the internal target (96-well plate), followed by vibration (500-1000 rpm for 1 min) and blending, denatured at 95°C for 5 min, rapidly cooled on the ice, and then centrifuged at 3700 rpm for 1 min. The supernatant was injected and the sample injection was carried out at 1.6 kV over 15 s with a total running time of 6,200 s at 15-kV run voltage. A 20to 1,200-bp LIZ 1200 ladder (Chimerx, Madison, WI) was used as an internal marker for each sample.
The size of each peak was determined using GeneMarker software (Version 2.2.0, SoftGenetics, PA, USA). Double peaks were counted only if they were separated by more than 1 bp, otherwise only the highest peak was counted ( Fig 1B). Between different strains, amplicons within ±0.5 bp differences was supposed to have the same length. LTs were defined by the combination of SCGE ITS1 and ITS amplicon lengths. Thereafter, the LTs and their matching "species identification" were recorded into an electronic database. The nomenclature of different ITS LTs was as follows: three lowercase Roman letters representing species, dash followed by an Arabic numeral representing potential length subtypes within each species. For example, "cor-2" represented species Candida orthopsilosis, ITS LT subtype 2 in the database. One to four isolates of each LT were further selected to determine the physical lengths of the ITS1 and full-length ITS regions, by DNA sequencing, and to compare these with corresponding SCGE-derived lengths. The full-length ITS regions were amplified by primer pair ITS1/ ITS4 as previously described [10] and sequenced from both directions. The physical lengths of the ITS1 and ITS regions (including primers) were analyzed by CLC Sequencing Viewer 7.5 (QIAGEN, Dusseldorf, Germany). The full-length ITS sequences were then used for phylogenetic analysis by the maximum-likelihood algorithm with 1000 bootstrap replication to ensure robustness using MEGA software (version 6.0, MEGA Inc., Englewood, NJ).

SCGE reproducibility
SCGE was performed in triplicate on three separate occasions by three different technicians using fresh yeast subcultures for the five reference strains, and four isolates of rare Candida or non-Candida yeast species (Candida rugosa, Trichosporon japonicum, Trichosporon asteroides and Trichosporon dermatis).

Clinical and in silico evaluation of the SCGE ITS LT database
Ninety-seven test isolates (seven Candida species, and two Cryptococcus species were used to evaluate the performance of the SCGE ITS LT database for yeast identification (Table 2). When a particular strain's ITS LT exactly (with <0.5 bp difference) matched a recorded reference ITS LT in the database, it was designated as the corresponding species (Fig 1).
Because the ITS LT database did not cover all known species of yeasts, further, the ITS1 and ITS sequence length for 41 isolates of 41 additional yeast species, which were not included in the ITS LT database developed, were obtained from GenBank ( Table 3). The ITS1 and ITS length of these isolates were in silico calculated, and queried against the developed ITS LT database to examine potential misidentifications.  Fig 1A, structure of yeast ITS region and primers used in this study (showed in arrows with dashed lines). The duplex PCR using primers ITS1-FAM, ITS2 and ITS4 will theoretically generate two amplicons (for ITS1 and full-length ITS region, respectively). As the forward primer ITS1 was 5'-end FAM labeled, the amplicons can be observed by SCGE examination (peaks "a" and "b" in Fig 1B). B. an example of interpreting the SCGE results (strain 12HX414). Peaks "a" and "b" represent amplicons of ITS1 and full-length ITS region, respectively. As peak "a" and "a'" were separated by less than 1 bp, only peak "a" was counted (see Methods section).

SCGE analysis of ITS1 and ITS amplicons
Duplex amplification of the ITS1 and ITS regions yielded positive PCR products in all 156 isolates for the ITS LT database development. Calling of the sizes of the ITS1 and ITS regions by SCGE are shown in Table 1 and Fig 1A. A total of 35 and 37 different amplicon lengths were obtained for the ITS1 and ITS region, respectively. The average amplicon lengths ranged from 135.5 bp to 481.3 bp for the ITS1 region and from 356.3 bp to 881.3 bp for the ITS region, with the standard deviation for different lengths ranging from 0-0.3 bp (Table 1). Of note, different physical lengths of the ITS1 and ITS regions obtained by DNA sequencing (Fig 1A) were clearly separated by SCGE, and amplicons with the same SCGE lengths had unique physical lengths ( Table 1).  The amplicon sizes has been rounded up to the nearest whole number. Fig 2A, the whole database. Fig 2B, LTs of different species that had the same-length ITS1 or ITS region amplicons. In Fig 2A and 2B, gel-bands in blue or red represent amplicon of the ITS1 region, while gel-bands in purple or pink represent amplicon of the ITS region. Gel-bands in red or pink indicated the length of ITS1 or ITS amplicons were shared by different species.  (Table 1). Meanwhile, the full-length ITS region was identical between all Cryptococcus curvatus and all T. dermatis isolates (526.2 bp vs. 526.4±0.1 bp, Fig 2B Group IV), Cryptococcus laurentii and the ITS LT cal-1 C. albicans isolates (533.0 bp vs. 533.0±0.2 bp, Fig 2B Group V), and T. asahii, T. japonicum and T. asteroides isolates (538. 2±0.1 bp to 538.4±0.1 bp, Fig 2B Group VI) (Table 1).

Buildup of the SCGE ITS LT database
Based on the combination of ITS1 and ITS region lengths called by SCGE, 41 ITS LTs were identified, which was able to identify 94.9% (148/156) isolates of 90.6% (29/32) species used for SCGE ITS LT database development, only except for isolates of T. asahii, T. japonicum and T. asteroides (Fig 2B Group VI, Table 1). Of note, the ITS LTs were able to distinguish the following genetically closely related species within species complexes e.g. C. parapsilosis sensu stricto, C. metapsilosis, C. orthopsilosis and L. elongisporus within C. parapsilosis species complex (Fig  3), Candida glabrata sensu stricto and Candida nivariensis within C. glabrata species complex, Candida haemulonii and C. duobushaemulonii within C. haemulonii species complex, and C. neoformans and C. gattii isolates within C. neoformans species complex.

Reproducibility
Reproducibility of the SCGE technique was assessed by repeating the method on nine isolates (five reference strains and four clinical isolates) on three different occasions, and results are summarized in Table 1. For the same amplicon of each strain, the range of measured differences between different repeats was 0.0-0.1 bp.

Clinical evaluation of the SCGE ITS LT database
The identification results by SCGE vs. the reference method (MALDI-TOF MS supplemented by DNA sequencing, see Methods) for the test set of 97 isolates is shown in Table 2. Using the ITS LT database, SCGE correctly identified 99.0% (96/97) isolates to species level, with 13 ITS LTs identified amongst the nine species studied. Only one isolate was not able to be identified -a putative "C. tropicalis" strain, which was not consistent and dissimilar to the ITS LTS of the two established C. tropicalis ITS LTs (i.e. ctr-1 and ctr-2, Table 1) in the database. ITS region sequence of the isolate was 99.6% identical to C. tropicalis type strain ATCC 750T (GenBank accession no. KJ651200), and phylogenetic analysis based on full-length ITS sequences clustered the isolate with ctr-1 and ctr-2 (Fig 4). In addition, the isolate was also identified as "C. tropicalis" by a variety of identification methods including API 20C AUX (bioMérieux), Vitek2 Compact YST (bioMérieux) and Vitek MS; hence, this isolate was assigned as C. tropicalis. A new ITS LT, ctr-3, was designated to this isolate, and ctr-3 was incorporated into the SCGE ITS LT database for future use.

In silico evaluation of ITS LT database
The ITS1 and ITS region sizes of the additional 41 isolates of 41 yeast species were called, as showed in Table 3. Using the ITS LT database developed in the present study, only Trichosporon mucoides was "misidentified", which had identical ITS1 and ITS lengths as Trichosporon dermatis ITS LT tde-1.

Identification of yeast species by MALDI-TOF MS
Of 156 isolates used for SCGE-based ITS LT database, overall, 120 isolates (76.9%) were identified to species level by Vitek MS (bioMérieux), seven (4.5%) were misidentified, and 29 (18.6%) got "no identification" results (Table 1). Of the 32 yeast species studied, 20 species (62.5%) were included in the Vitek MS mass spectra database (database version IVD 2.0), whilst 12 species (37.5%) were absent from the database. All isolates of yeast species being included in the database (120 isolates) were correctly identified. However, amongst isolates of yeast species being absent from the database (36 isolates), 19.4% (seven isolates) were misidentified, and 80.6% (29 isolates) got "no identification" results.

Discussion
The increasing spectrum and emergence of rare yeast pathogens continues to pose a challenge for microbiology laboratories to provide rapid and reliable identification [6,19,24,25]. As a potential alternative molecular methods, here we have developed a SCGE-based approach for yeast identification and demonstrated that it can provide accurate and reproducible identification of major Candida, Cryptococcus and other rare yeast species with the exception of Trichosporon species.
By constructing an in-house database of SCGE patterns encompassing the electropherograms of 32 yeast species from six genera, we first established proof of principle that this technique was able to unambiguously identify and distinguish between the majority of the more commonly-encountered Candida and Cryptococcus species. Of note, both ITS1 and full length ITS SCGE amplicon results in combination were needed to make these distinctions allowing for correct identification of all isolates of these genera. Specifically, the ITS1-directed assay could not distinguish between certain Candida species including between C. tropicalis strains and one LT of C. albicans, both major Candida pathogens [10,13,23]. The inability of ITS1diretced profiles on their own to differentiate between C. neoformans and C. gattii is consistent with the results of several ITS sequencing-based studies [26]. This reinforces the need for any method for fungal identification to be evaluated using more than one target, even within the same genetic locus (the ITS region in this case). Interestingly, the SCGE ITS LT assay was able to differentiate all genetically closed related species included in the study which cannot be differentiated by phenotypic methods [26][27][28][29]. This distinction is important for epidemiological studies and because there may be differences in antifungal susceptibilities between members within the complexes [28,30].
Another key finding was that SCGE lends itself as a potential typing tool. It has been demonstrated that within the same yeast species, there might be different levels of intra-species ITS sequence diversity [11,31]. Species with high genetic diversity are most frequently human commensals, and this finding could explain the existence of additional genetic adaptation within normal microbiota with older evolutionary origins [11]. As observed in this study, for eight of the species used to construct the database, subtypes based on their ITS LTs were identified, although from aspect of typing, SCGE was less discriminatory than multi-locus sequence typing [32,33].
The results obtained with the "database build cohort" of isolates are supported by results obtained on the test isolates, albeit only representing nine species. Only one isolate could not be identified to species, and this isolate represented a novel subtype or ITS LT of C. tropicalis, again raising the possibility of using SCGE as a typing tool. The study of larger numbers of different yeast species in this context would be a clinical interest.
Comparing to the "gold standard" ITS sequencing methods, SCGE-based method was cheaper and with less turnaround time (Table 4). Although both methods relied on the expensive DNA analyzer equipment (e.g. ABI 3730xl, Applied Biosystems), clinical laboratories will benefit from widely available commercial DNA sequencing companies, who provide cost effective (~US$1.5 per SCGE sample analyzed) and timely (24 h) services in urban China (Table 4). MALDI-TOF MS, if available, is another powerful tool for yeast identification, which enables rapid and accurate identification of common clinically-important yeasts, but may be hindered by its equipment acquisition costs (Table 4) [13,19]. Moreover, insufficiencies in yeast spectrum databases may affect its identification capacity in identifying close-related and rare yeast species (Table 1) [6,13,19,25]. As shown in the present study, 37.5% of yeast isolates could not be correctly assigned to species level by Vitek MS system because the corresponding species were being absent from the mass spectra database.
In the broader sense, the usefulness of CGE-based assays for species identification of Candida has been reported previously by Monstein et al. and Mallus et al. [34,35]. Both studies used the Seegene Seeplex PCR assay (Seegene Diagnostic, Seoul, South Korea) for nucleotide amplification, but CGE was carried out on two different platforms QIAxcel (Qiagen, Hilden, Germany) [35] and the MultiNA (Shimadzu Corp., Tokyo, Japan) [34], respectively. Our assay here differs as SCGE relies on fluorescent-labeled primers for PCR amplification whereas QIAxcel and MultiNA devices are adapted for analysis of conventional PCR products amplified by non-labeled primers requiring an additional step to identify nucleic acid with e.g. SYBR Green fluorescent stain. We chose the SCGE approach as the method has proved to be more sensitive in detecting low-intensity amplicons, more precise in size-calling, and has higher discriminatory power compared with the QIAxcel-based assay [36]. In addition, compared with previous CGE-focused Candida studies, that examined amplicons for less than 10 Candida species [34,35], our study developed a relatively more comprehensive identification database, which comprised 21 Candida species, and 11 species of non-Candida yeasts e.g. Cryptococcus, Trichosporon and Rhodotorula. The isolates used for database development were from the CHIF-NET study, which was currently the largest nationwide multicenter surveillance program for IFDs in China, indicating its clinical relevance at least in the region. The major limitation of the current SCGE ITS LT assay was its inability to identify Trichosporon species including T. asahii, T. japonicum and T. asteroides, T. mucoides and T. dermatis. It has been reported that the ITS and D1/D2 region sequences of above Trichosporon species were highly similar, with only few single nucleotide polymorphisms in nucleotide sequences but no difference in sequence-length [9,37]. Therefore, the ITS region was not an appropriate target for identification of Trichosporon species. Instead the rDNA intergenic spacer (IGS) 1 region is most suitable for differentiating between phylogenetically close Trichosporon species because of its higher diversity [9,37,38]. Further work on IGS1 region for SCGE is warranted for study of this genus.
Another limitation of the study was that among the test 97 isolates, only seven Candida and two Cryptococcus species were analyzed. By studying 41 additional and including rare species of yeasts, by in silico analysis and by querying their identity against the ITS LT database, only T. mucoides was "misidentified" where the ITS LT profile was identical to that of T. dermatis (Table 3). This is encouraging and we plan to add to our in-house database the ITS LTS representative of more species so that it may be more widely applicable.
In summary, we have here established a database for SCGE-based identification for yeasts other than Trichosporon species, and the premise for performing a larger scale study to evaluate the identification capabilities of SCGE. To this end, DNA sequencing remains the "gold standard" identification method. However, the ITS SCGE assay described herein has shown promise to fulfill the function as a potential "reference" method since it is simple to use and adaptable for rapid identification. Moreover, it has the potential to detect mixed infections and to subtype species.