Predominance of Single Prophage Carrying a CRISPR/cas System in “Candidatus Liberibacter asiaticus” Strains in Southern China

“Candidatus Liberibacter asiaticus” (CLas) is an uncultureable α-proteobacterium associated with citrus Huanglongbing (HLB, yellow shoot disease), a highly destructive disease affecting citrus production worldwide. HLB was observed in Guangdong Province of China over a hundred years ago and remains endemic there. Little is known about CLas biology due to its uncultureable nature. This study began with the genome sequence analysis of CLas Strain A4 from Guangdong in the prophage region. Within the two currently known prophage types, Type 1 (SC1-like) and Type 2 (SC2-like), A4 genome contained only a Type 2 prophage, CGdP2, namely. An analysis on CLas strains collected in Guangdong showed that Type 2 prophage dominated the bacterial population (82.6%, 71/86). An extended survey covering five provinces in southern China also revealed the predominance of single prophage (Type 1 or Type 2) in the CLas population (90.4%, 169/187). CLas strains with two and no prophage types accounted for 7.2% and 2.8%, respectively. In silico analyses on CGdP2 identified a CRISPR (clustered regularly interspaced short palindromic repeats)/cas (CRISPR-associated protein genes) system, consisting of four 22 bp repeats, three 23 bp spacers and 9 predicted cas. Similar CRISPR/cas systems were detected in all 10 published CLas prophages as well as 13 CLas field strains in southern China. Both Type 1 and Type 2 prophages shared almost identical sequences in spacer 1 and 3 but not spacer 2. Considering that the function of a CRISPR/cas system was to destroy invading DNA, it was hypothesized that a pre-established CLas prophage could use its CRISPR/cas system guided by spacer 1 and/or 3 to defeat the invasion of the other phage/prophage. This hypothesis explained the predominance of single prophage type in the CLas population in southern China. This is the first report of CRISPR/cas system in the “Ca. Liberibacter” genera.


A4 and other CLas strains
CLas strain A4 originated from a collection in an HLB outbreak in Sihui City of Guangdong Province, People's Republic of China in December of 2005 ( Fig 1A). The bacterium was first grafted on a healthy mandarin citrus (Citrus reticulata Blanco), cultivar "Shatangju", and transmitted to periwinkle (Catharanthus roseus (L.)G. Don.) via dodder (Cuscuta campestris Yunck). CLas was monitored by PCR with primer set OI1-OI2c [7] and quantified by the procedure of Li et al. [31] with primer set HLBasf/HLBasr (Fig 1A and 1B). Strain A4 was maintained, propagated through grafting, and used as DNA source for sequence evaluation. Other CLas strains used in this study were collected from HLB affected citrus trees in five provinces in southern China (Fig 2). DNA was extracted following the procedure described previously [17]. Infection of CLas was confirmed by the procedure described by Li et al. [31]. A DNA sample from a single tree, or a single Asian citrus psyllid (Diaphorina citri Kuwayama), the vector of CLas, was considered as a CLas strain. For citrus origin, total plant DNA was extracted by E. Z. N. A.HP Plant DNA Kit (OMEGA Bio-Tek Co., Guangdong, China) using 200 mg of leaf midribs from three citrus leaves collected from the same branch of HLB-infected tree. For the Asian citrus Psyllid (Diaphorina citri Kuwayama), DNA was extracted with TIANamp Genomic DNA Kit (Tiangen Biotech Co., Beijiang, China) from single insects following the manufacturer's protocol.

Re-evaluation of A4 genome sequence
A brief description of strain A4 genome sequencing using Illumina MiSeq platform with Strain Psy62 genome sequence (CP001677.5) [13] as a reference was published previously [22]. Because the Psy62 genome sequence did not include prophage FP2 (a SC2 homolog), the A4 genome sequence was reassembled by including SC2 sequence (NC_019550.1) as a reference following the same procedure [22], mainly involving identification of CLas reads based on reference sequences with standalone BLAST [32], read collection with Perl scripts, and a combination of de novo assembly with Velvet 1.2.10 [33] and referenced assembling with CLC Genomic Workbench 7.5.
For gap closure, primers were designed using Primers 3 software [34] based on contig sequences from assembly results. PCR was performed following standard procedures. Amplicons generated from these primers were cloned in pEASY-T1 plasmid (TransGen Biotech, Beijing, China) or directly sequenced by Sanger's method. Sequences were assembled with Seq-Man software under the DNASTAR Lasergene suit (http://www.dnastar.com). Genome annotation was conducted using the RAST server (http://rast.nmpdr.org) [35].
strains with MiSeq data such as A4, or published sequence data (Table 1), the mapping method was used. Prophage type was determined by mapping the MiSeq sequence reads, or the prophage sequences, to SC1 and SC2 using CLC genomic workbench version 7.5. For field collected samples, the PCR method was used. Specific PCR primers were designed by comparing the sequences between SC1 and SC2. Eight loci/regions unique to SC1 and SC2 after alignment between the two sequences were selected. Primer sets were designed using Primer 3 software [34]. Primer sequences and related information are listed in Table 2. Prophage type was determined by the success of PCR experiments yielding expected amplicons from at least 6 out of the 8 specific primer sets. CLas strains from five provinces (Yunnan, Guangxi, Hainan, Guangdong and Fujian) in southern China were used for distribution analysis of different prophage type (Fig 2). The percentage of CLas strains with different type of prophage from each province were calculated based on the PCR result.

CRISPR/cas analyses
A CRISPR/cas system was defined by the simultaneous presence of a CRISPR array and cas genes in the nearby vicinity [28]. Candidate CRISPR repeats array were detected by CRISPR Recognition Tool [37]. Alignment of CRISPR repeat sequences was performed on the Clustal Omega Server [36] and viewed by Jalview [38]. The secondary structure of CRISPR repeat transcript (represented by DNA sequences) was predicted by Quikfold on DINAMelt web server with default setting [39]. To check for possible sequence origins, spacers were used as queries for BLASTn against nucleotide sequence database including the virus database in GenBank (version 1.1).
Genes or ORFs adjacent to CRISPR repeat array were selected and used as queries to search for the presence of cas gene in Conserved Domain Database (CDD, version 3.13) that included the most updated collection of published cas genes [40]. Once a candidate CRISPR/cas system was identified, the sequence in the vicinity was downloaded and used as query to search for homologs in other published CLas genomes (Table 1) using BLASTn. Variations of the CRISPR locus among known CLas genomes were analyzed through multiple sequence alignment by Clustal Omega [36]. Phylogenetic analyses were performed on MEGA 6.0 [41].
To investigate variations of the CRISPR array, additional CLas strains were collected from southern China. Prophage types were determined by the PCR method ( Table 2). The CRISPR regions were PCR amplified with primer set CRIF/CRIR (CTCAGCTTTTGTCATGCCCA / AGGAAGACAATATCGCCCGT). Amplicons were sequenced by Sanger's method.

Results and Discussion
Re-evaluation of A4 genome sequence To bypass the in vitro culture barrier, the in planta culture system was used to supply Strain A4 DNA continuously. As shown in Fig 1, periwinkle was an effective host for CLas enrichment. A  21.0 in periwinkle) was achieved. Further CLas DNA enrichment procedures were described previously [22]. Based on the number of MiSeq reads, the CLas/periwinkle DNA ratio was about 0.02 or 1:50 (636,810 CLas-reads vs. 32,130,744 non-CLas reads), rather than the possible 1:1,000 [31]. Over 20,000 bp were re-sequenced from PCR amplicons with a total of 225 primer sets to improve quality of the previous version of A4 genome sequence. The new version of A4 genome (CP010804) consisted of 1,233,514 bp, with the average GC content of 36.4%, 1,187 ORFs, and 53 RNA genes.

Special features of A4 genome
Comparison of whole genome sequences between strain A4 and selected strains (Psy62, Ishi-1 and gxpsy) from different geographical origins showed limited variations in the chromosomal region, mostly single nucleotide polymorphisms (SNPs) and indels (insertions/deletions) including tandem repeat variations reported previously [14,17]. A feature of particular interest was the presence of a single prophage. Among the 636,810 CLas reads (mean length = 250 bp) from the MiSeq data, no reads were matched to Psy62 genome at several regions corresponding to prophage FP1 (homolog of SC1). A visualization of A4 MiSeq reads mapped to SC1 and SC2 were performed by CLC genomic workbench (S1 Fig). A4 reads covered 57% of SC1 and 100% of SC2, indicating the presence of a Type 2 prophage, designated as CGdP2, in the A4 genome.
As shown in Fig 3, specific primer sets ( Table 2) were effective in detecting and defining (6/ 8 or 75%) Type 1 and Type 2 prophages. Non-target amplification occurred, e.g. sample D lane 12 (primer set 12) and samples A, C, and D of lane 16 (primer set 16) (Fig 3). By design, both primer sets 12 and 16 were Type 2 prophage specific. However, overall prophage type interpretation was not affected. It should also be noted that although sample D is considered as harboring no Type 1 or Type 2 prophage, it is possible that partial Type 1 or Type 2 prophage DNA exist in the bacterial chromosome or a currently unknown prophage.
Among the 86 CLas strains from Guangdong (Fig 2), 71 (82.6%) harbored only Type 2 prophage, likely CGdP2. Adding the 7.0% of Type 1 prophage strains, a near 90% of CLas population in Guangdong harbored a single prophage. Similarly, single prophage dominated each of the four other provinces, although the ratio of the two prophage types varied. Noticeably, strains in Yunnan were dominated by Type 1 prophage, contrasting to those of Guangdong. This is in agreement with the previous observations that CLas population in the high altitude Yunnan Province was different from that in the low altitude provinces such as Guangdong [42,43]. In a summary, a total of 187 CLas strains were collected from five provinces in southern China and analyzed (Fig 2). Among them, 26.74% (50/187) harbored single Type 1 prophage, 63.64% (119/187) harbored single Type 2 prophage. Over 90% CLas strains had single prophage. Only 6.95% (13/187) harbored both Type 1 and Type 2 prophages. It should be noted that in the case of two prophage types detected, it was also possible that the CLas samples might be a mixture of two cell types, each having only a single prophage. Our laboratory recently published three more CLas draft genome sequences, HHCA [23], FL17 [24], and YCpsy [44]. Based on the MiSeq reads mapping to SC1 and SC2, all three CLas strains had single prophage (Table 1). Our observation of single prophage dominance in CLas is different from the earlier reports of two prophages in CLas strain Psy62 [19], UF506 [20], and gxpsy [21]. The discrepancy may be related to the multiple sources of CLas, that increased the chance of collecting two prophage types. A single prophage was reported in the first report of Psy62 from a single psyllid [13]. In the second report that proposed FP1 and FP2, both psyllid and citrus samples were involved [19]. Similarly, both plant and psyllid samples were involved in the study of SC1 and SC2 [20]. The exception is Strain gxpsy, which was reported from a single psyllid [21].
Another interesting observation was that 2.67% (5/187) CLas strains harbored none of the two prophages. This is the first observation of CLas strains without Type 1 or Type 2 prophages in China, similar to strain Ishi-1 in Japan [25]. The lack of prophage did not seem to correlate to the lack of HLB symptoms (Fig 3). This seems to deviate from the speculation that prophage might be related to bacterial virulence [20] and a peroxidase gene in SC2 could encode a secreted effector that suppressed plant defenses [27]. However, our current understanding of CLas pathogenicity / virulence is very limited.
A CRISPR/cas system Analyses of A4 genome sequence revealed seven possible CRISPR arrays (Table A in S1 File). However, CDD search identified CD16_05520 as a putative cas4 gene (Table 3), which was 1,682 bp or 4 ORFs downstream of CRISPR candidate 7 (Table B in S1 File and Table 3). This CRISPR/cas system was located within prophage CGdP2. The CRISPR array contained four highly similar 22 bp repeats with three heterologous spacers of 23 bp (Figs 4, and 5). Unlike the CRISPR spacers, each repeat had typical dyad structure and capable of forming a stable stemloop (Fig 5), a characteristic of CRISPR repeat [28]. Repeat sequences were much more homogeneous (82%, 18/22) than spacers (39%, 9/23). No similar CRISPR array was found in Gen-Bank sequence database except for the 10 published CLas prophages (Table 1), suggesting the CRISPR/cas system was shared by these prophages.
When comparing the 10 CLas prophages from different geographical regions (Fig 4), spacer 1 showed no difference. Spacer 3 is mostly homogeneous except for a SNP in strain Psy62 from Florida. Significant sequence variations were found in spacer 2. Additionally, 14 CLas strains were collected in southern China and their CRISPR regions were compared. Variations were again found in spacer 2 but not in spacer 1 and 3. Cluster analysis showed that variations in spacer 2 grouped along with prophage types, regardless to the geographical origins (Fig 6). BLAST search through virus database with each spacer as a query did not identify any 100% similarity match.
According to annotation, the CRISPR array was found within an ORF CD16_05495. This is not typical among the known bacterial CRISPR arrays which were believed to be intergenic [28]. However, CRISPR array was in the opposite direction of CD16_05495, i.e. the CRISPR sequence itself was not coding. In addition, it was pointed out that CRISPR arrays could be masked by ORFs incorrectly annotated simply based on lack of stop codon in long stretch of DNA sequences [45]. ORFs surrounding the CRISPR array were mostly gene possessing DNA/ RNA processing function motifs (Table 3; Fig 7). As discussed earlier, CD16_05520, was highly similar to member of Cas4 superfamily (pfam10926) [28,46,47].
The relationships of other ORFs to cas gene in the current version of CDD were less clear. This is not surprising since database of cas gene sequences is still in its infancy. Plus, CLas itself is a poorly studied bacterium. A set of cas genes designated as cas1 to cas4 have been regarded as the core genes for a CRISPR/cas system [28,48]. Although homologues of cas1, cas2, and cas3 could not be found based on sequence similarity, the CLas CRISPR/cas system contained a set of genes possessing functions to those of the cas genes, CD16_05535 as cas1 for its exonuclease domain, CD16_05540 as cas2 for its endoribonuclease domain, and CD16_05545 as cas3 for its helicase domain (Table B in S1 File). In another word, the CLas CRISPR/cas system possesses all key components to be fully functional.

CRISPR/cas and CLas prophage relationship
It should be noted that most CRISPR/cas systems discovered so far are chromosome-borne. It is, however, also documented that CRISPR/cas system were carried by phages [49][50][51][52][53]. In Vibrio cholera, it was reported that a phage-encoded CRISPR/cas system could be used to counteract a phage inhibitory chromosomal island of the bacterial host [53]. In a human gut virome study, Minot et al. [51] demonstrated a strong in silico evidence of a phage-encoded CRISPR array targeting another phage.
Our survey from southern China showed that two types (Type 1 and Type 2) of propahges, and therefore inferring two types of phages, coexist (Fig 2). However, for a CLas strain (a HLB citrus tree), single prophage is predominant (90.4%, Fig 2), which could be interpreted as the two prophages were in competition for a host. Considering that the function of a CRISPR/cas system was to destroy invading DNA based on spacer information, it can be hypothesized that one pre-established CLas prophage in a CLas cell could use its CRISPR/cas system to defeat the invasion of the other phage/prophage DNA. The sequence of spacer 1 or spacer 3 or both could be the target of recognition, although more research such as protospacer adjacent motif (PAM) is involved is needed. Along this line, the role of spacer 2 remains to be investigated.
Having proposed the hypothesis on competitions between the two CLas prophages/phages, we are aware that directly molecular evidence is needed for the ultimate proof of the CRISPR/ cas system. Yet, this effort could face an even more challenging research issue, the in vitro cultivation of CLas that has not been resolved, despite research efforts for decades. Here, we explored the use of in silico genome sequence analyses to identify a CRISPR/cas system in CLas, which could be related to the observed prophage competitions in southern China. This is Table 3. Basic information of a predicted CRISPR/cas system in prophage CGdP2 of "Candidatus Liberibacter asiaticus" strain A4 with comparison to prophage SC2. the first effort to investigate CRISPR/cas system in the genus of "Ca. Liberibacter". In light of the fast advancement of the current cas technology [54], knowledge of the CLas CRISPR/cas system could potentially be used for gene manipulation of this uncultureable bacterium using the in planta (such as periwinkle) cultivation system.    Schematic representation of CRISPR (clustered regularly interspaced short palindromic repeats)/cas system in "Candidatus Liberibacter asiaticus" Strain A4. The CRISPR repeats are depicted by four vertical blue lines at locus 05495. Open reading frames (ORFs) are represented by arrow boxes with locus numbers listed. ORFs with no predicted functions are indicated by white arrows (locus number omitted for simplicity). ORFs with conserve domains of DNA/RNA enzymes were predicted as "cas" genes and indicated by blue arrows. Arrow directions represent ORF directions. The cas4 assignment to ORF 05520 was determined by significant match to orthologues in Conserve Domain Database. Genes "cas1-3" were proposed mainly based on similar protein functions. doi:10.1371/journal.pone.0146422.g007

Conclusions
This study began with the genome sequence analysis on a CLas strain collected from Guangdong Province of China, where HLB has occurred for over a hundred years, and then extended the study to four nearby provinces. The CLas population in southern China was found to predominantly harbor a single prophage. The prophage carried an immunity structure called a CRISPR/cas system. The prevalence of single prophages suggested competition events between prophages for CLas hosts. One prophage might use its immunity structure to defeat the invasion of the other. This is the first finding of an immunity system in CLas. The information will facilitate current understanding on the molecular mechanisms of CLas population variation. Biological information about CLas, the HLB pathogen, is currently in urgent need for development of effective HLB control strategies.
Supporting Information S1 Fig. Mapping of MiSeq reads of "Candidatus Liberibacter asiaticus" Strain A4 to the sequence of prophage SC1 and SC2. Mapping track of A4 Miseq reads to SC1 and SC2 sequence were performed on CLC genomic workbench. Green lines represent forward reads and red lines represent reverse reads. A4 reads covers 57% of SC1 (40,048 bp) and 100% of SC2 (38,997).