CRISPR-cas Subtype I-Fb in Acinetobacter baumannii: Evolution and Utilization for Strain Subtyping

Clustered regularly interspaced short palindromic repeats (CRISPR) are polymorphic elements found in the genome of some or all strains of particular bacterial species, providing them with a system of acquired immunity against invading bacteriophages and plasmids. Two CRISPR-Cas systems have been identified in Acinetobacter baumannii, an opportunistic pathogen with a remarkable capacity for clonal dissemination. In this study, we investigated the mode of evolution and diversity of spacers of the CRISPR-cas subtype I-Fb locus in a global collection of 76 isolates of A. baumannii obtained from 14 countries and 4 continents. The locus has basically evolved from a common ancestor following two main lineages and several pathways of vertical descent. However, this vertical passage has been interrupted by occasional events of horizontal transfer of the whole locus between distinct isolates. The isolates were assigned into 40 CRISPR-based sequence types (CST). CST1 and CST23-24 comprised 18 and 9 isolates, representing two main sub-clones of international clones CC1 and CC25, respectively. Epidemiological data showed that some of the CST1 isolates were acquired or imported from Iraq, where it has probably been endemic for more than one decade and occasionally been able to spread to USA, Canada, and Europe. CST23-24 has shown a remarkable ability to cause national outbreaks of infections in Sweden, Argentina, UAE, and USA. The three isolates of CST19 were independently imported from Thailand to Sweden and Norway, raising a concern about the prevalence of CST19 in Thailand. Our study highlights the dynamic nature of the CRISPR-cas subtype I-Fb locus in A. baumannii, and demonstrates the possibility of using a CRISPR-based approach for subtyping a significant part of the global population of A. baumannii.


Introduction
CC2, according to the Pasteur's MLST scheme, is currently the largest and most widely distributed clone in the global population of A. baumannii [12,13]. Nonetheless, several other clones have co-dominated or recently emerged as important international actors. For instance, CC1 ranks as the second largest clone of A. baumannii, with a broad international distribution in more than 30 countries from all continents [14]. Isolates from this clone have commonly showed a multidrug-resistance phenotype and frequently carried AbaR3-like resistance islands [16]. In parallel, a growing occurrence of CC25 has recently been reported from different countries in Europe, South and North America, Africa, and Asia [13,14,17,18]. In addition to their extensive resistance to antibiotics, the CC25 isolates have shown the ability to resist desiccation, form biofilms on abiotic surfaces, and adhere to human alveolar epithelial cells [19].
Two CRISPR-Cas systems have recently been found in the genome of particular A. baumannii strains [20,21]. The CRISPR-cas locus in strain AYE belongs to subtype I-Fb, herein denoted as CRISPR-cas subtype I-Fb throughout the manuscript [4,22]. Genomic islands carrying this locus in strains 4190, AB0057 and AYE were found to be closely related, indicating potential interstrain horizontal transfer [20]. Comparative analysis of partial sequences of the CRISPR-cas subtype I-Fb locus was useful in detecting the occurrence of an intra-clonal diversity among clinical isolates of international clone CC1 [21]. The aim of this study was to investigate the evolutionary history of CRISPR-cas subtype I-Fb in A. baumannii and to determine the genetic relatedness among a collection of CRISPR-positive clinical isolates of A. baumannii, based on comparative sequence analysis of the arrays of spacers located in their CRISPR-cas subtype I-Fb locus.

A. baumannii isolates
The study included 74 isolates of A. baumannii carrying the CRISPR-Cas subtype I-Fb system ( Table 1). The isolates were collected from the United States of America (USA; n = 29), Sweden (n = 12), Norway (n = 10), Iraq (n = 5), Netherlands (n = 3), Czech Republic (n = 3), Germany (n = 3), Canada (n = 2), Greece (n = 1), France (n = 1), Italy (n = 1), Argentina (n = 1), Colombia (n = 1), and United Arab Emirates (UAE; n = 1). The country of isolation was unknown for one isolate. Twenty-five isolates were part of three ongoing projects involving whole-genome sequencing of carbapenem-resistant A. baumannii isolates obtained in Norway between 2010 and 2013 (project I), Sweden between 2012 and 2013 (project II), or representatives of an international collection of A. baumannii isolates belonging to CC25 (project III). Forty-four isolates with sequenced genomes were selected from the records of the International Nucleotide Sequence Database Collaboration (http://www.insdc.org/). Occurrence of CRISPR-cas subtype I-Fb in these isolates was detected by Nucleotide BLAST algorithm (http://blast.ncbi.nlm.nih.gov/Blast. cgi), against both the "Nucleotide collection (nr/nt)" and "Whole-genome shotgun contigs (wgs)" databases. The remaining 5 isolates belonged to CC1 or CC25 and were part of previously published studies [23,24]. The online multilocus sequence typing (MLST) service hosted by the Center for Genomic Epidemiology (http://www.genomicepidemiology.org/) in Denmark was used to determine the ST of all the isolates for which a full genome sequence was available [25]. The assignment was performed according to the Institute Pasteur's MLST scheme [13] (http:// pubmlst.org/abaumannii/). In order to group the isolates into CCs, a minimum spanning tree was generated from all the allelic profiles in the database using PhyloWeb and the MSTree application assimilated in the Institute Pasteur's MLST web site (http://www.pasteur.fr/mlst).

Phylogenetic analysis
Phylogenetic trees were generated based on nucleotide sequences alignments of (i) a conserved segment of 101 bp located downstream of the array of spacers, (ii) 920 bp of the cas1 gene, AVOD00000000, [45] (Continued) (iii) 1251 bp of the csy1 (Cas system-associated) gene, (iv) 615 bp of the csy4 gene, and (v) 2976 bp of the concatenated MLST sequences. Only isolates with sequenced genomes were included in the phylogenetic analyses. The online available package of programs (MUSCLE, Gblocks, PhyML, and TreeDyn) was used for nucleotide alignment, tree construction, and tree rendering [26]. One hundred bootstraps were used for bootstrap analysis.

CRISPR-based subtyping
DNA sequences of the CRISPR arrays of spacers were either retrieved from the GenBank nucleotide database or amplified and sequenced using the BigDye 3.1 technology (Applied Biosystems). Amplification of the full length of the arrays was performed using a pair of external primers, Ab-CRIS-F and Ab-CRIS-R, targeting conserved flanking regions (S1 Table). PCR products were purified using ExoProStar (GE Healthcare Bio-Sciences). The purified PCR amplicons were subsequently sequenced using the two external and several internal primers designed in tandem as required. Arrays of spacers were identified using CRISPRFinder [27]. A dictionary of annotated spacers was created using CRISPRtionary (http://crispr.u-psud.fr/ CRISPRcompar/Dict/Dict.php) and revised manually. CRISPRtionary was also used to create a binary file of presence (1) or absence (0) of spacers. Each spacer with a newly defined sequence was assigned a new consecutive number, and each array with a newly defined assortment of spacers represented a new CRISPR-based sequence type (CST), as previously described [28].

Results and Discussion
Description and distribution of the CRISPR-cas subtype I-Fb locus The CRISPR-cas subtype I-Fb locus, located at position 1,057,691 to 1,069,768 of the genome of A. baumannii strain AYE (GenBank accession number: CU459141), consisted of six genes consecutively encoding the Cas1 endonuclease, Cas3/cas2 helicase/RNAse, and four Csy proteins ( Fig. 1A and B). csy4 (cas6f) was followed by a short non-coding TA-rich leader sequence. Then, the locus enclosed an array of spacers, where each spacer was flanked by two DRs. The spacers had variable sequences and a common length of 32 bp, whereas the DRs had the 5 0 -GTTCATGGCGGCATACGCCATTTAGAAA-3 0 consensus sequence and were 28 bp long.
Similarly to the CRISPR-Cas subtype I-F systems in other genera, the consensus sequence belonged to the DR cluster 4, showing a nucleotide similarity of 65-75% with those of Escherichia coli, Pseudomonas aeruginosa, Yersinia pestis, Shewanella spp., and Pectobacterium atrosepticum [3,5,29,30]. Overall, 876 distinct A. baumannii-spacers (Ab-1 to-876) were identified (S2 Table). The size of the arrays was remarkably different among the isolates, ranging from 148 bp (2 spacers) up to 7354 bp (122 spacers). A complete CRISPR-cas subtype I-Fb locus was present in all isolates belonging to CC1 and CC25. The locus was also present in isolates from ST113, ST12, ST38, ST126, ST427, ST505, ST508 and ST519. An internal deletion was detected in the locus of only one isolate, Naval-82 that belonged to ST428, resulting in the absence of cas3/cas2, csy1, csy2 and part of csy3 (Gen-Bank accession number: AMSW01000159). On the other hand, the locus was replaced by a short sequence of 128 bp in the genome of A. baumannii strain TYTH-1 (GenBank accession number: CP003856), which was assigned to ST2 (Fig. 1C). The locus was also not present in the genome of other isolates from CC2 or isolates AB4857 and OIFC099 from the epidemic clones CC3 and CC32, respectively ( Fig. 1C and data not shown). Notably, the locus was present in isolates from other Acinetobacter species such as Acinetobacter haemolyticus TG19602 and Acinetobacter gyllenbergii NIPH 230 (GenBank accession numbers: AMJB01000210 and AYEQ01000180, respectively). In addition, a common occurrence of CRISPR-cas subtype I-Fb in Acinetobacter parvus has recently been reported [22].
A Blastn search failed to detect proto-spacers with homologous sequences to 42/106 of the spacers present in A. baumannii CC1 isolates (S2 Table). This could be due to the limited number of A. baumannii phages that have been sequenced and deposited in the GenBank databases [22]. Nonetheless, phage-and plasmid-related DNA elements were the source of 50/106 and 12/106 of the spacers, respectively. The remaining 2/106 spacers originated from A. baumannii DNA that was most likely not related to phages or plasmids. These results were comparable to those of previous studies on the CRISPR arrays in Streptococcus thermophilus and Y. pestis [9,31]. For example, only 500 out of 952 unique spacers in S. thermophilus showed similarity to viral (n = 384), plasmid (n = 80), or chromosomal (n = 33) sequences [9]. A prophage of 42778 bp, located on the genome of A. baumannii NIPH 527 (APQW01000004: 179581-222358), represented the main foreign DNA encountered by our isolates, being the source of 16/106 of the spacers (S2 Table and S1 Fig.). The proto-spacers in this prophage were carried on a variety of genes, such as the replicative DNA helicase, glycosyl hydrolase, tail tape measure, integrase, terminase, and GDSL-family lipase genes, or located in inter-gene regions. The proto-spacers were found either on the sense or antisense strand (S1 Fig.), as previously described [3].
Alignment of sequences surrounding the proto-spacers identified the dinucleotide "CC", or "GG" on the complementary strand, to be the PAM for the CRISPR-Cas subtype I-Fb machinery of A. baumanni (S2 Table and S2 Fig.). The PAM motif was located on the 5 0 end of the proto-spacers, as previously reported for other CRISPR-Cas type I systems [2]. Our results were also consistent with earlier studies showing that GG has been the signature PAM for the  1,069,768). The locus consisted of two CRISPR-associated genes (cas1 and cas3/cas2), four Cas system-associated genes (csy1, csy2, csy3, and csy4), and an array of spacers. The map was created using Artemis (http:// www.sanger.ac.uk/resources/software/artemis/). B) Nucleotide sequence of the array of spacers in A. baumannii strain AYE. The array included 59 spacers surrounded by 60 direct repeats (marked in green). Some repeats, mainly at the trailer end of the array, included degenerated nucleotides (marked in yellow).
DR cluster 4 and CRISPR-Cas subtype I-F systems [3]. A set of 4 spacers (Ab-62 to Ab-65) was most likely acquired by one of our isolates, assigned to CST8, after a single contact with a plasmid from A. baumannii IS-58 assigned to CST1 (S2 Table). Similarly, Ab-77 to Ab-80, Ab-92 to Ab-94 (S1 Fig.), and Ab-97 to Ab-99 could also be acquired after single interactions with particular plasmid or phage DNA molecules.

Evolution of the CRISPR-cas subtype I-Fb locus
Arrays of spacers have mainly evolved by adding new spacers in response to contacts with foreign genetic elements [9]. Since the addition takes place in a one-way direction, spacers present at one end (the trailer end) of the arrays had been integrated earlier than those present in the other end (the leader end), and the order of spacers generally provides a chronological narration of former exposures to invading phages and plasmids [31]. Studies have reported that trailer end spacers are generally conserved among different isolates and can be used to anchor clusters and detect common ancestors of the arrays and probably of the isolates themselves [9]. In contrast, the leader end spacers are usually polymorphic, reflecting the existence of distinctive phage/ plasmid pools at a particular era in different geographic locations [8]. Ab-1 and Ab-107 were the first spacers acquired by the vast majority of our isolates (S2 Table). Spacer Ab-1 was present in the locus of all the isolates from CC1, ST38, ST428, ST505, and ST519, and also isolate 4190 from CC25. In contrast, spacer Ab-107 was shared by all the CC25 isolates, except isolate 4190, and the ST113, ST126, ST12, and ST427 isolates. The conservation of Ab-1 and Ab-107 at the trailer end of the arrays putatively assembled our arrays into two major groups with a different first step of descending from a common ancestor empty of spacers. In agreement with previous studies, the DRs at the trailer end of our isolates were degenerated (Fig. 1B). However, the number and sequence of the degenerated nucleotides were following two patterns corresponding to the Ab-1 and Ab-107 grouping of the isolates, which confirmed the occurrence of two main lineages in the early evolutionary history of the locus (S3 Fig.). Furthermore, comparative sequence analysis of a conserved segment of 101-bp located adjacent to the trailer end of the arrays showed a precisely matching assembly of the isolates (S4 Fig.).
The CRISPR arrays have frequently been reshaped by internal deletions and duplications, of individual spacers or sets of consecutive spacers, leading to further diversification of the arrays [32]. Interestingly, Ab-5 was located first at the trailer end of the array of spacers in isolate ab299505 (ST508). This was probably due to an internal deletion of 240 bp, erasing Ab-1 to Ab-4, caused by a recombination event between the two DRs surrounding the deleted region (Fig. 2). The last DR in this array had a unique sequence that was most likely derived from the two recombined DRs, consistent with the occurrence of such a recombination event. Accordingly, the locus in ab299505 also belonged to the Ab-1 lineage (S4 Fig.).
Phylogenetic trees of cas1, csy1, and csy4 identified seven distinct pathways of evolution for the CRISPR-cas subtype I-Fb locus in A. baumannii (Fig. 3A, B, and C). Pathways 1, 4, 5, 6, and 7 branched from the Ab-1 lineage whereas pathways 2 and 3 descended from the Ab-92 lineage. Pathway 1 ended with a cluster including all the alleles of the locus present in isolates from CC1. Similarly, pathway 2 created a cluster of all the alleles of isolates from CC25, except for strain 4190. Pathway 3 was shared by the alleles of isolates AB_1650-8 (ST113), NIPH615 (ST12), ab233846 (ST126), and TG23791 (ST427), whereas the two alleles of isolates 4190 (CC25) and ab1106579 (ST505) evolved following pathway 4. The alleles of isolates NIPH201 C) Comparative analysis of the genetic surroundings. The comparison was performed between the locus-positive strain AYE which belonged to sequence type (ST1) and the locus-negative strains TYTH-1, AB4857, and OIFC099 which belonged to ST2, ST3, and ST32, respectively. Homologous sequences shared by all the isolates were indicated by gray zones.
doi:10.1371/journal.pone.0118205.g001 (ST38), ab299505 (ST508), and ab532279 (ST519) showed individual pathways of evolution. However, the latter three alleles had mosaic sequences proposing the occurrence of penetrations mediated by recombination events with the locus of other strains. Due to the internal deletion, the pathway of evolution could not be determined for the locus in Naval-82 (ST428). Comparing the topology of the isolates in these phylogenetic trees with the one based on the concatenated MLST sequences showed a congruent positioning of the isolates of pathways 1 and 2, suggesting a vertical spread of the locus in these two clusters (Fig. 3D). On the other hand, the incongruent standing of the isolates of pathways 3 and 4 proposed the occurrence of recent horizontal acquisitions of the whole locus between isolates of each cluster [10].

CRISPR-based subtyping
Different assortments of the spacers divided the isolates into 40 CSTs (Fig. 4 and S3 Table). Isolates from CC1 (n = 36) belonged to 13 CSTs, with some CSTs being different from each other only by a duplication or deletion of 1 spacer. CST1 included 18 isolates recovered between 2004 and 2013 from USA (n = 11), Iraq (n = 3), Canada (n = 2), Germany (n = 1), and Sweden (n = 1). Tracking the epidemiological data showed that eight of the isolates were obtained during the military operations in Iraq and Afghanistan [33,34]. CST1 could be an Iraq-endemic sub-clone of CC1 that was able to spread to USA, Canada, and Europe. A previous study comparing the DNA profiles of A. baumannii isolates from USA and the United Kingdom that were associated with casualties returning from the Iraq conflict has also demonstrated the import of at least one strain responsible for outbreaks of infections in the two countries [35]. Adaptation of CST1 to the pool of phages and plasmids present in a particular geographical site resulted in the acquisition of specific spacers which might be used as a genomic signature of this sub-clone and a biological marker of this particular geographic ecosystem [10,36]. On the other hand, CST2 included 3 isolates obtained from Czech Republic in 1994 and USA in 2009 and 2010. However, the Czech isolate was not reported to be epidemiologically linked with the two American isolates [37].
CST8 included two isolates obtained in 2011 and 2013 by two different diagnostic laboratories in Norway. Since no other epidemiological linkage was detected, inter-hospital transfer of patients or medical staff could be responsible for the spread of CST8. However, the long gap in the time of acquisition obviously excluded the occurrence of an outbreak. CST12 was different from CST13 only by having a duplication of spacer Ab-49. Five isolates belonged to CST12-13. Three of these isolates, obtained from Germany, Norway, and USA between 2003 and 2011, shared a history of import from Iraq ( Table 1). The other two isolates were obtained from different laboratories in Norway in 2009 but were both imported from India [23]. Interestingly, none of these isolates belong to ST1, the founder of CC1. In contrast, the isolates mainly belong to ST94, indicating a considerable sub-clonal demarcation of ST94, able to overcome the origination of the isolates from two geographically disconnected countries. Three isolates, obtained from patients in one military hospital in France in 2009, were found to carry arrays of spacers with a high similarity to each other [21]. The three isolates were linked to CST12-13 according to our subtyping scheme. However, the comparison was incomplete since only the leader end of the CRISPR arrays was amplified and partially sequenced in the French isolates. Of note, the three French isolates were recovered from skin samples while our isolates came from various sources [21].
The CC25 isolates (n = 30) were divided into 19 CSTs (Table 1 and Fig. 4). Three isolates, obtained from Sweden in 2012 and Norway in 2012 and 2013, belonged to CST19. Interestingly, these isolates were recovered from patients previously hospitalized in Thailand. The molecular similarity between the isolates suggested the existence of a Thai strain and supported the history of import of these isolates, pointing once more to the necessity of having a screening  program for patients after hospitalization abroad [23]. CST23 was identical to CST24 apart from having a duplication of three spacers. CST23-24 included 6 isolates obtained from Sweden and 3 isolates obtained from Argentina, UAE, and USA. The first Swedish isolate was collected in Blekinge in April 2012 while the other 5 isolates came from 5 different patients hospitalized at the same medical center in Östergötland in August 2013. The Swedish isolates, particularly the latter five, could represent a single strain responsible for a small-sized outbreak taking place in Östergötland. The Argentinean strain represents 7 isolates sharing the same pulsed-field gel electrophoresis (PFGE) pattern [38]. These isolates were recovered between 2009 and 2012, during an endemic setting in three hospitals located in two different cities in Argentina. Similarly, the UAE and USA strains are representatives of two groups of isolates sharing same PFGE patterns [34,39]. The ability of CST23-24 to be endemic and to cause outbreaks of infections highlights the need to precisely distinguish such highly-successful subclones, which requires the application of more strict infection control procedures. Isolates from singleton STs (n = 8) belonged to 8 distinct CSTs. These isolates carried >80% of the unique isolate-specific spacers. The broad polymorphism detected among the spacers reflects the complicity and extensive diversity of phage and plasmid populations, facilitating the occurrence of several events of independent interactions and leading to separated pathways of evolution of the CRISPR arrays.

Conclusion
Vertical transmission of the CRISPR-cas subtype I-Fb locus in our global collection of A. baumannii clinical isolates took place following two main lineages and several pathways of descent from a common ancestor. Occasional events of horizontal transfer have increased the diversification and facilitated further dissemination of the locus. Using the CRISPR-based subtyping approach, we were able to detect a sub-clone of A. baumannii CC1, probably originating in Iraq and spreading internationally to the USA and Europe. The study also detected a sub-clone of A. baumannii CC25 with a remarkable ability to cause outbreaks of infections. The unambiguous data generated by this approach can readily be exchanged in silico, used by other groups, and expanded by forthcoming projects. Overall, CRISPR-based subtyping supplements MLST and can be used to track the source and dissemination routes of particular strains.
Supporting Information S1 Fig. Genetic structure of a prophage carrying 16 proto-spacers and representing an invader most frequently interacting with the CRISPR-Cas subtype I-Fb machinery of Acinetobacter baumannii. The prophage, 42778-bp long, was located on the genome of A. baumannii NIPH 527 (APQW01000004: 179581-222358). The prophage was shown as a white box on the graph and described as a "mobile_element" in the feature list. Genes and open reading frames were shown as blue arrows, with the arrowheads indicating the direction of transcription. Protospacers were presented as labeled green arrows, with the arrowheads indicating the direction of their integration as spacers in the CRISPR arrays. The prophage was surrounded by two identical 20-bp repeat regions, for which the sequences were indicated on the graph. The map was created using Artemis (http://www.sanger.ac.uk/resources/software/artemis/).