CRISPR Diversity in E. coli Isolates from Australian Animals, Humans and Environmental Waters

Seventy four SNP genotypes and 54 E. coli genomes from kangaroo, Tasmanian devil, reptile, cattle, dog, horse, duck, bird, fish, rodent, human and environmental water sources were screened for the presence of the CRISPR 2.1 loci flanked by cas2 and iap genes. CRISPR 2.1 regions were found in 49% of the strains analysed. The majority of human E. coli isolates lacked the CRISPR 2.1 locus. We described 76 CRISPR 2.1 positive isolates originating from Australian animals and humans, which contained a total of 764 spacer sequences. CRISPR arrays demonstrated a long history of phage attacks especially in isolates from birds (up to 40 spacers). The most prevalent spacer (1.6%) was an ancient spacer found mainly in human, horse, duck, rodent, reptile and environmental water sources. The sequence of this spacer matched the intestinal P7 phage and the pO111 plasmid of E. coli.


Introduction
Current water quality research is predominantly focused on identifying sources of faecal contamination in environmental water. However, despite many advances, utilizing many different technologies, tracing faecal water contamination remains problematic. In this work we investigated diversity of CRISPR in isolates from animals, humans and environmental waters and the potential of using outcome of that studies for microbial source tracking. The CRISPR system is a recently discovered immune-like defence system of bacteria and archaea against phages and plasmids [1,2]. Mechanisms of this defence system have been well studied, however CRISPR adaptation remains poorly understood [3,4]. The system is based on the retention of specific sequences (proto-spacers) from mobile genetic elements during the first infection/integration. These are harboured (as spacers) within so-called CRISPR loci [5]. These spacers are transcribed with handles comprised of the repeats into short interfering CRISPR RNA molecules (crRNA), and are subsequently used to interfere (commonly referred to as 'silencing') with the known, or recognised, foreign DNA or RNA to cleave foreign nucleic acid in order to protect the cell [6]. Intriguingly, whilst about 96% of archaea contain CRISPR genes, only about 45% a diverse number of E. coli isolates from Australian animals (indigenous and introduced), water samples (lake and rivers), commensal (faeces), and clinical (urine and blood) human samples. The analysis was also applied to in silico CRISPR sequences from Australian carnivorous marsupials, fish, Tasmanian devil, environmental water and human faeces (clinical and commensal) in an effort to assess, and potentially extend, the methods utility for such analyses.

Materials and Methods
E. coli isolate collection E. coli isolates (N = 185), from a range of sources (animals, water and human) [24] and previously characterized using SNP analysis [24], were selected for CRISPR screening (see Table 1). Cultures were incubated overnight at 37°C in 5 mL nutrient broth (Oxoid, UK) followed by DNA extraction.

DNA extraction
Overnight broth culture of 500 μL was centrifuged at 10 000 × g for 1 min. Cell pellets were resuspended in 180 μL DNase/RNase-free water and used for DNA extraction on the Corbett X-tractorGene automated DNA extraction system (Corbett Robotics, Australia). Phenol extraction for sequencing was performed manually according to the "OpenWetWare protocol" [26]. Briefly, cells were re-suspended in a TE buffer pH 8.0, incubated with RNase A (25mg mL −1 ) for 30 min at 65°C and then with proteinase K (25 mg mL −1 ) for 15 minutes at 37°C. After double extraction by phenol and chloroform, DNA was precipitated by ice-cold ethanol, and with 3M sodium acetate overnight with its final pellet washed with 70% ethanol and stored in DNase/RNase free water at −20°C until further use. The quantity and purity of DNA extracts were determined using a DU 730 spectrophotometer (Beckman Coulter, USA).
Screening of E. coli strain collection for CRISPR 2.1 To screen isolates for the CRISPR 2.1 locus, primers 5 0 -TGGTGAAGGAGTTGGCGAAGG-3 0 and 5 0 -AAAATGTCCCTCCGCGCTTACG-3 0 , annealing iap and cas2 respectively [13] were used with TaqPol (Roche, Australia) in a modified touch-down PCR reaction with the following conditions: denaturation at 95°C for 5 min, then 15 cycles, 95°C for 15 sec, 68°C for 1 min with decreasing temperature for each cycle, 72°C for 2 min; finally 20 cycles 95°C for 15 sec, 60°C for 1 min, 72°C for 2 min; final extension 72°C for 10 min. The amplicons were visualised using a 1% agarose gel, run at 100V for 30 min and stained using SYBR-Safe (Invitrogen, USA), and captured using the Gel-doc system (Bio-Rad, USA). CRISPR 2.1 presence was determined by the presence of appropriately sized bands (500-2000 bp).

Cloning procedure and sequencing
Forty five isolates positive for CRISPR 2.1 loci were amplified using High Fidelity DNA polymerase (Bio-Rad, USA). The bands were excised from the gel, purified with QIAquick Kit (QIAGEN, USA) and ligated into a pGEM-3Z plasmid vector as described previously [27]. The plasmid DNA samples were submitted for sequencing to the Australian Genomic Research Facility, Brisbane. The sequences were annotated using ContigExpress, VectorNTI v.10 software (Invitrogen, USA) and were deposited in GenBank with accession numbers KF707494-KF707538.

CRISPR analysis
CRISPR sets from sequences of CRISPR 2.1 locus were identified using the CRISPRFinder web database [7]. In addition, CRISPR loci from 54 E. coli genomes sourced from the Broad Institute (http://www.broadinstitute.org/annotation/genome/escherichia_antibiotic_resistance/ GenomesIndex.htmL) were analysed in silico and 30 positive isolates carrying CRISPR loci were combined with our library.
Direct screening of water samples and E. coli DNA for the known mobile genetic elements acquired from sequenced CRISPR arrays Primers for the P7 phage were designed using Vector NTI (Invitrogen, USA) to target the corresponding spacer revealed from the E. coli CRISPR sequence analysis. A two litres water sample from the Brisbane River was filtered through a 0.33 μm filter. Genomic DNA (20 ng) from E. coli isolates were added as template for touch-down PCR reactions targeting 444 bp long amplicon of P7 bacteriophage by primer pairs (F: 5 0 -TCAAAATCCCCTGTTATCGT-3 0 and R: 5 0 -TATTGTCTGAATGGTGGGGC-3 0 ) with the following conditions: denaturation at 95°C for 5 min, then 15 cycles, 95°C for 15 sec, 65°C for 1 min with decreasing temperature for each cycle, 72°C for 2 min; finally 20 cycles 95°C for 15 sec, 60°C for 1 min, 72°C for 2 min; final extension 72°C for 10 min. Positive PCR products were excised from the gel and sequenced as previously described.

Results
A set of 185 Australian E. coli isolates from different sources (animals, water and human) [24] was examined for the presence of CRISPR2.1 loci. Half of these were positive for the CRISPR 2.1 region (Table 1). Using the previously determined SNP profiles [24], 50 isolates were selected for sequencing from human and animal specific sources, and also mixed sources, which were positive for CRISPR 2.1. Isolates with identical SNP profiles and originated from the same plate and have similar/identical bands on the CRISPR 2.1 gel were excluded from current studies to minimize cloning and sequencing costs. The sequences of these selected isolates and other Australian isolates downloaded from the Broad Institute website were analysed using CRISPRFinder, to detect direct repeats and spacers in these genomes. In order to sequence the CRISPR 2.1 regions, PCR products were cloned into pGEM-3Z vectors. Subsequently, the plasmid libraries were sequenced.
The spacers from all isolates tested and the aligned in silico strains, based on our library spacers' sequences, are shown in Fig 1, together with their relevant SNP profiles. Each unique spacer array was assigned an allele number, and the primary source and SNP profiles were also included.
All spacer sequences were interrogated using the BLASTN algorithm in GenBank. This allowed for the extraction of specific proto-spacers which have been annotated thus far as mobile elements (plasmids, phages), in order to identify the influence of invading elements on E. coli diversity. The results of this BLASTN analysis, showing plasmids and phages, can be found in Table 2. This includes phage P7, E. coli plasmid pO111, pO157, amongst others.

CRISPR diversity in human sourced E. coli isolates
Analysis of our mixed-source E. coli library revealed that CRISPR2.1 was present in some isolates from all the animals tested, however, the isolates from human sources tended to lack CRISPR 2.1 in general. About 75% of human isolates lack CRISPR 2.1 (53/71), in contrast to 30% (12/17) of in silico analysis of published Australian E. coli genomes. However, at least half of the in silico human-originating E. coli that lacked CRISPR 2.1 loci or where spacers were missing, were from the B2 phylogroup, which correlates with previously published data [21,28] and low CRISPR presence in uropathogenic E. coli [29]. Indeed, the absence of CRISPR genes could be viewed as an indication of the presence of clinical and potentially pathogenic E. coli in water, as we found that the majority of our isolates from urine, blood and fecal clinical specimens also lacked CRISPR 2.1. Even if some strains could harbour CRISPR 2.1, the absence of the cas2 gene may further support dysfunction of the CRISPR defence mechanism in clinical E. coli [30].

CRISPR typing further resolves E. coli isolates with the same SNP and MLST profiles
We did not observe any consistent relationship between SNP profiles of our isolates or the Sequence Type (ST) of in silico strains and CRISPR alleles. Previously, evolutionary studies on CRISPR diversity of Sulfolobus islandicus demonstrated independent spacers' acquisition in regards to genotype mutations in house-keeping genes [31]. This is due to the rapid CRISPR array evolution allowing different genotypes to acquire the same resistance to a common pool of viruses and plasmids [31], in contrast, house-keeping genes are more conserved.
Certain isolates with the same SNP profile were found to have different CRISPR alleles. For instance, SNP profile 23 comprises isolate profiles of kangaroo (k8); cattle (c67) and a water sample (5A2). These three isolates had completely different CRISPR alleles (CA) confirming that the three isolates were indeed from different sources, despite their common SNP profile. Since one of the aims of this project was to further discriminate between E. coli isolates with common SNP profiles, and in this instance, we have proven this is true, however it is obvious that further work is required to discern whether more mixed-source SNP profiles may be discriminated using this approach. Another type of relationship was observed in ST10, a genotype common to three human E. coli isolates (H386, H454, H617), and one from water (E1002). These isolates had similar but not identical spacer sequences. However, the pattern was not followed with human isolate H383 (also ST10), which has a different CRISPR allele compared to the other ST10 isolates. Therefore, the use of CRISPR diversity may allow distinction between isolates from different host origins, which were previously combined in one ST and consequently in one SNP profile [24].
Identical E. coli CRISPR arrays indicate that isolates could be from the same geographical location In contrast, other isolates had different SNP profiles but the same CRISPR allele pattern. For instance, E. coli isolate c69 originating from cattle-(SNP profile 2) had the same CRISPR allele (CA 40-42) as cattle isolate c72 (SNP profile 14) and kangaroo-originating k297 (SNP profile 4). The only identical spacers to be found in E. coli were from these three animal faecal samples originating from one farm site. As the likelihood of identical CRISPR 2.1 alleles in unrelated organisms is very low, we postulated that it was likely that these three isolates were identical because they were from the same geographical location. Since CRISPR spacer arrays, which are undergoing rapid horizontal gene transfer events, can change rapidly within a few generations of E. coli growth in a host, it is very probable that only those hosts sharing a food source, and in close physical proximity to each other, will have identical CRISPR spacer arrays. However, recent studies reported the conservative character of CRISPR alleles [28,32]. According to these studies, E. coli strains which diverged about 250 000 years ago have identical CRISPR arrays compared to modern strains, indicating a surprisingly low level of diversity. Nevertheless, our CRISPR library shows a high diversity of spacers from a wide range of hosts and uniquely similar arrays only within those E. coli isolates which were sampled from one geographical location and at one time.
Identical alleles were also observed from in silico animal and human E. coli sequences: isolate R424 (ST34) from a common suburban garden skink and human E. coli H383 (ST10) (Fig  1). Another example of matching CRISPR alleles was observed in the group CA6-9, which combined human, reptile and medium-sized carnivorous marsupial E. coli isolates, as well as isolates from water. While the host source of these isolates is known, the geographical area in which the hosts were contained was not supplied, so it was not possible to verify the explanation tendered above.

Proto-spacers identified in Australian E. coli isolates
This study established that the enterobacterial P7 phage proto-spacer in E. coli was found in Australian animal and human isolates and also in isolates from environmental water sources. Mostly this proto-spacer was found in association with plasmid pO111_2, normally found in E. coli O111 EHEC (Table 2).
Previously, proto-spacers of P1 phage and F plasmid have been found in E. coli genotypes ECOR44 and 47 and ECOR 42 and 49 respectively [13]. This led to further analysis of our data in order to find the link between DNA phages/plasmids and proto-spacers in Australian E. coli isolates.
Of particular interest was the large number of isolates that had the proto-spacers P7 and pO111_2 [33]. This showed that these plasmids carry virulence genes which change non-virulent O157 E. coli into pathogenic O157 ETEC strains. Taking into account the large proportion of human isolates with spacers against P7/pO111, it can be seen that similar past attack events seemed to have occurred in many of the isolates analyzed here. The observed insertion of the proto-spacer pO111 into the genome of past isolates may indicate the frequent interaction with genetic mobile elements.
Such strains either become resistant to the conjugative plasmid pO111 or become repressed by CRISPR immunity, if this plasmid is acquired. Resistance to plasmids was recently reported in Staphylococcus aureus livestock ST398-MRSA-V strains, which explains why such strains have less antimicrobial drug-resistant genes and phage-encoded virulence factors compared to other MRSA strains [34].

Proto-spacers and CRISPR immunity
Extensive studies into the molecular mechanism of CRISPR immunity have shown that insertion of proto-spacers into the CRISPR array occurs in the next position after the first direct repeat, which locates them in close proximity to the leader sequence. Thus, spacers are stored strictly in chronological order with the oldest at the end (spacer 1, see Fig 1). The most intriguing matches were shown for spacer 270 of the duck isolate du151. In silico predictions revealed a potential linkage between this spacer and plasmids carrying virulence genes ( Table 2). Interestingly, spacer 270 was not only predicted to target E. coli plasmids, but also plasmids of other enterobacteria such as Shigella and Salmonella. For instance, spacer 270 targets plasmid p666, which confers ETEC pathogenicity to E. coli H10407 [35]. Another plasmid pO83 (150 kB in length) [36], carries adhesive factors, which allow E. coli to become adherent and invasive E. coli (AIEC), which is associated with Crohn's Disease. This plasmid has >85% identity with two other plasmids, pAPEC-01-ColBM and avian pathogenic E. coli (APEC), and also with the plasmid pCVM29188_146 of Salmonella enterica serovar Kentucky [37]. These plasmids share the common function of colicin M and D production. The same proto-spacer sequences were present in the large plasmid pEC_24 with 73.8Kb which was previously identified from a number of clinical isolates [38]. Interestingly, as well as carrying multiple-antibiotic resistance genes, this plasmid also harbours genes responsible for colicin production that Smet and co-authors had found never to have been reported before for such IncFII class [38]. E. coli uses these colicin toxins to compete with other strains of the same species, as acquisition of these plasmids can give an advantage to the host by killing the strains which do not produce colicins [38]. This may lead to the explanation of why the spacer was not commonly identified in the E. coli population. Particular duck E. coli isolate 240, which has a distinctive allele and could have an ecological niche. This isolate probably does not require colicin production due to lack of interaction with E. coli strains in the gut. Further detailed studies are needed to prove this assumption.
As noted earlier, the CRISPR system targets the most vital genes coding for key proteins in the replication process of the mobile elements' replication, or the conjugation process [13,39]. Using in silico predictions, we found that du151 spacer 270 targets: OriT nicking and unwinding protein, a type IV secretion-like conjugative transfer system pilin acetylase TraX of the Shigella species; a relaxase protein TraI of the same family in Salmonella typhi plasmid, and conjugal transfer nickase/helicase TraI in the E. coli F plasmid. Indeed, CRISPR interference with plasmids was initially discovered on plasmid nickase genes of Staphylococcus epidermidis [40] which is vital for self-replication. Thus, current results are further evidence of the universal nature of the CRISPR resistance mechanism in E. coli.
Interestingly, high numbers of old acquired spacers matched sequences that were flanked by aminopeptidases and some non-annotated conservative protein genes, from CRISPR sites of E. coli serotypes O111:H-, O113:H2 and O26:H11. Isolates from different sources had such spacers, which were, however, different from known mobile elements. This finding poses a question about the existence of such mobile elements that might not yet have been isolated. Alternatively, it may be that such elements have disappeared or have become degenerated, and can now be found only as a part of the bacterial genome. Such spacers were identified in isolates from horses and water samples only, so they are poorly studied. These isolates, however, are identical to CRISPR arrays of pathogenic strains of E. coli in GenBank, suggesting that they are likely to have shared the same pool of mobile elements in the past. Future studies on host-virus interactions could reveal such conjugative plasmids and phages that remain unknown at present.

Conclusion
It was found in this study that CRISPR array analysis alone was not effective as a source tracking tool for E. coli, which led us to intensify the search for unknown phage sequences. The high diversity in E. coli CRISPR can be an advantage when the level of specificity is required to be high (for instance, in the case of proof of site identity). Such a tool may involve the combination of SNP genotyping and CRISPR allele identification based on a high resolution melt approach. The fact that isolates from two sources shared an allelic profile out of 50 isolates analyzed, led us to apply the preliminary results using a combination of methods for microbial source tracking. CRISPR spacers, harbouring sequences of phage and plasmids, may prove useful in investigating host-specificity of these invading elements. Analysis of E. coli CRISPR patterns showed a lack of host-specificity for all isolates sequenced in this study.