Short-Term Genome Stability of Serial Clostridium difficile Ribotype 027 Isolates in an Experimental Gut Model and Recurrent Human Disease

Background Clostridium difficile whole genome sequencing has the potential to identify related isolates, even among otherwise indistinguishable strains, but interpretation depends on understanding genomic variation within isolates and individuals. Methods Serial isolates from two scenarios were whole genome sequenced. Firstly, 62 isolates from 29 timepoints from three in vitro gut models, inoculated with a NAP1/027 strain. Secondly, 122 isolates from 44 patients (2–8 samples/patient) with mostly recurrent/on-going symptomatic NAP-1/027 C. difficile infection. Reference-based mapping was used to identify single nucleotide variants (SNVs). Results Across three gut model inductions, two with antibiotic treatment, total 137 days, only two new SNVs became established. Pre-existing minority SNVs became dominant in two models. Several SNVs were detected, only present in the minority of colonies at one/two timepoints. The median (inter-quartile range) [range] time between patients’ first and last samples was 60 (29.5–118.5) [0–561] days. Within-patient C. difficile evolution was 0.45 SNVs/called genome/year (95%CI 0.00–1.28) and within-host diversity was 0.28 SNVs/called genome (0.05–0.53). 26/28 gut model and patient SNVs were non-synonymous, affecting a range of gene targets. Conclusions The consistency of whole genome sequencing data from gut model C. difficile isolates, and the high stability of genomic sequences in isolates from patients, supports the use of whole genome sequencing in detailed transmission investigations.


Introduction
Multiple typing and fingerprinting schemes have been described for the common healthcare-associated pathogen Clostridium difficile. These have variable discriminatory power and practicality: particularly regarding utility by non-specialist laboratories, availability of reference databases and cost. [1] We have recently successfully applied sequence-based typing (multi-locus sequence typing, MLST) [2][3][4] to the investigation of C. difficile transmission. [5] This approach showed that the majority of 1282 C. difficile infection (CDI) cases over 2.5 years could not be explained by recent hospital ward contact with other symptomatic cases. However, the dominance of epidemic C. difficile types in many scenarios, for example ribotype 027 (NAP-1), [6,7] emphasises the need for typing methods with high discriminatory power that can be applied in multiple settings.
The increasing availability and affordability of next-generation whole genome sequencing (WGS) technology offers much greater genetic resolution that has hitherto been available. [8][9][10] However, increased discriminatory power may be undermined by genetic instability. There is therefore a key need to understand the stability of the C. difficile genome over time (ie its microevolution) in order to maximise the potential value of WGS, since plausibility of transmission between closely related isolates can only be interpreted in the context of within-patient diversity at single points in time, and over time. Ideally, such studies should use data obtained from both controlled experimental and in vivo settings to test the reproducibility of observations. Crucially, such combined approaches offer the potential to examine the effects of interventions/exposures. We have therefore investigated C. difficile genome stability from successive samples recovered from a gut model that simulates CDI and its treatment, [11,12] and from serial isolates recovered from patients with on-going or recurrent CDI.

Materials and Methods
We collected C. difficile isolates from each of the following two distinct sources. Firstly, serial samples were taken for whole genome sequencing from 3 human gut CDI model experiments conducted to investigate specific research questions. [12][13][14] One model simulated CDI resulting from co-amoxiclav exposure (without CDI treatment), and the other two, CDI following clindamycin exposure, then subsequently treated either with vancomycin for 4 days or cadazolid (a novel oxazolidinone/ quinolone hybrid [15]) for 7 days. The gut model used is described in full elsewhere, [12][13][14] however, briefly, it consists of three vessels in a weir cascade system, held under anaerobic conditions, and designed to reflect the increasing alkalinity (pH 5.5-6.8) and reduced substrate availability of the proximal to distal colon. In each experiment the system was inoculated with a pooled faecal slurry of C. difficile culture-negative faeces that were obtained from five healthy (C. difficile negative) elderly ($60y) volunteers. The model was top-fed with a growth medium at a controlled rate. The following time periods key time periods were observed ( Figure 1): Period A, steady state, when the model was left to equilibrate for at least two weeks prior to interventions; Period B, C. difficile spore inoculation, when the model was instilled with ,10 7 cfu C. difficile ribotype 027 spores and monitored for seven days; Period C, C. difficile spore inoculation and CDI initiation, when the model was instilled with a further ,10 7 cfu C. difficile ribotype 027 spores and CDI-provoking antibiotics (co-amoxiclav 8 mg/L, TID, 7d; or clindamycin, 33.9 mg/L, QID, 7d); Period D, CDI, i.e. C. difficile cytotoxin induction, following provocative antibiotic treatment; Period E, In two of the three gut models, CDI treatment (vancomycin 125 mg/L, QID 4d, or cadazolid 250 mg/L BID 7d [14,15]) following high level cytotoxin production ($4 relative units) for $2 days; and finally period F, a rest period when no further interventions took place. C. difficile isolates were obtained from the gut model at the timepoints indicated in Figures 2, 3, and 4, chosen to represent the different stages of C. difficile evolution within the gut and the variation in antibiotic selection pressure. Following culture on Columbia blood agar (ANO 2 , 24 h, 37uC). DNA was extracted from sub-culture of a representative sweep taken across the primary culture plate, except for 3 occasions where 12 individual colonies were sub-cultured and DNA extracted from the resulting growth to assess for within-'host' diversity (co-amoxiclav model, final timepoint and clindamycin/ vancomycin model immediately prior to CDI treatment and at the final timepoint).
Secondly, we analysed whole genome sequences from C. difficile cultured from 44 patients sampled more than once during ongoing or recurrent CDI due to C. difficile ribotype 027 (multi-locus sequence type 1, ST1). These CDI cases were identified from sequential unformed faecal samples that had been submitted to the Oxford University Hospitals NHS Trust Microbiology laboratory between September 2006 and March 2011 for routine investigation of microbial causes of diarrhoea. The cases build upon an existing collection of whole genome sequences from serially sampled patients which included 28 ribotype 027 infections which were previously analysed together with 63 other samples. [16] C. difficile toxin enzyme immunoassay (Meridian Biosciences, Cincinnati, OH) positive samples were cultured following an alcoholshock on modified Brazier's cycloserine-cefoxitin-egg yolk agar. [2] Culture positive isolates underwent MLST. [2] DNA was extracted from sub-culture of a single colony from each positive clinical sample.
DNA samples were prepared for sequencing using standard Illumina (San Diego, California, USA) and adapted protocols. Pools of 96 samples were sequenced at the Wellcome Trust Centre for Human Genetics, Oxford, UK, using sequencing-by-synthesis technology, [17] on the Illumina Genome Analyzer II (GAII), GAIIx, and HiSeq platforms, generating 50-108 base pair reads. Sequence reads were mapped using Stampy v1.0.11 (without Burrows-Wheeler Aligner pre-mapping, using an expected substitution rate of 0.01) [18] to a ribotype 027 C. difficile reference genome, CD196 (Genbank: FN538970) [19], to produce BAM files used in subsequent base-calling. Single nucleotide variants (SNVs) were identified across all mapped non-repetitive sites using SAMtools (version 0.1.12-10) [20] mpileup with the extended base-alignment quality flag, after parameter tuning based on bacterial sequences (options '-E -M0 -Q30 -q30 -m2 -D -S'). To identify repetitive regions, BLAST [21] searches of the CD196 reference genome were made using fragments of the same genome. As all strains analysed in this study were the same lineage as the reference strain (ribotype 027), we did not mask mobile elements. We only used SNVs that were supported by $5 reads, including one in each direction. A consensus of $75% was also required to support a SNV, and calls had to be homozygous under a diploid model.
As a quality control, five isolates had DNA extracted twice, and were sequenced twice, producing indistinguishable final sequences. The within-host diversity and rate of C. difficile evolution was estimated by maximum likelihood from first and last isolates from serially sampled patients under a coalescent model [22] assuming a Poisson distribution for the accumulation of mutations (see Text S1). Confidence intervals were obtained by parametric bootstrap. dN/dS ratios were calculated using the Nei and Gojobori method. [23] Genomes were sampled with replacement 1000 times to generate bootstrap confidence intervals.
Ethical approval for use of patient samples without individual consent was provided by the Oxford Research Ethics Committee (10/H0505/83) and the National Information Governance Board (8-05(e)/2010).

Genomic Stability of C. difficile Gut Model Isolates
A total of 62 C. difficile isolates were obtained and successfully sequenced from 29 samples collected from 3 gut models: 18 isolates from the co-amoxiclav induction model (7 timepoints over 35 days, 12 isolates at last timepoint, figure 2), 33 isolates from the clindamycin induction, vancomycin treatment model (11 timepoints over 70 days, 12 isolates at two timepoints, just prior to, and following vancomycin treatment, figure 3), and 11 isolates from the clindamycin/cadazolid model (11 timepoints over 32 days, figure 4). A mean 90.3% of the CD196 reference genome was called, with a mean read depth of 62.3.
A total of 10 variable sites were identified across all 3 gut models (figures 2, 3, and 4). Both the co-amoxiclav and clindamycinvancomycin models were inoculated with the same C. difficile ribotype 027 isolate. Sequences obtained from whole culture plate sweeps at the first 2 timepoints in these models show evidence of heterozygousity at 3 of the variant sites in both experiments (sites 1592378, 2916358, 3391317) suggesting that a mixture of C. difficile with and without these SNVs was present in the initial inoculum. In the co-amoxiclav model one of these SNVs (2916358) persisted throughout and was present in 7 of the 12 colonies sequenced at the final timepoint. In contrast the proportion of reads containing the other 2 SNVs diminished during the population expansion associated with CDI cytotoxin induction (phase D), and were present in 0/12 and 1/12 colonies at the end of the model. In the clindamycin/vancomycin model, 2 of the pre-existing SNVs became established (sites 1592378, 2916358), as well as a new SNV which arose during induction and persisted during treatment (2206108). In both models, a number of minority variants emerged, evidence of which could be seen in multiple picks taken following CDI cytotoxin induction (coamoxiclav model, sites 9921 and 10179; clindamycin/vancomycin model, site 977041) and CDI treatment (clindamycin/vancomycin model, 2380323, 3493190) (figures 2b, 3b). In the final model, clindamycin/cadazolid, a single SNV was observed at the final 2 timepoints (2914411) following treatment.
Overall, over a total 137 days of follow up, including three CDI inductions and two CDI treatments, only two new SNVs emerged that became established in the majority of sequences. Pre-existing SNVs present in the minority of the within-model inoculum population became dominant twice in one model, and once in another. Five additional SNVs were detected that were only present in the minority of colonies at one or two timepoints.

Genomic Stability of C. difficile Patient Isolates
A total of 122 isolates were obtained from 122 faecal samples from 44 patients with mostly recurrent, or on-going, CDI due to C. difficile ribotype 027/ST1 (23 patients with 2 samples, 12 patients with 3 samples, 8 patients with 4 samples and 1 patient with 8 samples). The median (IQR, range) time between first and last samples was 60 (29.5-118.5, 0-561) days. A single patient with 20 SNVs between first and last samples taken 60 days apart was excluded because the subsequent sample was considered more likely to represent a re-infection, as there was no evidence of a single recombination, no SNVs in DNA mismatch repair genes, and a sequence matching the later sample had previously been found in several other patients admitted to an adjacent ward at the same time. In the remaining 43 patients, successive C. difficile isolates from 28 patients had no SNVs. A further 3 patients had isolates with transient SNVs that were not detected at the final timepoint: 2 had a single SNV, and the other had 13 SNVs distributed throughout the genome after 93 days; in the latter case, however, the isolate recovered from a sample collected on the following day was identical to the initial isolate, suggesting a mixed infection in the transient 13 SNV isolate. The remaining patients had isolates with 1 SNV (n = 7), 2 SNVs (n = 4) or 3 SNVs (n = 1) (Figure 5a) at the final timepoint. All SNVs were distinct from those observed in the gut model. The overall rate of evolution estimated from first and last samples from these 43 serially sampled patients was 0.42 SNVs/year (95% CI 0.00-1.30) in the sites routinely called by sequencing (mean 90.0% of the CD196 reference genome) (Figure 5b). The within-patient diversity was 0.28 SNVs/called genome (95% CI 0.04-0.54). Excluding a single out-lying point at 561 days with 0 SNVs in a sensitivity analysis, the rate of evolution was 0.75 (95%CI 0.00-1.87)/called genome/ year and the within-patient diversity 0.16 SNVs/called genome (95%CI 0.00-0.46).

Functional Impact of Variants
The functional impact of SNVs arising in the gut model isolates and serial sample isolates from infected patients is shown in table 1. Nine of the 10 SNVs across the three gut models occurred within an annotated coding sequence, and all resulted in a nonsynonymous change, including two causing a premature stop codon. Similarly 19 of the 21 SNVs from patient samples occurred in coding sequence, with only 2 resulting in synonymous changes, and the remainder non-synonymous, including 4 premature stop codons. No patients developed the same SNV independently. There was only one occurrence of the same gene being affected by two mutations, both occurring in a RNA polymerase sigma-B factor gene in the co-amoxiclav induction gut model. Transcription and transport/metabolism pathways were affected by nonsynonymous SNVs in both clinical and gut model isolates.

Discussion
We have demonstrated using isolates obtained from two complimentary sources, recurrent CDI in patients and a predictive experimental gut model of CDI, that there is a high degree of C. difficile genome stability as measured by WGS. The two investigated scenarios represented opportunities for C. difficile genome mutation occurring over periods of one to several months. The studied settings represent relatively controlled and uncon- trolled stresses that could be expected to increase the chance of SNV occurrence. Both yielded isolates from CDI symptomatic/ toxin production periods and after treatment.
Both the gut model and clinical patient isolates showed evidence of within host variation at the initial sample, either directly in the case of the gut model, or from the evolutionary model in the clinical samples. This contributed significantly to SNV differences observed between baseline and subsequent samples. The 2 new SNVs observed over 137 days of gut model follow up are consistent with the estimates for C. difficile evolution of 0.42 (95% CI 0.00-1.30) SNVs/sampled genome/year obtained from serially sampled patients. The relatively broad confidence interval, encompassing zero, reflects the limited number of recurrent ribotype 027 infections in our dataset, and is a limitation of our study. When a single outlier with zero SNVs after 561 days was removed, the estimated rate of evolution increased modestly to 0.75 (95%CI 0.00-1.87) SNVs/called genome/year. This highlights the potential difficulties in applying a single clock rate to C. difficile, as the prolonged time with zero SNVs could represent time spent dormant in a spore state with a slower rate of evolution, which would lower the overall estimated rate of evolution. The estimates obtained are similar to previous estimates where data from ribotype 027 and non-ribotype 027 samples were combined. [16].
The presence of minority variants detected by sequencing subcultures of multiple colonies, whilst not unexpected, is a reminder that diversity may be detected between transmitted cases as a result of sampling a minority variant in either the donor or the recipient. Similarly, if only a single colony is sub-cultured, potential mixed infection may also be missed. However sequencing Figure 2. Co-amoxiclav induction gut model samples and variable site base counts. Panel A shows the total viable counts, spore counts and cytotoxin titres. Sampling points are shown as circles. Sampling points where 12 colonies were sub-cultured and sequenced separately are shown with an *. Panel B shows the base counts obtained at the variable sites identified between sequences. Each sample is represented in a different column. Samples taken at different timepoints are numbered sequentially. Where 12 colonies were sub-cultured at a given timepoint these are indicated by different letters following the sample number and yellow background shading. Replicate samples sequenced to assess the accuracy of the sequencing are shown with a', e.g. 3' and blue background shading. The proportion of high quality (base quality PHRED scaled score $30, mapping quality $30) base counts supporting the SNV identified is shown for each variable site as a separate row. Median (range) calls per site 49 . No site with .2 different base calls. doi:10.1371/journal.pone.0063540.g002 Figure 3. Clindamycin-induction, vancomycin-treatment gut model samples and variable site base counts. Panel A shows the total viable counts, spore counts and cytotoxin titres. Sampling points are shown as circles. Sampling points where 12 colonies were sub-cultured and sequenced separately are shown with an *. Panel B shows the base counts obtained at the variable sites identified between sequences. Each sample is represented in a different column. Samples taken at different timepoints are numbered sequentially. Where 12 colonies were sub-cultured at a given timepoint these are indicated by different letters following the sample number and yellow background shading (pre-treatment) and purple background shading (post-treatment). Replicate samples sequenced to assess the accuracy of the sequencing are shown with a', e.g. 6c' and blue background shading. The proportion of high quality (base quality $30, mapping quality $30) base counts supporting the SNV identified is shown for each variable site as a separate row. Median (range) calls per site 71  sweeps of primary culture plates using existing data pipelines may also miss minority variants if these are attributed to sequencing error, or result in a heterozygous base calls, which may be converted to a null call by sequencing quality filtering algorithms. Therefore, sequencing the full diversity present on primary culture plates together with bioinformatic approaches specifically designed to detect mixed infection may provide an optimal approach [24].
While relatively few mutations were observed, a striking proportion were non-synonymous. Twenty-six of 28 SNVs in coding regions were non-synonymous, with 6 resulting in premature truncation of a gene. This corresponds to a dN/dS ratio (ratio of non-synonymous vs. synonymous substitution rate, adjusted for the redundancy in the codons affected) of 5.3 (95% CI 1.6 to '), which contrasts with the dN/dS ratio reported in another whole genome study of recently diverged C. difficile lineages of very close to 1. [25] However the genomes in this previous study were separated by up to several years, rather than several weeks as was typical in our study. Although dN/dS ratios above 1 are suggestive of SNVs arising in response to selective pressure, such as might be experienced during antibiotic exposure, short-term ratios are typically higher owing to a lag in the removal of slightly deleterious mutations. [26] A number of transient SNVs were also observed, consistent with genetic drift. Whilst the classification of genes with SNVs is intriguing, with SNVs arising after antibiotic exposure in genes involved in transcription and transport/metabolism pathways, much larger formal association studies are required to understand their functional significance.
Another limitation of our study is that we do not have details of the specific interventions, particularly treatments, for individual CDI cases. However, during the study hospital policy was to treat initial CDI and first recurrence with oral vancomycin for 14 days; treatment was initiated when samples were sent for C. difficile testing and compliance was high (data not shown). Where patient's first and subsequent samples were more than 14 days apart (63 of the 78 subsequent samples obtained), we have assumed that these represent recurrences of CDI (in these cases, relapses with the same strains), or possibly failure to respond to initial treatment in some patients. Given the prolonged observation periods, it is highly likely that some of the strains we have examined were subjected to multiple exposures to antibiotics used to treat CDI and possibly other infections, and this could be explored in more detail in future studies. In common with the patients studied, CDI treatment with vancomycin was also simulated in one of the gut models. Whilst it would also be of interest to repeat the study in a population receiving metronidazole treatment (not used in Oxfordshire hospitals), and to simulate metronidazole treatment, the consistency of the results obtained to date suggest that this would be unlikely to materially alter our findings. The antibiotic used to treat simulated CDI in the final gut model is a hybrid molecule, which is currently in phase II clinical trials for the treatment of CDI. [14,15].
Recombination and hypermutation are two ways in which the modest rate of genome mutation we observed here could in theory accelerate. Hypermutation is not well studied in C. difficile. In this study, we found no variant sites arising within annotated DNA mismatch repair enzymes, responsible for hypermutation de-scribed in Staphylococcus aureus [27] and other pathogens. [28] There is clear evidence for recombination in the C. difficile genome over longer timescales, [3,16,25] although interestingly it is not seen as frequently in ribotype 027 isolates compared to some other lineages. [16] With relatively few variants and short time scales we did not find evidence of clustering of nearby variable sites suggestive of recombination in our study. However, the absence of C. difficile mixed infection in the gut model precludes within-species recombination.
Our data showing relatively stable genomes, and low but nonzero microevolution, in the face of interventions and indeed repeated episodes of infection and treatment provide reassurance that WGS can be used to (re-) examine the epidemiology of C. difficile and aetiology of CDI. For example, using genotyping (MLST) we recently unexpectedly found that about three-quarters of CDI cases could not be linked to other cases by ward-based contact, [5] raising the issue of other sources of C. difficile transmission not well addressed by current infection control policy and guidance. Whole genome sequence data offer the potential to study such C. difficile transmission in more detail. [16] To interpret WGS data, however, it is necessary to first establish the 'natural' rate of genome mutation. Without such microevolutionary information, use of such a highly discriminatory fingerprinting method risks dismissing CDI cases and sources that could in fact be related. [9] Speculation about the aetiology of community CDI cases and indeed the possible links between human and animal C. difficile strains emphasise the need for optimised fingerprinting studies. [29,30] A better understanding of other possible routes of transmission and reservoirs is needed to optimise control efforts.
In summary, the consistency of sequences from our gut model isolates demonstrates the reliability of our sequencing and analysis techniques. Crucially, the high stability of mapped genomic sequences in C. difficile isolates from patients supports their use in detailed transmission studies. Our findings also highlight the importance of detailed microevolutionary studies specific to key pathogens to maximise the future value of pathogen whole genome sequencing.

Data Sharing
The sequences reported in this paper have been deposited in the European Nucleotide Archive Sequence Read Archive under study accession number ERP002404: Short-term stability of C. difficile 027 isolates in a gut model and recurrent disease.

Supporting Information
Text S1 Description of coalescent-based model used to estimate within-host diversity and rate of C. difficile evolution. (DOCX)