The authors have declared that no competing interests exist.
Conceived and designed the experiments: NR BB LG TFK. Performed the experiments: NR BB. Analyzed the data: NR BB JDJ LG TFK. Contributed reagents/materials/analysis tools: NR LG DF MRS JDJ. Wrote the paper: NR JDJ TFK.
Populations of human cytomegalovirus (HCMV), a large DNA virus, are highly polymorphic in patient samples, which may allow for rapid evolution within human hosts. To understand HCMV evolution, longitudinally sampled genomic populations from the urine and plasma of 5 infants with symptomatic congenital HCMV infection were analyzed. Temporal and compartmental variability of viral populations were quantified using high throughput sequencing and population genetics approaches. HCMV populations were generally stable over time, with ∼88% of SNPs displaying similar frequencies. However, samples collected from plasma and urine of the same patient at the same time were highly differentiated with approximately 1700 consensus sequence SNPs (1.2% of the genome) identified between compartments. This inter-compartment differentiation was comparable to the differentiation observed in unrelated hosts. Models of demography (i.e., changes in population size and structure) and positive selection were evaluated to explain the observed patterns of variation. Evidence for strong bottlenecks (>90% reduction in viral population size) was consistent among all patients. From the timing of the bottlenecks, we conclude that fetal infection occurred between 13–18 weeks gestational age in patients analyzed, while colonization of the urine compartment followed roughly 2 months later. The timing of these bottlenecks is consistent with the clinical histories of congenital HCMV infections. We next inferred that positive selection plays a small but measurable role in viral evolution within a single compartment. However, positive selection appears to be a strong and pervasive driver of evolution associated with compartmentalization, affecting ≥34 of the 167 open reading frames (∼20%) of the genome. This work offers the most detailed map of HCMV
The large, dsDNA virus Human cytomegalovirus (HCMV) is the most genetically complex viral pathogen of humans. HCMV populations are highly variable, which may allow the virus to evolve in human hosts on short timescales. We tested this hypothesis by longitudinally sampling HCMV populations from the urine and/or plasma of congenitally infected infants. We found that HCMV is generally stable within a compartment, but rapidly evolves when crossing host compartments. In fact, HCMV sampled from two compartments of the same host is as different as HCMV collected from unrelated hosts. We used mathematical modeling and population genetic analysis to show that both a bottleneck (i.e., a reduction in population size) associated with compartment colonization as well as positive selection are necessary to explain the observed differences between compartments. We also conclude from these data that fetal infection in these patients occurred between 13–18 weeks gestational age, consistent with the timing of symptomatic congenital HCMV infections. This study is the most detailed investigation of DNA virus evolution in human hosts to date, provides a framework for the study of other viral infections using similar techniques, and will aid in the development of new antiviral therapies and vaccines.
Human cytomegalovirus (HCMV) is a β-herpesvirus with seroprevalence of 30–90% in the United States
HCMV has been shown to be highly polymorphic both among and within human hosts
To better understand this process, we used high throughput sequencing to sample HCMV genomic populations from the urine and plasma of 5 infants with symptomatic congenital HCMV infection at multiple time points during the first year of age. This approach allowed us to monitor genome-wide evolution of the virus and to determine the mechanisms that contribute to pathogen evolution. We find that HCMV populations can evolve slowly within the same tissue compartment, or rapidly when moving between compartments. In order to characterize the mode and tempo of this differentiation, we constructed detailed maps of the
Our previous work showed significant variability within HCMV populations sampled from the urine of congenitally infected neonates - variation that could potentially enable rapid evolution of the viral population
Patient |
Specimen Source | Age at Specimen Collection (months or as noted) | Viral Load (copies/mL) |
B101 | Urine | 7 | 1.3E+07 |
B101 | Urine | 10 | 1.8E+07 |
B103 | Urine | 1 week | 6.7E+06 |
B103 | Urine | 6 | 2.6E+07 |
B103 | Plasma | 1 week | 6.9E+04 |
B103 | Plasma | 6 | 4.9E+04 |
M103 | Plasma | 1.5 | 1.8E+04 |
M103 | Plasma | 5 | 3.9E+04 |
MS1 | Urine | 1 | 5.0E+07 |
MS1 | Urine | 2 | 6.4E+06 |
MS1 | Urine | 11 | 4.0E+05 |
MS2 | Urine | 1 | 3.7E+07 |
MS2 | Urine | 2 | 1.0E+07 |
MS2 | Urine | 11 | 4.5E+05 |
All specimens were collected from patients with confirmed congenital HCMV infections.
Patients MS1 and MS2 are monozygotic, monochorionic twins.
Initially, the populations were studied separately by quantifying the intrahost variation of each population. Consistent with previous work, thousands of single nucleotide polymorphisms (SNPs) were observed in each population (
Box and whisker plots of the nucleotide diversity of HCMV populations sampled from the urine and plasma compartments. The boxes mark the interquartile values and the whiskers show the minimum and maximum values. The distributions are significantly different (
Patient | Sample Source | Time of Collection | SNPs | Π |
Nonsynonymous SNPs | ΠNon |
ΠSyn |
B101 | Urine | 7 months | 6,765 | 0.17% | 4,280 | 0.11% | 0.04% |
B101 | Urine | 10 months | 5,019 | 0.13% | 3,153 | 0.07% | 0.02% |
B103 | Urine | 1 week | 6,350 | 0.10% | 3,736 | 0.05% | 0.02% |
B103 | Urine | 6 months | 6,723 | 0.10% | 3,776 | 0.05% | 0.02% |
B103 | Plasma | 1 week | 4,890 | 0.21% | 2,606 | 0.09% | 0.09% |
B103 | Plasma | 6 months | 5,735 | 0.17% | 3,181 | 0.09% | 0.04% |
M103 | Plasma | 1.5 months | 3,503 | 0.14% | 2,272 | 0.06% | 0.04% |
M103 | Plasma | 5 months | 3,820 | 0.12% | 2,308 | 0.07% | 0.03% |
MS1 | Urine | 1 month | 5,049 | 0.09% | 2,580 | 0.04% | 0.02% |
MS1 | Urine | 2 months | 4,693 | 0.09% | 2,475 | 0.04% | 0.02% |
MS1 | Urine | 11 months | 5,515 | 0.13% | 2,978 | 0.04% | 0.03% |
MS2 | Urine | 1 month | 4,662 | 0.09% | 2,590 | 0.04% | 0.02% |
MS2 | Urine | 2 months | 3,950 | 0.08% | 2,008 | 0.03% | 0.02% |
MS2 | Urine | 11 months | 8,789 | 0.13% | 4,870 | 0.06% | 0.03% |
Π is nucleotide diversity as calculated using the formula of Nei and Li
ΠNon is amino acid diversity and ΠSyn is the diversity of synonymous SNPs. Both were calculated in the same way as Π but only using nonsynonymous or synonymous SNPs, respectively.
To better understand the patterns of sequence variability, the HCMV genomic populations were studied as pairs across time or compartments. Eight pairings were created for host-matched specimens, and the frequencies of all SNPs were tracked across the pairings (
Host-matched specimens were paired either across time or compartments. The frequency of SNPs was tracked across the pairings. The trajectories show that HCMV populations can be stable, with the majority of SNPs remaining at nearly the same frequencies over time. A minority of SNPs rapidly change frequencies during the course of sampling, which is most apparent in panels C and G. Panels A–D: trajectories of all SNPs in the populations. Panels E–H: trajectories of only the consensus sequence SNPs identified between the pairings. Panels A and E: B103 longitudinal urine populations. Panels B and F: B103 longitudinal plasma populations. Panels C and G: B103 one week plasma and urine populations. Panels D and H: MS1 longitudinal urine populations. See supplemental
A consensus sequence was then called for each sampled population. The consensus sequences of two populations of interest were aligned and SNPs identified between the sequences (
High throughput sequence data was generated on HCMV populations collected from 5 infants from either the urine or plasma compartment or both compartments (patient B103). Panel A: Consensus sequences were generated from the population data of each sample and maximum likelihood phylogenetic trees were constructed from whole genome alignments. Sequences longitudinally sampled from the same compartment of the same host clustered together. In contrast, B103 sequences sampled from the urine and plasma at the same time were highly differentiated and clustered more closely with sequences from other hosts. Intriguingly, plasma sequences from two hosts appeared (M103 and B103) to form a single clade, a result consistent with convergent evolution acting on plasma populations. Node labels represent bootstrap values from 100 replicates and the tree was rooted with the HCMV reference sequence (Strain Merlin, Ref Seq ID: NC_006273). Panel B: Population differentiation between the populations was measured by estimating the summary statistic FST for the specimen pairings. Higher values of FST indicate higher levels of population differentiation. HCMV populations are relatively stable across time when measuring populations collected from the same compartment of a single host or between monozygotic, monochorionic twins (MS1 and MS2). B103 populations sampled from different compartments at a single timepoint showed elevated FST values, indicative of higher levels of population differentiation. The level of differentiation observed between compartments in a single patient is comparable to that observed between populations collected from unrelated patients. Error bars represent 95% confidence intervals. ANOVA analysis:
Comparison | Patient | Sample 1 Source | Sample 1 Time | Sample 2 Source | Sample 2 Time | FST | Consensus Sequence SNPs | Π | Consensus Sequence Nonsynonymous SNPs | ΠAA |
Urine | 7 month | Urine | 10 month | 0.05 | 52 | 0.03% | 10 | 0.02% | ||
Urine | 1 week | Urine | 6 months | 0.06 | 49 | 0.02% | 20 | 0.04% | ||
Plasma | 1 week | Plasma | 6 months | 0.23 | 213 | 0.14% | 71 | 0.17% | ||
Plasma | 1.5 month | Plasma | 5 month | 0.16 | 59 | 0.06% | 33 | 0.13% | ||
Urine | 1 month | Urine | 2 months | 0.09 | 164 | 0.08% | 26 | 0.05% | ||
Urine | 2 months | Urine | 11 months | 0.22 | 234 | 0.15% | 53 | 0.13% | ||
Urine | 1 month | Urine | 2 months | 0.08 | 160 | 0.07% | 52 | 0.09% | ||
Urine | 2 months | Urine | 11 months | 0.17 | 135 | 0.07% | 24 | 0.04% | ||
MS1 Urine | 1 month | MS2 Urine | 1 month | 0.08 | 147 | 0.07% | 46 | 0.08% | ||
MS1 Urine | 2 months | MS2 Urine | 2 months | 0.05 | 145 | 0.07% | 26 | 0.05% | ||
MS1 Urine | 11 months | MS2 Urine | 11 months | 0.14 | 155 | 0.10% | 37 | 0.09% | ||
Plasma | 1 week | Urine | 1 week | 0.42 | 1,602 | 0.89% | 459 | 0.96% | ||
Plasma | 6 months | Urine | 6 months | 0.45 | 1,769 | 1.15% | 498 | 1.21% |
In the above analysis, comparisons were made between consensus sequences, which incorporate only high frequency SNPs of the populations. To determine if similar levels of differentiation were present for comparisons of all SNPs, FST
These data led us to explore the mechanisms of both the population stability and rapid differentiation of the HCMV populations observed in this study. From population genetics, it is known that both demography and selection can lead to large changes in SNP frequencies in relatively short timespans
The high throughput sequence data was analyzed to infer the demographic history of the viral populations using the statistical framework described previously
The demographic histories of the viral populations were inferred from the high throughput sequence data. In the models, time increases from left to right and the width of the various shapes is proportional to the size of the viral populations. All population sizes and timespans are drawn to scale. A tabular representation of parameter values of the model can be found in Table S2. See the text for a complete discussion of the models. Panel A: Model of B103 sampled population histories. (The populations within the urine compartment of B103 are drawn to 1∶8 scale [as compared to the B103 plasma compartment] for the sake of clarity.) Panel B: Model of B101 sampled population histories. Panel C: Model of M103 sampled population histories. Panel D: Model of MS1 and MS2 sampled population histories Panel E: Expansion of early timepoints of MS1 and MS2 model. Arrows drawn between populations (Panels A and D) represent migration rates and are scaled relative to each other and not population sizes to improve visibility of the arrows.
The demographic history was first inferred for the B103-sampled viral populations given the rapid inter-compartment differentiation observed. In the best-fitting B103 demographic model (
The B101 and M103 best-fit models appear to agree qualitatively with that of B103 with respect to the demographic histories of
Next, a best-fit demographic history was inferred from the MS1 and MS2 sequence data (
From the demographic model, some or all of the differentiation across compartments may be explained by a bottleneck effect associated with plasma populations colonizing the urine compartment, such as shown in
The results of this analysis show that the effect of positive selection in shaping HCMV evolution is variable and dependent on context. Very few putative targets of positive selection were identified in the compartment-matched, longitudinal specimens from all infants (
The population branch statistic (PBS) was used to identify putative targets of positive selection within the HCMV viral populations collected from congenitally infected infants. Higher PBS values identify loci that have a higher likelihood of being targets of postive selection. The red line indicates the 5% significance threshold, above which values are considered significant. This threshold was determined through simulations using the demographic parameters inferred from the data, as depicted in
Comparison | Patient | Source - Sample 1 | Time - Sample 1 | Source - Sample 2 | Time - Sample 2 | Positively Selected SNPs | Selected SNPs in Coding Region |
ORFs |
Nonsynonymous Selected SNPs |
Longitudinal | B101 | Urine | 7 month | Urine | 10 month | 11 | 8 | 4 | 3 |
B103 | Urine | 1 week | Urine | 6 month | 2 | 2 | 1 | 2 | |
B103 | Plasma | 1 week | Plasma | 6 month | 23 | 20 | 13 | 8 | |
M103 | Plasma | 1.5 month | Plasma | 5 month | 3 | 3 | 3 | 0 | |
Interhost | MS1/MS2 | MS1 Urine | 1 month | MS2 Urine | 1 month | 23 | 21 | 6 | 2 |
MS1/MS2 | MS1 Urine | 2 months | MS2 Urine | 2 months | 13 | 11 | 4 | 2 | |
MS1/MS2 | MS1 Urine | 11 months | MS2 Urine | 11 months | 9 | 8 | 3 | 2 | |
MS1/MS2 | MS2 Urine | 1 month | MS1 Urine | 1 month | 9 | 7 | 3 | 1 | |
MS1/MS2 | MS2 Urine | 2 months | MS1 Urine | 2 months | 11 | 11 | 3 | 7 | |
MS1/MS2 | MS2 Urine | 11 months | MS1 Urine | 11 months | 6 | 6 | 2 | 0 | |
Compartmental | B103 | Plasma | 1 week | Urine | 1 week | 114 | 83 | 34 | 26 |
The number of positively selected SNPs located in a coding region.
Open Reading Frame.
The number of ORFs that contain a positively selected SNP.
The number of positively selected SNPs that are nonsynonymous.
In contrast, the assay yielded significantly different results when studying positive selection associated with movement of virus across compartments. The B103 populations sampled from the 1 week urine and plasma specimens were used to estimate the effect of positive selection associated with colonization of a new host compartment. 114 SNPs detected in the B103 urine compartment were identified as putative targets of positive selection, including 31 in non-coding regions and 83 in coding regions (
Many of the SNPs detected in the screen of selection associated with urine compartment colonization showed positional clustering (
Lastly, it was observed that positively selected SNPs (
Panel A: Consensus sequence SNPs between populations were classified as either pre-existing (detected in the ancestral and derived populations) or new (only detected in the derived population). Plotted are the occurrence of these two classes of SNPs in the various patient specimens. Panel B: A tabular representation of the data presented in Panel A.
HCMV has been shown to exhibit significant inter-host sequence divergence as well as high levels of intra-host variability. Here we provide a detailed study of the evolution of HCMV in human hosts using specimens collected across tissue compartments and time. Furthermore, we provide evidence of rapid evolution of HCMV populations associated with movement of the virus between host compartments. The rapid divergence can best be explained by a demographic history that includes population bottlenecks and expansions, as well as positive selection across a subset of loci.
This study shows that the evolution of HCMV within human hosts can proceed in two ways. HCMV populations are relatively stable within tissue compartments. For example, the urine populations of B101 and B103 exhibited only 52 and 49 consensus sequence SNPs, respectively, and low FST values. This result is in agreement with previous studies that have monitored HCMV genetic sequences within patients
In contrast, HCMV populations can also rapidly evolve during colonization of different compartments within a host. Significant HCMV inter-host differentiation is well documented, both at the genetic and genomic levels
One reason that the plasma and urine compartments were chosen for study is because they represent the circulating and shed HCMV populations, respectively. Plasma virus is likely limited to the individual (with the important exceptions of transfusion/transplantation-associated and congenital infections) whereas virus shed into urine is excreted and possibly infecting other hosts. The results in this work suggest that there are large differences between viral populations isolated from these two sources. The potential implications of these findings are numerous. For example, do studies of secreted virus, such as from the urine and/or saliva, provide useful information about circulating virus or infection-associated diseases? In addition, how do SNPs associated with shedding viral populations (e.g., urine populations) affect tropism for mucosal surfaces, the presumed route of most inter-host infections? The evidence for positive selection when crossing into the urine compartment suggests that the viral population has adapted for increased fitness, but most likely increased fitness in renal epithelial cells. Thus, it is not known how compartmentalization affects inter-host infectivity. Alternatively, because at least a portion of the differences between populations could be explained by the neutral demographic history, some of the SNPs would have arisen stochastically and may have little or no effect on viral fitness. Therefore, it will be interesting to determine the phenotypic differences between patient-derived HCMV populations from different compartments.
Demographic histories of HCMV populations were generated from population data in this study. These demographic models may provide a partial explanation for the population differentiation observed across compartments. The B103 model suggests that the urine and plasma populations split 3 months prior to the initial collection. It was also observed that these populations are highly differentiated, as measured by FST (
The demographic model generated from B103 data also suggests a novel mechanism for reinfection. In the model, there was asymmetric migration of alleles between host compartments with a higher flow from the urine to the plasma. In this study, the flow of alleles was relatively low and the plasma population size was fairly stable. One could speculate that after a collapse of the plasma population, due the effects of the adaptive immune response or antiviral therapy, migration between compartments would provide a source of highly differentiated virus. Thus, a clinical manifestation of this phenomenon could be intrahost (i.e. self) reinfection by virus sourced from various compartments. If this is true, effective HCMV vaccine design will need to account for not only plasma-associated epitopes but also epitopes sourced from other compartments. Future studies are needed to test the validity and prevalence of this mechanism of reinfection, and its potential effect on anti-HCMV clinical strategies.
An important disagreement was noted between the B101 and B103 models and those of MS1 and MS2: the best-fit model of B101 and B103 urine populations contained 2 bottlenecks separated by ∼2.5 months, while the MS1 and MS2 model contained 3 bottlenecks, separated by 0.2 months and 2.2 months. One possible explanation for the disagreement is that in some hosts, HCMV undergoes 2 bottlenecks en route to the urine compartment, while in other hosts there are 3 bottlenecks. However, there is no clear biological mechanism to explain this difference. Alternatively, the MS1 and MS2 models appear to infer events further back in time than those of the other models. Thus, the first bottleneck in this model could reflect an earlier event for which the other datasets had insufficient power to detect. A third explanation is that the MS1 and MS2 models have higher-resolution than those inferred from the other datasets. To generate these models, one compares the spectrum of neutral allele frequencies of the experimental dataset to the spectra that would be expected from populations that have experienced various demographic events (i.e. bottlenecks and/or expansions). The best fit model is one for which the experimental spectrum most closely matches the expected spectrum. Unfortunately, models of various population histories could generate similar allele frequency spectra, limiting the ability to fully resolve these models. However, because MS1 and MS2 viral populations were derived from the same ancestral population, two allele frequency spectra were estimated from populations that evolved in parallel from the same ancestor, providing more information about the evolutionary history of the populations and allowing similar models to be resolved. Thus, it is proposed that all HCMV urine populations have experienced at least 3 bottlenecks prior to colonization of the kidney. However, two bottlenecks may not be resolved by data from a single host due to the very close timing of the bottlenecks (∼1 week). The net effect of both bottlenecks is reduction of population size by >90% (
If three bottlenecks do indeed occur during the course of HCMV congenital infections, the biological cause of these reductions in population size is not known. We hypothesize that the first bottleneck results from the movement of the virus from the maternal compartment to the placenta. In all models, the timing of this event was ∼13–18 weeks gestational age and appears to agree with the known epidemiology and pathology of symptomatic HCMV congenital infections
From the data in this study, it is tempting to speculate on the relative contribution of demography and selection to the evolution of HCMV populations. For example, 1602 SNPs were observed between the B103 urine and plasma compartment consensus sequences at 1 week of age (
The use of specimens from subjects B101, B103, and M103 was approved by the University of Massachusetts Medical School and Baystate Medical Center Institutional Review Boards. Subjects MS1 and MS2 clinical specimens were obtained from neonates with congenital HCMV infection during the course of routine clinical care at the University of Minnesota Medical School. Protocols for collection of HCMV isolates from congenitally infected infants were approved by the University of Minnesota Institutional Review Board. Informed consent was obtained from subjects' parents for study of HCMV.
Patients identified at the University of Minnesota Medical Center or University of Massachusetts Memorial Health Center were evaluated for HCMV infection on the basis of signs and symptoms suggesting congenital infection. Patients MS1 and MS2 were monochorionic, monozygotic twins with clinical evidence of congenital CMV consisting of thrombocytopenia, transaminitis, and, for MS1, a small gestational age phenotype. Congenital infection was confirmed for all patients by urine HCMV positive cultures before 3 weeks of age and/or PCR detection of HCMV DNA. No patients were treated with antiviral drugs.
Serial specimens were collected at times described in
A set of primer pairs were constructed that spanned the entire HCMV genome. Details of the primers sequences and primer design strategy have been described
The DNA in pooled amplicons was sheared by sonication on a Sonic Dismembrator 550 (Fisher) until the median size was ∼350 basepairs (bp). The DNA library was prepared as described
The raw sequence images were processed through Illumina Pipeline 1.8 to generate sequence data. The consensus sequence of each sample was called as described
Whole genome alignments were generated using the Vista whole genome aligner hosted on the Vista server
A dataset of only synonymous mutations for each viral population was created. These datasets were used as neutral datasets for subsequent analysis of demographic history and were analyzed with the program
To test for evidence of positive selection, we employed the population branch statistic (PBS) as described
To determine the significance of the PBS statistic, 10,000 simulations were run for a 5,000 bp region under the inferred demographic histories of the populations (
Raw sequencing reads from Illumina sequencing will be deposited in the Sequence Read Archive (
SNP calling false positive rate (FPR) is dependent on filtering threshold. A filtering algorithm that has been described previously
(TIF)
HCMV populations show patterns of both stability and differentiation. Full panel of results from those depicted in
(TIF)
Evidence of positive selection in HCMV samples. Full results of PBS tests from
(TIF)
Targets of positive selection are clustered on the HCMV genome. SNPs from the B103 urine sample were identified as putative targets of positive selection. Plotted as a histogram are distances between putatively selected SNPs. The majority of putatively selected SNPs are located within 200 basepairs of the nearest selected SNP.
(TIF)
Comparison of demographic models to population data. Joint allele frequency spectra of the population data (top left of each panel) are compared to the expected joint allele spectra of the demographic models depicted in
(TIF)
Results of simulations under demographic models depicted in
(TIF)
Summary of sequencing data.
(PDF)
Parameter values of demographic models HCMV populations.
(PDF)
Estimate of HCMV effective population size from time sampled populations.
(PDF)
Targets of positive selection in 6 month B103 plasma populations.
(PDF)
Targets of positive selection in 6 month B103 urine populations.
(PDF)
Targets of positive selection in 10 month B101 urine populations.
(PDF)
Targets of positive selection in 5 month M103 plasma populations.
(PDF)
Targets of positive selection in MS1 1 month urine populations.
(PDF)
Targets of positive selection in MS2 1 month urine populations.
(PDF)
Targets of positive selection in MS1 2 month urine populations.
(PDF)
Targets of positive selection in MS2 2 month urine populations.
(PDF)
Targets of positive selection in MS1 11 month urine populations.
(PDF)
Targets of positive selection in MS2 11 month urine populations.
(PDF)
Targets of positive selection in 1 week B103 urine populations.
(PDF)
The authors would like to thank Ryan Gutenkunst for valuable advice when using