Fig 1.
Estimated in vivo mutation frequencies in HCV1a, as determined by analysis of 195 viral populations.
(A) Box plots and average estimated frequencies (± 95% confidence intervals) for all, synonymous (Syn), non-synonymous (Nonsyn), and nonsense mutations, stratified by transitions (Ts) vs. transversions (Tvs). Transition mutation frequencies are much higher for every class of mutation. *** denotes statistical significance with adjusted P-values <0.001 by the Holm correction. (B) Genome-wide transition mutation frequencies, ordered by mutation frequency and colored by mutation type, show that synonymous mutations are more common than non-synonymous mutations in the HCV genome. (C) Transition mutation frequencies along the HCV genome, colored by mutation type, show that average mutation frequency is roughly consistent across the genome, with synonymous mutations more common than nonsynonymous mutations. Each dot represents the average mutation frequency at a nucleotide position, across the 195 viral populations. The line represents the sliding window average of 100 bases. The regions with the highest (HVR1) and the lowest (Core) average mutation frequencies were highlighted in yellow.
Table 1.
Genome-wide average estimated frequencies of different types of mutation in the HCV1a genome.
Fig 2.
Estimated transition mutation frequencies of HCV by gene.
Aggregated observed frequencies by gene and type of mutations; dots indicate the averages and the error bars represent the estimated 95% confidence intervals from 195 samples. *** denotes statistically significant difference (adjusted P<0.0001) between synonymous and nonsynonymous mutations by Mann-Whitney test.
Fig 3.
Various factors that affect the frequency of mutations in HCV, based on analysis of 195 viral populations, each derived from a different patient.
(A) Predicted effects of ancestral nucleotide (T, C, or G, vs. A); CpG-creating status (vs. non-CpG-creating status); nonsynonymous (Nonsyn; vs. synonymous); amino acid-changing (AAChange; vs. non-amino acid-changing); presence in the Core, HVR1, E2, NS1, NS2, NS4A, and NS5B genes (vs. the NS3/NS4B/NS5A regions) on mutation frequencies in the HCV genome. Beta regression models were used to determine the effects of the different factors on mutation frequencies across the genome, and this figure reflects the results of the best-fit model based on AIC. (B) Estimated average transition mutation frequencies from the beta regression model (black dots with standard errors) and the actual observed frequencies from 195 patients infected with HCV (in colors). (C) Top 8 important features identified from the predictive random forest regression model on mutation frequencies (for all features tested, see S1 Table).
Fig 4.
Estimated genome-wide selection coefficients (fitness costs) in the HCV genome.
(A) Selection coefficients (1/replication cycle) along the HCV genome, colored by mutation type (syn = synonymous; nonsynon = nonsynonymous); each dot represents the average at each position across 195 patient samples, and lines represent the sliding window average for 50 bases. (B) Selection coefficients (1/replication cycle) stratified by nucleotide and syn/nonsyn status, colored by mutation type. (C) Estimated mutation frequencies stratified by starting nucleotide and syn/nonsyn status, colored by mutation type. Comparison of (B) and (C) shows higher estimated selection coefficients at A and T sites than at C and G sites, even though mutation frequencies were higher at A and T sites compared to C and G sites.
Table 2.
Average estimated selection coefficients per genic region of the HCV1a genome.
Fig 5.
Factors that affect the fitness cost (selection coefficient) in HCV, based on analysis of 195 viral populations, each derived from a different patient.
(A) Predicted effects from beta regression models on selection coefficients (SC) in the HCV genome, shown together with predicted effects on mutation frequencies (See Fig 3A for description of different factors in the figure.) (B) Top 8 important features identified from the predictive random forest regression model (for all features tested, see S1 Table). (C) Overlapping features ranked in the top 20 in the random forest regression models for mutation frequencies (MF) and selection coefficients (SC).
Fig 6.
Occurrence and fitness costs of resistance-associated variants in the HCV genome.
Estimated fitness costs (top) and natural prevalence (bottom) of resistance-associated variants, where each dot represents a mutation frequency observed in a patient. Variant names in black are created by transition and those in brown are created by transversion mutations. RAVs that can be created by different mutations are specified in names, in a format of variant name, nucleotide position, and type of mutation (Tv1 stands for transition mutations that result in C or A and Tv2 stands for transition mutations that results in G or T).