Genetic Variability of Hepatitis C Virus before and after Combined Therapy of Interferon plus Ribavirin

We present an analysis of the selective forces acting on two hepatitis C virus genome regions previously postulated to be involved in the viral response to combined antiviral therapy. One includes the three hypervariable regions in the envelope E2 glycoprotein, and the other encompasses the PKR binding domain and the V3 domain in the NS5A region. We used a cohort of 22 non-responder patients to combined therapy (interferon alpha-2a plus ribavirin) for which samples were obtained before initiation of therapy and after 6 or/and 12 months of treatment. A range of 25–100 clones per patient, genome region and time sample were sequenced. These were used to detect general patterns of adaptation, to identify particular adaptation mechanisms and to analyze the patterns of evolutionary change in both genome regions. These analyses failed to detect a common adaptive mechanism for the lack of response to antiviral treatment in these patients. On the contrary, a wide range of situations were observed, from patients showing no positively selected sites to others with many, and with completely different topologies in the reconstructed phylogenetic trees. Altogether, these results suggest that viral strategies to evade selection pressure from the immune system and antiviral therapies do not result from a single mechanism and they are likely based on a range of different alternatives, in which several different changes, or their combination, along the HCV genome confer viruses the ability to overcome strong selective pressures.


Introduction
The hepatitis C virus (HCV) infects approximately 170 million people worldwide and it establishes persistent infection in more than two-thirds of those who contract it [1,2]. The current standard treatment for patients infected with HCV consists of a combined therapy of interferon plus ribavirin, which is successful in approximately 40% of the patients [3,4]. Recently, randomised controlled clinical trials have shown that pegylated interferon and ribavirin yield a higher rate of sustained virological response [5,6]. Moreover, response to anti-HCV therapy varies depending on the viral genotype. Patients infected with HCV genotypes 1 or 4 show significantly lower sustained response rates than those infected with genotypes 2 or 3 [6][7][8][9].
Studies trying to find differential patterns in the HCV genome in response to pressure from the immune system or resistance to antiviral treatment have mainly focused on those regions seemingly involved in evasion mechanisms and, in consequence, supposed to be subjected to strong selective pressures. The highest levels of sequence variation, i.e. the highest genetic plasticity, are concentrated in the four hypervariable regions of HCV, three of which are located in the envelope E2 glycoprotein. Hypervariable region 1 (HVR1) consists of a 27 amino acid sequence located at the N-terminus of the protein [10,11] and seems to be involved in target-cell recognition and virus attachment [12]. Hypervariable region 2 (HVR2) consists of 9 amino acids located downstream of HVR1 [11] and it has been hypothesized to be involved in cell surface receptor binding [13]. Recently, a third hypervariable region, denoted HVR3 [14,15], has been described in the E2 protein, and it is located between the two others. This region could play a role in the process of binding to host cell receptors and virus entry into host cells [14]. Finally, a fourth highly variable domain (V3), spanning 24 amino acids, is located at the C-terminus of the NS5A protein and appears to be involved in responsiveness to interferon [16,17]. Close to this domain there is another important region in the NS5A protein, the protein kinase-R binding domain (PKR-BD), which consists of 66 amino acids and includes a 40 amino acid domain, the putative interferon sensitivity determining region (ISDR). These domains interact with PKR, which is involved in the cellular antiviral response induced by interferon, thus blocking its antiviral activity [18,19].
HCV infections show two main features: high persistence and low susceptibility to antiviral treatments. The high levels of genetic variability of HCV are seemingly involved in viral escape from the host immune response, usually leading to chronic disease [20,21]. The selective pressures exerted by the immune system [22,23] correlate with the degree of genetic variability in the target regions [24]. This is the case for HVR1, which has been studied extensively [21,[25][26][27][28]. Moreover, the genetic variability of other regions such as the ISDR [29,30] or the V3 domains [16,17] has also been studied. However, no conclusive results have been attained in any case, probably due to the establishment of complex interactions between the genetic diversity of the virus and the host immune response [31][32][33][34].
In this study we have employed several analytical procedures to find differential selection patterns in a cohort of non-responder patients during their treatment with interferon alpha-2a plus ribavirin. For this, we employed 22 patients infected with HCV genotype 1 (7 with subtype 1a and 15 with subtype 1b), for which a sample immediately previous to initiation of antiviral treatment was available (T0), and additional samples after 6 or/and 12 months of treatment (T1 and T2, respectively), when their lack of response was established, were also available. Two viral genome regions were analyzed. About 100 clone sequences per patient were obtained from the E1-E2 region (4690 sequences in total), which included all hypervariable regions; additionally, between 25 and 96 sequences were obtained for the NS5A (2486 sequences in total), including the ISDR and V3 domains.

Changes in nucleotide diversity during treatment
Nucleotide diversity was very high in the E1-E2 region (details of individual samples and different genetic variability estimates can be found in Table S1), although we found a few highly homogeneous samples (Table 1). A small positive correlation between nucleotide diversity estimates at the two time points (r = 0.218, P,0.01) was observed and the average values were almost identical (p = 0.025560.0031 for T0 samples and p = 0.023860.0033 for T1/T2 samples, Table 1). Differences in nucleotide diversity between T1/T2 and T0 samples were computed to analyze the change in diversity during treatment ( Figure 1). Significant differences were detected in most cases (20/ 24) using t-tests. A significant increase in nucleotide diversity was detected in 13 cases, a significant decrease in 9 cases and no significant differences were found in 4 cases. Consequently, a slight tendency for nucleotide diversity to increase during treatment was detected for the E1-E2 region.
In most patients we detected a significant divergence between sequences sampled before and after treatment. The corresponding nucleotide divergence statistics (Da), using Jukes-Cantor correction, are shown in Table 1. The range of Da values spanned from 0.0737 to 0.0002, with an average value of 0.0211 (0.0171 and 0.0231 for subtypes 1a and 1b, respectively). This is an indication that there are very different patterns of differentiation between sequences sampled at different times from the same patient, with a slightly larger effect for patients infected with HCV subtype 1b than subtype 1a (see below and supplementary Figures S1 and S2).
As expected, genetic variability was lower for the NS5A region than for the E1-E2 region (Table 1). A marginally significant correlation was found for nucleotide diversity between T0 and T1/T2 samples from each patient (r = 0.396, P = 0.056). Again, the average nucleotide diversities for the two sets of samples were almost identical (p = 0.012860.0089 for T0 samples and p = 0.012760.0098 for T1/T2 samples, Table 1). Variation in nucleotide diversity during treatment was evaluated by computing the difference between T1/T2 and T0 samples ( Figure 1) for each patient and using t-tests. Significant increases in nucleotide diversity were detected in 8 cases, significant decreases in other 8 cases, and no significant differences in the remaining 8 cases. Therefore, the NS5A region did not present any trend in nucleotide diversity variation during treatment.
Similarly to the E1-E2 region, we detected a significant divergence between sequences sampled before and after treatment in most patients (Table 1, Figure 1). The range of Da values spanned from 0.0356 to 0.0001, with an average value of 0.0081 (0.0039 and 0.0103 for subtypes 1a and 1b, respectively). As for the E1-E2 region, these estimates indicate that there are very different patterns of differentiation between sequences sampled at different times from the same patient, with a wider range of variation for patients infected with HCV subtype 1b than for those infected with subtype 1a (see below and supplementary Figures S1 and S2).
For T0 samples, a significant correlation was detected between the nucleotide diversity levels of E1-E2 and NS5A regions (r = 0.604, P,0.002). However, no significant correlation was observed for T1/T2 samples (r = 0.092, P = 0.668). These results indicate that the levels of genetic variability between both regions were closely related before initiation of treatment, but the correlation had disappeared after it was discontinued. We observed a significant correlation (r = 0.521, P,0.01) between the differences in nucleotide diversity at T0 and T1/T2 in the two genome regions analyzed in each patient. Only two patients (A20 and A35) presented significant changes of opposite sign in the two estimates, with diversity increasing during treatment in one region and decreasing in the other ( Figure 1). Hence, there is evidence for nucleotide diversity being affected similarly by antiviral treatment in both genome regions, with only a few cases in which the two regions changed differently, at least in levels of nucleotide diversity, to treatment.
Patterns and rates of evolution in the E1-E2 and NS5A regions before and after antiviral treatment We obtained maximum likelihood phylogenetic trees for all the sequence clones of each genome region from each patient using a common outgroup sequence, HCV-H77 (accession number NC_004102) for sequences of subtype 1a and HCV-J (accession D10749) for those of subtype 1b. The most remarkable feature from the analysis of these 44 trees was the absence of a common pattern for the relationship between clones derived before and after antiviral treatment (all trees for E1-E2 and NS5A regions are shown in the supplementary material, Figures S1 and S2). In the E1-E2 region, for instance, T0 and T2 samples from patient G07 were grouped in separate clusters, with all T2 samples clustering in a monophyletic group derived from one unidentified variant already present at time T0 (Figure 2a). On the contrary, for this same region clones from the two time samples from patient A34 were completely mixed, with no differentiation between T0 and T1 sequences (Figure 2b).
Intermediate patterns were present in other patients for this region ( Figure S1). The same observation applied to the NS5A region where, for instance, patient C12 showed a perfect separation in different clusters of sequences obtained at each time sample whereas patient G06 presented a very heterogeneous structure (Figure 2c, d). Again, a range of intermediate and similarly extreme patterns were found in other patients ( Figure S2).
Inspection of the phylogenetic trees from both genome regions confirmed the previous results of nucleotide diversity analyses. Despite being characterized by a common phenotype, lack of response to interferon plus ribavirin treatment, hepatitis C virus from these patients presented a wide range of evolutionary patterns and levels of genetic diversity, before and after antiviral treatment. For instance, some patients showed low levels of genetic variability at the T0 sample and higher levels at T1/T2 samples, and vice versa ( Figures S1 and S2). Globally, these results suggest that HCV populations use different strategies to adapt to and overcome the antiviral effects of interferon and ribavirin used in the treatment of these different patients.
To further characterize in a more quantitative manner the different adaptive strategies to antiviral treatment used by hepatitis C virus, we analyzed the amount of viral evolution within each infected patient in the two genome regions considered. For this, we analyzed the ML trees described above using common, epidemiologically unrelated outgroup sequences, H77 for subtype 1a sequences and HCV-J for subtype 1b.
As a proxy for the amount of evolution before and after antiviral treatment we computed the average length from the common Figure 1. Graphic representation of the differences in nucleotide diversity between final versus initial time samples for the E1-E2 and NS5A regions (denoted as black and white dots, respectively). Three time samples were obtained for patients C22 (for E1-E2 and NS5A regions) and G26 (for E1-E2 region) and, consequently, differences between these three time samples are represented. Non-significant differences are indicated with an asterisk. doi:10.1371/journal.pone.0003058.g001 ancestor to each tip for the different clones of each patient at both genome regions. In line with previous analyses, an enormous heterogeneity was observed for the cohort of patients in both regions ( Figure 3). For the E1-E2 region, 17 out of 22 patients presented an increase of the genetic distance to the common ancestor during treatment, whereas the same happened for 14 out of 22 patients for the NS5A region. Despite this trend of increasing genetic distance from the origin after treatment, only 10 of the 22 patients showed a simultaneous increase in both regions, whereas the evolutionary distance decreased in both regions for only one patient. Contrary to the previous observation of lack of correlation between divergence before and after infection of each patient, we Figure 2. Examples of phylogenetic trees from the E1-E2 and NS5A regions. Trees from patients G07 (a) and A34 (b) are shown for the E1-E2 region, whereas trees from patients C12 and G06 are presented for the NS5A region. Red, green, and blue dots denote sequences from T0, T1, and T2, respectively. doi:10.1371/journal.pone.0003058.g002 observed significant correlations between genetic distances from the common origin before and after antiviral treatment within each patient (r = 0.487, P = 0.011 for the E1-E2 region; r = 0.754, P,0.001 for the NS5A region). However, the relationship between genetic distance from origin before treatment and its relative change after it was the opposite one. Correlations between genetic distances at T0 and (T1/T22T0) were negative and marginally significant for both genome regions (r = 20.407, P = 0.030, for E1-E2, and r = 20.303, P = 0.087 for NS5A). Finally, relative changes in both regions within patients were not correlated to each other (r = 0.080, P = 0.639) nor to the number of sites detected to evolve under positive selection (r = 0.328, P.0.10, for E1E2; r = 0.117, P.0.10, for NS5A).
These results provide further evidence for evolution taking place not only at different paces in the two genome regions consideredmost values for the E1-E2 region were located in the range 0.04-

Evolutionary changes during antiviral treatment in the E1-E2 region
The nature of the differences at the nucleotide level between viral populations sampled at different times during antiviral treatment within each patient was further investigated at the amino acid level. Firstly, we analyzed the differences in amino acid composition in each position between different samples for each patient. A summary of the significant differences found is shown in Supplementary Table S3. Almost half (71/154) of the positions were detected to vary significantly in composition in at least one patient. The tallying of positions with significant changes per patient also revealed a very uneven distribution, with a maximum of 29 positions (in patient G16) to a minimum of none found in two subtype 1a (A20 and A34) and two subtype 1b-infected patients (C37 and G17). There were no significant differences in the number of such positions detected between the two viral subtypes (average numbers were 12.5 for subtype 1a, 13.5 for 1b and 13.2 for the whole data set).
As with genetic diversity at the nucleotide level, the distribution of significantly changing positions was not homogenous along the analyzed genome fragment (Figure 4a). For the E1-E2 region six sub-regions were defined, the three hypervariable regions described in this portion of the HCV genome (HVR1, HVR2 and HVR3) and the three intervening regions, the first one corresponding to the C-terminus of the E1 envelope protein coding gene and the two others already described in the E2 protein coding gene. In the 56 aa fragment corresponding to the E1 glycoprotein, significant changes were identified in only 11 positions, or 19.6% of the region ( Table 2). Most of these positions were found to vary in a single patient (Figure 4a). In contrast, for the region corresponding to the E2 glycoprotein a high proportion of the positions were identified as variable in this analysis (59 of 101, 58.4%). These positions were mainly localized in the HVR1 region (92.6% of its positions differed significantly between the two time samples in at least one patient), the HVR2 (55.6%) and the HVR3 (64.7%). Furthermore, most positions identified as variable Most, but not all, positions identified as evolving under positive selection in at least one patient corresponded to sites whose amino acid composition was found to be significantly different between the two different sample points in each patient (Table 3 and Table  S3). The general distribution of these two kinds of sites was very similar (Figures 4a and 4b), with a lower number of sites identified to evolve under positive selection than changing in amino acid composition during the antiviral treatment.
For each of the six regions considered, levels of synonymous (K s ) and non-synonymous (K a ) substitutions were estimated for each patient (data shown in Table S4). A wide range of K s values, from 0 to 0.32 substitutions/site, was observed. Moreover, there were no clear differences among the different regions considered. On the contrary, K a values also presented a wide distribution, ranging between 0 and 0.27 subst./site, but in this case clear differences were observed among the six regions analyzed. Three different groups could be distinguished: firstly, the HVR1 region (with values ranging between 0 and 0.26 subst./site) showed the highest values for K a ; secondly, regions HVR2 and HVR3, with K a values ranging from 0 to 0.1 subst./site, were characterized by similar and intermediate values of K a ; and finally, a third group, comprising the three intervening regions and with K a values ranging from 0 to 0.05 subst./site, showed the lowest values of K a , very close to 0 in most cases.
The analysis of the changes in synonymous and non-synonymous substitutions between different samples from the same patient in the six sub-regions considered in the E1-E2 region allowed a better appreciation of the evolutionary processes involved in the virus response during treatment (Table S5). Globally, a general increase in the amount of synonymous substitutions was observed for the six subregions considered, ranging from 8% to 117%. This might be attributed to the mutagenic effect of ribavirin [59]. A more detailed examination of these results and those of changes in the Ka/Ks ratios revealed substantial variability in this response. Some patients showed increased levels of Ks after antiviral treatment in all, or most, subregions considered, but in others the pattern was the opposite one, with fewer synonymous mutations after treatment than before it (Table S5). Furthermore, in all but one case (patient G26) in which a significant number of positively selected sites were detected ( Table 3) the pattern of change in Ks and Ka/Ks corresponded to a reduction in both parameters. The corresponding phylogenetic trees revealed highly monomorphic viral populations after treatment while those before this was initiated were very variable (see trees corresponding to patients A20, C08 and C17 in Supplementary Figure S1). Interestingly, a general trend towards negative correlations was found between DK s and the number of positively selected sites for each sub-region, but there were too few data points to test its significance reliably. This trend became marginally significant for some sub-regions when considering the total number of codons positively selected at the E1-E2 region (data not shown).

Evolutionary changes during antiviral treatment in the NS5A region
We applied the same analyses previously described for the E1-E2 region to the NS5A region, although considering its specific   (Table S6), with a maximum of 22 positions (in patient C29) to a minimum of none, found in two subtype 1a patients (A34 and C17) and four of subtype 1b (A35, G06, G07 and G22). There were no significant differences in the number of such positions detected between the two viral subtypes (average numbers were 1.25 for subtype 1a, 4.44 for subtype 1b and 3.38 for the whole set). The distribution of significantly changing positions was not homogeneous along the analyzed NS5A region, although to a lower extent than for the E1-E2 region (Figure 4c, Table 2). For the NS5A region we considered five different sub-regions: NS5A_1, ISDR, Rest of PKR-BD, NS5A_2, and the V3 domain. In the 70 amino acid fragment corresponding to the NS5A_1 region significant changes were identified in only 5 positions (7.1%), and all but one were detected in a single patient. The best evolutionary model for sequences derived from each patient included a class allowing for positively selected codons (v.1) in 18 of the 22 analyzed patients ( Table 3). The four exceptions corresponded to subtype 1a-infected patients A20 and C17 and subtype 1b-infected patients A35 and G26. Among the 18 patients in which the virus was estimated to evolve under a model incorporating positive selection, in 15 of them we identified at least one amino acid with a posterior probability .0.95 of having evolved under positive selection, and the number of such positions ranged between 1 (patient A09) and 9 (patient G16).
In contrast to the E1-E2 region, the distribution of sites evolving under positive selection was relatively homogeneous (Tables 2 and 3 and Figure 4d). There were only two remarkable regions: the one denoted as low variability region, which showed a very low number of positively selected sites (4 of 70, 5.7%), and the V3 domain, which showed the highest proportion of positively selected sites (10 of 24, 41.7%). The frequency of sites evolving under positive selection was very similar between the PKR-BD and the remaining positions of the NS5A region. In any case, it is also important to note that most positively selected sites were detected in only one patient (24 of 39, 61.5%), 13 sites were detected in two patients (33.3%), and only two sites were detected in three patients (5.1%). In analogy with the E1-E2 region, most positions identified as evolving under positive selection in at least one patient corresponded to sites whose amino acid composition was found to be significantly different during antiviral treatment (Figure 4c, d).
We observed a wide range of K s values, ranging between 0 and 0.2 subst./site, but without clear differences among the different regions considered. The distribution of K a values was not as wide as for the E1-E2 region, ranging between 0 and 0.05 subst./site, but again in this case significant differences were observed among the five regions analyzed. Three different groups could be distinguished (Supplementary Table 7): firstly, the V3 domain (with K a = 0 to 0.05 subst./site) showed the highest values for K a ; secondly, the regions termed Rest of PKR-BD and NS5A_2 (with K a = 0 to 0.02 subst./site) were characterized by similar and intermediate values of K a ; and finally, a third group, comprising the NS5A_1 and the ISDR, with K a = 0 to 0.007 subst./site, showed the lowest values of K a , except for four cases at the ISDR showing intermediate or even high K a values.
A detailed analysis of the pattern of evolutionary changes in each of the five sub-regions considered within NS5A (Supplementary Table S8) revealed a similar pattern to that observed in the E1-E2 region, but with a general decrease in values for all parameters. Overall, there was a small increase in the levels of synonymous substitutions and a slight decrease in the change of Ka/Ks before and after treatment. However, there was no clear relationship between the direction of change in the levels of synonymous substitutions and the detection of positively selected codons ( Table 3, Table S8), not even for the three patients with a significantly larger number of such positions detected (C05, G16 and G19, with 8, 9, and 8 positions, respectively). Once again, the corresponding phylogenetic trees showed markedly different patterns with one patient, C05, presenting similarly variable, non monophyletic groupings for sequences obtained before and after treatment, another patient, G17, with a relatively more monomorphic viral population after treatment, and yet another, G06, with an almost monomorphic population at T0 replaced by a highly variable one after antiviral therapy ( Figure S2).

Discussion
Lack of response to antiviral treatment is presumably associated with the ability of viral populations to escape from the deleterious effects of the effective agent(s). For such apparently simple organisms as viruses, the escape response depends on the existence of resistance mutations at different genome locations. Tremendous efforts have been devoted to identify which particular mutations are responsible for HCV resistance to interferon and ribavirin, which has resulted in the identification of several genome regions presumably associated to viral escape but no specific variant(s) have been found to consistently produce this phenotype. Nevertheless, several reports, including our own, have described that the levels of genetic diversity in different HCV genome regions correlate with treatment failure, with higher variability levels before treatment in isolates from non-responders than from responders [35][36][37][38].
In this work, we have observed a wide range of genetic variation in non-responder patients in viral populations sampled before initiation of antiviral therapy in the two genome regions analyzed, E1-E2 and NS5A. Hence, genetic diversity in these regions does not seem to be a good predictor of sustained viral response to antiviral therapy, since all these patients were non-responders. Furthermore, we have shown that there is not a single, common pattern in the variation of the nucleotide diversity during the antiviral treatment, especially in the NS5A region, although the E1-E2 region presents a slight trend to increase genetic diversity (Figure 1 and Tables 1 and 2). Therefore, our results show that neither the genetic diversity level nor its rate or pattern of change during treatment can be taken as predictors for the response to antiviral treatment because they are different for different patients despite these showing the same outcome.
The absence of a common response to antiviral treatment in these viral populations extends not only to genetic variability but also to more general patterns of evolution. This is reflected in a wide diversity of patterns in the phylogenetic trees derived for the two genome regions from the viral sequences obtained before and after treatment. Within-patient phylogenetic trees of the infecting viruses presented from very homogeneous, highly differentiated populations at the two sampling points, to cases in which both constituted a single, highly variable population with no signs of differentiation between the two samples. All intermediate possibilities were also found, including cases with an almost monomorphic initial population which was transformed into a highly variable one, to the reverse case.
This lack of a common pattern is also revealed by the detection of positive selection in these two regions. Although most patients presented positively selected sites at one or the other genome region, no such sites were detected in patient A35. On average, the number of such sites was higher in the E1-E2 than in the NS5A region, but there was no correlation within patients (r = 20.0567, P.0.10, Table 3). Patients with a large number of positively selected sites in the E1-E2 region showed none (A20, G26) or few (C08) such sites in the NS5A but patient C05, third in the ranking of sites in E1-E2, was second in NS5A. The reverse is also true for patients with most selected sites in the NS5A region.
These differences are likely reflecting the different role of the two fragments analyzed. If we consider the distribution of changes along the E1-E2 region, it is remarkable the high degree of conservation of the first 56 amino acids, which correspond to the C-terminus of the envelope 1 protein (Table 3 and Figure 4). This fragment presents a hydrophobic region apparently involved in multiple functions, such as the maintenance of E1 and E2 proteins in the endoplasmic reticulum or the formation of heterodimers between E1 and E2 proteins [39][40][41], suggesting its involvement in the viral replication cycle and accounting for the high degree of conservation detected therein. In contrast, the sequenced portion of this region that encodes the E2 protein presents a higher level of variability, mainly concentrated in the three hypervariable regions. The highest number of changes has been found in the HVR1, where most HCV antigenic sites have been reported [42,43]. Hypervariable regions HVR2 and HVR3, in which several antigenic sites have been reported [13,44] also show high levels of variability, although to a lower extent than HVR1. The recently described HVR3 [14,15] shows a slightly lower variability than HVR1 and HVR2, which correlates with a lower exposition of its antigenic sites as inferred from structural models [13]. This pattern is also reflected in the analysis of genetic divergences, which showed the highest K a for HVR1, intermediate levels for HVR2 and HVR3, and very low levels for the remaining sub-regions included in this E1-E2 fragment (Table S4).
Levels of genetic variability in the NS5A region were lower than in the E1-E2 fragment analyzed. The first 70 amino acids of the NS5A region showed a high degree of conservation, whereas the highest variability was found in the V3 domain, which has been postulated to be involved in responsiveness to interferon [16,17,45] and where a certain degree of variability has been described [46]. The remaining sub-regions in this fragment showed an intermediate degree of variability. The analyses of genetic divergences further corroborated these observations, showing the highest K a for the V3 domain, the lowest values for the first 70 amino acids of the fragment and the ISDR (with some exceptions), and intermediate K a values for the rest of the fragment (Table S7). On the whole, these and other similar results [47] indicate that the NS5A protein is subject to strong evolutionary restrictions, probably because of its role in replication mechanisms [48,49]. Moreover, the low levels of variability present in the PKR-BD, and more specifically in the ISDR, are probably due to the existence of a specific sequence involved in response to interferon [18,29,50,51].
A correlation in the number of amino acid changes between both regions was observed for composition analysis but not for positive selection analysis. This could reflect the presence of different selective pressures acting on each region, and consequently of hitchhiking phenomena. The absence of recombination in HCV along with the enhanced selective pressures during antiviral treatments would facilitate the presence of hitchhiking selection [52,53], in which the regions under the strongest selective pressures would drive the course of evolution in the rest of the genome. In this situation, the level of linkage between regions would depend on the time elapsed between the hitchhiking events and the subsequent readjustment phenomena in the affected regions. Although the high mutation rates in the HCV genome will certainly complicate these analyses, the role of hitchhiking selection in the evolution of HCV deserves a closer scrutiny.
Genetic variability, amino acid composition and positive selection analyses reflect the enormous heterogeneity of adaptive solutions shown by viral populations infecting each patient. These results are further corroborated by the phylogenetic analyses, where the diversity of tree structures in the pool of patients for both analyzed regions is remarkable, thus precluding to discern general patterns of viral adaptation. Additionally, the analysis of divergence is consistent with the previous results, providing evidence for the particular adaptation routes exhibited by each patient.
In agreement with our results, it has been previously shown that the adaptive solutions adopted by RNA virus populations are convergent to a certain extent [54]. However, although positions detected to evolve under positive selection are mainly concentrated in the hypervariable regions, there are too many of these so to establish clear patterns of adaptation to the strong selective pressures exerted by the immune system and antiviral drugs. At this point, it is important to remark the difficulty in distinguishing between changes due to selective pressures imposed by the immune system from those specific to antiviral therapy.
The addition of ribavirin is likely to mask adaptive events even further. The precise mechanism of action of ribavirin is not completely understood [55] and different mechanisms have been recently shown. It has been suggested that the anti-HCV effect of ribavirin is partly mediated via the up-regulation of PKR activity [56]. Alternatively, it has been proposed that ribavirin acts as an RNA mutagen [57], in which case a possible mechanism for resistance could depend on increasing replication fidelity by means of the accumulation of mutations in the polymerase [58]. In fact, the mutagenic effect of ribavirin has been confirmed very recently [59], although this is still a controversial issue [60]. We have detected a global increase in the levels of synonymous substitutions after failed treatment, which could be due to the mutagenic effect of ribavirin. However, as indicated above, there are cases in which the change is in the opposite direction. But we have also found that the detection of large numbers of positively selected sites in the E1-E2 region is usually associated to a reduction in the level of synonymous substitutions and to a less polymorphic viral population after treatment. The most plausible interpretation for this is that the stronger the selective pressures on viral population (imposed by antiviral treatment and host immune response), the higher the initial reduction in genetic variability. Alternatively, for those populations with more positively selected sites, an increased fidelity of the corresponding HCV polymerase could also account for the observed reduction in the levels of synonymous substitutions. In this respect, mutations in the NS5B protein, which is the RNAdependent RNA polymerase in HCV, could be under strong selective pressure and, consequently, variation in other genome regions, such as the hypervariable regions, could eventually become a surrogate marker of these selection events. From this perspective, future studies should also focus on the genetic analysis of the NS5B protein and its potential correlation to sensitivity to ribavirin [61].

Patients and samples
Serum samples were obtained from 22 patients infected with HCV genotype 1, seven of which were infected with subtype 1a and 15 with subtype 1b. These patients were included in a prospective study in which serum samples (T0 samples) were taken immediately before they were subjected to a combined treatment of pegylated interferon-2a plus ribavirin. After 6 or 12 months, treatment was discontinued since viral load did not decrease more than 2 logs and a second serum sample was obtained for analysis (T1 and T2 samples for 6 and 12 months, respectively). In a few cases, samples were available for both 6-and 12-months time points. Samples were obtained in different hospitals from the Comunidad Valenciana, Spain (Table 4). All patients provided written consent to be included in the study which was approved by the corresponding ethics committees of the institutions involved (Hospital General de Valencia, Hospital Clínico Universitario de Valencia and Hospital General de Alicante).
Two HCV genome regions were studied: one corresponded to a 472 nt fragment encompassing genes encoding proteins E1 and E2 (from nucleotide 1322 to 1793 in the HCV-J reference genome sequence, accession number AF009606 [62], including the three hypervariable regions HVR1, HVR2 and HVR3), and referred to as E1-E2 region, and the other corresponding to a 743 nt fragment from gene NS5A (nucleotides 6742 to 7484), including the interferon sensitivity determining region (ISDR) and the V3 domain and referred to as NS5A region.

Experimental procedures
RNA extraction, reverse transcription, amplification, cloning and sequencing, are described in detail in [63]. Briefly, after viral RNA extraction (High Pure Viral RNA Kit; Roche), reverse transcription reactions were performed with random hexadeoxynucleotides in order to prevent any bias during reactions due to unspecific oligonucleotides. Primers used for subsequent PCR are detailed in [64]. Amplified DNA products for each region were purified with High Pure PCR product Purification Kit (Roche) and directly cloned into EcoRVdigested pBluescript II SK (+) phagemid (Stratagene). Plasmid DNA was purified with High Pure Plasmid Isolation Kit (Roche). Cloned products for E1-E2 and NS5A regions were sequenced using vector-based primers KS and SK (Stratagene). For the E1-E2 region, we obtained about 100 clones from each patient, yielding a total of 4690 sequences, 2232 from T0 samples, 1447 from T1 samples and 1011 from T2 samples. For the NS5A region, we obtained between 25 and 96 clones per sample and 2486 sequences in total were determined (see Table  S1 in Supplementary data). HCV sequences obtained in this study have been deposited in GenBank with accession numbers given in Tables S1 and S2.

Genetic variability analysis
Sequence alignments were obtained using CLUSTALX v1.81 [65]. DnaSP 3.51 [66] was used to estimate, for both E1-E2 and NS5A regions, the following measures of genetic variability in the viral samples of each patient: number of polymorphic sites (S), total number of mutations (g), number of haplotypes (nHap) and nucleotide diversity (p). Synonymous (K s ) and nonsynonymous (K a ) substitutions Synonymous (K s ) and nonsynonymous (K a ) substitution per synonymous and nonsynonymous site, respectively, were estimated for each patient from data derived from the corresponding T0 sample using the Nei-Gojobori method implemented in the program MEGA [67]. Standard errors of K s and K a were obtained by bootstrap resampling with 500 pseudoreplicates. According to structural and functional properties, the 472-nt fragment of the E1-E2 region was divided into six different sub-regions for K s and K a estimation: the E1 sub-region, corresponding to E1 protein

Changes in amino acid composition during treatment
For both regions, amino acid composition was determined for each sample and the different sets of sequences corresponding to each patient (T0 sample versus T1 or T2 sample) were compared with program VESPA [68]. Tests for differences in the composition at each amino acid position between the two timepoints were carried out by means of a G-test. Significance levels for multiple comparisons were corrected by Bonferroni's method.
Positively selected amino acid positions during the treatment For each patient, a maximum likelihood approach [69] implemented in the PAML package 3.15 [70] was used to investigate the presence of positively selected codons in the E1-E2 and NS5A regions. Two criteria were employed to assign the best evolutionary model to each patient (independently for each region): a likelihood ratio test (LRT), which compares the fit of two nested models to the data [69]; and the Akaike information criterion (AIC), which allows to perform comparisons between non nested models [71]. For all patients and genome regions, six models were compared with the PAML package: M0, M1, M2, M3, M7 and M8. For models M2, M3 and M8, the existence of positively selected codons is allowed as they incorporate a class of codons for which v = K a /K s (ratio of non-synonymous and synonymous substitution rates) can be .1. Therefore, whenever one of these models explained the observed data significantly better than the other corresponding alternative in which such a class is not allowed, then the existence of positively selected codons was inferred. Next, a Bayes empirical Bayes (BEB) procedure [72] was applied to detect codons with a posterior probability of belonging to the v.1 class larger than 0.95.

Phylogenetic trees and rates of molecular evolution
Maximum likelihood trees were constructed with PHYML [73] using a common evolutionary model (GTR+I+G) and common outgroup sequences, H77 (accession number NC_004102) for subtype 1a sequences and HCV-J (accession number D90208) for subtype 1b. These two outgroup isolates represent epidemiologically unrelated strains to those included in our study. HCV-H77 was isolated from an American patient in 1979 [74] and HCV-J was derived from a Japanese patient in the late 1980's [75]. Rates of evolution for the different time samples of each patient were estimated by removing the outgroup from each phylogenetic tree and then computing the average length of the arms for all the sequences from each time sample. Figure S1 Phylogenetic trees for the E1-E2 region from all 22 analyzed patients. Different symbols are used to denote sequences sampled at T0 (red dots), T1 (green dots) and T2 (blue dots).