Hepatitis C virus genetic diversity by geographic region within genotype 1-6 subtypes among patients treated with glecaprevir and pibrentasvir

Hepatitis C virus (HCV) is genetically diverse and includes 7 genotypes and 67 confirmed subtypes, and the global distribution of each HCV genotype (GT) varies by geographic region. In this report, we utilized a large dataset of NS3/4A and NS5A sequences isolated from 2348 HCV GT1-6-infected patients treated with the regimen containing glecaprevir/pibrentasvir (GLE/PIB) to assess genetic diversity within HCV subtypes by geographic region using phylogenetic analyses, and evaluated the prevalence of baseline amino acid polymorphisms in NS3 and NS5A by region/country and phylogenetic cluster. Among 2348 NS3/4A and NS5A sequences, phylogenetic analysis identified 6 genotypes and 44 subtypes, including 3 GT1, 8 GT2, 3 GT3, 13 GT4, 1 GT5, and 16 GT6 subtypes. Phylogenetic analysis of HCV subtype 1a confirmed the presence of two clades, which differed by geographic region distribution and NS3 Q80K prevalence. We detected phylogenetic clustering by country in HCV subtypes 1a, 1b, 2a, 2b, and 5a, suggesting that genetically distinct virus lineages are circulating in different countries. In addition, two clades were detected in HCV GT4a and GT6e, and NS5A amino acid polymorphisms were differentially distributed between the 2 clades in each subtype. The prevalence of NS3 and NS5A baseline polymorphisms varied substantially by genotype and subtype; therefore, we also determined the activity of GLE or PIB against replicons containing NS3/4A or NS5A from HCV GT1-6 clinical samples representing 6 genotypes and 21 subtypes overall. GLE and PIB retained activity against the majority of HCV replicons containing NS3/4A or NS5A from HCV GT1-6 clinical samples, with a median EC50 of 0.29 nM for GLE and 1.1 pM for PIB in a transient replicon assay. The data presented in this report expands the available data on HCV epidemiology, subtype diversity by geographic region, and NS3 and NS5A baseline polymorphism prevalence.


Introduction
Hepatitis C virus (HCV) infection affects 71 million people worldwide and is associated with liver disease and hepatocellular carcinoma [1][2][3][4]. HCV is genetically diverse and is classified into 7 genotypes, 67 confirmed subtypes, and 20 provisionally assigned subtypes [5]. The global distribution and prevalence of each HCV genotype (GT) varies by geographic region. HCV GT1 is the most prevalent worldwide and has a widespread geographic distribution, representing 46% of all HCV infections [1]. HCV GT3 is the second most prevalent genotype and accounts for 30% of global infections, and is more common in South Asia, Australasia, and some countries in Europe [1,6]. HCV genotypes 2 and 4 are the next most common, each representing 9-13% of HCV infections with more limited geographic distribution. GT2 prevalence is higher in Asia and West Africa, while a high incidence of GT4 infection occurs in Central and Eastern sub-Saharan Africa, North Africa, and the Middle East [1,6]. HCV genotypes 5, 6, and 7 are the most restricted in geographical distribution, with GT5 common in South Africa and GT6 prevalent in East and Southeast Asia [1], while GT7 infection has been reported in a small number of individuals from the Democratic Republic of Congo [7].
HCV genotypes and subtypes differ at the nucleotide level by approximately 30% and 15%, respectively, and subtype diversity varies by genotype [5]. HCV genotypes 2, 4, and 6 are the most diverse and include 11, 17, and 24 subtypes, respectively [5]. Subtype prevalence and diversity has been reported to vary by geographic region and country [6,8]. In GT1a, two viral clades differ by geographical distribution and occurrence of nonstructural protein 3 (NS3) protease polymorphisms [9,10]. Sequence clustering by geographic region has also been reported for other HCV subtypes based on phylogenetic analysis [8], indicating that genetically distinct virus lineages are circulating in different countries. Due to the high genetic diversity across HCV genomes, the efficacy of direct-acting antiviral (DAA) regimens for the treatment of HCV infection can be impacted by HCV genotype, subtype, and the presence of baseline amino acid polymorphisms [11]. Depending on the local prevalence of resistance-associated amino acid polymorphisms, the genetic diversity of HCV subtypes in different parts of the world may impact the treatment options available to HCV-infected patients.
Interferon-free HCV treatment options with DAAs have expanded from the earlier regimens of sofosbuvir (SOF) plus ribavirin (RBV) [12,13], SOF plus daclatasvir [14], SOF plus simeprevir [14,15], SOF/ledipasvir [16][17][18], ombitasvir/paritaprevir/ritonavir (OBV/PTV/r) with or without dasabuvir [19,20], and elbasvir/grazoprevir [21,22]. Newer HCV treatment options include pan-genotypic regimens with shorter treatment duration and indications for difficult-to-treat patient populations [23], including glecaprevir/pibrentasvir [24][25][26][27][28][29][30], SOF/velpatasvir (VEL) [31,32], and SOF/VEL/voxilaprevir [33]. Glecaprevir (GLE) is an NS3/4A protease inhibitor (identified by AbbVie and Enanta) [34], and pibrentasvir (PIB) is an NS5A inhibitor [35]. The regimen of GLE/PIB is highly efficacious for the treatment of HCV GT1-6 infection [24][25][26][27][28][29][30]. Among 9 phase 2/3/3b studies that evaluated the safety and efficacy of GLE/ PIB (300/120 mg dose) for the treatment of HCV GT1-6 infection, 2440 patients were enrolled in 27 countries that spanned five geographic regions worldwide, representing a large database of genetically and geographically diverse HCV GT1-6 patient samples. In this report, we utilized a large dataset of NS3/4A and NS5A sequences isolated from 2348 HCV GT1-6-infected patients treated with GLE/PIB to assess genetic diversity within HCV subtypes by geographic region, examine the prevalence of baseline amino acid polymorphisms by region/country and phylogenetic cluster, and determine the activity of GLE or PIB against subgenomic HCV replicons containing NS3/4A or NS5A from HCV GT1-6 clinical samples. responsible data sharing regarding the clinical trials we sponsor. This includes access to anonymized, individual and trial-level data (analysis data sets), as well as other information (e.g., protocols and Clinical Study Reports). This includes requests for clinical trial data for unlicensed products and indications. The clinical trial data relevant to this study can be requested by any qualified researchers who engage in rigorous, independent scientific research, and will be provided following review and approval of a research proposal and Statistical Analysis Plan (SAP) and execution of a Data Sharing Agreement (DSA). Data requests can be submitted at any time and the data will be accessible for 12 months, with possible extensions considered. For more information on the process, or to submit a request, visit the following link: https://www.abbvie.com/our-science/clinical-trials/ clinical-trials-data-and-information-sharing/dataand-information-sharing-with-qualifiedresearchers.html.

Clinical studies
SURVEYOR-1 (ClinicalTrials.gov identifier NCT02243280) and SURVEYOR-2 (Clinical-Trials.gov identifier NCT02243293) were open label, dose-ranging, phase 2/3 studies that evaluated the safety, efficacy, and pharmacokinetics of GLE plus PIB with and without ribavirin (RBV) for 8, 12, or 16 weeks duration in 868 patients with chronic HCV GT1-6 infection, with or without compensated cirrhosis [27][28][29]. The HCV patient sequences analyzed from the SURVEYOR-1 and SURVEYOR-2 studies were isolated from the baseline samples of 590 patients who received the study dose of GLE 300 mg plus PIB 120 mg. The ENDUR-ANCE-1, -2, -3, and -4 studies were phase 3 studies that evaluated the safety and efficacy of GLE/PIB (300/120 mg) for 8 [26]. Clinical studies SURVEYOR-1, -2, ENDURANCE-1, -2, -3, -4, -5/6, and EXPEDITION-1, -4 enrolled HCV-infected patients that were treatment-naïve and those treatment-experienced to pegylated-interferon ± RBV ± SOF. The current report is a post-hoc analysis of data from previously published multinational clinical trials SURVEYOR-1, -2, ENDURANCE-1, -2, -3, -4, -5/6, and EXPEDITION-1, -4. All clinical studies were conducted in accordance with the World Medical Association Declaration of Helsinki and the guidelines of the International Conference of Harmonization. The study protocols were approved by the relevant institutional review boards and regulatory agencies at each individual site, and all patients provided written informed consent. The names of the ethics committees and institutional review boards that approved these studies can be found on the trial registration at www.clinicaltrials.gov.

Subtype determination
Viral RNA isolation, reverse transcriptase (RT)-PCR, and nested PCR were conducted using genotype and subtype-specific primers on >2300 available baseline plasma samples, as previously described in detail for full-length NS3/4A and NS5A genes [34,35,[37][38][39][40][41]. For each sample, HCV genotype was identified by the Versant HCV Genotype Inno-LiPA Assay v2.0 at the time of enrollment in the study, and HCV subtype was subsequently determined by neighbor-joining phylogenetic analysis of full-length NS3/4A and NS5A nucleotide sequences, as described in detail previously [40,41]. The subtype for each sample was assigned based on agreement between NS3/4A and NS5A results when both gene sequences were available, or was based on data from a single gene target when data was not available from both genes.

Next-generation sequencing
NGS analysis of NS3/4A and NS5A amplicons from 2348 baseline samples was conducted by DDL Diagnostic Laboratory (Rijswijk, Netherlands) for studies SURVEYOR-1, -2, ENDUR-ANCE-1, -3, -4, -5/6, and EXPEDITION-1, and by Monogram Biosciences (South San Francisco, CA, USA) for studies ENDURANCE-2 and EXPEDITION-4, using methods that were described in detail previously [40]. PCR amplicons were purified and quantified by DDL Diagnostic Laboratory using methods previously described [40]. PCR amplicons were purified, quantified, and normalized by Monogram Biosciences using the AxyPrep Mag PCR Normalizer kit (Corning Life Sciences, Corning, NY). PCR amplicons were fragmented and tagged using the Nextera XT sample preparation and index kits (Illumina, San Diego, CA). Pairedend sequencing was conducted using the Illumina MiSeq platform, and FASTQ files were mapped against a subtype-specific reference sequence. Sequences were trimmed to remove nucleotides with a Q-score <25, and sequence reads less than 50 bases in length were discarded. Amino acid substitutions relative to a subtype-specific HCV reference sequence were reported by the sequencing vendor using a frequency threshold of !1%.

Phylogenetic analysis
NGS consensus nucleotide sequences for NS3/4A and NS5A were generated with an ambiguity setting of 0.25 and aligned using MAFFT [42]. NS3/4A and NS5A sequences were grouped by genotype and subtype, and included in phylogenetic analyses to assess genetic relationships within each subtype by geographic region and country. Maximum likelihood phylogenetic trees were constructed using PHYML [43,44] in Geneious software (Biomatters Ltd., Auckland, New Zealand) [45] with the HKY85 nucleotide substitution model [46], 100 bootstrapping replicates, and additional parameters as described in detail previously [40]. Phylogenetic analysis was conducted for both NS3/4A and NS5A for each subtype, and one target was selected as the representative tree displayed in figures. Sequence clusters that matched between the phylogenetic trees of NS3/4A and NS5A with bootstrap support !50 that contained !5 sequences were identified with a number, starting with C1 for the cluster with the greatest number of sequences.

NS3 and NS5A baseline polymorphism analysis
NS3 amino acid positions 36,43,54,55,56,80,155,156, 166 (GT3 only), and 168 were considered signature positions for the NS3/4A protease inhibitor class in GT1-6. Baseline amino acid polymorphisms were identified by comparing translated baseline sequences to the respective subtype-specific reference sequence shown in S2 Table. The number of patients with "Any" baseline polymorphism in NS3 at signature amino acid positions was also calculated. The prevalence of baseline polymorphisms in NS3 at a detection threshold !15% was analyzed by subtype, geographic region, and phylogenetic cluster. NS5A amino acid positions 24, 28, 30, 31, 32, 58, 92 and 93 were considered signature positions for the NS5A inhibitor class in GT1-6, and polymorphisms were identified by comparing translated baseline sequences to the respective subtype-specific reference sequence shown in S3 Table. The number of patients with "Any" baseline polymorphism in NS5A at signature amino acid positions was also calculated. The prevalence of baseline polymorphisms in NS5A at a detection threshold !15% was analyzed by subtype, geographic region, and phylogenetic cluster.
The activity of GLE or PIB against chimeric transient replicons containing NS3, NS3/4A, or NS5A from GT1-6 clinical samples was assessed using the transient replicon assay as described in detail previously [34,35,37,38,40,47]. The 50% effective concentration value (EC 50 ) of GLE or PIB was calculated in Prism5 software (GraphPad Software, Inc., La Jolla, CA) using a nonlinear regression curve fitting to the 4-parameter logistic equation. The average EC 50 value for each clinical sample was calculated from at least 2 independent experiments each conducted in duplicate.

Statistical analysis
Fisher's exact test with a two-sided significance level of 0.05 was used to compare sequence distribution by geographic region and phylogenetic clade or cluster, as well as the prevalence of baseline polymorphisms in NS3 and NS5A by geographic region and phylogenetic clade or cluster, without multiplicity adjustment.  (Fig 1). No GT7-infected patients were enrolled among the 9 phase 2/3/3b studies. Subtype diversity was highest in GT2, GT4, and GT6, and included 37 of the 52 confirmed subtypes among these 3 genotypes [5].

HCV GT1-6 subtypes by country of enrollment
The distribution of countries was the most diverse for subtypes 1a and 1b, representing enrollment of patients from 22 and 24 countries, respectively. In subtype 1a (n = 395), 28.1% of patients were from the United States, 11.4% were from Canada, and 8.4% were from Australia. In subtype 1b (n = 466), the patient distribution was generally evenly divided among 24 countries, including 9.2%, 8.8%, and 7.5% of patients from the United States, Romania, and Poland, respectively. Six samples subtyped as either GT1a (n = 4) or GT1b (n = 2) by phylogenetic analysis of NS3/4A and NS5A sequences were included in subsequent sequence analyses, and were later determined to be GT2/1 chimeras based on full-genome sequencing; one subtype 1b sample from Korea and four subtype 1a samples from the United States were 2b/1 chimeras, and one subtype 1b sample from the United States was a 2k/1b chimera.
Among GT2-infected patients, subtypes 2a, 2b and 2c were the most prevalent (92.8%), and the country distribution varied depending on the subtype (Fig 1). In GT2a (n = 152), the majority of patients were from Asian countries of South Korea (38.8%) and Taiwan (14.5%), while in GT2b (n = 272) 79.4% of patients were from the United States. GT2c-infected patients (n = 80) were mainly from Italy (67.5%) and Belgium (13.8%), while the majority of patients infected with other GT2 subtypes (n = 39) were from France. The majority of the GT3-infected patients enrolled had subtype 3a infection (98.7%). Among GT3a-infected patients (n = 627), the majority were from North America or Oceania, including 38.3% from the United States, 13.7% from Australia, and 11.6% from New Zealand, while patients infected with subtype 3b (n = 6) or 3i (n = 2) were from Australia, Canada, or the United Kingdom. The majority of the GT4-infected patients had subtype 4a or 4d infection (78.9%, Fig 1). Among 82 GT4a-infected patients, 48.8% were from the United States, while GT4d-infected patients (n = 53) were predominantly from Europe, and patients infected with other GT4 subtypes (n = 36) were generally from Belgium, France, or Canada. In GT5a (n = 53), 30.2% of patients were from South Africa, 32.1% were from Belgium, and 26.4% were from France. GT6-infected patients were predominantly from Canada, the United States, France, and Australia (Fig 1).
In NS5A, the prevalence of polymorphisms at amino acid positions 24, 30, 31, 92, and 93 was similar across geographic region and phylogenetic clade (GT1a, Table 3). Polymorphisms at NS5A amino acid position P32 were not detected in GT1-6 sequences. In the Oceania region, NS5A polymorphisms at position M28 were less prevalent (3%, 1/39), while polymorphisms at position H58 were more common (13%, 5/39) compared to other geographic regions. NS5A polymorphisms at positions important for the inhibitor-class had a similar prevalence between clade 1 and 2 in subtype 1a.

Phylogenetic clustering by geographic region in HCV subtypes 1b, 2a, 2b, and 5a
Phylogenetic analysis of 466 GT1b NS3/4A and NS5A sequences identified 9 sequence clusters with strong branch support, and clustering by country was most notable for sequences from Poland and Taiwan (Fig 3A). Cluster C1 (n = 13) contained sequences from Korea and Taiwan, and included 37% (11/30) of the total GT1b sequences from Taiwan. Among sequences from Poland, 45.7% (16/35) clustered in subgroups C5 and C6, which were comprised almost entirely (90-100%) of sequences from Poland (Table 1). Other smaller sequence clusters in subtype 1b included sequences exclusively from Asia (C8) or Australia (C9).
The prevalence of baseline polymorphisms in NS3 and NS5A was assessed by geographic region and phylogenetic cluster, and is shown in Tables 2 and 3, respectively. In GT1b, the prevalence of NS3 polymorphisms at amino acid positions 36, 54, 55, 155, and 168 was similar across geographic region and phylogenetic cluster (GT1b, Table 2). The most common polymorphism was Y56F, detected at 32% (147/461) prevalence overall. In general, Y56F was evenly distributed by geographic region, except that it was not detected in sequences from Oceania (0/10). The highest prevalence of Y56F was detected in sequences from Poland and Spain, with a prevalence of 51% (18/35) and 53% (18/34), respectively. Although the prevalence of Y56F was similar across geographic regions, the frequency of detection ranged from 0-100% in phylogenetic clusters C1-C9. Presence of Y56F was highest in clusters C7 (100%), C4 (70%) and C5 (60%), and detection was not limited to a specific country in those sequence clusters. NS3 Q80H/K/L/R polymorphisms were detected at an overall prevalence of 5% (23/ 461) in GT1b, but occurred at the highest prevalence in sequences from Asia (17.5%, 11/63). Polymorphisms at position Q80 were detected in 15-29% of sequences in clusters C1, C4, and C7.

Two clades identified by phylogenetic analysis in HCV subtypes 4a and 6e
Phylogenetic analysis of 82 HCV GT4a sequences identified 2 clades in subtype 4a (Fig 4A and  4B). The region and country distribution of clade 1 (n = 35) was 40% North America, 51% Europe, and 9% New Zealand. The distribution of clade 2 (n = 47) was 70% North America and 30% Europe ( Table 1). The prevalence of sequences from North America was significantly HCV genotype 1-6 genetic diversity higher in clade 2 (P-value = 0.008, two-tailed Fisher's exact test), whereas the distribution of sequences from Europe was not significantly different between the 2 clades. The prevalence of baseline polymorphisms in NS3 and NS5A was assessed by geographic region and phylogenetic clade, and is shown in Tables 2 and 3, respectively. In NS3, T54S was detected at a prevalence of 5% (4/81) overall and was evenly distributed between the 2 phylogenetic clades (GT4a, Table 2). In NS5A, polymorphisms were detected at amino acid positions L28, L30, and P58 among 81 GT4a sequences (GT4a, Table 3). NS5A polymorphisms L28M and P58L/S/T were evenly distributed among the 2 clades. The L30R polymorphism was differentially distributed between clade 1 and clade 2, occurring at a frequency of 17% (6/35) in clade 1 versus 4% (2/46) in clade 2, although the difference did not reach statistical significance (P-value = 0.07, twotailed Fisher's exact test).

Discussion
In this report, a large dataset of HCV GT1-6 NS3/4A and NS5A sequences was utilized to assess genetic diversity within HCV subtypes by geographic region. Among NS3/4A and NS5A sequences isolated from 2348 patient samples, phylogenetic analysis identified 6 genotypes and 44 subtypes, including 3 GT1, 8 GT2, 3 GT3, 13 GT4, 1 GT5, and 16 GT6 subtypes (Fig 1). In addition, we analyzed 20 GT2 samples where the subtype could not be determined by phylogenetic analysis due to lack of homology with the 11 confirmed GT2 subtypes, potentially representing novel GT2 subtypes. Subtype diversity was highest in GT2, GT4, and GT6, which are reported to be the most diverse genotypes and encompass 52 confirmed subtypes [5]. We detected phylogenetic clustering by country in HCV subtypes 1a, 1b, 2a, 2b, and 5a, suggesting that genetically distinct virus lineages are circulating in different countries. Since the prevalence of NS3 and NS5A baseline polymorphisms varied substantially by genotype and subtype among patients treated with a regimen of GLE/PIB, we also determined the activity of GLE or PIB against replicons containing NS3/4A or NS5A from HCV GT1-6 clinical samples representing 6 genotypes and 21 subtypes overall. In the transient HCV replicon assay, GLE and PIB retained activity against the majority of HCV replicons containing NS3/4A or NS5A from HCV GT1-6 clinical samples, confirming previous reports describing the pangenotypic activity of GLE and PIB [34,35]. A separate publication [39] recently presented the pooled resistance analysis of HCV GT1-6 infected patients treated with GLE/PIB in 8 registrational clinical studies, and revealed a lack of impact of genotype, subtype, or baseline polymorphism prevalence on treatment outcome with the recommended treatment duration [39,48,49]. HCV GT1 infection is the most prevalent and geographically disseminated genotype globally [6]. HCV subtype 1a is more common in North America, Andean Latin America, and Australia [6,50], and consists of two distinct phylogenetic clades [9,10]. Consistent with previous studies, our phylogenetic analysis of 395 HCV GT1a sequences from 22 countries confirmed the presence of 2 clades in subtype 1a, which differed by geographic region and NS3 Q80K prevalence. Similar to published reports [9,51], we found that the distribution of sequences from North America and the prevalence of the GT1a NS3 Q80K polymorphism were significantly higher in clade 1, while sequences from Europe were more common in clade 2 (P-value <0.0001). In our analysis, Q80K prevalence was significantly higher in sequences from North America (56%) compared to Europe (21%) or Oceania (2.5%; P-value <0.0001). The origin for this difference has been traced to a single virus lineage with Q80K in NS3 that occurred in the United States around 1940 [52].
Sub-clustering detected in subtype 1a, clade 1 in our analysis was generally grouped by the presence or absence of Q80K in NS3. 94.7% of Australian sequences in clade 1 contained Q80 in NS3 and clustered in a strongly supported sub-cluster with other NS3 Q80 sequences, likely representing previously described sub-clade 1C [51]. Phylogenetic separation has been reported for subtype 1a sequences from North America and Australia [53], and Bayesian estimates place the origin of the epidemics for both continents around the early 20 th century coinciding with World War I [54]. Geographical separation of the continents likely resulted in genetically distinct sequences due to a founder effect in the two regions, which may also explain the relative absence of the NS3 Q80K polymorphism in the Australian sequences. While NS3 sequences have been analyzed extensively in subtype 1a due to the impact of Q80K on SVR rates with regimens containing some HCV protease inhibitors [55], corresponding data for NS5A has not been widely reported. Our analysis of NS5A sequences in subtype 1a revealed identical phylogenetic separation by clade, and NS5A polymorphisms at positions important for the inhibitor-class had a similar prevalence between clade 1 and 2.
Phylogenetic clustering by geographic region has been reported for some HCV subtypes [8].
We detected phylogenetic clustering by country in HCV subtypes 1b, 2a, 2b, and 5a, likely due to geographical separation resulting in genetically distinct virus lineages. Among 466 GT1b sequences from 24 countries included in our analysis, clustering by country was most notable for sequences from Poland and Taiwan (Fig 3A), where GT1b comprises 77% [56,57] and 46% [58] of HCV infections, respectively. Incidence of HCV infection in Poland and Taiwan is higher in the intravenous drug use (IDU) population [59,60], and phylogenetic clustering has been reported for networks of people who inject drugs [61]. However, only 1 out of 65 total patients from Poland or Taiwan reported a history of IDU in our GT1b-infected patient population, so clustering of sequences from Poland and Taiwan is likely due to geographical separation rather than independent networks of IDU individuals.
Our analysis of HCV subtypes 2a and 2b detected independent clustering for the majority of sequences from Europe, New Zealand, Asia, and North America (Fig 3B and 3C). In GT2binfected patients, the distribution of NS5A sequences with L/M31 polymorphisms varied by geographic region and phylogenetic cluster. Due to epidemiological differences among GT2 subtypes and a more limited global distribution for GT2 in general [1], clustering by country in GT2a and GT2b is likely due to the geographical separation of Europe, New Zealand, Asia, and North America. Similarly, in our analysis of 53 GT5a sequences, three phylogenetic clusters were detected representing the countries of Belgium, France, and South Africa (Fig 3D). GT5a is the most prevalent genotype in South Africa [6,57], but high prevalence of GT5a has also been reported in limited geographic areas of France [62,63] and Belgium [64]. Phylogenetic clustering has been reported for GT5a sequences originating from Belgium [64], and Bayesian phylogeny estimated the time to most recent common ancestor for Belgian and South African isolates to the late 1800s, demonstrating independent populations of GT5a circulating for over 100 years in both Belgium and South Africa [65].
Our analyses of GT2c, GT3a, GT4d, and GT6a sequences did not detect phylogenetic clustering by geographic region or country (S1 and S2 Figs). Phylogenetic clustering by country has been reported for GT3a sequences from Pakistan [66], India [8], and Russia [8], where GT3a occurs at a prevalence of 79% [67], 64% [57], and 36% [57], respectively, among HCVinfected individuals. While GT3a is the most prevalent genotype in Pakistan and India, our clinical studies did not enroll patients from these countries, likely explaining why we failed to see significant clustering in GT3a.
Two clades were detected in GT4a by phylogenetic analysis in our study, and we found that the prevalence of sequences from North America was higher in clade 2 based on country of enrollment. HCV GT4 infection occurs at a low frequency in HCV-infected patients from North America, and is most prevalent in Central and Eastern sub-Saharan Africa, North Africa, and the Middle East [1,6,68]. Egypt has the highest prevalence of GT4 infection worldwide [68,69], predominantly subtype 4a which spread rapidly due to anti-schistosomiasis campaigns beginning in the 1940s [69][70][71]. In a previous study that examined NS5A genetic diversity in HCV GT4-infected patients treated with OBV/PTV/r, subtype 4a sequences from Europe and the United States clustered independently from 4a sequences from Egypt in a phylogenetic analysis, and the L30R/S polymorphism in NS5A was significantly associated with the Egyptian cluster [40]. Since we did not collect country of origin information from GT4ainfected patients in our clinical studies, we included 7 GT4a NS3/4A and NS5A sequences from GenBank that were identified as originating from Egypt in our phylogenetic analyses ( Fig 4A) and found that the Egyptian GT4a sequences all sorted to clade 1. NS5A baseline polymorphism analysis also revealed that prevalence of L30R was numerically higher in clade 1 (17%) versus clade 2 (4%). Based on these combined observations, we propose that subtype 4a clade 1 may be associated with sequences from Egypt and characterized by the presence of the NS5A L30R polymorphism; this hypothesis should be investigated in future studies.
In GT6e, two clades were also identified by phylogenetic analysis of 25 NS3/4A and NS5A sequences (Fig 4C and 4D), and NS5A amino acid polymorphisms were differentially distributed between the 2 clades. Based on country of enrollment, the geographic region distribution was not statistically different between the 2 clades. However, most of the GT6e-infected patients were enrolled from North America and Europe where GT6 infection is relatively rare, and the patients' country of origin is not known. HCV GT6 infection is most prevalent in East and Southeast Asia [1,72], and outside of those regions GT6 infection is generally found in emigrant populations from endemic countries [72,73]. Subtype 6e occurs at a high frequency in Vietnam where GT6 comprises around 54% of HCV infections [74,75]. Limited data have been published for subtype 6e describing genetic diversity and prevalence of baseline polymorphisms in NS3 and NS5A [8]. Among 25 GT6e-infected patients included in our analysis, V36L in NS3 was detected in 8% (2/24) of sequences, and was exclusively detected in clade 2. NS5A baseline polymorphisms K24R, V28M, L31I, P58S, and T93S were detected in 52% (13/ 25) of GT6e sequences, and were each differentially distributed between clade 1 and clade 2 ( Table 3). NS5A polymorphism K24R was exclusively detected in clade 1, while L31I, P58S, and T93S were only detected in clade 2, and V28M prevalence was significantly higher in clade 1 (89%) versus clade 2 (12%). Our analysis provides additional data illustrating the genetic diversity of subtype 6e.
In conclusion, this study examined HCV genetic diversity among 6 genotypes and 44 subtypes identified from 2348 HCV-infected patients treated with a regimen of GLE/PIB who were enrolled in 27 countries, thus expanding the available data on HCV epidemiology, subtype diversity, and NS3/NS5A baseline polymorphism prevalence at amino acid positions important for the inhibitor class. The efficacy of DAA regimens for the treatment of HCV infection can vary by HCV genotype, subtype, and the presence of baseline polymorphisms, although newly approved pangenotypic DAA regimens are less impacted by these variables [11,23,39]. While phylogenetic clustering by country is still detected for some HCV subtypes, representing genetically distinct virus lineages circulating in specific countries or regions, the global distribution of subtypes and intra-subtype virus lineages appears to be shifting with increased immigration patterns. The availability of DAA regimens for HCV treatment varies by country worldwide [74], and continued effort is required to ensure pangenotypic DAA regimens are available for all HCV-infected patients in order to achieve HCV elimination worldwide.
Supporting information S1