Next-generation sequencing analysis of a cluster of hepatitis C virus infections in a haematology and oncology center

Molecular characterization of early hepatitis C virus (HCV) infection remains rare. Ten out of 78 patients of a hematology/oncology center were found to be HCV RNA positive two to four months after hospitalization. Only two of the ten patients were anti-HCV positive. HCV hypervariable region 1 (HVR1) was amplified in seven patients (including one anti-HCV positive) and analyzed by next generation sequencing (NGS). Genetic variants were reconstructed by Shorah and an empirically established 0.5% variant frequency cut-off was implemented. These sequences were compared by phylogenetic and diversity analyses. Ten unrelated blood donors with newly acquired HCV infection detected at the time of donation (HCV RNA positive and anti-HCV negative) served as controls. One to seven HVR1 variants were found in each patient. Sequences intermixed phylogenetically with no evidence of clustering in individual patients. These sequences were more similar to each other (similarity 95.4% to 100.0%) than to those of controls (similarity 64.8% to 82.6%). An identical predominant variant was present in four patients, whereas other closely related variants dominated in the remaining three patients. In five patients the HCV population was limited to a single variant or one predominant variant and minor variants of less than 10% frequency. In conclusion, NGS analysis of a cluster of HCV infections acquired in the hospital setting revealed the presence of low diversity, very closely related variants in all patients, suggesting an early-stage infection with the same virus. NGS combined with phylogenetic analysis and classical epidemiological analysis could help in tracking of HCV outbreaks.


Introduction
The high level of intrahost and interhost hepatitis C virus (HCV) diversity results from the high error rate of RNA dependent RNA polymerase (RdRp) and fast replication of the virus. PLOS  Consequently, HCV population represents a swarm of closely related variants called quasispecies. HCV variability enables the evasion of host adaptive immune responses and establishment of chronic infection, as well as drug resistance [1,2]. Viral molecular diversity is often significantly reduced upon virus transmission to a new host (bottleneck effect) [3][4][5][6]. The bottleneck phenomenon may be affected by the size of the inoculum, HCV genotype, viral load and complexity of the virus population in the donor (number and frequency of variants), as well as recipient host factors, such as IL28B genotype [7,8].
Studies on the early evolution of HCV following infection are rare due to the limited availability of clinical samples from the early stages of infection [9,10]. In previous studies HCV intrahost diversity was analyzed using such well-established techniques such as DNA heteroduplex gel shift method [11] or bulk clonal sequencing [12]. However, their sensitivity with respect to minor variant detection is typically low [4,13]. Novel methods suitable for in-depth analysis of quasispecies phenomenon were introduced such as single-genome analysis and next-generation sequencing (NGS) allowing for the evaluation of a wide spectrum of genetic variants, including those of minor frequency [5,14,15]. Despite some technical limitations, a reliable detection of variants constituting as little as 0.5% of the population became feasible [16].
In the present study we took advantage of a unique opportunity to investigate in-depth genetic diversity during early stages of infection, by analyzing a cluster of HCV infection among patients of a regional hematology and oncology center in Southern Poland. We investigated the diversity of hypervariable region 1 (HVR1) which represents a highly exposed fragment of envelope 2 glycoprotein playing a major role in HCV cell entry (receptor binding, membrane fusion) and is a major target for specific antiviral response (antibody shielding, epitopes for antibody responses) [17,18]. Its variability facilitates immune evasion and reflects the immune pressure of the host [19].
Our study demonstrates the presence of very low HVR1 diversity in the early stage of viral infection. Since these variants were closely related, the patients were most likely infected from a common source.

Patients
In November and December 2015 five clinically overt cases of acute hepatitis C infection were diagnosed among patients of a regional hematology and oncology center in Southern Poland. As all these patients had repeated hospital stays between August and October 2015, all patients hospitalized in this period in the same ward were contacted and asked to provide a blood sample for HCV infection screening and analysis. Out of 129 inpatients, 34 were already dead by the time of the study, 17 refused participation or could not be reached (including one patient from the initial cluster), and 78 provided both a sample and consent. Out of these tested individuals, HCV RNA was found in ten patients. Extensive epidemiological investigation did not identify the source of infection. Basic clinical and virological data on the study subjects are presented in Table 1.
Plasma samples from ten HCV RNA-positive, anti-HCV negative blood donors were used as controls for phylogenetic and sequence similarity comparisons. These controls were infected with the same HCV 1b subtype, and their infection was identified at the time of attempted donation.
The study was approved by the Bioethical Committee of the Medical University of Warsaw (Approval Number WUM AKBE/144/16) and Institute of Hematology and Transfusiology (Approval Number 55/2013) and all subjects and controls provided written informed consent.

Pyrosequencing
Approximately 3×10 7 DNA amplicons were subjected to emulsion PCR using the GS Junior Titanium emPCR Lib-A Kit (454 Life Sciences, Branford, CT, USA). Pyrosequencing was carried out according to the manufacturer's protocol for sequencing amplicons using GS Junior (454 Life Sciences).

Data analysis
Sequencing errors (mismatches, insertions and deletions) were corrected and haplotypes reconstructed using the program diri_sampler from the Shorah software suite (https://www1. ethz.ch/bsse/cbg/software/shorah) [21]. Haplotypes of posterior probability > 95% and represented by at least 10 reads were extracted with LStructure (https://github.com/ozagordi/ LocalVariants/blob/master/src/LStructure.py). Based on pyrosequencing and reconstruction of a cloned HVR1 sequence [16] we were previously able to reliably detect variants constituting as little as 0.5% of the population and this cut-off was implemented in the current analysis. Subsequently, reconstructed haplotypes of frequency >0.5% were aligned to the consensus sequence (the most frequent sequence in all patients) and translated into amino acid sequences by MEGA (Molecular Evolutionary Genetics Analysis), version 6.0 (http://www.megasoftware. net/) [22]. Phylogenetic trees were constructed according to the Maximum Likelihood method based on the Tamura-Nei model [23] using MEGA 6.0. We used the same approach in our previous studies [20,24] and the superiority of Tamura-Nei model for the analysis of HVR1 was reported by others [25]. The robustness of tree topology was estimated by the bootstrapping method (resampling 1000 data sets) using MEGA 6.0. Genetic diversity parameters were assessed by DNA SP version 5 (http://www.ub.edu/dnasp/) and MEGA 6.0. Sequence similarity was compared using Clustal 2.1 Percent Identity Matrix (http://www.clustal.org/omega/) [26]. Furthermore, Highlighter from HIV.lanl.gov was used to visualize individual sequence polymorphisms [13].

Results
In the present study the HVR1 amplification and molecular analysis were successful in seven out of ten HCV-infected patients. In three patients HVR1 could not be amplified mostly likely due to low viral load (patients 1 and 9), and mismatch between primers and particular viral strain (patient 10). An average of 4506 HVR1 sequence reads was obtained per sample (median 4149); (Table 2). Reads were reconstructed by SHORAH and, after implementation of the experimentally established 0.5% cut-off, one to seven HCV variants were retained per sample (mean 3.4, median 3.0). Mean nucleotide diversity was 0.032 (median 0.015) and number of nucleotide substitutions was 11.4 per patient (median 5.0). After translation to amino acid sequence, the number of variants ranged from one to three per patient (mean 2.4, median 3.0). The detailed data for each patient are presented in Table 2. When all patients' sequences were phylogenetically compared with sequences from ten unrelated controls, it was found that all the patients' sequences clustered together except for two variants of lower frequency (26.6% and 3%) in patient 8, which clustered with variants from one control (C_118). Moreover, sequences derived from the cluster were interspersed with one another, with no evidence of clustering in individual patients (Fig 1).
Intrapatient phylogenetic trees could be constructed only for patients in whom at least three HVR1 variants were present (patients 3, 4, 5, 6 and 8; Fig 2). As seen, in patients 3, 4 and 6 the trees displayed star-like phylogeny while in patient 5 the tree was more complex, with higher number of clades. Nevertheless, the two dominant variants of 54.1% and 23.5% each other. In the remaining patients HVR1 population was comprised of only one (patient 7) or two variants (patient 2). In the latter the major variant prevailed at 97.6% of population. Sequence similarity analysis revealed that HVR1 sequences from the analyzed patients were more similar to each other (95.4% to 100.0%) than to the sequences derived from controls (64.8% to 82.6%). The only exception were two low frequency variants (26.6% and 3.0%) seen in patient 8. Similarity of the latter two variants to variants from the other six patients ranged from 79.0% to 82.8% while similarity to variants found in controls ranged from 67.6% to 98.3%. Comparison of sequences from all seven patients is shown on Fig 3. Nucleotide sequence analysis revealed that the predominant variant was identical in patients 5, 6, 7 and 8 and a very similar variant was predominant in patients 2 and 4 (Fig 3). These two predominant variants differed only by two nucleotide substitutions (98.86% similarity). In patient 3 the predominant variants were slightly different from predominant variants in patients 2, 4, 5, 6, 7 and 8 (98.30% similarity). When the frequency structure of variants was analyzed it was found that in patients 2, 3, 4, 6 and 7 (71.4% of patients) the HVR1 populations were "narrow" (i.e. limited to single variant or to one predominant variant and minor variants of less than 10% frequency). When compared to consensus sequence, nucleotide substitutions were largely non-silent (Fig 4).
When analyzing amino acid substitutions compared to the consensus sequence, it was found that changes affected codons 384, 386, 392, 398, 404, 407 within the HVR1 and 413 outside the HVR1, whereas in variants 3 and 2 from patient 8 there were multiple changes (Fig 5). However, all the identified changes were outside of the potential N-glycosylation site at position 417.

Discussion
In the present study we characterized HCV HVR1 variants in a group of patients from a regional hematology and oncology center. These patients were found to be HCV RNA positive after hospitalization but the source of infection remained unknown despite an extensive epidemiological investigation. As only one of the seven patients seroconverted at that time, the samples likely represented the very early stage of infection. We identified the presence of the same or nearly identical viral strain in all patients by phylogenetic linkage and by high sequence similarity of patients' HVR1 variants (95.4% to 100.00%).
In five out of seven patients the phylogenetic analysis was either consistent with star-like phylogeny or there were no more than two variants present (one of which was dominant and the other was minor) which also suggests infection with a single founder. In addition, diversity per site within each patient viral population was low and roughly the same (except for patient 8), as would be expected by assuming that the divergence time since transmission was short and similar for each patient.
The complexity of variant populations, reflected by the number of nucleotide variants, was very low (mean 3.4 variants) which is in contrast to high diversity of HVR1 displayed during chronic infection. For example, in our previous study, the average number of HVR1 variants during chronic HCV 1b infection was 30-40 [16, 24]. This low complexity probably reflects the bottleneck effect at the time of transmission and suggests that the infection has been initiated by a single variant, so called "founder", which must have been very similar or even identical to the consensus sequence inferred from all patients' sequences [3,4,27]. Furthermore, the structure of HVR1 population was "narrow" in the majority of cases (as one predominant variant was accompanied by minor variants at <10% frequency), which is also compatible with a recent single variant infection. Previous studies showed that in chronic hepatitis C viral population tends to become more "flat" in terms of frequency structure, with higher predominance of moderate and low frequency variants [24]. Alternatively, the structure of the populations could have been affected by the presence of immunosuppression due to immunosuppressive drugs and the underlying disease. However, during such an early phase of infection the immune system response, which could narrow the population diversity is likely to be limited [4].
So far very few published studies analyzed diversity in the early stages of HCV infection [4][5][6][28][29][30][31][32]. In the typical clinical setting, the complexity and diversity of HCV quasispecies is reduced at the time of transmission [6,33]. However, a bottleneck may not be present in case of massive infections [30,34].
In our study, the exact route of infection was not identified, but these could have been errors during line flushing and/or multidose vials use. In this case the inoculum (i.e. infectious dose) would have been very small which could explain the bottleneck and very "narrow" character of the viral lineages in the infected patients.
Whether one of the patients was the source of infection is unclear. Patient 8 differed from the other patients due to her higher intrahost HVR1 heterogeneity and high similarity of two of her strains to those found in the unrelated control. As these samples were sequenced in separate runs, using different multiplex sequence identifiers (MIDs) the findings were unlikely to be artifactual (sequencing or demultiplexing error or contamination). These data imply that patient 8 could have acquired the infection earlier and was subsequently superinfected with the predominant strain. Alternatively, the patient could have been the source of infection herself, transmitting only the predominant strain to other patients. Indeed, in the study of Campo et al, where multiple HCV infection outbreaks were studied by NGS, intrahost HVR1 populations derived from the infection source displayed the highest genetic heterogeneity [35].
Another possibility is that the source of infection was patient 5. This patient's HVR1 variants displayed more complex phylogeny (higher number of clades) and frequency distribution which are typical for the later phase of infection (higher predominance of moderate and low frequency variants closely related to each other). Patient 5 was also the first to display elevation of ALT activity levels.

Conclusions
Our NGS analysis of a cluster of HCV infections in the hospital setting revealed the presence of low diversity, very closely related variants in all patients, suggesting an early-stage infection with the same viral variant. NGS combined with phylogenetic analysis and classical epidemiological analysis could help in tracking of HCV outbreaks.