Charged amino acid variability related to N-glyco -sylation and epitopes in A/H3N2 influenza: Hem -agglutinin and neuraminidase

Background The A/H3N2 influenza viruses circulated in humans have been shown to undergo antigenic drift, a process in which amino acid mutations result from nucleotide substitutions. There are few reports regarding the charged amino acid mutations. The purpose of this paper is to explore the relations between charged amino acids, N-glycosylation and epitopes in hemagglutinin (HA) and neuraminidase (NA). Methods A total of 700 HA genes (691 NA genes) of A/H3N2 viruses were chronologically analyzed for the mutational variants in amino acid features, N-glycosylation sites and epitopes since its emergence in 1968. Results It was found that both the number of HA N-glycosylation sites and the electric charge of HA increased gradually up to 2016. The charges of HA and HA1 increased respectively 1.54-fold (+7.0 /+17.8) and 1.08-fold (+8.0/+16.6) and the number of NGS in nearly doubled (7/12). As great diversities occurred in 1990s, involving Epitope A, B and D mutations, the charged amino acids in Epitopes A, B, C and D in HA1 mutated at a high frequency in global circulating strains last decade. The charged amino acid mutations in Epitopes A (T135K) has shown high mutability in strains near years, resulting in a decrease of NGT135-135. Both K158N and K160T not only involved mutations charged in epitope B, but also caused a gain of NYT158-160. Epitope B and its adjacent N-glycosylation site NYT158-160 mutated more frequently, which might be under greater immune pressure than the rest. Conclusions The charged amino acid mutations in A/H3N2 Influenza play a significant role in virus evolution, which might cause an important public health issue. Variability related to both the epitopes (A and B) and N-glycosylation is beneficial for understanding the evolutionary mechanisms, disease pathogenesis and vaccine research.


Introduction
Influenza is an acute respiratory infectious disease caused by influenza virus, which affects millions of people annually and results in moderate mortality. Among the different types and subtypes of influenza virus, the A/H3N2 subtype has dominated a lot of human influenza outbreaks worldwide since its emergence in 1968 [1]. Based on influenza surveillance, H3N2 virus evolved genetically and became the dominant strain in 2014/15 season in Japan (99%) and in Europe (83%) [2,3], but accounted for only 23.9% (625/2616) in 2015/16 season in the United States [4]. Both events of A/ H3N2 epidemic occurred in 2010 (81.9%, 127/155) and 2012 influenza seasons (Feb. to Jul.) in the Southern China [5,6].
Influenza viruses are subtyped by surface glycoproteins, which include hemagglutinin (HA) and neuraminidase (NA). Viral HA performs attachment of the influenza virus to sialic acid moieties on the host cell and functions as a major antigen initiating produce host specific antibody. The HA monomer could be divided into HA 1 and HA 2 while former is an important functional region. Viral NA is responsible for cleaving the terminal sialic acid residues, which helps to release viruses from the host cell [7]. The function of NA is to cleave the terminal sialic acid residues present on cell surfaces and progeny virions, facilitating release of the virus from infected cells and thus playing an important role in release and spread of progeny virions. Because of the lack of proofreading activity of its polymerase, influenza virus genes mutate very frequently without genetic correction, resulting in 1-2% annual divergence of influenza strains [1,8]. The amino acid (AA) substitutions of HA_AA158 N/K/D/S/R and _AA160 T/K/I occurred in Europe 2014/15 season and those of three HA sites (A 214 S, V 239 I and N 328 S) and two NA sites (L 81 P and D 93 G) did during the 2012 season in the Southern China [2,6].
Electric charge is an important biochemical feature of HA protein, relating to HA's antigenicity and receptor binding affinity. Mutation N 145 K in HA protein resulted in changing both antigenicity (epitope A) and receptor binding avidity, which was contributed by amino acid charge alteration [9]. Influenza M1 molecule charge drove conformational changes, leading to alterations in their electrostatic interactions [10]. Moreover, in the process of viral adsorption, a higher positive charge could promote the affinity of receptor binding domain (RBD) in HA binding to its host cell sialic acid receptors, which is highly negatively charged [11]. Evolution of the HA and NA genes has a most critical influence on influenza virus transmission, including antigenic drift and accumulation of N-glycosylation sites (NGSs) [12]. The N-glycosylation of the HA and NA acts to mask antigenic epitopes, constrain binding to host antibodies, protect the enzymatic sites of NA, and balance the activities of HA and NA [13]. Moreover, the NGSs in variants might play an even more important role in influenza virus evolution. HA epitopes (B-cell epitope) of A/H3N2 have been studied to detect mutations during each influenza season, including epitopes A, B, C, D, E, L and R [7,14]. Amino acid mutations resulted in changing in the epitope charge and interaction as the a combination of epitope sequences, pH optimization, and the additive L-arginine presented hydrophobic epitopes and their subsequent assembly into virus-like particles [15]. Herein, we sequenced the HA and NA genes of H3N2 viruses for years and analyzed the genetic and amino acid mutation and evolution including AA features, NGSs and epitopes. Our aim is to learn correlation between AA features (including charges) and NGSs/epitopes.

Virus and gene
Human H3N2 strains isolated in Guangdong, Southern China, from 2012 to 2016 were selected based on space-time sampling [1] Viral RNA molecules from H3N2 strains from Guangdong collected during 2012-2016 were extracted using a QIAamp Viral RNA mini kit (Qiagen, Venlo, The Netherlands). Reverse transcription-polymerase chain reaction (RT-PCR) assays were conducted using Qiagen Sensiscript Reverse Transcriptase and Takara PyroBest Taq. After purification using a Qiagen Gel Extraction Kit, amplicons were sequenced using an ABI PRISM 3100 Genetic Analyzer. Amplifications were performed on a 96-Well Thermal Cycler PCR system (Applied Biosystems, USA) under the following conditions: 30 min at 55˚C; 2 min at 94˚C; 40 cycles of 15 s at 94˚C, 30 s at 55˚C, and 60 s at 68˚C; and a final extension of 5 min at 72˚C.

Phylogenetic and evolution analysis
Viral nucleotide sequences were analyzed by means of the Maximum Composite Likelihood and phylogenic trees were generated based on the Neighbor-Joining (NJ) in the MEGA 7.0.21 package [19]. The reliability of phylogenic trees was estimated using 1000 bootstrap replicates. Evolutionary selection pressure was conducted using the Datamonkey web-server (www. datamonkey.org). Four different methods include the Single-likelihood ancestor counting (SLAC), Fixed effects likelihood (FEL), Internal Fixed effects likelihood (IFEL) and Random effects likelihood (REL), The SLAC, FEL, and REL are used to detect sites under selection at external branches of the phylogenetic tree, while the IFEL investigates sites along the internal branches. The SLAC is intensive for large alignments but appears to underrate the substitution rate and the REL is more suited to intermediate-sized datasets than SLAC, but it may be unsuitable for small alignments. The FEL method gives a lower rate than REL of false positives for data sets of few sequences [20]. The SLAC, FEL and IFEL were employed with the significance of 0.1.

Amino acid (AA) charge and molecular model
The amino acids based on their charges were classified into three groups including acidic amino acids [Aspartic acid (D) and Glutamic acid (E)], basic amino acids [Lysine (K), Arginine (R) and Histidine (H)] and neutral AA. At a specific physiological pH, each acidic/basic AA is ionized alternatively with a single negative/positive net charge, respectively [11]. For HA or NA monomer and subunit, the net charge (NC) was calculated by subtracting the number of acidic AA from that of basic AA.
The H3 HA protein includes five epitopes (A, B, C, D and E) and two additional conserved neutralizing epitopes (L and R) [7,14]. The amino acid substitutions may be divided into the charge+, charge-and charge ± substitutions, according to whether they increase, decrease or maintain the positive charge, respectively. During calculating epitope charge, as the number of 1968-1987 genes downloaded from GenBank were under the statistical requirements, these genes were calculated in six groups

Calculation and statistical analysis
Correlation and regression were statistically performed and statistical significance was depended on its significance probability (P<0.05), used SPSS 20.0 (SPSS Inc., Chicago, IL.). Calculated electric charge, the net charge S nc = S n i , where n is the respective number of acidic AAs (D and E) and basic AAs (K, R and H) in each monomer or epitope, and i is each charge of the five charged AA.

Genetic evolution and selection
Both phylogenetic trees were generated from the HA and NA genes of the influenza H3N2 viruses isolated between 1968 and 2016. A remarkable diversification of HA genes occurred in 1993, in which many nucleotides and corresponding AA mutations were identified, including charged S 133 D and E 135 T mutations in Epitope A, charged E 156 K, R 189 S, K 193 S and R 197 Q mutations in Epitope B, charged G 172 D and N 246 K mutations in Epitope D (Fig 1). For the NA graph, similar to HA genes, the NA genes evolved chronologically and a major diversification of NA genes occurred around 1990. Last decade, the charged amino acids mutations related to epitopes included Epitope A (T 135 K), Epitope B (K 158 N, K 160 T and N 189 K), Epitope C (N 278 K)

Biological charge of HA and NA
Both HA and NA charges depending on each AA charged were analyzed, whose strains isolated from 1968 to 2016 (Fig 3), in addition, that the HA could be divided into HA 1  The NA charges increased from +6.0 (1968) to +12.0 (1986), then kept around +10 (to 1998). But they decreased rapidly from +9.4 (1998) to +5.9 (2000), then it rose slightly.

Variation of NGS
A total of 15 potential NGSs were identified in HA genes spanning 48 years. The HA NGSs showed a gradual increase, ranging from 7 to more than 12 (14 domains). The spatial position of NGSs on HA proteins in variants/vaccine strains were shown in Fig 4. It was a special time during 1995-1998 that NGSs gained from 8.7 to 11.0. The incremental NGSs in the most strains since 1996 included NES 122-124 and NGT 133-135 and remained up to date. As to the HA gene of the vaccine strain A/Hong Kong/4801 /2014, a total of 11 NGSs were identified, such as NST 008-010 , et al (Table 2). It was worth noting that NGT 133-135 became a conserved NGS since its first appearance in 1996 (95.6%, 452/473), but its uncertain potentiality accounting for 22.0% (99/452) got a lower N-Glyc score (<++), especially during 2003-2007. Remarkably, due to the mutation K 160 T, the NYT 158-160 emerged in 2014 and got prevalent quickly worldwide (20.8% in 2014, 65.0% in 2015 and 54.2% in 2016). The NST 008-010 , NGT 022-024 , NVT 165-167 , NCT 063-065 and NGS 285-287 in HA genes were relatively conversed and remained stable potentiality during the whole period (Table 3), whereas the NGT 133-135 was almost conversed and stable in previous two decades, but deleted in recent two years, including in the HA of A/ Guangdong/ 264 /2016. The NET 081-083 and NCS 276-278 occurred respectively during 1968-1975 and 1992-1997. In addition, the NGT 483-485 was the only one identified in HA 2 , whose expression and composition in strains were nearly constant. It was correlated in statistics between the NGSs number and charge values in HA/HA1 (r NGS:HA charge = 0.834, p<0.001; r NGS:HA1 charge = 0.677, p<0.001), which indicated that the amino acid mutations involved charge changing were highly correlated with NGS domains here.
The NGSs of NA genes changed more slightly than those of HA, including 8-11 NGSs (Fig  4). An acquisition of NIT 093-095 was due to point mutation K 93 N during 1997-2000 and a loss was due to point mutation N 93 D/G since 2008. There were 8 NGSs emerging in NA of A/ Hong Kong/4801/2014, such as NIT 061-063 , et al ( Table 2). The NDT 146-148 used to be a highly conserved and a potential one up to 2008 and the D 147 N mutation resulted in a reduction in   Electric charges of epitopes A-E have been modified greatly since 1968 up to date (Fig 5). The net charges of epitopes featured following. 1) The charge values of all five epitopes changed greatly during the period of 1993-2000, especially epitope A, C and E. 2) For epitope A, its charges reached the minimum of -1.35 in 1993 and quickly increased to its maximum (+2.00) in 1999, then went into wavelike decline; conversely, charges of epitope E dropped b.The N-glycosylation site predicted with "++" or "+++" score would be identified as a strong potential one with asparagine N-glycosylated.
c. If there were multiple common mutations in a residue, a "/" would be used to separate the mutations and their proportions respectively.
https://doi.org/10.1371/journal.pone.0178231.t004 in a Z-shaped curve during 1984-2004 (1984-1994:~+3.00, 1995: +2.08, 1996-2004:~+2.00), where both charges of epitope C and D hit bottom in 2008 (+1.00/+0.89), then the former sharply got up while the latter slightly rose. As to average and standard deviation (SD) values, the charges of epitope A ( " X A = 0.38 ± 1.02) and epitope E ( " X E = 2.55 ± 1.12) varied greatly, while epitope C ( " X C = 2.08 ± 0.53) was relatively conserved. However, great variability has been shown in epitope B and C charges in the last 3-6 years, while average charges of the former during 2013-2016 showed a sharp "V"shape curve and the latter increased rapidly from +1.00 (2009) to +2.88 (2016). Among these epitope charges, the correlation coefficients were obtained between A and B (r A:B = 0.603, P<0.001), A and E (r A:E = -0.630, P<0.001), and D and E (r D:E = 0.521, P = 0.001). It indicated that the positive correlation between the epitope A and B charges, the negative correlation between the epitope A and E charges to some degree, and the slight positive correlation between epitope D and E charges.

Discussion
Influenza A/H3N2 has been circulating globally for nearly five decades and has resulted in many epidemics and deaths. According to a study in Germany, influenza-associated deaths per 100,000 persons in West Germany increased to 25.3 in 1989-1990 and 22.4 in 1995-1996, which was three-to five-fold more than usual during influenza seasons [24]. These findings corresponded to our phylogenetic diversities of HA and NA genes during the period 1990-1993, in which these genes were mutated greatly, including the charged amino acid mutations in Epitope A, B and D. Throughout for evolutionary pathway, the positive selection sites 121, 137, 138, 145, 159 and 262 in HA in this study have a remarkable impact on HA evolution [6]. Beside human H3N2 Influenza, 18 human infections linked to swine H3N2 (H3N2v) occurred in Michigan and Ohio, July-August 2016 [25]. In the Eastern China, 8.7% samples (87/1000) were identified as positive influenza and seven viruses were determined as H3 subtype AIV based on the HI results [26].
Missense mutations in nucleotide could give rise to amino acid (AA) substitutions. Charged AA (either acidic or basic one) mutations could alter the physical characteristics of a specific domain or the whole protein, furthermore some mutations resulted in a gain or loss of NGSs. Compared to strain A/Hong Kong/1/ 1968, the HA charges in this study increased and peaked at 1.54-fold and HA 1 charges peaked at 1.08-fold, while the number of NGSs in HA were nearly doubled. This present results were similar to a previous ones, whose data only up to 2010 [12].
Electrostatic interactions are important for the viral RBD binds to the host cell sialic acid receptors. However, viral N-glycans could shield the RBD, and the highly negatively charged sialic acids and sulfuric acids on the viral N-glycans might cause electrostatic repulsion of the host cell, which has an adverse influence on the HA binding function [11]. Since the positive charge of HA1 appears to exert a beneficial effect for viral adsorption, it is possible that the increased positive charge of HA1 was to neutralize the deleterious effect from hyper-glycosylation during evolution of human A/H3N2 virus. The HA 1 is positively charged in this study and the HA/HA1 charge values were highly correlated with NGSs number (r NGS: HA charge = 0.834; r NGS:HA1 charge = 0.677), which suggested that HA/HA 1 amino acid mutations related to their charge changing mainly focused on N-glycan modification.
Some novel NGSs (such as Asn-91) might be useful candidates for functional analyses to identify innovative genetic modifications for beneficial phenotypes acquired in human lineages [27]. For H3 HA genes here, a gain of the NWT 126-128 glycosylation site is interesting in the context of biological evolution, but the one plus (+) potential reduced its significance. The A/H3N2 influenza viruses charges, N-glycosylation and epitope mutations NGSs of NA were more conserved than those of HA, which suggested that NA undergo less immune pressure than HA. It was the current opinion that an increase of HA NGSs in A/ H3N2 might lead to reduction of viral virulence and decline in the severity of illness [28].
A B-cell epitope is defined as a region of an antigen recognized by either a particular B-cell receptor or that can subsequently elicit antibody in a humoral response. Epitopes A, B, C, E and L in HA genes of A/H3N2 (Table 4) have mutated to great degree in this study and most antigenicity-determining AA mutations adjacent to RBD, including sites HA_145, 155, 156, 158, 159, 189 and 193, were located within epitope B except 145 (Epitope A) [29]. Amino acids in epitope B in this study were the most unstable during the last decade and it concluded following, 1) five prevalent mutations were N 158 K, Y159F/S, K160T, K189N and Q197K; 2) four of the five mutations contained charged modification; 3) mutations in position 158, 159 and 160 resulted in the gain of NGS NYT 158-160 . It's noteworthy that both K 158 N and K 160 T were not only charged AA mutations in epitope B, but also caused a gain of NYT 158-160 . Previous study have shown that epitope B played a vital role in both antigenic phenotype and receptor specificity [30]. Mutation T 135 K occurred in epitope A, resulting in deleting a NGT 135-135 , which might be a significant mutation as well. The 1990s was also the special time for all epitopes' net charges, which coincided with our phylogenetic diversities of HA genes. Epitopes A and B, including residues of 135, 142, 144, 159 and 160, also showed high mutability in strains isolated from Guangdong, Southern China during 2012-2016.
Antigenic mutations are based on amino acid substitutions, where the net charge value and the order in AAs for each epitope are crucial. Just like the relation between key and lock, the charged AAs located on epitope must electrically and spatially correspond to that located on antibody's complementarity determining region, even if the general net charge is appropriate. Some opposite charged mutations emerging in one epitope simultaneously, such as K160T and Q197K (epitope B) and D 053 N and K 278 N (epitope C), might cause electrical repulsion of antibody binding, while the synthetic effect on net charge was neutral. Besides mutations in epitope B, the R 142 G (A), D 053 N (C), K 278 N (C), Y 094 H (E) and N 171 K (L) were prevalent or emerged charged mutations in recent five years as well. The accumulation of antigenic epitope mutations could, to a great extent, spark a local or provincial epidemic and/or outbreak [6,30]. Overall, amino acid mutations could initiate changes in NGSs and antigenic epitopes, which influenced viral pathogenicity towards the host.
The A/H3N2 virus circulated and dominated worldwide for more than four decades, meantime it still evolves rapidly at the genetic level. Due to its polymerase and segment reassortment, HA segments originated from equine H3 influenza might become associated with cross-species transmission even contribute to appearance of new strains [31,32]. The HA genes genetically mutated more rapidly than the NA genes so far, especially in missense mutations, suggesting that the HA gene was under immune pressure to strive for surviving. The charged amino acids mutations including Epitope A (T 135 K), Epitope B (K 158 N and K 160 T), Epitope C (Q 311 H) and Epitope D (K 173 Q and N 225 D) during last decade. We should continue to survey that A/H3N2 variations in AA features, NGSs and antigenic epitopes of could, to some degree, be sufficient to evade immune protection in humans. Molecular monitoring of NGSs and antigenic epitopes of the influenza A/H3N2 virus is beneficial for understanding the evolutionary mechanisms that govern influenza viruses. Moreover, monitoring of viral charge might enlighten vaccine development, including the selection of vaccine delivery vector and immunologic adjuvant.