Fig 1.
Schematic of host-virus interactions at the hemagglutinin (HA) viral surface protein.
(A) The structure of the H1N1 HA. The immunodominant HA head (HA1) contains the antigenic sites [8] onto which antibodies bind to neutralize the virus, whereas the HA stalk (HA2) is more conserved. (B) Immune escape due to increased avidity mediated by electrostatic charge, according to transmission experiments by Hensley and coauthors [1]. To infect host cells, virions must first successfully enter cells. To accomplish this, the HA protein binds to a cellular receptor, i.e. sialic acid, on the surface of the cell. An increase in charge is posited to increase HA avidity and therefore facilitate HA binding, leading to successful immune escape. (C) Patterns of charge over time in natural human sequences. Arinaminpathy and Grenfell [2] showed that the H3 increases over time, but that the H1 is constant. (Inset) Schematic of the net charge trends over time for the HAs of H1N1 and H3N2 viruses in natural human sequences.
Fig 2.
Identification of negative, zero, and positive functional branches.
(A) Scatter-plot of site-wise functional selection and expected charge, based on deep mutational scanning (DMS). For a given residue using DMS preference data, ‘functional selection’ is a measure of functional constraints at this site (a transformation of Shannon’s evenness) and ‘expected charge’ is the mean charge from DMS preferences. The colours are to facilitate the identification of the negative (yellow), zero (dark blue), and positive (light blue) branches. (B) The location of the residues belonging to each of the three branches identified in A, on the HA monomer, visualized with PyMOL [24] and PDB 1RVX [23]. (C)-(E) The net charge of the positive (C), negative (D), and zero (E) branches over time in 1741 H1N1 sequences from human hosts during 1918 to 2008.
Fig 3.
Evolutionary rates, relative solvent accessibility (RSA), and distances to receptor binding site (RBS) on functional branches.
(A) Normalized conservation scores (labelled rate of evolution) per site overlaid on the functional selection-expected charge plot. These conservation scores were obtained from Rate4site [27] with H1N1 sequences from the fludb database of the IRD [28]. A higher score indicates lower conservation and faster evolution. (B) Empirical cumulative distributions of conservation scores for each of the three branches. (C) Distances to the RBS overlaid on the functional selection-expected charge plot. These distances were calculated using PyMOL [23] (see Materials and methods) (D) Empirical cumulative distributions of the distance to the RBS for each of the three branches. (E) Relative solvent accessibility overlaid on the functional selection-expected charge plot. These scores were computed using dmstools2 [29] (see Materials and methods) (F) Empirical cumulative distributions of relative solvent accessibility for each of the three branches.
Fig 4.
Characteristics of conservation and distance to RBS in each HA subunit.
(A) Conservation scores (labelled rate of evolution) per HA subunit, overlaid on the functional selection-expected charge plot. These scores were obtained from the Rate4Site algorithm [27] using H1N1 sequences obtained from the IRD fludb database [28] (see Materials and methods). As in Fig 3A, note that higher scores indicate higher rate of evolution. (B) Distances to the RBS per HA subunit, overlaid on the functional selection-expected charge plot. These distances were calculated using RBS annotations according to Gamblin and colleagues [24] using PyMOL [23] (see Materials and methods). (C) and (D) are as in (A) and (B), respectively, but for the HA2 subunit. (E) Empirical cumulative distribution of evolutionary rates of each branch separated by domain. (F) Empirical cumulative distribution of distance to RBS, separated by both branch and domain.
Fig 5.
Fitted observed charge in 1918, 1948, 1978, and 2008 computed from fitted per-site multinomial logistic regression models.
These models were fitted using nnet [30] in R [31] (see Materials and methods). All 1744 human HA H1N1 sequences between 1918 and 2008 from fludb [28] were used to fit these statistical models. The arrow is used to highlight sites in the negative branch that are transitioning to positive values.