Evolutionary Dynamics Analysis of Human Metapneumovirus Subtype A2: Genetic Evidence for Its Dominant Epidemic

Human metapneumovirus (hMPV) is a respiratory viral pathogen in children worldwide. hMPV is divided into four subtypes: hMPV_A1, hMPV_A2, hMPV_B1, and hMPV_B2. hMPV_A2 can be further divided into hMPV_A2a and A2b based on phylogenetic analysis. The typical prevalence pattern of hMPV involves a shift of the predominant subtype within one or two years. However, hMPV_A2, in particular hMPV_A2b, has circulated worldwide with a several years long term high epidemic. To study this distinct epidemic behavior of hMPV_A2, we analyzed 294 sequences of partial G genes of the virus from different countries. Molecular evolutionary data indicates that hMPV_A2 evolved toward heterogeneity faster than the other subtypes. Specifically, a Bayesian skyline plot analysis revealed that hMPV_A2 has undergone a generally upward fluctuation since 1997, whereas the other subtypes experienced only one upward fluctuation. Although hMPV_A2 showed a lower value of mean dN/dS than the other subtypes, it had the largest number of positive selection sites. Meanwhile, various styles of mutation were observed in the mutation hotspots of hMPV_A2b. Bayesian phylogeography analysis also revealed two fusions of diffusion routes of hMPV_A2b in India (June 2006) and Beijing, China (June 2008). Sequences of hMPV_A2b retrieved from GenBank boosted simultaneously with the two fusions respectively, indicating that fusion of genetic transmission routes from different regions improved survival of hMPV_A2. Epidemic and evolutionary dynamics of hMPV_A2b were similar to those of hMPV_A2. Overall, our findings provide important molecular insights into hMPV epidemics and viral variation, and explain the occurrence of an atypical epidemic of hMPV_A2, particularly hMPV_A2b.


Introduction
Human metapneumovirus (hMPV), first identified in 2001 in the Netherlands, is an enveloped, non-segmented, RNA virus that belongs to the Paramyxoviridae family and pneumovirus subfamily [1]. It is an important respiratory viral pathogen in children, which circulates in all age groups worldwide, causing illnesses that range from mild colds to severe lower respiratory tract diseases [1][2][3].
The genome of hMPV is approximately 13 kb in length and is comprised of eight genes, encoding nucleocapsid (N), phosphoprotein (P), matrix protein (M), fusion glycoprotein (F), transcriptional enhancer protein (M2-1), small hydrophobic protein (SH), attachment glycoprotein (G), and a large polymerase protein (L) [4]. As a type II membrane protein, the hMPV G protein plays an important role in viral attachment and is a potential target for neutralizing and protective immune responses [5,6]. Accordingly, the G protein shows extensive amino acid variations, mostly in the ectodomain, that help the virus to escape immunity pressure from hosts [7,8]. Thus, the G protein has been widely used to infer evolutionary relationships among isolates from different regions [8][9][10]. Based on genetic variability of the G gene, hMPV has been divided into two genotypes (A and B) and four subtypes (A1, A2, B1, and B2) [8,11]. There are two subgroups of hMPV_A2, hMPV_A2a and hMPV_A2b, the latter of which was identified in Germany in 2006 [12].
The prevalence pattern of hMPV is similar to that of respiratory syncytial virus; two subgroups of the virus can co-circulate with varying degrees of prevalence [13,14]. Shifts in the predominant hMPV subtype within one or two years are typical in the prevalence pattern of hMPV, and can be attributed to the response to the existing immunity against circulating strains [11][12][15][16][17][18][19]. However, such a shift has not been observed in the prevalence pattern of the hMPV_A2 subtype. Reports from Korea, Mexico, Italy, and Germany indicate that hMPV_A2 can maintain predominance for three or more years, rather than undergoing the shift observed in other hMPV subtypes [15,[20][21][22][23]. The mechanism behind this distinctive epidemic feature of hMPV_A2 is unclear.
Here, we compiled the epidemic patterns of the hMPV subtypes based on the global incidences of hMPV subtypes, using published data as well as our own data collected between 1999 and 2009. We found that hMPV_A2 was predominant from 2002 to 2009. To interpret the mechanisms behind this phenomenon, we reconstructed and compared the evolutionary dynamics of all hMPV subtypes, particularly the phylodynamics of hMPV_A2a and A2b, based on sequences of the G gene. Our findings provide insights into the prevalence pattern of the hMPV subtypes and the evolutionary mechanism behind the atypical prevalence pattern of hMPV_A2.

Sample collection
A total of 60 hMPV-positive respiratory specimens were taken from pediatric and adult patients with acute respiratory tract infections (ARTIs) at the Beijing Children Hospital and the Peking Union Medical College Hospital in Beijing, China between 2006 and 2009. Sampling was performed once for each patient. The inclusion criteria of ARTI patients have been described previously [24,25]. Briefly, the enrolled patients had acute respiratory symptoms, acute fever (body temperature §38.0uC), and normal or low leukocyte count. For each adult patient, a nasal and a throat swab were collected simultaneously. The two swabs were combined in one tube containing 3 ml viral transport medium (VTM). For each pediatric patient, about 3 ml of nasopharyngeal aspirate (NPA) was collected under aseptic conditions. All samples were stored at 280uC prior to analysis. Clinical information of each enrolled patient was recorded in a standard form and reviewed retrospectively upon identification of hMPV-positive patients. Written informed consent was obtained from all participants or guardians on behalf of the minors/children participants. The study was approved by the Medical Ethic Review Board of the Institute of Pathogen Biology, Chinese Academy of Medical Sciences. All samples were coded prior to analysis to ensure anonymity.
Nucleotide amplification and sequencing of hMPV partial G genes RNA was extracted from the samples using QIAamp viral RNA mini Kit (Qiagen, Duesseldorf, Germany), and reverse transcribed using SuperScript II reverse transcription system (Invitrogen, Carlsbad, CA) following the manufacturer's instructions. The cDNA of hMPV was amplified by nested PCR. The 20 ml PCR reaction mixture contained 2 ml of reaction mix, 0.4 ml of dNTP (10 mM), 0.2 ml of each primer (50 mM) and 0.2 ml of Taq  Polymerase. Primers for hMPV genotype A were: outer sense 42-CAAAGCAAGAGTGAAAAATCGTGTG9, inner sense 44-AAGCAAGAGTGAAAAATCGTGTGGC, outer antisense  681-TGTTTTTACAATGTTGGTGTGCTAT, and inner antisense 527-GTGGTGGCTCTGGGGACCGCCTTCG. Primers  for hMPV genotype B were: outer/inner sense 104-TTATTG-GACTAACAGCGTTAAGCAT,  outer  antisense 805-ATGGTGTCTATGTTTTTCTGGGTCT, inner antisense 736-TTTTTGTTAACTACTTGGATGGGAT. Two microliters of cDNA was used as template in the first reaction, and 2 ml of the first round product was used as template in the second reaction. The two-round nested PCR was performed using these cycling conditions: pre-denaturing at 94uC for 180 s; 30 cycles of amplification at 94uC for 30 s; 50uC for 30 s; and 72uC for 60 s, with a terminal elongation step at 72uC for 10 min. The sequence data obtained for the hMPV G gene was deposited to GenBank under accession numbers JN166985 to JN167044 (Table S1).

hMPV sequence collection and phylogenetic analysis
A total of 234 partial G gene cDNA sequences (www.ncbi.nlm. nih.gov), derived from samples with known background information including collection dates (between 1981 and 2009) and collection places, were obtained from GenBank. Those sequences plus the 60 sequences obtained in this study were subjected to phylogenetic analysis. These 294 sequences represent hMPV strains from 12 countries and cover most encoding sequences of the G protein's ectodomain. To avoid possible sampling bias, all the sequences used in this study were obtained from simultaneous screening of all four hMPV subtypes. The background information of the G gene sequences enrolled in this study is provided in Table  S1.
Estimates of genetic distance were used to evaluate genetic divergence between hMPV subtypes [26]. The overall mean distances of the hMPV subtypes were estimated using MEGA (version 4.0) [27]. A kimura-2 parameter model, gammadistributed among-site rate variation with four rate categories (c4), and 1000 bootstraps were applied for distance estimation.
Phylogenetic reconstructions of the hMPV subtypes were performed by maximum-likelihood (ML) and neighbor-joining (NJ) methods using PAUP*(Phylogenetic Analysis Using Parsimony* and other methods, version 4b10 beta) [28], and by the Bayesian Markov Chain Monte Carlo (MCMC) method using BEAST (Bayesian evolutionary analysis by sampling trees version 1.5.4) [29]. The best fit nucleotide substitution model (s) for ML and the Bayesian MCMC analysis were selected using Modeltest (version 3.7) [30].

Bayesian MCMC evolutionary analysis
Evolutionary parameters, including molecular clock phylogenies, demographic histories, evolution rates, and divergence times, were jointly estimated from heterochronous G genes of hMPV subtypes and hMPV_A2 sublineages using the Bayesian MCMC method implemented in BEAST (version 1.5.4). Bayesian MCMC analysis was performed using the uncorrelated lognormaldistributed model (UCLD). Analysis was performed using the best fit nucleotide substitution model general time reversible (GTR) with a gamma-distributed among-site rate variation with four rate categories (c4) as estimated by Modeltest. To determine the degree to which dating estimates were affected by the demographic tree model, chosen Bayesian MCMC analyses were repeated using the constant size and exponential growth tree models. Each Bayesian MCMC analysis was run for 200 million states and sampled every 5,000 states. Posterior probabilities were calculated using Tracer (version 1.5) with a burn-in of 2 million states. Bayesian skyline plots for hMPV_A2a and A2b were estimated to depict the relative viral genetic diversity (g) over time.

Natural selection and adaptation analysis
Selection pressures on the hMPV G genes were estimated from the ratios of nonsynonymous to synonymous substitution (dN/dS) using the codon-based phylogenetic method in CODEML (distributed in PAML, version 4) [31]. The M2a codon substitution model of CODEML can estimate three classes of sites: purifying (0#v,1), neutral (v = 1), and positive selection (v.1) sites. Posterior probabilities of the inferred positive selection sites were calculated using the Bayes empirical Bayes (BEB) approach which accounts for sampling errors. Only the sites with a posterior probability of v.95% were considered as positive selection sites. Because most viral amino acids are encoded by more than one codon, a larger value of dN/dS does not always mean that the chance of an amino acid switch to any given position is higher. To infer the positive selection sites at the amino acid level, the deduced amino acid sequence entropy was determined using BioEdit (version 7.0.9) [32].
To chronologically trace the evolution of nonsynonymous changes throughout the evolutionary history of hMPV_A2, the nonsynonymous changes from the heterochronous virus sequences were reconstructed using a joint maximum likelihood method using HyPhy software version 0.99 [33]. The occurrence time of each nonsynonymous substitution in hMPV_A2 was estimated using the maximum clade credibility (MCC) tree generated from the Bayesian MCMC molecular clock analysis [29].

Phylogeography analysis
A Bayesian stochastic search variable selection (BSSVS) approach, included in Beast 1.5.4, that allows the exchange rates  in the CTMC to be zero with some prior probability, was used to find a minimal set of rates demonstrating the diffusions in the phylogeny [34]. To identify the rates contributing to the migration path, a Bayes factor test was performed by comparing the posterior to prior odds that individual rates are zero. BF.3 was considered significant. A maximum clade credibility tree was applied to summarize the obtained trees using TreeAnnotator which is included in BEAST 1.5.4. To visualize the diffusion rates over time, the locationannotated MCC tree was converted to a keyhole markup language (KML), which is suitable for viewing with GoogleEarth (http:// earth.google.com).

Global epidemics of hMPV subtypes
Epidemiological reports on hMPV subtypes during 1999 and 2009 were retrieved from PubMed (www.ncbi.nlm.nih.gov/ pubmed). Only the reports in which all hMPV subtypes were screened simultaneously were enrolled to avoid sampling bias (see Table S2 for details). Ten-year epidemic curves of hMPV subtypes and hMPV_A2 sublineages are summarized in Figure 1. The epidemic peak of hMPV_A1 occurred in 2002, whereas hMPV_B1 and _B2 peaked similarly in 2004. As previously reported, the prevalence of all hMPV subtypes except hMPV_A2 fluctuated every two or three years; the prevalence of hMPV_A2 was atypical as it was long-lasting between 2002 and 2009 with a drop in 2007. This atypical prevalence of hMPV_A2 can be attributed mainly to hMPV_A2b, although hMPV_A2a spiked in 2006 ( Figure 1).

Phylogenetic relationships of hMPV subtypes
To unveil the factors that were responsible for the atypical prevalence pattern of hMPV_A2, we first analyzed the phylogenetic relationship of each selected hMPV G sequence to confirm its subtypes. A total of 294 partial G genes corresponding to nucleotide 6431-6760 qualified (GQ153651); all 294 were derived from samples collected between 1981 and 2009 and were subjected to Maximum-likelihood (ML) phylogenetic reconstructions. The data shows well-supported groups of all subtypes and hMPV_A2 sublineages (Figure 2). Phylogeographic analysis revealed a cosmopolitan distribution for the hMPV subtypes and hMPV_A2 sublineages, and strains from a certain geographical location gathered within the same cluster. Co-circulation of multiple subtypes and/or sublineages was observed in some countries, including China, the Netherlands, South Africa, Canada, USA and Australia. To verify these findings using the ML method, the data were analyzed independently using the Neighbor Joining (NJ) and Parsimony methods. Similar tree topologies to that by ML method were obtained (data not shown). The phylogenetic relationship of the hMPV G gene sequences confirms the subtype division of each selected sequence and provides a basis for further analysis of evolutionary dynamics.

Evolutionary characteristics of hMPV subtypes
To determine why the atypical pattern of hMPV_A2 prevalence occurred, we characterized the evolutionary behavior of the hMPV subtypes. For this purpose, a Bayesian relaxed molecular clock method was used to estimate the evolutionary characteristics of hMPV subtypes. The method assumes no a priori correlation between the evolution rate of a lineage and that of its ancestor. The evolutionary rates of the G gene were estimated to be 4.42, 7.81, 8.54, and 4.98610 23 substitutions/site/year for hMPV_A1, A2, B1 and B2, respectively ( Table 1). The evolutionary rates of hMPV_A2 and _B1 were almost two-fold those of hMPV_A1 and _B2. Estimation of the genetic distance of each hMPV subtype revealed that hMPV_A2 had the highest overall mean distance of 0.112 (Table 1). These evolutionary characteristics of the hMPV subtypes suggest that hMPV_A2 evolved toward heterogeneity faster than the other subtypes.

Dynamics of population growth
To infer the demographic history of hMPV, Bayesian skyline plots were used to reconstruct the population histories of hMPV subtypes and hMPV_A2 sublineages from our analysis of the genetic diversity in the G genes over time. hMPV_A1 showed a sharp, but transient, increase in relative genetic diversity (g) in    ing that the relative genetic diversity of G genes generally reflects the population dynamics of the hMPV subtypes. The analysis of relative genetic diversity level of hMPV subtypes showed that hMPV_A2 was the highest after 2002 with a value higher than 10, while those of other subtypes were lower than 10. Meanwhile, hMPV_A2 exhibited a rising trend while other subtypes remained steady after the only increase in relative genetic diversity. These findings are correlated with the observation that hMPV_A2 has been the predominant subtype since 2002.
The first increase of the hMPV_A2 prevalence occurred through an increase in the relative genetic diversity of hMPV_A2a around 1997 ( Figure 3B); the last two increases of the hMPV_A2 prevalence can be attributed to an increase in the relative genetic diversity of hMPV_A2b. Although the relative genetic diversity of hMPV_A2b has continued to rise since 2001, with a value higher than 10, the relative genetic diversity of hMPV_A2a has remained steady since 2001 with a value lower than 10. These findings further support the fact that hMPV_A2b has been the predominant sublineage of hMPV_A2.

Natural selection and adaptation of G protein
To investigate the extent of selection pressure on the G gene of the hMPV subtypes and hMPV_A2 sublineages, the mean dN/dS values among branches of the hMPV subtypes were estimated using the M2a codon substitution model in the PAML version 4 [31]. Maximum-likelihood analysis showed that the evolution of hMPV was driven mainly by the signal of purifying selection with estimated values of mean dN/dS ranging from 0.49 to 0.94 for all hMPV subtypes except hMPV_A1. For hMPV A1, the mean dN/ dS was 1.25, indicating a signal of positive selection (Table 2). Although hMPV_A2 has atypically prevalent for an extended time period, the mean dN/dS value of hMPV_A2 was the second lowest among hMPV subtypes. The stronger negative selection signal of hMPV_A2 is not surprising, as selection pressure on individual sites may be the key determinant in the survival of hMPV_A2.
Because individual site mutations may play a role in the atypical prevalence of hMPV_A2, the positive selection sites of the hMPV subtypes were estimated using a similar codon substitution model ( Table 2). We identified 14, 16, 9, and 11 positive selection sites for hMPV_A1, A2, B1, and B2, respectively. Similar values of dN/dS for positive selection sites within subtypes were observed. Moreover, sites of hMPV_A1 had the highest dN/dS values and sites of hMPV_A2 had the lowest values among hMPV subtypes. It seems that the largest number of positive selection sites in hMPV_A2 contribute to the atypical prevalence. An additional entropy analysis on amino acids was also taken into account to gain more supports [32,35]. Entropy analysis [32] identified 1, 3, 0, and 1 positive selection amino acids for hMPV_A1, A2, B1, and B2, respectively, when the threshold of entropy was set to 1 [35]. Our findings show that because the G sequence of hMPV_A2 had more positive selection sites than those of the other subtypes, antigenic drift resulted in the increased hMPV_2 epidemic [36,37]. Moreover, the positive selection sites of hMPV_A2a and A2b were inferred independently at the codon and amino acid levels. Although hMPV_A2a had a higher mean dN/dS value at the codon level than hMPV_A2b, hMPV_A2b had more sites with higher entropy at the amino acid level than hMPV_A2a (Table 3). hMPV_A2b had 5 positive selection sites at the amino acid level, whereas hMPV_A2a had 1, suggesting increased severity of the hMPV_A2b epidemic [37].
To trace the positive selection sites in the G gene of hMPV subtypes chronologically, ancestral sequence reconstructions The mutation hotspots and the diverse mutation types identified for hMPV_A2b suggest that it has been under high selection pressure, consistent with the atypically long-lasting epidemic of hMPV_A2. Mutation hotspots in the temporal evolution routes of other hMPV subtypes were similarly assessed (see Figure S1). Only two multi-time point changes in hMPV_A1 and a successive change in hMPV_B1 were observed. For hMPV_B2, two successive changes and two multitime point changes were found.

Phylogeography analysis
To determine the mechanism behind the rise of hMPV_A2b in recent years, the spatial diffusion of the time-scaled genealogy of hMPV_A2b was traced using a recently developed BSSVS approach ( Figure 5). By June 2002, hMPV_A2b had emerged in the Netherlands and spread to Canada, Japan, and Beijing, China. By June 2006, the spread routes from the Netherlands and Japan had joined in India, and another route from Canada to USA had emerged. By June 2008, hMPV_A2b had spread from USA to Beijing, China and joined the route from the Netherlands. Coincidently, hMPV_A2b sequences retrieved from GenBank boosted from India in 2006 and Beijing, China in 2008 ( Figure 6, Table S3). The consistency between the phylogeographic data and the atypical epidemic of hMPV_A2 show that viral transmission plays an important role in the growth of the virus-infected population, which further explains the atypical epidemic of hMPV_A2b in recent years.

Discussion
Our analysis of the global prevalence of hMPV subtypes shows a long-lasting epidemic of hMPV_A2 that is atypical in comparison with those of the other hMPV subtypes. To study the mechanism behind this hMPV_A2 prevalence, we assessed phylodynamics, natural selection and adaptation, and phylogeography of the hMPV subtypes and the hMPV_A2 sublineages. Our evolutionary dynamics data are consistent with the atypical epidemic pattern of hMPV_A2 in recent years, providing a molecular evolutionary explanation for the atypical hMPV_A2 epidemic.
The substitution rates of hMPV G genes in this study were estimated to a range of 4.42-8.54610 23 substitutions/site/year, which is similar to those reported by Padhi et al [10], but higher than those of de Graaf et al [38]. We found that hMPV_A2 had the second highest substitution rate after hMPV_B1. However, genetic distance estimation revealed that the overall mean distance of hMPV_A2 is at least two-fold higher than that of all other subtypes. Because hMPV_A2 has the largest overall mean distance and the second-highest substitution rate in G genes, it may have evolved fastest toward heterogeneity, explaining its time serial trend of the effective population size as well as its atypical epidemic.
The Bayesian skyline plots show a general upward trend in the relative genetic diversity of hMPV_A2, which is higher than that of the other hMPV subtypes. This is consistent with its atypical prevalence pattern, as relative genetic diversity level is one determinant of viral survival [39,40]. Other hMPV subtypes show a steady level of relative genetic diversity after 2005, consistent with their lower prevalence during the same time period. Moreover, most fluctuations in relative genetic diversity presented here largely agree with the global prevalence of all hMPV subtypes, except hMPV_B2, demonstrating that relative genetic diversity can reveal viral features of epidemic dynamics [39]. Insufficient screening of hMPV from clinical specimens might account for the inconsistency between epidemic dynamics and the increased prevalence of hMPV_B2 between 1995 and 1997, as hMPV was not recognized as a human pathogen until 2001. Overall, our data suggest that the pattern of relative genetic diversity matches the corresponding epidemics of hMPV, indicating that the relative dynamics of genetic diversity can be used to explain the patterns of past epidemics of hMPV.
One indication of purifying selection could be seen in mean dN/dS values under 1, which were observed for each hMPV subtype, except hMPV_A1. hMPV_A1 had a mean dN/dS value of 1.25. This strong signal of positive selection in hMPV _A1 seems to reflect an intense fluctuation in its history of genetic diversity. Although hMPV_A2 had the second lowest mean dN/ dS value, it had the largest number of positive selection sites. These positive selection sites contained various mutation hotspots in the temporal evolution routes of hMPV_A2b. Fewer positive selection sites were observed in hMPV_A2a than in hMPV_A2b, and fewer mutation hotspots were found in the temporal evolution routes of hMPV_A1, B1, and B2 than in hMPV_A2. These findings show that hMPV_A2 has undergone higher selection pressure in the temporal evolution routes than the other subtypes, consistent with the atypical epidemic of hMPV_A2. Because the epidemic and evolutionary dynamics of hMPV_A2b are similar to that of hMPV_A2, hMPV_A2b may account for the atypical epidemic of hMPV_A2.
We also used phylogeography analysis to investigate the epidemiological history and the diffusion routes of the epidemic hMPV virus. Our Bayesian phylogeography analysis revealed two fusions in the diffusion routes of hMPV_A2b. Because sequences from identical isolates were excluded, our findings suggest that the viral transmission from different regions enhanced the viral capacity against host immunity and thereby intensified the hMPV_A2b epidemic. A possible reason is that host immunity cannot effectively fight against the novel strain of antigenic divergent hMPV_A2b, causing a higher viral prevalence when the novel hMPV_A2b strain was introduced to a specific population. These findings confirm that epidemic history, along with spatialtemporal dynamics of gene flow and phylogeography analysis need to be considered in the planning of prevention strategies, as has previously been suggested [41,42] In summary, the evolutionary dynamics of hMPV subtypes determined in this study provide insight into the epidemic and variation of hMPV in recent years. These data can contribute to the diagnosis, surveillance, and control of hMPV as well as for future vaccine development.