Genome Sequencing and Phylogenetic Analysis of 39 Human Parainfluenza Virus Type 1 Strains Isolated from 1997–2010

Thirty-nine human parainfluenza type 1 (HPIV-1) genomes were sequenced from samples collected in Milwaukee, Wisconsin from 1997–2010. Following sequencing, phylogenetic analyses of these sequences plus any publicly available HPIV-1 sequences (from GenBank) were performed. Phylogenetic analysis of the whole genomes, as well as individual genes, revealed that the current HPIV-1 viruses group into three different clades. Previous evolutionary studies of HPIV-1 in Milwaukee revealed that there were two genotypes of HPIV-1 co-circulating in 1991 (previously described as HPIV-1 genotypes C and D). The current study reveals that there are still two different HPIV-1 viruses co-circulating in Milwaukee; however, both groups of HPIV-1 viruses are derived from genotype C indicating that genotype D may no longer be in circulation in Milwaukee. Analyses of genetic diversity indicate that while most of the genome is under purifying selection some regions of the genome are more tolerant of mutation. In the 40 HPIV-1 genomes sequenced in this study, the nucleotide sequence of the L gene is the most conserved while the sequence of the P gene is the most variable. Over the entire protein coding region of the genome, 81 variable amino acid residues were observed and as with nucleotide diversity, the P protein seemed to be the most tolerant of mutation (and contains the greatest proportion of non-synonymous to synonymous substitutions) while the M protein appears to be the least tolerant of amino acid substitution.


Introduction
Human parainfluenza virus type 1 (HPIV-1) causes upper and lower respiratory infections (URI, LRI) in infants, young children, and the elderly. The virus also causes more severe disease in immunocompromised patients and those with chronic medical conditions [1][2][3]. Children under 5 years old and particularly those children younger than 1 year are the most commonly hospitalized by HPIV-1 infection [4]. In the United States, the rate of HPIV-1-associated hospitalization among children younger than 5 years is estimated at 0.32-1.6 per 1000 children [1]. Estimates of annual hospitalizations in the U.S. due to HPIV-1 vary from 5,800 to 28,900 [1,5].
HPIV-1 is one of four distinct HPIV serotypes all of which are enveloped, negative-sense, ssRNA viruses belonging to the Paramyxoviridae family. The function and structure of the viral genes has been explored using mutant HPIV-1 viruses generated with a reverse genetics system [6,7] or cell culture [8][9][10]. Evolutionary studies of HPIV-1 were performed in the 1990s using the hemagglutinin-neuraminidase (HN) and fusion (F) genes [11][12][13][14]. However, these studies are now dated and no evolutionary studies based on the complete genome of HPIV-1 have ever been conducted. Not only is there very little evolutionary data on HPIV-1whole genomes, but the publicly available sequence data for HPIV-1 is also extremely limited. To date there is only one complete genome sequence and 329 partial sequences. Many of these sequences are from mutant viruses created for the development of vaccines and therapeutic agents and therefore not useful for evolutionary analyses. Of the wild-type HPIV-1 sequences, most are partial sequences of the HN gene. Limited genomic data and poorly understood/studied HPIV-1 evolution has also been an impediment to the development of molecular diagnostics for HPIV-1. To address these gaps in knowledge and data, we developed a whole genome sequencing protocol for HPIV-1 from clinical samples and low passage virus isolates and report the genome sequences for 40 HPIV-1 strains. Thirty-nine of these strains were collected in Milwaukee, Wisconsin during 1997Wisconsin during -1999Wisconsin during , 2005Wisconsin during -2007Wisconsin during and 2009Wisconsin during -2010 and one was obtained from the American Type Culture Collection. Using these new data, we characterized the evolutionary pattern of HPIV-1 over the past 14 years in Milwaukee, WI and combined our data with previously published HN gene sequences from viruses collected from the USA and Japan over a 44-year period (1966-2010) to further examine the evolutionary pattern of HPIV-1.

Ethics Statement
All clinical samples were collected under protocols allowing sequencing approved by the Medical College of Wisconsin and Children's Hospital of Wisconsin institutional review boards. Some samples were approved to be collected retrospectively, deidentified, and did not require consent. The remaining samples were collected with written informed consent (from parent/ guardian).

Clinical Specimen and Isolates
Nasal swab specimens were collected from Children's Hospital of Wisconsin and its affiliated clinics. After collection, nasal swabs from each patient were immediately put into 3 mL of M4 viral transport medium (Remel, Lenexa, KS) and kept at 2-8uC. On the day of collection, 500 mL aliquots of each specimen were prepared and one aliquot from each specimen was tested in a onestep real-time RT-PCR assay for HPIV-1, -2, -3, -4 and human metapneumovirus (HMPV) that was developed in the clinical laboratory of the Midwest Respiratory Virus Program (MRVP). The remaining aliquots of each clinical specimen were frozen at 280uC for later use. Twenty-six HPIV-1 positive clinical specimens were collected during the 2009-2010 HPIV-1 season and used for this sequencing project. An additional 10 specimens collected during 1997-1999, one specimen collected in 2005, and two specimens collected in 2007 (using the same standard collection method), were retrieved from the MRVP virus bank. These 13 specimens tested positive for HPIV-1, at the time of collection, using the Hexaplex, a multiplex RT-PCR enzyme hybridization assay [15]. The C-35 strain, isolated in Washington D.C. in 1957, was obtained from American Type Culture Collection (ATCC, Rockville, MD). This strain was passaged 20 times at ATCC and one time in our lab. The passage history before the strain was sent to ATCC is not clear.
Because clinical specimens are a limited resource, all specimens, except for the 2005 and 2007 samples were inoculated onto Rhesus monkey kidney cells (LLC-MK2, ATCC, Rockville, MD) to propagate the viruses for future experiments. In brief, LLC-MK2 cells were cultured in 6-well plates at 37uC and 5% CO 2 with Eagle's minimal essential medium (EMEM, Lonza, Walkersville, MD) containing 10% fetal bovine serum (HyClone, Logan, UT) and 1% Penicillin/Streptomycin (pen/strep) (Gibco, Grand Island, NY) until 90% confluence was achieved. Cells were washed twice with EMEM and then 100 mL of clinical specimen was combined with 100 mL of EMEM and added to each well. The plates were incubated at 37uC for 1 hour and gently rocked every 15 minutes to avoid drying of cells. Following incubation, 3 mLs of virus growth media [EMEM containing: 0.09% bovine serum albumin (BSA) (MP Biomedicals, Solon, OH), 2.4 mg/mL trypsin (Worthington, Lakewood, NJ), 1% pen/strep (Gibco) and 1% fungizone (Gibco)], was added into each well and the cells were returned to the incubator. Fifty mL of culture supernatant was used for a hemagglutination assay with guinea pig red blood cells (Colorado Serum Company, Denver, CO) on days 2, 4, 6, 8 and 10 post-inoculation. HPIV-1 viruses were harvested when the hemagglutination assay was positive. The supernatant from hemagglutination negative cultures was harvested at day 10 and blind passaged to new 6-well plates LLC-MK2 cells.
Of the 39 HPIV-1 viruses collected in Milwaukee, 10 were sequenced directly from clinical nasal swab specimens, 16 from low-passage (1-2 passages) clinical isolates and 14 were sequenced initially using clinical specimens, but gaps were filled in using lowpassage isolates obtained from that specimen. Table 1 shows the pertinent information for each HPIV-1 strain including: sample type, the amount of sequence data available, and GenBank Accession number.

Primers for Amplification and Sequencing
The HPIV-1 genome was amplified by RT-PCR as ten overlapping fragments. Amplification primers were designed using an on-line primer design tool [IDT PrimerQuest (www.IDTDNA. com, Integrated DNA Technologies, Inc., Coralville, IA)] based on sequences retrieved from GenBank (primarily based on Accession # AF457102). Amplification primers were designed to, have melting temperatures of 55-60uC, have GC contents of around 50%, and amplify about 2 Kb of the HPIV-1 genome. The amplification primers were designed to cover only nucleotides (nt) 1-15575 of the 15600 nt genome as the low GC content at the extreme 59 terminus makes primer design and amplification difficult.
The first and second fragments amplify similar regions of the genome and use the same reverse primer. The first fragment uses a forward primer that binds to the first nt of the genome; however this primer has a low GC content (37%) and only worked for 50% of the strains. The forward primer of the second fragment binds to the genome starting at nt 77. Strains that failed to produce an RT-PCR product for the first fragment were sequenced beginning slightly downstream of the 39 terminus at the start of the second fragment.
The RT-PCR amplified fragments were sequenced with 44 primers. The amplification primers were also used as sequencing primers. All primers were synthesized by Integrated DNA Technologies (IDT) and are listed in Table 2. Those primers shown in bold print were used for both amplification and sequencing while those shown in plain print were used only for sequencing.

Nucleotide Extraction, Amplification, Purification and Electrophoresis
Viral RNA was extracted from 400 mL of clinical specimen or tissue culture supernatant using the NucliSENS easyMAG (bioMérieux, Durham, NC) according to the manufacturer's protocol and eluted in 50 mL of elution buffer.
cDNA was synthesized in a 20 mL reaction mixture containing 2.5 mM random hexamers (Applied Biosystems, Carlsbad, CA), 4 mM dNTPs (Applied Biosystems), 4 mM MgCl 2 (Applied Biosystems,), 1 U/mL RNase inhibitor (Applied Biosystems), 2.5 U/mL MuLv reverse transcriptase (Applied Biosystems) and 3 mL RNA. The reaction mixture was incubated for 5 minutes at 25uC and then for 14 minutes at 42uC followed by 1 minute at 95uC. Ten mL of cDNA was amplified by adding 40 mL of PCR mix containing a pair of PCR primers, 2.25 mM MgCl 2 , 2.5 U of Faststart hot start polymerase (Applied Biosystems) and the appropriate buffers to support PCR. The final concentration of each primer was 200 nM. Amplification was performed with a touch-down PCR program. After an initial denaturing at 94uC for 5 minutes, the PCR was done for 5 touch-down cycles with denaturing at 95uC for 45 seconds, annealing at 66uC for 5 minutes, and extension at 72uC for 90 seconds. The annealing temperature was decreased by 2uC after each cycle. This was followed by another 2 touch-down cycles with denaturing at 95uC for 45 seconds, annealing at 56uC for 2 minutes, and extension at   72uC for 90 seconds. Again, the annealing temperature was decreased by 2uC after each cycle. Finally, 30 standard PCR cycles were performed with denaturing at 95uC for 45 seconds, annealing at 52uC for 2 minutes and extension at 72uC for 90 seconds. The PCR was finished with a final extension at 72uC for 10 minutes. The PCR products were purified with a Millipore MultiScreen PCR m96 filter plate (Millipore, Billerica, MA) on the Biomek FX workstation (Beckman Coulter, Brea, CA) according to the Millipore PCR purification protocol. The purified PCR products were electrophoresed on an Agilent 2100 Bioanalyzer with a DNA 7500 kit and protocol (Agilent technologies, Inc., Santa Clara, CA) to determine the specificity and concentration of the amplified product.

BigDye Sequencing Reaction and Analysis
The purified PCR products were diluted to ,1 ng/mL based on the concentrations determined by the Agilent Bioanalyzer and used as the templates for sequencing reactions. The PCR products were sequenced with a BigDye terminator V.3.1 cycle sequencing kit (Applied Biosystems). Three mL of diluted PCR product was sequenced in a 10 mL reaction containing 0.5 mL BigDye terminator V.3.1 ready reaction mix, 1.35 mL of 56 sequencing buffer, 2 mL of 1.6 mM sequencing primer and 3.15 mL H 2 O. The sequencing reactions were cleaned up with a Millipore Montage SEQ 384 sequencing reaction cleanup kit (Millipore) on the Biomek FX workstation following Millipore's protocol (Beckman Coulter). Finally the sequencing reactions were analyzed with the ABI 3730xl DNA analyzer (Applied Biosystems). Subsequent sequence data was assembled and proofread using the DNASTAR Lasergene 8 software (DNASTAR, Madison, WI).

Phylogenetic Analysis
The HPIV-1 genomes obtained from this study were analyzed using both the whole genome and also the individual genes. Among the 40 genome sequences obtained in the current study,  HN gene). These sequences were excluded from phylogenetic analyses in which they did not have the complete genetic sequence (except in the case of the whole genome analysis). In addition, publicly available sequences from GenBank were also added to each analysis. Only those GenBank sequences obtained from wild-type HPIV-1 viruses that were greater than 100 bp in length were used for phylogenetic analyses.
Sequence alignments for the ORFs of individual genes were constructed using the Se-Al program (Rambaut et al., 1996). Because the collection year is known for most of the HPIV-1 sequences (for both the current strains and those from GenBank) and regression analysis using the program Path-O-Gen (Version 1.3) showed that there is clock-like evolution (data not shown), we used the Bayesian Markov Chain Monte Carlo (MCMC) method with a strict molecular clock in the BEAST program (Version 1.6.1) [16] to infer the time-scaled evolutionary relationships for HPIV-1. One of the 222 HN gene sequences in GenBank was missing collection date information so only 221 additional HN gene sequences were used for this analysis. The analysis incorporated the GTR model of nt substitution, with a different substitution rate estimated for each codon position. The chain length was set to 5 million. Trees were sampled every 1000 generations. The Maximum Clade Credibility (MCC) tree generated by BEAST was analyzed using TreeAnnotator (v 1.6.1, available at http://beast.bio.ed.ac.uk) with 10% burn-in. Statistical uncertainly is reflected in values of the 95% Highest Probability Density (HPD). The trees were viewed in FigTree (1.3.1, available at http://beast.bio.ed.ac.uk). Each clade is defined by long branches and nodes supported by high Bayesian posterior probability (BPP) values (.0.9). Similar results were obtained using a relaxed clock model (SRD06 model, Bayesian Skyline prior, run 100 million generations) (Data not shown).
In addition to the analyses done with the BEAST program, the phylogenies for each gene were also inferred using the maximum likelihood method in PAUP [17] with high bootstrap (.70%). For all of the genes other than the HN gene, only those sequences covering the entire coding region of the gene were used. For the HN gene, the phylogeny was inferred using nt positions 7245-8479 (nts 343-1577 of the 1728 nt HN gene) because this was the sequence range in which those HN gene sequences from our study and GenBank (used in the MCMC analysis) were primarily complete. The best-fit model of nucleotide substitution was tested and selected by MODELTEST [18] as the general reversible model (GTR+I+G) with the frequency of each substitution, proportion of invariant sites (I) and the gamma distribution of among-site rate variation with four rate categories. The bootstrap resampling process (1,000 replications) was performed using the neighbor-joining (NJ) method to assess the robustness of each node on the tree.

Amino Acid Substitution among Clades
The parsimony-based MacClade program [19] was used to determine amino acid changes in the proteins translated from the N, P, M, F, HN, and L genes.

Analysis of Nucleotide Diversity
Analyses of nucleotide diversity were performed using DNAsp version 5.10.01 [20]. The initial analysis was done using all of the genetic data obtained from this study plus the one HPIV-1 complete genome sequence that is currently available in GenBank (AF457102). This analysis showed that the HPIV-1 genome has a nucleotide diversity (p = the average number of nucleotide differences per site between two sequences) of 0.019 across the entire genome. However   Figure 1 containing the 36 HPIV-1 sequences from the present study. Colored rectangles labeled genotypes C and D represent the HN gene sequences from HPIV-1 viruses collected in Milwaukee, WI in 1991. The number following the underscore in each name represents the collection date in years since collection date of the oldest HPIV-1 strain. Branch color corresponds to clade name/HN sequence seen in Figure 3. doi:10.1371/journal.pone.0046048.g002 Table 3. Clade-Scale Amino Acid Substitutions from the N, P, M, F and L Proteins of 40 Recently Sequenced HPIV-1 Complete Genomes (the HN protein is described in greater detail in Table 4 and is not included here). fewer, but more complete, sequences actually allowed us to perform a more accurate analysis of the genetic diversity of the HPIV-1 genome and performed all nucleotide diversity analyses with only the 35 complete genome sequences. Beside analysis of the complete genomes, nucleotide diversity was also determined for each non-coding region, gene, and intergenic region. Genetic diversity was also calculated using a sliding scale analysis across the whole genome. The sliding scale analysis calculates the nucleotide diversity across a 100 bp window rather than across an entire section of the genome (e.g. an entire gene). The window was moved in 25 nt increments across the entire HPIV-1 genome and the nucleotide diversity was recalculated for each window. The population's mutation rate was estimated as Theta/site (from S) (the Watterson Estimator). Finally, the ratio of non-synonymous (Ka) to synonymous (Ks) mutations was also calculated in DNAsp using only the coding portions of the genome.

HPIV-1 Genome Sequencing Protocol
We have developed an RT-PCR protocol capable of amplifying nucleotides (nts) 1-15575 of the 15600 nt genome of HPIV-1 in ten overlapping fragments. Following amplification it is possible to obtain an accurate sequence of the entire HPIV-1 genome minus approximately 30-100 nts at either end. Forty HPIV-1 strains including clinical specimens and low passage virus isolates were sequenced using the sequencing protocol developed in this study and complete genomes were obtained for 35 of 40 HPIV-1 strains ( Table 1). All 40 HPIV-1 sequences have been submitted to GenBank (accession numbers: JQ901971-JQ902010). The genome of HPIV-1 consists of the N-P(C/Y)-M-F-HN-L genes separated by intergenic sequences and flanked by a 59 and 39 noncoding region. The HPIV-1 intergenic sequence is CTT between the N-P, P-M, F-HN, and HN-L genes and is CGT between M-F genes. These three intergenic nts were highly conserved with no changes observed in sequences from either GenBank or this study. The terminators at the end of the HPIV-1 mRNAs are also highly conserved and similar among different genes. The sequence motif of the terminators is WADTAAGAAAAA. The third nt varies among A/T/G while the first position is consistently an A except in the L gene terminator where it is changed to a T. Also, a six nt addition was found from nt 4896-4901 of HPIV-1/WI/629-D01575/2009. , which is within the range typically observed for negative-sense RNA viruses [28].
A second MCMC tree was made using 36 of the 40 newly identified HN gene sequences (four were excluded because they were incomplete) plus 221 HN gene sequences deposited in GenBank from previous studies [7,[12][13][14]29] (Figure 2). The phylogeny was structured both by time and by location of isolation. Clades 1 and 3 identified during the whole genome phylogeny were also evident on the HN phylogeny. However, the viruses belonging to clade 2 (as defined by whole genome phylogeny in Figure 1) were interspersed with viruses from Japan on the HN gene tree. The TMRCA for the HN gene is estimated to be 58.3-66.2 (95% HPD), which is similar to the estimate for the complete genomes from Milwaukee 1997-2010. The rate of evolution for the HN gene is 1.37610 23 nt substitutions per site per year (95% HPD: 1.16610 23 nt/site-year to 1.59610 23 nt/ site-year, similar to the rest of the genome. In an effort to confirm the results of the BEAST analysis we also used the maximum likelihood method available in PAUP (version 4.0) [17]. The phylogenies created using the 40 newly identified HPIV-1 whole genomes and the N, P, M, F, HN and L gene sequences ( Figures S1, S2, S3, S4, S5, S6, S7, S8) all look very similar to the MCMC tree created with complete HPIV-1 genomes in BEAST (Figure 1). The viruses are grouped in the same clade regardless of which gene is being analyzed. Figures S1,  S2, S3, S4, S5, S6, S7, S8).  Table 3 (complete genome sequences from this study) and shown graphically in Figure 3 (HN gene sequences from this study and those found in GenBank).

Analysis of Sequence Variability
The nucleotide diversity (p) is 0.018 across the entire genome (using the 35 complete genomes obtained in this study and the one complete genome previously available in GenBank). Nucleotide diversity was also calculated for important genetic elements of HPIV-1 including: 59 and 39 non-coding regions (NCRs), intergenic sequences, gene terminators, and open reading frames (ORFs). All of the non-coding regions were well conserved with the exception of the 59 NCR of the F gene (p = 0.050) and 39 NCR of the HN gene (p = 0.050), which were the most diverse regions of the genome. A 6 nt addition was found in the 59 NCR of the F gene of HPIV-1/WI/629-D01575/2009. Among the six ORFs of the HPIV-1 genome, the P gene is the most variable (p = 0.023) and the L gene is the most conserved (p = 0.014) ( Table 5).
The complete results of the sliding scale analysis of nucleotide diversity can be seen in Figure 4. The 100 bp regions containing the 59 NCR plus the first 5 nt of the F gene (nt 4983-5082) and the 39 NCR of the HN gene (nt 8609-8709) showed the greatest nucleotide diversity (p = 0.066, p = 0.058) across the entire genome. The most diverse region of each open reading frame is Table 4. Amino Acid substitutions of the Seven Regions of HN Gene (calculated using all available sequences from GenBank).

Region Name
Location of region

Analyses of Selective Pressure
The Ka/Ks ratio for the coding region of each gene ranges from 0.034-0.233 (values less than one are indicative of purifying/ negative selection). The P gene has the greatest Ka/Ks ratio of all HPIV-1 genes (0.233) and the M gene has the smallest Ka/Ks ration (0.034).

Recombination Analysis
Finally, we used the Recombination Detection Program (RDP) to identify possible recombination events among the 40 wholegenome HPIV-1 sequences. This analysis indicated that none of the strains that we sequenced exhibited any signs of recombination, which is consistent with the phylogenetic analyses performed in this study ( Figures S1, S2, S3, S4, S5, S6, S7, S8).

Discussion
By developing a novel PCR-based complete genome sequencing protocol for HPIV-1, we were able to provide the first evolutionary analysis of HPIV-1 at the complete whole genome level. Thirtynine HPIV-1 genome sequences were obtained directly from clinical nasal swab specimens and/or viral isolates from 1997-1999, 2005, 2007 and 2009-2010 collected in Milwaukee, WI (and an additional sequence was obtained from an HPIV-1 isolate from ATCC). These data were combined with HPIV-1 gene sequences publicly available in GenBank.
Previous evolutionary studies of HPIV-1, based on the HN gene sequence, indicate that evolution is guided by a combination of cocirculating strains and the development of geographically restricted lineages [12][13][14]. The sequences used for the present analysis include those used in previous analyses and a significant number of additional sequences, which increases the overall temporal and geographic distribution of HPIV-1 sequence data. Previous studies of HPIV-1 in Milwaukee, WI revealed that two distinct HPIV-1 genotypes were circulating in Milwaukee during 1991 (genotypes C and D), confirmed by this study (Figure 2 and Figure S7) [12]. The current study indicates that two distinct HPIV-1 viruses continue to circulate in Milwaukee, WI; however, both are derived from genotype C. Even though no HPIV-1 genotype D viruses were detected, it remains possible that HPIV-1 genotype D may still be present in different parts of the country or even in WI, but if it has stopped circulating it would be interesting to look at nt polymorphisms or amino acid substitutions specific to this genotype to see if they may have contributed to its extinction (e.g. HN amino acid substitutions T7I and M445I).
We inferred the phylogenies of the HPIV-1 complete genome and the HPIV-1 HN gene using Bayesian Markov Chain Monte Carlo analysis (MCMC) with current sequences and those available in GenBank. The analyses showed that the topology of the whole genome tree was similar to that seen for the HN gene and supported the same three clade topology for the 1997-2010 Milwaukee viruses. While the TMRCA estimated for the HN gene and the whole genome were similar [58.3-66.2 years (95% HPD) and 66.7-72.0 years (95% HPD), respectively], the substitution rates obtained with these two analyses differed slightly (1.37610 23 nt substitutions per site per year and 7.61610 24 nt substitutions per site per year, respectively). With everything being equal this result indicates that the HN gene evolves more rapidly than the rest of the genome. However, the HN gene analysis contained nearly six times the number of sequences from a greater temporal and geographic distribution, and the difference in substitution rates is more indicative of sequence availability rather than a true difference in evolutionary rate. With that being the case, these results suggest that the HN gene evolves in a manner similar to that of the whole genome and that the HN gene may be an appropriate model for HPIV-1 evolution when whole genome sequences are not available. In order to get a better idea whether this finding is appropriate we would need to look at additional HPIV-1 whole genomes from different geographic locations.
In addition to a broad overview of the evolutionary pattern seen in HPIV-1 we also performed more in depth analyses on the  HPIV-1 genome sequences by looking at nucleotide diversity and selection pressure. As was mentioned previously, an analysis of nucleotide diversity revealed that the 59 NCR of the F gene and the 39 NCR of the HN gene were the most diverse non-coding regions of the genome. Previous reports indicate that the 59 NCR of the F gene contains cis-acting elements that affect transcriptional termination [31] and may regulate the level of read through transcription across the M-F junction and subsequent F protein production [32]. We also looked at the diversity in the coding regions of the HPIV-1 genome using both analyses of nt substitution [nucleotide diversity (p)] and amino acid substitution [(non-synonymous/synonymous substitution rate (Ka/Ks)]. The non-synonymous/synonymous (Ka/Ks) mutation ratios of less than 1 that is seen in all HPIV-1 genes indicates that, as a whole, HPIV-1 proteins are under purifying/negative selection. However, some proteins seem much more tolerant of amino acid substitutions than others. For example, it is interesting to note that the Ka/Ks ratios for the HN and P genes are approximately 3-8 times greater than the ratios for all other HPIV-1 genes (Table 6), which may indicate that these genes are more tolerant of nonsynonymous mutations. At first blush it seems counterintuitive that a protein, such as the HN, with so many important functions would be among the most tolerant of amino acid substitution. One previous study reports that residues 184, 262, 264, 277, 279, 420, 461, and 541 play a role in hemaglutinase and neuraminidase function, while another indicates that residue 55 plays a key role in envelope fusion, and yet another describes how the conserved cysteine and proline residues around residue 461 are important to the viral structure [9,10,33]. However, as a surface protein, and one of the primary immunogens of HPIV-1, an increased tolerance of non-synonymous mutations outside of these key functional domains would likely aid the virus' ability to escape the host immune response. In support of this hypothesis, the current study shows that the residues previously identified as having a functional role showed little to no variation over the last 30 years while the remainder of the protein is more tolerant of amino acid substitution.
The P gene region encodes the P, C, C', Y1 and Y2 proteins. These viral accessory proteins aid the L protein in transcription/ translation of viral nucleic acid and also aid in suppression of the host innate immune response by acting as antagonists to the IFN and apoptosis pathways [34]. The rationale for an increased Ka/ Ks ratio in the P protein might be linked to the function of these accessory proteins that would benefit from increased diversity.
While the F gene showed less overall diversity than either the HN or P gene (in terms of nucleotide diversity and Ka/Ks ratio), portions of the gene are among the most conserved and the most diverse in the whole genome. Nts 5383-6107 are among the least tolerant of mutation, which is somewhat expected as this region contains the previously identified F protein cleavage site (the QTRFFG motif seen at F protein residues 110-115 was completely conserved among all 40 HPIV-1 sequences from this study) [11]. Meanwhile, nts 6233-6382 are among the most tolerant of mutation. As with the HN protein, the F protein is an important virus surface protein and increased variation in certain regions of the F protein may help the virus avoid the immune system.
Despite the N gene/protein being among the most conserved in HPIV-1, we did find a hypervariable region at the C-terminus of the N protein. Studies of Sendai virus found that this region is required for proper template function during RNA synthesis [35]. The N protein of Sendai virus and HPIV-1 share common structures and functional sites and as such the variable regions found in the N protein of HPIV-1 might have a similar function in RNA synthesis. The substitutions found at residues 442 have been documented previously in a mutant virus [36][37][38]. In this study, we show that these substitutions are common in all three clades of 1997-2010 Milwaukee HPIV-1 viruses and that they have remained consistent for over a decade.
Finally, we performed an analysis to see if any of these viruses have undergone recombination. Natural recombination events have been detected in several other members of the Paramyxoviridae family including Newcastle disease virus (NDV) [39][40][41][42] and human respiratory syncytial virus [43]. Attenuated vaccines are capable of influencing the evolution process of NDV through exchanging their genetic material with circulating viruses [44]. Recently, Yang et al [27] identified a naturally occurring HPIV-3 recombinant virus. In our recombination analysis of the 40 genomes reported in this study, we found no indication that any recombination events occurred. This result seems logical considering the fact that the topology of phylogenetic trees for each HPIV-1 gene group together in a nearly identical manner.
This work represents a significant step forward in filling the void of much needed HPIV-1 genetic data. Our hope is that the genome sequencing protocol we provide here will facilitate an even greater increase in HPIV-1 genetic data from diverse locales and time periods. This increase in genetic data may in turn lead to breakthroughs in the development of novel HPIV-1 therapeutics, vaccines and diagnostic assays. In the meantime, the genetic data provided by this study should lead to an increased understanding of HPIV-1 evolution, gene structure and function, viral pathogenesis, and conserved domains that can function as targets for molecular diagnostic assays.