Sequencing and analysis of globally obtained human parainfluenza viruses 1 and 3 genomes

Human Parainfluenza viruses (HPIV) type 1 and 3 are important causes of respiratory tract infections in young children globally. HPIV infections do not confer complete protective immunity so reinfections occur throughout life. Since no effective vaccine is available for the two virus subtypes, comprehensive understanding of HPIV-1 and HPIV-3 genetic and epidemic features is important for diagnosis, prevention, and treatment of HPIV-1 and HPIV-3 infections. Relatively few whole genome sequences are available for both HPIV-1 and HPIV-3 viruses, so our study sought to provide whole genome sequences from multiple countries to further the understanding of the global diversity of HPIV at a whole-genome level. We collected HPIV-1 and HPIV-3 samples and isolates from Argentina, Australia, France, Mexico, South Africa, Switzerland, and USA from the years 2003–2011 and sequenced the genomes of 40 HPIV-1 and 75 HPIV-3 viruses with Sanger and next-generation sequencing with the Ion Torrent, Illumina, and 454 platforms. Phylogenetic analysis showed that the HPIV-1 genome is evolving at an estimated rate of 4.97 × 10−4 mutations/site/year (95% highest posterior density 4.55 × 10−4 to 5.38 × 10−4) and the HPIV-3 genome is evolving at a similar rate (3.59 × 10−4 mutations/site/year, 95% highest posterior density 3.26 × 10−4 to 3.94 × 10−4). There were multiple genetically distinct lineages of both HPIV-1 and 3 circulating on a global scale. Further surveillance and whole-genome sequencing are greatly needed to better understand the spatial dynamics of these important respiratory viruses in humans.


Introduction
Human Parainfluenza viruses (HPIV) belong to the Paramyxovirus family and are important causes of respiratory tract infections in young children, elderly individuals, and the immunocompromised globally. Serologic evidence shows that most children over five years of age have been infected by HPIV [1]. There are four types of HPIV with HPIV-1 and 3 belonging to the Human respirovirus 1 and 3 species of the Respirovirus genus and HPIV-2 and 4 belonging to the Human rubulavirus 2 and 4 species of the Rubulavirus genus. HPIV-1 and 3 cause more illness than the other two types and are therefore more studied. These two types have been reported to have distinct clinical and epidemiologic features. HPIV-1 is the most common pathogen associated with croup while HPIV-3 ranks only behind respiratory syncytial viruses (RSV) as a cause of bronchiolitis and viral pneumonia among infants and young children [2][3][4]. In children less than 5 years of age it has been estimated that HPIV-1 causes 0.3-1.6 hospitalizations per 1000 children and HPIV-3 causes 0.5-2.6 hospitalizations per 1000 children in the United States [3][4][5]. Both HPIV-1 and 3 also cause serious lower respiratory tract disease and death in the immunocompromised and elderly. HPIV-1 infections have been epidemic in the fall of odd-numbered years since 1973 and HPIV-3 infections are usually widespread in the United States during late spring and summer [3,6]. HPIV infections do not confer complete protective immunity and individuals get recurrent infections throughout their life. No effective vaccine is yet available for these viruses and further studies are needed to better understand the disease burden of these viruses [2]. A better understanding of HPIV-1 and 3 genome diversity, evolution, and epidemiological features is key for the development of improved diagnostics tools, prevention strategies, and treatments for HPIV-1 and 3 infections. However, major progress in this field has been hampered in part by the limited amount of HPIV-1 and 3 whole genome sequence data available to the scientific community in public repositories. To contribute to filling this gap in knowledge, our study sought to carry out a large-scale whole genome sequencing and analysis of HPIV-1 and 3 strains collected from multiple countries and years. HPIV-1 and 3 samples were collected from Argentina, Australia, France, Mexico, South Africa, Switzerland, and the USA during the period 2003-2011 and genome sequenced using a combination of Sanger and Next-Generation Sequencing (NGS) technologies (Ion Torrent, Illumina, and 454 platforms). Here, we present the results of a comparative analysis of 40 HPIV-1 and 75 HPIV-3 novel genome sequences, in combination with additional HPIV sequences available in Genbank. Our findings provide further insights into HPIV1 and HPIV3 worldwide genome diversity and evolution.

Ethics statement
Samples from the Virology Laboratory of CEMIC, University of Geneva Hospitals, National Reference Center for Measles and Paramyxoviridae, and University of Adelaide-Institute of Medical & Veterinary Science Adelaide were collected for routine viral diagnostic testing and

Sample collection and HPIV identification from various locations
Samples were collected retrospectively as part of a larger paramyxovirus sequencing project which also included RSV and human metapneumovirus. The RSV sequencing part of the project was published previously [7]. There was some overlap in locations that provided RSV and HPIV samples. Even though collection details for some of these locations has already been published previously, they are repeated here for clarity. The Autonomous University of San Luis Potosí (San Luis Potosí, Mexico). Respiratory samples were collected from pediatric subjects as part of research projects carried out to analyze the epidemiology of viral respiratory infections and as part of a hospital-based infection control program [8][9][10][11]. The research projects were approved by the corresponding Research and Ethics Committees. Samples were obtained by nasal wash or pharyngeal swab, and they were processed as described in Noyola et al. (2004) [8]. Viral testing was carried out directly on respiratory samples. HPIV identification was performed with a direct fluorescent antibody assay using the Respiratory DFA viral screening and identification kit (Light Diagnostics, Chemicon International, Temecula, CA, subsequently Light Diagnostics, Millipore Corporation, Temecula, CA) which detects seven respiratory viruses including adenovirus, influenza A, influenza B, HPIV-1, HPIV-2, HPIV-3, and RSV. After viral detection, respiratory samples were stored at -70˚C until they were shipped on dry ice for sequence analysis.
Virology laboratory of CEMIC (Buenos Aires, Argentina). Respiratory samples from hospitalized pediatric patients less than 1 year of age were submitted for routine viral diagnostics. Nasopharyngeal aspirates or swabs were obtained in a transport media (Hanks plus 2% FCS, penicillin, streptomycin, and amphotericin B). Samples were processed for antigen detection on pelleted cells by indirect immunofluorescence with monoclonal antibodies against RSV, adenovirus, influenza A and B and parainfluenza (EMD Millipore, Billerica, MA, USA) [12,13]. A fluorescein labeled anti-mouse IgG was used (Sigma-Aldrich Corp., St. Louis, MO, USA). Readings were performed with a C. Zeiss microscope provided with epifluorescent equipment and a mercury lamp. An aliquot of the sample was preserved in liquid nitrogen.
University of Adelaide-Institute of Medical & Veterinary Science (Adelaide, Australia). Nasal pharyngeal aspirates from pediatric patients less than 15 years of age were submitted for routine viral diagnostics and tested within eight hours of collection for seven viruses (adenovirus, influenza A and B, HPIV 1, 2, and 3 and RSV) and Mycoplasma pneumoniae [14]. The viruses and M. pneumoniae were identified directly from the samples by specific antibodies using in-house developed enzyme immunoassays. Samples were also inoculated into 96-well microwell cell cultures, spun at 1000 × g for 1hr at 35˚C and incubated for 5-6 days at 37˚C prior to testing the inoculated cell culture fluids for viruses by enzyme immunoassay [15]. Following testing the remaining volumes were then stored at 4˚C for less than three days and then stored at -70˚C long term. The samples used in this report were randomly selected without any specific patient identifications.

Department of Medical Virology, University of Pretoria (Pretoria, South Africa) and National Institute for Communicable Diseases (NICD) (Johannesburg, South Africa).
Respiratory samples from 2007-2010 were from pediatric and adult patients and submitted either as part of outpatients enrolled in the NICD Viral Watch programme [16] or for routine diagnosis from patients with lower respiratory tract infection, hospitalized in the Kalafong and Steve Biko Academic hospitals. Testing in 2007 was performed at the Tshwane National Health Laboratory Service laboratory, Department Medical Virology, University of Pretoria by direct immunofluorescent assay followed by confirmation by RT-PCR, as described previously [17]. At NICD, samples from 2008-2009 were tested in shell vials and direct immunofluorescence [18], from 2010 onwards samples were tested with RT-PCR [19]. Some viruses were amplified in tissue culture for this study. Samples were stored at -80˚C.
National Reference Center for Measles and Paramyxoviridae, Caen University Hospital (Caen, France). Respiratory samples from pediatric and adult patients were submitted for routine diagnostics for 15 respiratory viruses (adenovirus, influenza A and B, HPIV 1, 2, 3, and 4, RSV A and B, human metapneumovirus, human coronaviruses 229E, OC43, NL63 and HKU1, and bocaviruses) and four intracellular bacteria including Mycoplasma pneumoniae. These viral and bacterial targets were detected directly from the samples using the RespiFinder assay (PathoFinder B.V., Maastricht, Netherlands). Samples were divided in aliquots and stored frozen at -80˚C.
Laboratory of Virology, University of Geneva Hospitals (Geneva, Switzerland). Nasopharyngeal swabs and aspirates were collected from pediatric and adult patients for routine viral diagnostics for a panel of respiratory viruses (adenovirus, influenza A and B, metapneumovirus, HPIV 1, 2, and 3, rhinovirus, enterovirus and RSV). HPIV were identified by a published in house real-time RT-PCR assay [20]. Samples were stored at -80˚C.
The Midwest Respiratory Virus Program, Children's Hospital of Wisconsin, and Dynacare Laboratories (Milwaukee, Wisconsin, USA). Nasopharyngeal swab samples were collected from pediatric and adult patients with informed consent under an IRB protocol or through routine clinical diagnosis and de-identified. HPIVs were identified by an in-house real-time RT-PCR. Sample remnants were kept at -80˚C.
Samples from all locations were frozen at -70˚C or less and held frozen until shipped on dry ice to the Midwest Respiratory Virus Program. Once received they were stored at -80˚C until further processing.

Sequencing HPIV genomes
For samples sequenced using the Sanger technology, genome specific PCR primers tiling across the entire genome were designed from the respective HPIV-1 and HPIV-3 consensus sequences using an automated PCR primer design pipeline developed at the J. Craig Venter Institute (JCVI) with an average amplicon length of 650 bp and an overlap of 150 bp [21]. Each primer had an M13 sequence tag at the 5' end used for Sanger sequencing. They were then pruned to a minimal set of 95 primers with at least three-fold amplicon coverage at every base position of the consensus. The HPIV-1 primer sequences (S1 Text) were generated based on a consensus sequence using the Genbank accessions NC_003461, JQ901990, JQ901989, JQ901975 and JQ901972; while the HPIV-3 primers (S2 Text) were generated using a consensus sequence generated from the accessions: AB012132, Z11575, EU424062, FJ455842 and EU326526. Viral RNA was extracted from 100-400 μl of the samples using the NucliSENS easyMag (bioMerieux, Inc., Durham, NC, USA) with elution in 25 μl at the Midwest Respiratory Virus Program. After extraction the RNA was shipped on dry ice to the JCVI. There, RNA from each sample was subjected to RT-PCR using a Qiagen One-step RT-PCR kit (Qiagen, Hilden, Germany) to produce 650 bp amplicons from each of the 95 primer pairs for both HPIV-1 and HPIV-3 samples. PCR products were treated with a mixture of exonuclease and shrimp alkaline phosphatase to remove primers and dNTPs, and then Sanger sequencing was performed with M13 primers.
For samples sequenced using Ion Torrent, 100ng of pooled DNA amplicons were sequenced as described previously [22].
For samples sequenced using SISPA and 454/Illumina, 100ng of pooled DNA amplicons were randomly amplified and prepared for next-generation sequencing using a sequence independent single primer amplification (SISPA) method as initially described by Djikeng et al. [23]. Briefly, 100ng of amplified viral DNA were denatured in the presence of 6% w/v DMSO and a chimeric oligonucleotide containing a known barcode 22nt sequence followed by a 3' random hexamer. A Klenow reaction was prepared with the denatured DNA template by adding NEB buffer II, 3'-5' exo-Klenow (New England Biolabs, Ipswich, MA, USA), and dNTPs (Thermo Fisher Scientific, Waltham, MA, USA). The Klenow reaction was incubated at 37˚C for 60 min, followed by incubation at 75˚C for 10 min. The resulting cDNA was randomly amplified by PCR using Promega GoTaq Hot Start Polymerase (Promega Corporation, Madison, WI, USA) at 35 cycles (denaturation: 30 sec, 94˚C; annealing: 30sec, 55˚C; extension: 48 sec, 68˚C). PCR reactions contained primers corresponding to the known 22nt barcode sequence from the oligonucleotide utilized in the previous Klenow step. The resulting cDNA was then treated with Exonuclease I at 37˚C for 60 min, followed by incubation at 72˚C for 15 min.
SISPA products were normalized and pooled into a single reaction that was purified using the QIAquick PCR purification kit (Qiagen, Hilden, Germany). Sample was further gel purified to select for SISPA products 300-500bp in size for Illumina sequencing and 500-800bp in size for 454 sequencing. Two aliquots were then submitted for sequencing with 454 and Illumina sequencing technologies. 454 sequencing was performed on the GS FLX+. A rapid library was created, and samples were sequenced on one half plate. Illumina sequencing was performed on the Illumina MiSeq v2 with 250bp paired-end reads.

Assembly and annotation
For samples sequenced with Sanger, sequencing reads were trimmed to eliminate amplicon primer and low-quality sequences. They were then assembled with CLC using the clc_ref_as-semble_long program [24]. Draft assemblies were evaluated, and targeted PCR-based sequencing reactions were conducted to close gaps and improve sequence coverage.
For samples sequenced with a combination of Sanger, 454, Illumina, and Ion Torrent, the samples were assembled, validated, and annotated as described previously with minimal modification [22]. After the sequences had been assembled using CLC Bio's clc_novo_assemble program [25], the resulting contigs were searched against custom full-length HPIV-1 or HPIV-3 nucleotide databases to find the closest reference sequence which was then used for mapping the sequence.
All sequences generated as part of this study were submitted to Genbank as part of the Bioproject id PRJNA73053 for HPIV-1 with accessions KF530195 -KF530224 and KF687307 -KF687316 and PRJNA73055 for HPIV-3 with accessions KF530225 -KF530257 and KF687317 -KF687358.

Recombinant analysis
All HPIV-1 and 3 genome sequences with collection years available in Genbank as of 3/7/2019 were downloaded and combined with the genomes produced in this study. The genomes were aligned using MAFFT [26]. Any sequence that did not contain all of the complete coding sequences (CDS) for all of the HPIV genes was deleted from the alignment. Recombination was predicted for each alignment using the RDP, GENECONV, Chimaera, MaxChi, BootScan, SiScan, and 3Seq algorithms in RDP4.8 [27]. Recombination events were only considered to be potentially real if they were predicted by at least three of the algorithms. Any potential recombinants were deleted from the alignments used for subsequent analyses. One other sequence (MH678676) was removed due to a significant number of unresolved positions and another (KY234287) for containing a large insertion that could not be verified.

Entropy plots
To assess the degree of variability of the HPIV-1 and 3 proteomes, the complete genome alignments were trimmed to contain only the protein coding sequences for the N, P, M, F, HN, and L genes and then translated into predicted protein sequences. The entropy values for each amino acid position were calculated using the Entropy (H(x)) plot function in BioEdit 7.0. The entropy data was imported into Microsoft Excel where the number of positions with variants in more than one sequence were counted, the percent of variable positions were determined for each protein, and the data was plotted.

Positive selection
The same set of sequences used for the recombinant analysis was split into the individual CDS for the N, P, M, F, HN, and L genes. All additional complete CDS with collection years were retrieved from Genbank and added to these sets of sequences. Sequences were aligned again with MAFFT. Pervasive diversifying selection was determined for each alignment using the SLAC, FEL, and FUBAR algorithms available on the Datamonkey webserver [28][29][30][31]. Additionally, episodic diversifying selection was checked with the MEME algorithm [30]. For the HN CDS of HPIV3 the number of sequences had to be reduced to less than 500 for SLAC, FEL, and FUBAR and less than 400 for MEME to run successfully. Identical nucleotide sequences and some with nucleotide substitutions that caused no amino acid changes were removed to reach these numbers. Sites were only considered positive if they met the cutoffs for two or more of the algorithms, which were a p-value of less than 0.05 for SLAC, FEL, and MEME and Bayes factor/posterior probability of greater than 0.95 for FUBAR.

Phylogenetic analyses
The whole genome sequence and complete CDS alignments were analyzed with the Bayesian method of phylogenetic inference in BEAST v1.8.2 [32].BEAUti v1.8.0 was used to generated the xml files for use in BEAST with the collection dates used to assign tip dates with variable precision. The HKY model of nucleotide substitution was used with the gamma + invariants site model with four gamma categories, strict clock model, and the piecewise-constant Bayesian Skyline tree prior with 10 groups. For the complete CDS analyses sites were partitioned into 3 codon positions. The full genome and CDS sequences were run on 100 million iterations for HPIV-1. For HPIV-3 more iterations were needed to reach appropriate effective sample size values so two to five replicate runs of 100 million were performed. Results were analyzed with Tracer v1.6 to find the evolutionary rates [33]. Summary trees were produced using TreeAnnotator v1.8.0 to infer maximum likelihood trees for the genome and full CDS. The resulting trees were visualized and annotated with FigTree v1.4.0.

Results and discussion
As part of this study, 40 HPIV-1 and 75 HPIV-3 genomes were sequenced in total. Of these, 21 HPIV-1 and 51 HPIV-3 genomes were high quality sequences resulting in complete whole genome assemblies. The 43 remaining assemblies contained at least one sequence gap, either due to low quality sequence at specific regions or insufficient sequencing data, resulting in more than one contig per genome assembly. Ten HPIV-1 and 42 HPIV-3 genomes were sequenced using Sanger, while the remaining samples were sequenced using NGS platforms. One HPIV-3 sample was sequenced completely with both Sanger and NGS methods. Information on the sequenced viruses can be found in S1 Table and S2 Table. Recombination No recombination events were predicted in the HPIV-1 genomes. In the HPIV-3 genomes there were five genomes that were predicted to be recombinant. Two of these genomes were produced in this study (KF530247 and KF530229). Upon further inspection of the reads for these samples it became clear that they were a mix of reads from more than one virus with the percentage of reads representing each virus varying greatly across the genome. A deep sequencing analysis on KF530247 and KF530229 showed the presence of a mix of major and minor variants at multiple positions across the genome with high confidence. We could not tease apart the major and minor variants from the samples and therefore, for samples KF530247 and KF530229 the consensus sequence represented just the major variant. The amount of minor variant at positions across the genome were as much as 48%. Sample KF530247 had sites with minor frequency variants above 10% at 26 positions and sample KF530229 had minor variants above 10% at 11 positions across the genome. There were many other sites in both genomes where the frequency of minor variants was less than 10%. The other three potential HPIV-3 recombinant genomes (FJ455842, MH678681, and EU326526) were published by other research groups and lacked the sequencing reads required to confirm they were true recombinants. The publishers of FJ455842 argued that this virus was an example of natural recombination in HPIV-3 [34]. However, the only evidence shown is from recombination prediction algorithms, and the putative breakpoint falls in the overlap region of two amplicons used for sequencing. In our experience recombination events like this are typically the result of two different viruses being amplified and sequenced. Appropriate evidence for the recombination event would be individual reads spanning the breakpoint showing sequence that matches one parent on each side.

Entropy plots of the HPIV-1 and 3 protein sequences
Interestingly, our analysis revealed that all of the proteins of HPIV-3 harbored more variable sites than the equivalent proteins in the HPIV-1 virus (Fig 1). The increased percentage of variable sites may at least partially be due to there being four times more sequences for HPIV-3 allowing for a larger amount of the population's variation to be observed. From these plots it was clear that the P protein was the most variable in both HPIV-1 and 3. The P protein is important for RNA transcription and genome replication. A region (78-320) of the P protein that was previously identified as not being necessary for RNA synthesis in Sendai virus (a close relative of HPIV-1) contained 66% of the variable positions in the HPIV-1 P protein, even though the 243 residues represent only 43% of the protein sequence [35]. This protein is not well conserved between HPIV-1 and 3 so it is not clear exactly how the functional regions line up position-wise, but a similar region in HPIV-3 contained 55% of the variable positions. Perhaps since this region is not critical for the primary functions of the protein it is more accommodating of mutations. Often the HN gene is selected as the target gene to analyze the evolution of HPIVs because of its relatively high variability [36,37]. Our results suggest that the P gene could also be useful as a molecular marker for evolutionary analyses and epidemiology research. The entropy analysis also highlights more conserved regions of the genome that could be useful for the development of robust diagnostics and potential vaccines.

Positively selected sites
We used the SLAC, FEL, MEME, and FUBAR algorithms on the Datamonkey webserver to identify potential positively selected sites for each of the coding regions in HPIV-1 and 3 ( Table 1). The positively selected site at codon 5 of the HPIV-1 F CDS is a part of the hydrophobic signal peptide, which is important for insertion of the premature fusion peptide into the rough endoplasmic reticulum [38]. Later in the fusion maturation process the signal peptide is cleaved from the protein. Since the signal peptide is not present in the mature fusion protein or virus and therefore should not be exposed to the host immune response it is unclear what could be the potential source of the positive/diversifying selective pressure on this site. In HPIV-3, site 108 of the F protein was found to be under positive selection. This site is part of the cleavage sequence which has historically been described as amino acids 106-109 with the motif R-T-K-R but in modern sequences is more commonly R-T-E-R (present in >95% of sequences). There are seven sequences that have a G at this position and one that has an R. It is interesting that there are positively charged (K and R), negatively charged (E), and neutral (G) amino acids at this position. R and E have previously been observed in this position and were found to have no effect on replication in tissue culture or the respiratory tract of rhesus monkeys [39]. Between HPIV-1 and 3 there were six sites predicted to be under positive selection in the P protein. All of these sites fall in the region we described above that is highly variably but has been shown to not be essential for the function of the protein in Sendai virus. The two sites under positive selection in the HPIV-3 HN protein appear to frequently switch between two functionally similar amino acids with site 58 being S or N and site 168 being K or R. Both sites require only a single nucleotide change to change between the two amino acids. Site 440 of the HPIV-3 N protein is a Q in most sequences, but in multiple independent instances has switched to an R. The MEME algorithm predicted that there were 21 codons across the genome affected by episodic selection for HPIV-1 and 14 codons for HPIV-3 (S3 Table).

Evolutionary rates
The evolutionary rates estimated for the whole genomes were 4.97 × 10 −4 substitutions/site/ year (95% highest posterior density of 4.55 × 10 −4 to 5.38 × 10 −4 ) for HPIV-1 and 3.59 × 10 −4 (95% highest posterior density of 3.26 × 10 −4 to 3.94 × 10 −4 ) for HPIV-3 (Fig 2). Evolutionary rates were not very different from CDS to CDS for either virus. The L gene had the lowest rate for each virus. The evolutionary rates were similar between the two viruses with HPIV-3 having a slightly lower rate overall despite having a slightly higher rate for four of the six CDS. Since these viruses have significantly different tissue tropism and epidemiology this could lead to different immunologic pressures and some differences in evolutionary rates between the two viruses. Previous studies have estimated evolutionary rates from whole genomes to be 7.61 × 10 −4 for HPIV-1 and 4.2 × 10 −4 for HPIV-3 and for complete HN gene sequences to be 1.12 × 10 −4 to 1.4 × 10 −3 for HPIV-1 and 8.39 × 10 −4 for HPIV-3 [40][41][42]. The published rates are generally higher than those seen in this study, which may be due to differences in the selection of viruses used for the estimate. Regardless, the published rates are similar to those seen in this study and are typical of those seen for RNA viruses [43,44]. Another recent study found a very similar rate (3.12 × 10 −4 ) for HPIV-3 when analyzing the concatenated coding regions from all genomes available in Genbank at the time [45]. Our data suggests that at least from an evolutionary rate standpoint standard vaccine approaches to HPIV-1 and 3 should be successful because the genetic and predicted antigenic change in each virus is slow enough to allow subsequent change to any vaccine with reasonable genetic monitoring of the virus population.

HPIV-1 phylogenetic analysis
The Bayesian Markov Chain Monte Carlo (MCMC) analysis of 67 HPIV-1 genome sequences identified two major HPIV-1 clades (Clades 2 and 3) circulating at the time samples were collected for this study and one clade (Clade 1) that had been circulating in the late 1990s (Fig 3). Clade 1 was very distinct with sequences from the winter of 1997 from just Wisconsin, USA, and appeared to have died off before samples were collected for our study. Clade 2 included viruses collected from 2003-2009 from Australia, France, Mexico and USA; whereas, clade 3 included viruses collected from 2009-2015, mostly from USA, with a few sequences represented from France and South Africa. Clades 2 and 3 were found to co-circulate in the USA. However, the degree of clade co-circulation in other countries was difficult to assess due to low representation of samples from other countries. However, both clades were found in France and South Africa within one or two years of each other suggesting that co-circulation may be occurring in these regions. All viruses sequenced in this study from Mexico and Australia belonged to clade 2. Since these few sequences represented all the sequences available for these two countries, it is unclear if other clades co-circulate with clade 2 in these regions. Previous studies have found these same clades co-circulating in Croatia, Japan, USA, and Vietnam [36,40,41,46,47]. Combined, the results of these studies suggest that there may have only been two main clades of HPIV-1 circulating globally during the period included in this analysis.

HPIV-3 phylogenetic analysis
Most HPIV-3 phylogenetic papers follow a classification system established in a study from China in 2012 [48] and further refined in 2015 [49]. In this system HPIV-3 is divided into three clusters (A, B, and C), cluster C is divided into several sub-clusters (1, 2, 3, etc.), and then some of the sub-clusters are further divided into a 3 rd level of classification called lineage (a, b, c, etc.). Since then some groups have split up the sub-clusters into slightly different lineages that do not completely agree between papers or added a fourth level of classification below lineage. We decided to use the more agreed upon clades mostly following what was established in Almajhdi 2015 [49]. Our MCMC analysis of 268 HPIV-3 genomes included two of the three clusters (B and C) (Fig 4).  [46,48,49]. Noteworthy, there were no cluster C representative sequences from Australia. Perhaps that indicates some geographical restriction of HPIV-3 transmission in and out of Australia. It would be interesting to see a larger sampling of HPIV-3 in this country over multiple years to further clarify this situation. In several countries multiple lineages of HPIV-3 were observed within the same country in a single season or year showing that multiple lineages can co-circulate. Co-circulation of multiple HPIV-3 lineages was also observed previously in a study in Croatia [46]. The co-circulation of multiple clades is not uncommon for viruses and has been observed in other paramyxoviruses like RSV subtypes A and B and human metapneumovirus with subtypes A1, A2, B1, and B2 [50,51].

Conclusions
When we initiated this study, there were relatively few whole genome sequences available for HPIVs in GenBank. We have now added 40 HPIV-1 and 75 HPIV-3 genomes to GenBank. In the past couple of years a few other groups have also added additional HPIV genomes sequences to GenBank. Our analysis showed that multiple genetically distinct clades of both HPIV-1 and 3 circulate on a global scale with HPIV-3 having a larger number of distinct subclades that co-circulate. It may be important to consider the multiple clades of these viruses when designing vaccines in order to confer broad cross-protection. We found that some geographically distinct clades may have existed, and some appear to have died out. We further showed that HPIV-1 and 3 are evolving at similar rates and identified genomic and proteomic sites that are under positive selection. Moreover, we did not find any clear evidence that these viruses evolve by genetic recombination and demonstrate that it is possible to misinterpret dual infections as recombination events. The whole genome HPIV sequences generated in this project will contribute to the development of robust diagnostic tools, vaccines, therapies, and a better understanding of the global and regional epidemiology and the evolution of these viruses. However, further surveillance and whole-genome sequencing is greatly needed to further understand the spatial dynamics of these important human respiratory viruses. While this and other studies have significantly increased the number of genome sequences available for HPIV-1 and 3 in GenBank, the number of genomes available for these important viruses is still very limited. Influenza A, for example, has greater than 100-fold more sequences available and it is still difficult to predict the correct vaccine strain or more prevalent subtype for any given season. Therefore, considerably more sequencing effort is needed in paramyxoviruses to further improve our knowledge of their molecular epidemiology and genetics. More studies incorporating larger numbers of sequences from other geographic regions and years are critical.