Molecular Epidemiology of Coxsackievirus A16: Intratype and Prevalent Intertype Recombination Identified

Coxsackievirus A16 (CVA16) is responsible for nearly 50% of all the conﬁrmed hand, foot, and mouth disease (HFMD) cases in mainland China, sometimes it could also cause severe complications, and even death. To clarify the genetic characteristics and the epidemic patterns of CVA16 in mainland China, comprehensive bioinfomatics analyses were performed by using 35 CVA16 whole genome sequences from 1998 to 2011, 593 complete CVA16 VP1 sequences from 1981 to 2011, and prototype strains of human enterovirus species A (EV-A). Analysis on complete VP1 sequences revealed that subgenotypes B1a and B1b were prevalent strains and have been co-circulating in many Asian countries since 2000, especially in mainland China for at least 13 years. While the prevalence of subgenotype B1c (totally 20 strains) was much limited, only found in Malaysia from 2005 to 2007 and in France in 2010. Genotype B2 only caused epidemic in Japan and Malaysia from 1981 to 2000. Both subgenotypes B1a and B1b were potential recombinant viruses containing sequences from other EV-A donors in the 5’-untranslated region and P2, P3 non-structural protein encoding regions.


Introduction
In recent years, the hand, foot and mouth disease (HFMD) has been a very common infection for children in Asian countries caused by various enteroviruses, with coxsackievirus A16 (CVA16) and enterovirus 71 (EV-A71) as the two major causative agents [1]. Since EV-A71 was more frequently associated with severe complications including aseptic meningitis, acute flaccid paralysis, brainstem encephalitis, pulmonary edema, and even death, so there were a lot of studies focusing on EV-A71 [2][3][4][5], but relatively less on CVA16. Studies on molecular epidemiology of CVA16 were reported elsewhere [6][7][8][9][10], the genetic diversity and evolutionary characteristics of this virus has yet to be explored. Previous studies have shown that CVA16 was responsible for the confirmed HFMD cases in China in recent years [11][12][13]; besides HFMD and herpangina, infection by CVA16 could also cause serious myocarditis and pericarditis, acute heart failure, and even death [14,15].
The genome of CVA16 is a single stranded, positive sense RNA, containing approximately 7,410 nucleotides with a single open-reading-frame (ORF). The viral genome contains 5'-and 3'-untranslated regions (UTRs) which are essential for viral RNA replication and expression. The 5'-UTR is about 740 nucleotides in length and comprises the internal ribosomal entry site, which is related to the replication and internal initiation of translation of the genomic RNA [16]. The ORF has 6,579 nucleotides, encoding a polyprotein of 2,193 amino acids which is composed of 3 protein precursors: P1, P2, and P3. The P1 polyprotein precursor processed into 4 structural proteins, VP1 to VP4, while P2 and P3 are precursors of the seven nonstructural proteins; 2A, 2B, 2C, 3A, 3B, 3C, and 3D, respectively [17,18].
Since CVA16 was first identified in 1951 in South Africa, it evolved slowly [19]. Complete genomic analysis of Enterovirus A (EV-A) indicated that recombination in the nonstructural region played an important role in the evolution of some EV-A strains [20]. Recombination between CVA16 and other serotypes of EV-A in the nonstructural region had also been documented previously by using partial sequences available earlier [11], however, the recombination analysis was also limited by the lack of complete genome sequences. To clarify the global molecular epidemiology and genetic characteristic of CVA16, we perform an extensive genetic analysis using all of the available genomic sequences of CVA16 dated before December 2012 in the public database, plus 8 complete genome sequences determined in our laboratory.

Genotyping based on complete VP1 sequences of CVA16
A total of 629 sequences with complete VP1 region of CVA 16 were used for analysis, included 621 retrieved from public database dated before December 2012, plus 8 complete genome sequences determined by our laboratory (Table 1,  Table S2). Phylogenetic analyses were performed with MEGA software. CVA16 strains could be divided into genotypes A, B1 and B2 as indicated in our previous study [19]. Further, genotype B1 could be divided into subgenotypes B1a, B1b, and B1c. Among 629 VP1 sequences, 607 clustered together with the reference sequence of genotype B1, and 21 showed the nearest relationship with genotype B2, while the prototype CVA16 was the unique member of genotype A ( Figure 1). The genotype B2 was first reported from Japan in 1981 (AB465366). Among the 21 B2 strains, 17 were isolated from Japan during 1981-1998, and 4 from Malaysia during 1998-2000. After 2000, there was no genotype B2 reported. The earliest B1 strains were isolated in 1995 from Japan (AB634295-AB634311). Relatively, genotype B1 was more prevalent in the epidemics and had been reported in many countries, including mainland China, Taiwan, Japan, Malaysia, Thailand, Vietnam, Australia, France, Korea, Spain, Sweden and Saudi Arabia since then.
Among 607 genotype B1 strains, there were 387, 200, and 20 strains could be divided into subgenotypes B1a, B1b, and B1c, respectively, by phylogenetic analysis (Figure 1). In more detail, B1a was the most prevalent one, after its first appearance in 1995 in Japan, it has been circulating in 11 of the 12 countries (regions) involved in this study, except Saudi Arabia. B1b was reported in mainland China, Taiwan, Japan, Malaysia, Australia, and Saudi Arabia during 1998-2011. Circulation of B1c was only detected in Malaysia from 2005-2007, and in France in 2010 ( Figure 2).

Analysis of complete genomic sequences of CVA16
For further genomic analysis, a total of 35 complete genome sequences of CVA16 were used, including one prototype, 8 from our laboratory, and 27 retrieved from public database. Among the 35 genome sequences, there were 26 sequences from mainland China, 3 from Malaysia, 2 each from Hong Kong and Thailand, and 1 each from South Korea and Taiwan (Table  1). Phylogenetically, these 35 isolates were grouped into subgenotype B1a (18) and B1b (17) based on complete VP1 region, and all of the B1a sequences were from China ( Figure  1, Figure S1a).
Compared to the CVA16 prototype (U05876), there were no deletions or insertions observed in the P1, P2, and P3 encoding regions. Nucleotide substitutions among different CVA16 strains are scattered throughout the genome. Pairwise nucleotide sequence identities are 89.4%-98.9% among the 35 isolates, 91.4%-98.5% and 93.2-98.9% within groups B1a and B1b, respectively. Pairwise nucleotide sequence identities based on each gene were also calculated (Table S4). In the structural capsid protein encoding regions, VP1-VP4, nucleotide sequence identities within subgenotype B1a and B1b are more than 91.3% and 92.2%, respectively. Comparatively, sequences in the non-structural protein encoding regions showed more diversities, especially in 3B (Table S4), the least identity in which is 81.8%. Estimations of mean genetic distance also indicated high homology in the whole genome, which were 0.049 within B1a, 0.043 within B1b ( Table 2). The mean nucleotide distance between B1a and B1b is 0.086, which is significantly larger than mean genetic distance within groups. The mean amino acid distance is much smaller, which was 0.009 within both B1a and B1b, and 0.013 between B1a and B1b ( Table 2). Both the values in sequence identity and genetic distance suggested a near phylogenetic relationship of sequences within both B1a and B1b.
Phylogenetic dendrograms based on the whole genomes, protein encoding regions, and each specific gene were performed with MEGA 5.0 software as shown in Figure 3 and Figure S1-2. The grouping of B1a and B1b in phylogenetic trees based on P1, P2, and P3 regions were different ( Figure  3), which suggested potential intratypic recombinations between B1a and B1b had occurred. This has also been confirmed by phylogenetic analyses on each specific gene ( Figure S1-2). Several sequences of subgenotype B1a clustered with B1b on VP2-VP4 ( Figure S1), and the grouping on 2A-2C, and 3A-3D were more diverse ( Figure S2).
Moreover, a potential intertypic recombination between the 35 genomic sequences of CVA16 and other serotypes of EV-A was also observed. Based on nucleotide sequences of P1 and its encoding proteins, all of the 35 sequences displayed nearest phylogenic relationship with CVA16 prototype ( Figure  3a, Figure S1). However, based on other non-capsid protein encoding regions and 5' UTR, all sequences exhibited further distance in evolutionary origin with CVA16 than other EV-A serotypes. In dendrograms based on P2 and P3, all 35 sequences clustered with a clade including the prototypes of EV-A71, CVA2, CVA3, CVA6, CVA10 and CVA12 but not CVA16 (Figures 3b, 3c). The different topology of the dendrograms between P1 and P2-P3 regions suggested a potential intertypic recombination of these circulating CVA16 strains with other EV-A serotypes in non-structural protein encoding regions. This deduction could also be supported by the nucleotide sequence identities with every prototype of EV-A. In both P1 and mature proteins VP1-VP4 encoding regions, sequence identities with CVA16 prototype were 76.5%-78.0%, 75.1%-77.1%, 75.1%-78.8%, 76.7%-79.2% and 79.2%-83.5%, respectively (Table 3), these values are consistent with the 75% rule of enterovirus typing [21]. But in P2 and P3 regions, sequences in our study showed significantly higher homology to EV-A71, CVA2, CVA3, CVA6, CVA10 and CVA12, ranging from 81.1%-84.3% and 81.6%-84.1%, than to CVA16 and other prototypes, which were 79.4%-80.3% and 76.9%-78.1%, respectively (Table 3).
Phylogenetic trees from each of mature protein encoding regions of P2 (2A-2C) and P3 (3A-3D), indicated more details of this inconsistency. Based on each specific region of P2 and P3, all the sequences in our study clustered along with the same clade in P2 and P3, except in 2A. In the later case, sequences in our study still showed nearest phylogenetic relationship with the prototype of CVA16, but indicating a low bootstrap value ( Figure S2a). These data suggested a heterogenous origin of 2B, 2C, 3A, 3B, 3C and 3D in all 35 sequences ( Figure 3, Figure S2). Calculated values of nucleotide sequence identities in the 2B, 2C, 3A, 3B, 3C and 3D regions also showed consistency with phylogenetic dendrograms. While in 2A region, the sequence identities  (Table 3), and possibly the site which intertypic recombination occurred.
To further depict the possible recombination of these strains, similarity plot and bootscanning analyses were performed by using EV-A prototype strains as reference sequences. Four strains, Tainan/5079/98/Taiwan/1998, shzh00-1/GD/CHN/ 2000, BJ/11/11/BJ/CHN/2011, and BJ/11/12/BJ/CHN/2011, were chosen as the representative strains. These represented the oldest and the latest CVA16 strains of subgenotypes B1a and B1b, respectively ( Figure 4). As indicated in the plots, these 4 sequences showed definitely the highest similarity and the nearest phylogenetic relationship with CVA16 prototype in P1 region. However, in P2 and P3 regions, all 4 strains contained some unidentified regions that showed a little higher similarity with prototypes of EV-A71, CVA2, CVA3, CVA6, CVA10 and CVA12, but apparently not CVA16 (Figures 4a, 4c,   4e, and 4g). The bootscanning showed a nearer phylogenetic relationship with EV-A71 (Figures 4b, 4d, 4f, and 4h). An obvious single crossing site in the 2A region suggested a potential recombination occurred which is consistent with deduction above.

Discussion
CVA16 was involved in a significantly high transmission among the children with HFMD and herpangina. The symptoms caused by EV-A71 and CVA16 are difficult to differentiate clinically. The co-circulation or alternating circulation of CVA16 and EV-A71 had been proven to have contributed to the outbreaks of HFMD [8,22,23]. CVA16 was responsible for about 50% of all the confirmed enterovirus infections among HFMD cases in mainland China [8,[24][25][26]. The infection by CVA16 was usually mild and benign when compared to EV- A71 infection, however, severe complications such as myocarditis and pericarditis, even death could occur [14,15].
There was lack of systematic epidemiologic analysis on CVA16 in previous studies. We studied the molecular epidemiology of CVA16 by analyzing all the complete VP1 sequences from GenBank database, and clarified the global distribution of the CVA16 genotypes. All CVA16 strains belonged to the genotype B, except the CVA16 prototype, which was the unique strain in genotype A. From 1981 to 2000, majority of the CVA16 strains classified as genotype B2 were only detected in Japan and Malaysia. The subgenotypes B1a and B1b strains were first isolated in 1995 and 1998, respectively, in Japan. These two subgentypes then became the predominant ones circulating globally since 2000. In mainland China, all the CVA16 strains isolated belonged to subgenotypes B1a and B1b. Both subgenotypes could be detected in most of the provinces. However, only single subgenotype B1b was detected in Shanghai, Shaanxi, Jilin, Fujian, Tianjin and Inner Mongolia. There has no subgenotype B1c been detected in China, so the surveillance on this new subgenotype must be continually reinforced.
In this study, high nucleotide sequence identities were observed among all 35 CVA16 complete genome sequences and in their respective encoding regions. Phylogenetic analyses also indicated that these 35 CVA16 strains  consistently clustered together in all of the analyzed regions. These result suggested that all 35 CVA16 sequences had high homogeneity with a little internal differences. This revealed that CVA16 strains might have evolved independently, steadily and originated from a same ancestor. It was determined that neutralization epitopes resided mainly on capsid proteins, especially protein VP1 [27]. In our results, the amino acid sequence identities in the P1 and VP1 regions of these 35 CVA16 strains were 99.5% and 99.6%, respectively. Further amino acid sequence comparisons revealed that the field epidemic CVA16 strains also had high sequence identities in protein P1 and detailed VP1-VP4 encoding regions, both within or between the subgenotypes B1a and B1b strains ( This study demonstrated that the current CVA16 strains were potential multiple-recombinant viruses containing other EV-A donor sequences. Recombination was frequently detected in the P2 and P3 regions while it was rare in P1 region, probably because of the structural proteins was important for the virus replication and assembling of proper viral capsid [29][30][31]. Genetic recombination sometimes could result in the virulence change of enteroviruses, which could trigger serious public health problems [32]. Because the multiple intertypic recombinations occurred in the P2 and P3 regions long time before and only few complete sequences of CVA16 are available in GenBank database, so the phylogenetic analyses are limited and hard to determine the donor strains.
These isolated CVA16 strains from HFMD outbreaks in mainland China recently were the sustained circulation of the same B1a and B1b strains from a common ancestor without clearly new genetic recombination and with a slow evolutionary rate. To elucidate further detailed genetic characteristics and evolutionary recombination events in CVA16, continually strengthen on surveillance is necessary.

Ethics Statement
This study did not involve human experimentation or human participants; the only human materials used were stool, throat swab, and vesicles samples collected from HFMD patients at the instigation of the Ministry of Health P. R. of China (PRC) for public health purposes. HFMD is a notifiable disease in China, and the pathogenic surveillance of HFMD without private information referred is required by the Law of the PRC on the Prevention and Treatment of Infectious Diseases. According to this law, these data used in this study were obtained from samples as part of this monitoring program. Thus, the requirement for written informed consent was waived by the Ethics Review Committee of the National Institute for Viral Disease Control and Prevention. This study was approved by the Ethics Review Committee of the National Institute for Viral Disease Control and Prevention. The CVA16 isolates involved in this study were isolated from the clinical samples of HFMD patients in 2010. These patients are from different geographical locations in the Tibet Autonomous Region, Zhejiang, Jiangsu and Anhui provinces of China.

Viruses
In this study, eight CVA16 strains were isolated from eight HFMD patients in mainland China in 2010 respectively. Patients were children with the age range from 2 to 14 years old. Viruses were isolated from throat swabs, feces, and vesicle fluid specimens on human rhabdomyosarcoma (RD) cell line.

Determination of the genomes
The complete genome of the 8 CVA16 isolates mentioned above were determined in our laboratory. Viral RNA was extracted using a QIAamp Viral RNA Mini Kit (Qiagen, Valencia, CA, USA). First, the viral RNA was converted to cDNA by a random priming strategy. The cDNA was amplified using the primers designed by multiple alignments of CVA16 genomes available in GenBank database (Table S1). PCR products were purified for sequencing using the QIAquick Gel extraction kit (Qiagen). Then the amplicons were bidirectionally sequenced using fluorescent dideoxy terminators and an ABI PRISM 3100 genetic analyzer (Applied Biosystems Foster City, CA, USA).

Bioinformatics and Phylogenetic analysis
Trimming of the sequencing chromatogram and sequences assembly were preformed with Sequencher program (version 4.0.5 Gene Codes Corporation, USA). Phylogenetic trees were reconstructed with MEGA program (version 5.0; Sudhir Kumar, Arizona State University, Tempe, Arizona, USA) [33], using the   neighbor-joining method and Kimura-2-parameter substitution model. (Note: Maximum likelihood method has also been used to construct the phylogenetic trees to increases the reliability, the results were more or less the same as neighbor-joining method. Data not shown.) Bootstrap testing with 1000 replicates were used to estimate the strength of the phylogenetic trees. Bootstrap values greater than 80% were considered to be a strong support for the tree topology. Completely sequenced 28 CVA16 isolates available in the GenBank, 8 isolates from our laboratory and other EV-A prototype strains were aligned and used for construction of phylogenetic trees and recombiantion analyses. The genetic distance and identity matrices were determined by group mean and pairwise estimation of the sequences divergence with MEGA program [34].

Recombination analysis
Alignment of the complete genome sequences were performed with the ClustalW package in MEGA program (version 5.0; Sudhir Kumar, Arizona State University, Tempe, Arizona, USA) [33]. Potential recombination between CVA16 and other EV-A strains was scanned by Similarity plot and bootscan method (version 3.5.1; Stuart Ray, Johns Hopkins University, Baltimore, Maryland, USA) [35]. The neighborjoining method and Kimura 2-parameter substitution model were used. A sliding window size of 500bp nucleotides moving in 20bp nucleotides at a time was used in this analysis.

Nucleotide sequence accession numbers
Complete genome sequences of the 8 CVA16 strains have been submitted to the GenBank database with the accession numbers KC755228 to KC755235. Table S1. List of nucleotide sequences of primer for amplification of the whole genome sequences of CVA16. (DOCX)