Genetic Characterization and Classification of Human and Animal Sapoviruses

Sapoviruses (SaVs) are enteric caliciviruses that have been detected in multiple mammalian species, including humans, pigs, mink, dogs, sea lions, chimpanzees, and rats. They show a high level of diversity. A SaV genome commonly encodes seven nonstructural proteins (NSs), including the RNA polymerase protein NS7, and two structural proteins (VP1 and VP2). We classified human and animal SaVs into 15 genogroups (G) based on available VP1 sequences, including three newly characterized genomes from this study. We sequenced the full length genomes of one new genogroup V (GV), one GVII and one GVIII porcine SaV using long range RT-PCR including newly designed forward primers located in the conserved motifs of the putative NS3, and also 5' RACE methods. We also determined the 5’- and 3’-ends of sea lion GV SaV and canine GXIII SaV. Although the complete genomic sequences of GIX-GXII, and GXV SaVs are unavailable, common features of SaV genomes include: 1) “GTG” at the 5′-end of the genome, and a short (9~14 nt) 5′-untranslated region; and 2) the first five amino acids (M [A/V] S [K/R] P) of the putative NS1 and the five amino acids (FEMEG) surrounding the putative cleavage site between NS7 and VP1 were conserved among the chimpanzee, two of five genogroups of pig (GV and GVIII), sea lion, canine, and human SaVs. In contrast, these two amino acid motifs were clearly different in three genogroups of porcine (GIII, GVI and GVII), and bat SaVs. Our results suggest that several animal SaVs have genetic similarities to human SaVs. However, the ability of SaVs to be transmitted between humans and animals is uncertain.

During our investigation, new SaVs were detected from chimpanzees [8] and rats [9] by next generation sequencing. The nearly full length genomic sequences, excluding the 5'-and 3'-ends, of the chimpanzee SaVs have been determined and are classified as GI based on the complete VP1 sequences. The VP1 sequences of rat SaVs have been determined, but they are not yet classified [9]. Furthermore, one more complete genome sequence of GIII porcine SaV (CH430 strain) has been determined [17].
Available complete genome sequences for animal SaVs are still limited compared to human SaVs. Therefore, the aims of this study were to determine additional complete genomic sequences of animal SaVs, to identify the common genetic characteristics of SaVs, and to examine the genetic relatedness among human and animal SaVs. We also proposed 15 genogroups for human and animal SaVs based on the complete VP1 sequences.

Fecal specimens
Swine, mink, dog, and sea lion fecal samples from previous studies [4-6, 10, 18-20] were stored at -80°C and were used in this study for further sequence analyses. SaVs of swine fecal origin used in this study (GVII WGP247 [KC309421]; GVIII WG194D [KC309416], WG197C [KC309417], and WG214D [KC309419]; GIX WG214C [KC309418]) had been determined previously for the 3'-end~3kb fragment, covering from the partial NS7 to the 3'-end of the genome. Mink SaV sequence had been determined for only partial NS7 sequence using mixed samples of five feces [4]. The nearly complete genomic sequences of sea lion SaV (GV/ CSL9775 [Genbank accession no. JN420370]) and dog SaV (GXII/AN210D [JN387134]) fecal origin had been de novo assembled using next generation sequencing data [5,6]. We included them in this study because the 5 0 and 3 0 ends of the sea lion SaV, and the 5 0 end of dog SaV were lacking based on comparisons to other SaV full genomes using The Basic Local Alignment Search Tool (BLAST http://www.ncbi.nlm.nih.gov/BLAST).
RNA extraction, cDNA synthesis, and PCR Viral RNA was extracted from 200 μL of fecal suspensions using RNeasy Mini kit (Qiagen) with slight modification. Briefly, 200 μL of fecal suspensions was mixed with 350 μL of RLT buffer and incubate for 5 min, then 296 μL of ethanol was added and mixed well. The mixture was applied onto RNeasy mini column and then washed and eluted according to the manufacture's instructions. Purified RNA was eluted in 40 μL UltraPure DNase/RNase-Free Distilled Water (Invitrogen) and used freshly or stored at -80°C.
The SaV genomic sequences spanning the putative NS3 to NS7 region or the putative NS3 to the end of the genome were amplified by RT-PCR with one of the newly designed forward primers (1550F, 1571F, or 1578F) and reverse primer TX30SXN or the strain-specific reverse primer targeting the NS7 -VP1 junction region, using high-fidelity PCR enzyme, PrimeSTAR GXL DNA polymerase (TaKaRa Mirus Bio). A final volume of 50 μ l of the PCR reaction mixture contained 2 μL of the cDNA or the first PCR products, 10 μ l of 5 × PrimeSTAR GXL DNA polymerase buffer, 4 μ l of 2.5 mM dNTPs, 2 μ l of forward primer (10 pmol/μ l), 2 μ l of reverse primers (10 pmol/μ l), and 1 μL of PrimeSTAR GXL DNA polymerase (1.25 U/μ l). PCR was performed at 98°C for 10 sec followed by 45 cycles of 98°C for 10 s, 55°C for 15 s, and 72°C for 2 min, and a final extension at 72°C for 10 min.
For the new porcine GV SaV, the sequence covering the partial NS3 to the 3'-end of the genome was amplified by semi-nested RT-PCR using gene-specific forward primers and TX30SXN primer, as described above.

0 RACE
The 5 0 terminal nucleotide sequences of the SaV genomes were determined using 5' Rapid Amplification of cDNA Ends (RACE), Version 2.0 (Invitrogen) with a slight modification from the original protocol. Briefly, cDNA was synthesized from RNA as follows: 8.5 μL of viral RNA was mixed with 0.5 μL of 10 pmol/μl strain-specific primer and 1 μL of 10 mM dNTPs. The mixture was incubated at 80°C for 3 min, cooled on ice, and then mixed with 2 μL of 10 × First strand buffer, 2 μL of 100 mM DTT, 4 μL of 25 mM MgCl 2 , 1 μL of RNase OUT ribonuclease inhibitor (40 U/μl), and 1 μL of SuperScript III reverse transcriptase (200 U/μl). This mixture was incubated first at 25°C for 5min, then at 50°C for~3h, and finally at 85°C for 5 min. Afterwards, 1 μ l of RNase H or RNase T1 mixture (Invitrogen) was added to the mixture that was incubated at 37°C for 30 min. The cDNA was purified using the SNAP column in the 5' RACE System (Invitrogen) or the column in the QIAGEN PCR purification kit (Qiagen) according to the manufacture's instructions. Finally, the purified cDNA was eluted in 50 μL of UltraPure DNase/RNase-Free Distilled Water. Homopolymeric (dC or dA) tailing was added on the purified cDNA as follows: 10 μL of the purified cDNA solution, 2.5 μL of 2.5mM dATP (promega) or dCTP (Invitrogen), 5 μL of 5 x tailing buffer, and 6.5 μL of water was mixed, incubated at 94°C for 3 min, and cooled on ice.Then 1 μL of Terminal deoxynucleotidyl transferase (20 U/μl) (Invitrogen) was added and the mixture was incubated first at 37°C for 10min and then at 65°C for 10 min to inactivate the enzyme. Nested PCRs were performed with the gene specific reverse primers and the abridged anchor primer AAP (5 0 -GGCCACGCGTC GACTAGTACGGGIIGGGIIGGGIIG-3 0 ) for the primary PCR and the abridged universal primer AUAP (5 0 -GGCCACGCGTCGACTAGTAC-3 0 ) (Invitrogen) for the secondary PCR for the poly (C)-tailed cDNA, and QT (5 0 -CCAGTGAGCAGAGTGACGAGGACTCGAGCT CAAGCT 17 -3 0 ) and QO (5 0 -CCAGTGAGCAGAGTGACG-3 0 )for poly (A)-tailed cDNA, using high-fidelity PCR enzyme, PrimeSTAR GXL DNA polymerase or PrimeSTAR HS DNA polymerase (TaKaRa Mirus Bio). A final volume of 50 μ l of the reaction mixture contained 5 μL of the homopolymeric (dC or dA) tailing cDNA or the 0.2 μL of the first PCR reaction mixture, 10 μ l of 5 × PrimeSTAR DNA polymerase buffer, 4 μ l of 2.5 mM dNTPs, 2 μ l of forward primer (10 pmol/μ l), 2 μ l of reverse primer (10 pmol/μl), and 1 μL of PrimeSTAR HS DNA polymerase (2.5 U/μl) or PrimeSTAR GXL DNA polymerase (1.25 U/μ l). PCR was performed at 95°C for 5 min for initial denaturing, followed by 45 cycles of 94°C for 15 s, 60°C for 15 s, and 72°C for 2 min, and a final extension at 72°C for 10 min.

Amplification of the 3 0 ends of the SaV genomes
The 3 0 end of a SaV genome was amplified by RT-PCR with gene specific forward primer, and the reverse primer TX30SXN, using high-fidelity PCR enzyme. A final volume of 50 μl of the reaction mixture contained 2 μL of the cDNA synthesized with the TX30SXN primer, 10 μl of 5 × PrimeS-TAR DNA polymerase buffer, 4 μ l of 2.5 mM dNTPs, 2 μ l of forward primer (10 pmol/μ l), 2 μ l of reverse primer (10 pmol/μ l), and 1 μL of PrimeSTAR HS DNA polymerase (2.5 U/μ l) or Pri-meSTAR GXL DNA polymerase (1.25 U/μ l). PCR was performed at 95°C for 5 min for initial denaturing followed by 35 cycles of 98°C for 10 s, 55°C for 15 s, and 72°C for 1 min, and a final extension at 72°C for 10 min.

Cloning, sequencing, phylogenetic analyses, and genogrouping
The PCR products were separated by agarose gel electrophoresis, purified using a QIAquick Gel Extraction Kit (Qiagen), and sequenced directly or cloned into the pCR4Blunt-TOPO vector (Invitrogen) before sequencing by primer walking methods using a set of gene-specific primers. For cloned samples, at least three positive clones of each sample were selected for sequencing. Samples were sequenced using BigDye Terminator cycle chemistry and an automated ABI Prism3100xl sequencer (Applied Biosystems). Sequence editing and assembly were performed using the Sequencher program v4.10.1 (GeneCodes) and analyzed by Genetyx-Mac software v16.0.4 (Genetyx Corporation). The Basic Local Alignment Search Tool (BLAST; http://blast.ncbi.nlm.nih.gov) was employed to find homologous hits. Amino acid sequences were aligned using ClustalW version 2.1 (http://clustalw.ddbj.nig.ac.jp/top-j.html). The construction of Maximum-likelihood phylogenetic trees with 1,000 bootstrap replications, and the calculation of amino acid sequence pairwise distances were performed using MEGA6 software [22]. Identity = 1-distance.

Results and Discussion
Forward Primers targeting the putative NS3 region were designed to amplify SaVs from different animal species We found that the regions suitable for PCR primer design were in the putative NS3 region based on the full genomic sequence alignments of the seven animal SaVs. We designed three forward primers, 1550F, 1571F, and 1578F (see Materials and Methods). The primers 1550F and 1571F targeted the same PL (N / D) CD amino acid motif that was conserved among GIII, GVI, GVII, and GXIV SaVs, and the primer 1578F targeted the WDEFD amino acid motif of GXIV SaVs. This motif (WDEYD) differed slightly in GVI and GVII SaVs. These motifs were located downstream of the typical GXPGXGKT motif of the putative NS3 (Fig 1).
Successful amplification of GV, GVII, GVIII, and GXII SaVs, but not GIX SaVs, using the newly designed forward primers We amplified the sequence fragments for two porcine GVII (RV0042 and WGP247), three porcine GVIII (WG214D, WG194D and WG197C), and one mink GXII (WD1237]) SaVs using the newly designed forward primers 1550F, 1571F, or 1578F and strain-specific reverse primers or the modified oligo dT primer, TX30SXN. The strain specific reverse primer for WD1237 was designed based on the partial sequence obtained by next generation sequence (NGS) as described previously .
A long PCR product (approximately 6kb) was amplified for SaV strain RV0042 using primer set 1571F and TX30SXN. Approximately 3.5 kb-fragments were amplified for other samples using these three forward primer(s) and the corresponding strain-specific reverse primers targeting the NS7-VP1 junction region.
We could not amplify the fragment from GIX SaV (WG214C) using those forward primers. In most successful cases, only one of the three forward primers (1550F, 1571F, or 1578F) amplified a SaV strain, except for GVIII WG194D and WG197C samples. Two different SaVs, GVIII (WG194D) and GV SaVs (WG194D-1) were amplified from the same fecal sample WG194D using 1550F and 1578F primers, respectively. Identical sequences were amplified for WG197C strain using 1550F and 1571F forward primers.
The 5 0 -end of the newly determined GV, GVII, GVIII, and GXIII SaVs started with "GTG", which was the same as for the other SaV strains ( Table 1). The 5 0 -untranslated regions were 9 to 14 nt long, which shared the same size as other SaVs (Table 1).

Genetic comparisons among animal and human SaVs
The two porcine SaVs (WG194D-1 and RV0042), and the mink SaV (WD1237) clustered with GV, GVII, and GXII SaV strains, respectively, based on phylogenetic analysis of the complete VP1 amino acid sequences (Fig 3).
The SaVs detected from rat formed two distinct clusters (NYC-A19 and E48 cluster, and NYC-A1 and B2 cluster) based on the VP1 aa sequence similarity (Fig 3), as reported recently [9]. We proposed NYC-A19 and E48 as a new genogroup GXV, because they shared only 29.4-41.8% aa identity to other genogroups of SaVs. We also proposed NYC-A1 and B2 strains as GII, because they shared 59.6-61.1% aa identity with other human GII SaV strains. Compared with other genogroups, lower intra-genogroup aa identity was observed for GV ( 57.5%) and GVII ( 57.3%) SaV strains. We adjusted the previously proposed cut-off value of 60% VP1 aa identity for genogrouping [10] to a slightly lower value (57% identity or 43% distance) based on phylogenetic and amino acid sequence identity analysis with the newly available SaV sequence data (Figs 3 and 4).  Table 1).  2. Phylogenetic tree of the full length genomic sequences of 34 sapovirus strains using MEGA 6. The five strains whose complete genomes were determined in this study are boxed with dotted lines. Among these, the two SaV strains that were newly identified in this study are also indicated with arrows. The number on each branch indicates the bootstrap value. The scale represents the amino acid substitutions per site. Each sapovirus strain is indicated in the following format: Genbank accession number-strain name (species). These strains represent all reported 14 genogroups (GI-GXIV) and the newly reported rat SaV strains using MEGA 6. The 10 animal SaV strains with additional sequences determined in this study are boxed with dotted lines. Among these, the three SaV strains that were newly identified in this study are also indicated with arrows. The number on each branch indicates the bootstrap value. The scale represents the amino acid From these observations we can infer that some animal SaVs (porcine GIII, GVI and GVII and bat GXIV) evolved more distantly from human SaVs than others (porcine GV and GVIII, sea lion GV and dog GXIII). We compared such amino acid sequence characteristics partially for GIX and GXII SaVs. Only the putative cleavage sites between NS7 and VP1, YVMEG and FEMEG of GIX and GXII SaVs, respectively, are available. We could not do a similar analysis for porcine GX and GXI, and the rat GII and GXV SaVs, because the corresponding amino acid sequences are not available (Table 1).
substitutions per site. Each sapovirus strain is indicated in the following format: Genbank accession numberstrain name (species).
doi:10.1371/journal.pone.0156373.g003 The arrow indicates the genogrouping cut-off ( 57% aa identity) used in this study.The genogrouping of GV, GVII, GVIII, GIX, GXII, and GXIII SaVs sequenced in this study (as indicated as dotted box in Figs 3 and 5) based on NS7 and VP1 matched, except for the sea lion GV/CSL9775 and the porcine GVII/WGP247 strains. The Sea lion/GV/ CSL9775 strain clustered together with other human and porcine GV SaV strains in the VP1 region, but it was separated from other GV strains in the NS7 region as recently discussed [1]. Similarly, GVII/ WGP247 strain clustered together with other GVII strains in the VP1 region, but it was closer to porcine GIX/WG214C in the NS7 region. The NS7 sequences of porcine GX and GXI SaVs and rat (GII and GXV) SaVs are not yet available (Table 1).
doi:10.1371/journal.pone.0156373.g004 These 44 strains represent 12 genogroups (GI-GV, GVI, GVII, GVIII, GIX, GXII, GXIII, and GXIV) using MEGA 6. The amino acid sequences covering from "WKGL" sequence to the end of the putative NS7, "XXME", were used. Only 44 of the 74 SaV strains in the VP1-tree (Fig 3) have sequences for this region. The 10 strains analyzed in this study are boxed with dotted lines. Among these, the eight SaV strains whose corresponding sequences were determined in this

Conclusions
Using the long RT-PCR strategy with our newly designed forward primer targeting the conserved region of the putative NS3 and the 5 0 RACE methods, we identified and/or determined additional complete genomic sequences for GV, GVII, and GVIII SaVs. We also characterized the genomic extremes of sea lion GV and canine GXIII SaVs. Further determination of animal SaV genome sequences, including those SaVs (porcine GIX, GX and GXI, mink GXII, and rat GII and GXV), whose complete genomes have not been determined, may enable the design of more universal SaV primers to detect SaVs from both animals and humans. In future studies it will also be interesting to evaluate the potential interspecies transmission of closely related animal and human SaVs, such as GI, GII and GV SaVs, using experimental animals.