Genomic Sequencing and Analysis of Sucra jujuba Nucleopolyhedrovirus

The complete nucleotide sequence of Sucra jujuba nucleopolyhedrovirus (SujuNPV) was determined by 454 pyrosequencing. The SujuNPV genome was 135,952 bp in length with an A+T content of 61.34%. It contained 131 putative open reading frames (ORFs) covering 87.9% of the genome. Among these ORFs, 37 were conserved in all baculovirus genomes that have been completely sequenced, 24 were conserved in lepidopteran baculoviruses, 65 were found in other baculoviruses, and 5 were unique to the SujuNPV genome. Seven homologous regions (hrs) were identified in the SujuNPV genome. SujuNPV contained several genes that were duplicated or copied multiple times: two copies of helicase, DNA binding protein gene (dbp), p26 and cg30, three copies of the inhibitor of the apoptosis gene (iap), and four copies of the baculovirus repeated ORF (bro). Phylogenetic analysis suggested that SujuNPV belongs to a subclade of group II alphabaculovirus, which differs from other baculoviruses in that all nine members of this subclade contain a second copy of dbp.


Introduction
Baculoviruses are rod-shaped, insect-specific viruses with double-stranded, circular DNA 80-180 kb genomes [1]. Baculoviruses have been widely used as bio-pesticides to control insect pests in agriculture and forestry [2], as vectors for protein expression, and as potential vectors for gene therapy [3,4]. The family Baculoviridae used to be grouped into two genera: Nucleopolyhedroviruses (NPVs) and Granuloviruses (GVs), dependent upon differing morphologies of occlusion bodies (OBs) [5]. More recently, a new classification has subdivided the Baculoviridae into four genera, based on phylogeny and host specificities: Alphabaculovirus (lepidopteran-specific NPVs), Betabaculovirus (lepidopteran-specific GVs), Gammabaculovirus (hymenopteran-specific NPVs) and Deltabaculovirus (dipteranspecific NPVs) [6]. Alphavaculoviruses can be further gathered into group I and group II based on phylogenetic analyses, The NPVs are also characterized as single nucleocapsid NPVs (SNPVs) and multiple-nucleocapsid NPVs (MNPVs) according to the number of nucleocapsids per virion. To date, 62 baculovirus reference genomes are available in the National Centre for Biotechnology Information (NCBI) database; 42 of them are alphabaculoviruses, 15 betabaculoviruses, three gammabaculoviruses, one deltabaculovirus and one unclassified baculovirus.
In the present study, the genome of SujuNPV is completely sequenced and annotated, and compared with those of the other representative baculoviruses. Results indicate that SujuNPV is a novel species belonging to a unique subclade of group II alphabaculoviruses, which contain a second copy of the DNA binding protein gene (dbp).

DNA extraction of the viral genome
The SujuNPV were purified from the dead Sucra jujube preserved in ''Chinese general virus collection center'' (CGVCC) with collection Number IVCAS 1.0048, which was originally isolated from Shandong Province, China, in 1983 [12]. The ODVs were purified as previously reported [13]. To extract DNA, the ODVs were incubated with four times volume 1 M DAS (5 M Nacl, 5 M NaCO 3 and 0.5 M EDTA (pH8), mixed in the ratio of 3:3:0.6) at 37uC for 30 min. Then, the same volume of 1 M Tris (pH 7.4) was added followed by centrifugation at 10,000 rpm (5 min) to obtain the viral DNA.

Sequencing and sequence analysis of the SujuNPV genome
The SujuNPV genome sequence was determined by 454 pyrosequencing. A total of 92,684 reads were obtained and assembled into 10 contigs using GS De Novo Assembler software, covering 97.8% of the whole genome with a sequencing depth of 225x. The remaining gaps were filled using PCR and Sanger sequencing.
Briefly, the genome was broken randomly into small fragments of about 600-900 bp by nebulization and adapters were added to construct a genomic library. Subsequently, the library was amplified by emPCR before sequencing. The SujuNPV genome was assembled using a GS De Novo Assembler providing 454 programs. Additional verifications were performed for gaps and ambiguous sequences using sequence-specific primers. The hypothetical ORFs of the SujuNPV genome were predicted by fgenesV0 (http://www.softberry.com/berry.phtml) [14], adopting the criteria of a size of at least 50 aa with a minimal overlap with other ORFs. Predicted aa sequences were compared with homologues of typical baculoviruses of the four genera, including AcMNPV (NC_001623), HearNPV-G4 (NC_002654), CpGV (NC_002816), NeleNPV (NC_005906) and CuniNPV (NC_003084), and similarities were obtained by DNAStar software with default parameters. Gene parity plots were generated in order to analyze the gene order of SujuNPV relative to three other closely related baculoviruses (ApciNPV, EcobNPV and OrleNPV) and the five representative viruses mentioned above.
Consensus promoter motifs were searched for in the upstream 150 bp region from the start codon of each ORF based on the characterization of baculovirus' promoters, that's a TATA box linked with a CAKT motif 20-40 bp downstream and a DTAAG box.

Phylogenetic analysis
Phylogenetic analysis of baculoviruses was performed using the concatenated aa sequence of 37 core genes [15] from 62 baculovirus reference genomes (http://www.ncbi.nlm.nih.gov/ genomes/GenomesGroup.cgi?taxid=10442, data update until Jan.5 th , 2014). The sequences were aligned by ClustalW with default parameters of MEGA5. And the maximum likelihood (ML) phylogenetic tree was reconstructed according to the previous report [16] with 1000 bootstrap values. The phylogenetic trees of dbp, helicase and p26 were constructed based on the same parameters.

Prediction of secondary structure
The secondary structures of DNA sequences were predicted by the Mfold Web Server using default parameters [17].  Table 1. SujuNPV Genome Annotation.

Characteristics of the SujuNPV genome sequence
The full SujuNPV genome [GeneBank: KJ676450] was 135,952 bp in length with an A+T content of 61.34%. Following convention, the adenine coding for the start methionine of the polyhedrin gene (ph) was chosen as the zero point of the SujuNPV genome and ph was designated as the first ORF. Overall, 131 putative ORFs were detected in the SujuNPV genome with the criteria of a length of at least 50 amino acids (aas) and a minimal overlap with adjacent ORFs. The total ORFs covered 89.2% of the whole genome, distributed with 60 ORFs in a forward orientation and 71 ORFs in a reverse orientation. In addition, seven homologous regions (hrs) were identified in SujuNPV (Fig. 1).
BLAST comparisons of the 131 protein sequences of the SujuNPV, deduced from the homologous sequences of other baculoviruses, revealed that SujuNPV has 37 core genes (shown in red in Fig. 1) and 24 other genes conserved in lepidopteran baculoviruses (shown in blue in Fig. 1). It also contains 65 additional genes commonly found in various baculoviruses (shown in grey in Fig. 1) and five unique genes (shown as open arrows in Fig. 1). Consensus promoter motifs were searched for in the upstream 150 bp region of the start codon of each ORF. Amongst all 131 ORFs identified in the SujuNPV genome, 24 ORFs possessed the early promoter motif (a TATA box linked with a CAKT motif 20-40 bp downstream), whereas 61 ORFs had the late promoter motif DTAAG and 10 ORFs contained both the early and late promoter motifs (Table 1). No obvious baculoviral promoter motifs were detected for the remaining 36 ORFs.
Gene-parity plots of SujuNPV against three viruses in the same subclade and the five representative baculoviruses are shown in Fig. 3. The gene order between SujuNPV and ApciNPV, EcobNPV or OrleNPV revealed a high collinearity along the genomes, with some inversions and drifts. The plots of SujuNPV with representative lepidopteran baculoviruses (AcMNPV, HearNPV and CpGV) showed that SujuNPV is largely collinear with AcMNPV and HearNPV, less collinear with CpGV, but all contains a collinear region from Suju60 to Suju86, containing 20 core genes and five additional lepidopteran baculovirus conserved genes. This region has been suggested to exist in the ancestor of lepidopteran baculoviruses [27]. No obvious collinear region could be found between SujuNPV and NeleNPV or CuniNPV (Fig. 3).

Homologous regions
Homologous regions (hrs) are common elements in many baculoviruses, with characteristically high A+T contents, tandem repeats and imperfect palindromes. Hrs vary in location within genomes, number of copies and nucleotide sequences between different baculoviruses. These regions are suggested to act as replication origins and transcription enhancers [28,29].
The SujuNPV genome contains seven homologous regions, covering 3.7% of the genome, as displayed in Fig. 4A. The length of the hrs ranges from 590 bp-971 bp, and each hr consists of four to eight palindromic repeats of 99 bp in length ( Fig. 4A and 4B). Fig. 4B shows the arrangement of palindrome repeats in each homologous region. These palindromic repeats share at least 97.6% identity. The predicted secondary structure of the hr1-3 revealed that it contains a core palindrome region, colored by orange in Fig. 4C, and it is highly conserved in all counterparts, with about 99.5% identity on average. While the other two loop were not such conservative, neither on the size nor sequence.

DNA replication genes
Five core genes, five additional lepidopteran baculovirus conserved genes and eight other common genes involved in DNA replication were found in the SujuNPV genome (Table 2) [30][31][32]. Among these genes: helicase unwinds DNA [32]; dna polymerase is involved in DNA synthesis; late expression factor gene 3 (lef-3) and DNA binding protein gene (dbp) are involved in single-strand binding [33,34]; dna-ligase in ligation and alkaline exonuclease (alk-exo) in rectification [35,36], together with some other stimulators are required in the process of replication.
Functional classification of the genes in the SujuNPV genome; columns indicate classification by function and rows represent conservatism. Genes in the SujuNPV genome were arranged according to their functions and conservatism in alphabetical order.Some common genes involved in baculovirus replication were not present in SujuNPV. For example, lef-7 which has been shown to be a replication enhancer in baculoviruses [37], was absent from SujuNPV. SujuNPV also lacked certain genes associated with nucleotide biosynthesis, such as the ribonucleotide reductase subunits (rr1, rr2) and dUTPase, which are involved in dTTP biosynthesis [38].
Amongst the DNA replication genes, there are two copies of helicase and dbp in the SujuNPV genome. A full length helicase (Suju79, 1242aa) is a core gene found in all sequenced baculoviruses, whilst a second copy of truncated helicase (helicase-2) (Suju57, 451aa) is present in only six alphabaculoviruses (HearMNPV, LdMNPV, LyxyMNPV, MacoNPV-B, OrleNPV and SpliNPV) and 13 GVs (all sequenced GVs except for ClanGV and CaLGV) [39]. The phylogenetic tree of helicase homologies showed that they can be clearly divided into two   Table 2. Classification of gene function.  groups (Fig. 5A). It is very likely that they were acquired from different sources during evolution. The research of AcMNPV helicase reveals that it belongs to Superfamily 1 helicase, which contain 7 conserved motifs [40,41]. Motifs I and II are two NPTbinding motifs, together with another four motifs to fulfill the function of helicase [42,43]. The alignment of the conserved motifs with AcMNPV and E.coli UrvD (representative of Superfamily 1 helicase) reveals that they share the same motifs (Fig. 5B) and that helicase-2 is seemingly more conservative. It appears that the two copies have a common ancestor, but understanding how they evolved and came to balance their specialization and cooperation within one genome requires further research.
SujuNPV is the ninth baculovirus identified to have double copies of dbp; the other eight are ApciNPV, ClbiNPV, EcobNPV, EupsNPV, HespNPV, LdMNPV, LyxyMNPV and OrleNPV. Interestingly all these viruses belong to the same subclade (Fig. 2). Dbp is a conserved gene in lepidopteran baculoviruses. Phylogenetic analysis indicated that the dbp duplicates of these nine baculoviruses may have evolved separately to the conserved dbp in alphabaculoviruses (Fig. 6). We propose to name the alphabaculovirus-conserved dbp gene as dbp-1, and the second copy as dbp-2.
Dbp-2 appears to be more close to the dbp of betabaculovirus. In SujuNPV, dbp-1 (Suju30) and dbp-2 (Suju13) encode 309 aa and 310 aa proteins respectively, with 25% aa identity. Although the significance of SujuNPV and other bacuoviruses carrying two copies of dbp is unclear, it clearly marks out the subclade of these nine group II alhpabaculoviruses.

Transcriptional genes
In a baculovirus life cycle, the genes are transcribed in cascades by different polymerase. Early stage genes are transcribed by host RNA polymerase II, while genes expressed during the late period of the life cycle are transcribed by the virus-encoded RNA polymerase, comprising four core gene transcripts: LEF-4, LEF-8, LEF-9, P47 [44]. Two other core genes are involved in late phase transcription: lef-5 and very late factor (vlf-1), acting as an initiation factor [45] and a regulatory factor participating in the hyper-expression of very late genes [46], respectively. These core genes, in addition to genes such as 39k, lef-6, lef-10 and lef-12, are required for late transcription [47]. All of these genes appear in SujuNPV, except for lef-10 ( Table 2) and among all the other alphabaculoviruses this gene was only absent from ClbiNPV and OrleNPV.

Per os infectivity factors
So far seven genes have been identified as per os infectivity factors (PIFs), including p74, pif1, pif2, pif3, pif4 (odv-e28), pif5 (odv-e56) and pif6, which are essential for the oral infection of insect larvae [51][52][53][54]. PIF-1, PIF-2 and PIF-3 in association with P74 form a conserved complex on the surface of ODV and were proposed to perform an essential function in the early stages of virus infection [51]. PIF-4 is an envelope-associated protein found in both ODV and BV [52], whereas, PIF-5 and PIF6 have been recently demonstrated to be PIF members [53,54]. All seven of their genes are conserved within the SujuNPV genome and share 44%-61.2% identity with their homologues in group II representative baculovirus HearNPV.

Auxiliary genes
Auxiliary genes are those not essential for replication, transcription or structures, but provide the virus with the stronger adaptive ability [55], such as affecting the host's cellular metabolism for successful infection or by promoting the progeny yields of the virus. Examples are fibroblast growth factor (fgf) and gp37, which are proposed to help to spread virions from the primary infection site [56,57], egt, which promotes viral progeny by delaying larval molting [58], and cathepsin and chitinase, which aid the horizontal spread of viruses [59]. Superoxide dismutase (sod) has been suggested to migrate the effects of free radicals in infected hemocytes [60] and ubiquitin is proposed to stabilize viral proteins against being degraded by hosts [61]. Among these auxiliary genes, no core gene has been found and only fgf is a lepidopteran-conserved gene. SujuNPV was found to contain all the genes above (Table 2).
Baculovirus repeated ORFs (bros) are repetitive genes, which are widespread in baculoviruses and some other insect virus DNA [64]. Research of BmNPV showed that bros contained DNAbinding activity that could influence host DNA replication and transcription [65]. Four bro genes were identified in SujuNPV, and named bro-1 to bro-4, based upon their order of appearance in the genome (Fig. 1). SujuNPV bro-3 had an aa similarity to its homologues in AcMNPV, OpMNPV, LdMNPV and HearNPV-G4, with 36%, 38.6%, 35.2% and 13.3% sequence identity, respectively. The other three bros only shared a C-terminal region with Ld-bro-m, Ld-bro-p and Ld-bro-n.

Unknown genes
SujuNPV contained an additional eight lepidoptera-conserved genes and 37 common genes with unknown functions ( Table 2). P26 is an alphabaculovirus-specific gene. Among the 42 alphabaculoviruses previously sequenced, 19 contained a second copy of p26 and 16 of these belonged to group II. SujuNPV also contains two copies of p26, Suju10 (p26-1, 285 aa) and Suju56 (p26-2, 239 aa), which share 13.8% similarity. We name the one conserved in alphabaculoviruses as p26-1, and the second copy as p26-2. Phylogenetic analysis of p26 showed that the second copies of p26 could be classified into a unique subclade (colored pink in Fig. 8), with the exception of three group I baculoviruses (CfMNPV, ChocNPV and ChroNPV). Interestingly, the group II baculoviruses, except for LeseNPV, all specifically contain a conserved gene cluster that is p10, p26, ac29, lef-6 and dbp (dbp-2 in the 9 dbp-duplicated baculoviruses) in order. Although the significance of this gene cluster is unknown, it can provide us with more information for evolutionary analysis.

Unique genes
Five genes are unique to the SujuNPV genome, including Suju5 (353 aa), Suju14 (83 aa), Suju25 (192 aa), Suju53 (220 aa) and Suju106 (174 aa) which were not included in Table 2. Suju5 has a similar location and length to the ORF5 of Buzura suppressaria SNPV (BusuNPV) with14.8% aa identity, indicating they may have similar function [27]. Suju25 has an early promoter and a BLAST search showed it to have a slight similarity to the ATPbinding protein of Lysinibacillus sphaericus C3-41 with an Evalue of 0.89. No homologues were found in GenBank for the other three ORFs, whether these are functional ORFs of SujuNPV requires further experimentation.

Conclusion
Our analyses revealed that SujuNPV is a novel baculovirus within a unique subclade of group II alphabaculovirues, the members of which all contain a second copy of dbp. The SujuNPV genome contains seven hrs and five unique ORFs, as well as several genes with two or more copies. The presence of duplicated genes in this virus raises the question on the mechanisms of its acquisition (duplication of virus genes or independent horizontal transfer) and maintain, which needs further researches. These findings will facilitate future applications of SujuNPV to pest control and provide new data for the elucidation of the evolutionary pathways of baculoviruses.