Genome Sequence and Analysis of Buzura suppressaria Nucleopolyhedrovirus: A Group II Alphabaculovirus

The genome of Buzura suppressaria nucleopolyhedrovirus (BusuNPV) was sequenced by 454 pyrosequencing technology. The size of the genome is 120,420 bp with 36.8% G+C content. It contains 127 hypothetical open reading frames (ORFs) covering 90.7% of the genome and includes the 37 conserved baculovirus core genes, 84 genes found in other baculoviruses, and 6 unique ORFs. No typical baculoviral homologous repeats (hrs) were present but the genome contained a region of repeated sequences. Gene Parity Plots revealed a 28.8 kb region conserved among the alpha- and beta-baculoviruses. Overall comparisons of BusuNPV to other baculoviruses point to a distinct species in group II Alphabaculovirus.


Introduction
The Baculovirdae is an insect-specific family of viruses with double stranded circular DNA genomes of 80 kb -180 kb. Among the so far sequenced baculoviruses, Xestia c-nigrum granulovirus (XecnGV) has the largest genome (178,733 bp) with the smallest in the Neodiprion lecontei nucleopolyhedrovirus (NeleNPV, 81,755 bp) [1,2]. With the exception of members of Gammabaculovirus, two distinct progeny phenotypes are produced, the budded virus (BV) that disseminates systemically and the occlusion derived virus (ODV) required for oral infectivity [3]. The occlusion bodies (OBs) afford the embedded virions a certain amount of protection against environmental inactivating conditions such as UV lights and rainwater. The number of predicted ORFs in a single baculovirus range from 89 (NeleNPV) to 183 (Pseudaletia unipuncta GV, PsunGV) [2]. Among all the baculovirus predicted ORFs, 37 have been identified as core genes that exist in all sequenced baculoviruses and are essential for the viral life cycle [4,5].
The family Baculoviridae is classified into 4 genera: Alphabaculovirus (NPVs isolated from Lepidoptera); Betabaculovirus (GVs isolated from Lepidoptera); Gammabaculovirus (NPVs isolated from Hymenoptera) and Deltabaculovirus (NPVs isolated from Diptera) [6,7]. The Alphabaculovirus are further clustered into groups I and II based on phylogenetic analyses and the presence or absence of the gp64 gene. Only group I contains gp64 gene while group II has a gene encoding fusion protein (F) [8][9][10][11].
Buzura suppressaria is a pest insect of tea, tung oil, citrus and metasequoia plants. The Buzura suppressaria NPV (BusuNPV) was first isolated from dead larva of B. suppressaria and subsequently used as an insecticide against this pest [12,13]. The virus is a single nucleocapsid NPV with a genome size of approximately 120 kb. So far, only a few of the BusuNPV genes have been identified, including those encoding polyhedrin [12,14], ecdysteroid UDP-glucosyltransferase (egt) [15], polyhedron envelope protein gene (pep), the conotoxin-like protein gene (ctl), the inhibitor of apoptosis (iap), superoxide dismutase (sod) [16], and P10 [17]. A physical map of viral DNA was determined [12] and about 43.5 kb dispersed regions of the genome have been sequenced showing a distinct gene arrangement of BusuNPV [13]. In this manuscript we report the complete genome of BusuNPV. Sequence analysis showed that BusuNPV is a group II Alphabaculovirus with a genome distinct from other so far sequenced baculoviruses.

Sequencing and Genome Characteristics
The genome of BusuNPV was sequenced using the Roche 454 GS FLX system with shotgun strategy. A total of 97,246 reads were obtained with the average length of 340 bp. The BusuNPV genome was assembled by Roche GS De Novo assembler software and assisted by the published restriction maps [13]; the genome was covered 217 times.
The size of the BusuNPV genome is 120,420 bp with a G+C content of 36.8% (Table S1) and 127 hypothetical ORFs of more than 150 bp. The polyhedrin gene was defined as the first ORF and the A of its initiation codon as the first nucleotide (nt) of the genome. So far, 78 baculoviral genomes have been completely sequenced including BusuNPV (Table S1). BusuNPV contains the 37 core genes conserved in all baculoviruses (shown as red in Fig. 1) and 25 other genes that are present in all sequenced lepidopteran baculovirus (shown as blue in Fig. 1). The genome also contains 59 additional genes commonly found in a variety of baculoviruses (shown as grey in Fig. 1) and also has 6 unique genes (shown as open arrows, Fig. 1). A restriction map of HindIII is presented in Fig. 1, which corroborates the previous physical map [13]. A region appears to be conserved in alpha-and beta-baculoviruses (see below) is also presented in this figure.

Comparison to other Baculoviruses
The nucleotide identities between the ORFs of BusuNPV and other representative baculoviruses are shown in Table S2. The overall genomic nucleic acid identity to EcobNPV, EupsNPV, OrleNPV, HespNPV, ClbiNPV and ApciNPV was about 27.2%, 27.0%, 26.7% 22.0%, 24.2% and 27.4%, respectively. The observed low identities imply that BusuNPV is evolutionarily quite divergent from the fully sequenced baculoviruses.
Gene-parity plots of BusuNPV against the other 6 viruses in the same subclade demonstrated colinearity with some inversions over the whole genome (Fig. 3a). Some colinearity was also found with representatives of group I alphabaculoviruses and betabaculoviruses, but almost no colinearity with those from gamma-and deltabaculoviruses (Fig. 3b). Interestingly, a 28.8 kb region from Busu55 to Busu79 is almost totally collinearly conserved in alpha-   Fig. 1). It is likely that this region existed in the common ancestor of alpha-and betabaculoviruses.

Repeat Structures
Homologous repeated sequences (hrs) were supposed to be characteristic for many baculovirus genomes. The hrs are repeat regions with palindrome structure interspersed in the genome. Hrs consist of similar repeat sequence with varying length in a genome, but the hr sequence vary widely in different baculoviruses [20]. Hrs were suggested to be origins of DNA replication in baculovirus [21,22], however, a contrasting study showed deletion individual hr had no effect on the replication of AcMNPV [23]. Other studies attributed an enhancer function to hrs. They appear to bind to ie1 in AcMNPV and promote the transactivation level of IE1 [24][25][26]. Hrs are absent from the BusuNPV genome.
A non-hr origin was also suggested to initiate replication which contains palindromic and repetitive sequences in a complex organization [21,27]. A repeat sequence was detected from nt 101325 to 101469 in the BusuNPV genome and contained two complete repeats and a truncated repeat. The repeat is 59 nt (Fig. 4a), high in A+T content (71.7%) and probably forms a hairpin structure (Fig. 4b). Whether this is a functional non-hr origin for BusuNPV needs further analysis.
Other non-core structural proteins encompass the F protein (Busu120), which is essential for virus entry and budding and VP80 (Busu80), which is involved in nucleocapsid packaging and trafficking [38]. Busu98 is a homologue of Calyx/PEP and is the major protein of polyhedron envelope that enhances the stability of OBs [39,40]. Busu7 encodes P10 [17] and is involved in the process of OB envelopment and nuclear lysis at the late stages of infection [41].
Busu34 is a homologue of the gene encoding viral enhancing factor (VEF) that dissolves the peritrophic membrane (PM) of the midgut [44]. A study in LdMNPV found it helps ODV envelopes [45].

Auxiliary Genes
Ubiquitin is encoded by most baculoviruses as well as BusuNPV. Like most alphabaculoviruses and some betabaculoviruses, BusuNPV also encodes cathepsin (Busu24) and chitinase (Busu51), both are involved in liquefaction of insect and OB release [46,47]. A fibroblast growth factor (FGF, Busu114) aids virus dissemination through the tracheal system [48,49]. The egt gene which prevents larvae molting and pupation [50,51] was found in BusuNPV (Busu124) [15] and the baculovirus with deficiency egt gene kill the infected larvae faster than wild type stains [52,53]. BusuNPV also contains a sod (Busu92) and three iap genes (iap-1, Busu48; iap-2, Busu54; and iap-3, Busu93). Three Baculovirus repeated orf (bro) genes have also been found. The absence or duplication of these genes is common in baculovirus, although between stains with closer affinity [54]. A study on BmNPV showed that mutant bro-d or double mutant bro-a and bro-c could not be isolated, it suggested bro takes essential functions in BmNPV [55]. Another study indicated bro genes encode a protein with DNA binding activity, especially to single stranded DNA [56]. BusuNPV encodes poly (ADP-ribose) glycohydrolase (parg, Busu88), which is conserved in group II alphabaculoviruses with a function of poly (ADP-ribose) catabolism [29]. A study in HearNPV G4 showed it affects oral infectivity of OBs [57].
The Busu100 encodes a 532 aa protein with low homology to tryptophan repeat gene family in entomopoxvirus (minimum E value = 0.012). Busu109 encodes a 155 aa protein sharing a very low homology to 5-methyltetrahydropteroyltriglutamate-homocysteine methyltransferase in some bacteria (minimum E value = 2.1).
In summary, the genome sequence revealed BusuNPV is a distinct species in group II Alphabaculovirus. Phylogenetically, it is most closely related to EcobNPV, EupsNPV, OrleNPV and ApciNPV. It does not contain typical baculovirus hrs, but contain a new repeat structure, the function of which needs to be further characterized. A 28.8 kb conserved region was identified among alpha-and betabaculoviruses.

Viral DNA Extraction
BusuNPV was propagated in B. suppressaria larvae and OBs were purified by differential centrifugation [12]. DNA was extracted as described previously [16].

Sequencing and Bioinformatic Analysis
The genome was sequenced with the Roche 454 GS FLX system by using shotgun strategy. The reads were assembled with Roche GS De Novo assembler software. Contigs assembly was assisted by previously generated restriction maps [13]. A few regions that were not assembled into the contigs were further amplified by PCR, cloned and sequenced. The genome sequence data was uploaded to GenBank (GenBank accession number: KF611977).
Hypothetical ORFs were predicted by softberry FGENESV program (http://www.softberry.com/berry.phtml) [58] to contain the standard ATG start, and a stop codon and potentially encode at least 50 amino acids. Gene-parity plot analysis [13] was performed using Microsoft Office Excel to draw scatter diagram with using BusuNPV ORFs number as the X-axis and other baculovirus ORFs as the Y-axis. Gene annotation and comparisons were done with NCBI protein-protein BLAST algorithm (http://blast.ncbi.nlm.nih.gov/Blast.cgi). Repeat structures were detected by BLAST alignment of two sequences (http://blast.ncbi. nlm.nih.gov/Blast.cgi). The identity among homologous genes was done with MegAlign software using clustalW with default parameters. Regulatory regions and promoter motifs were identified as described previously [29].
Restriction sites were predicted by MapDraw software. Genome map framework drawn with genomeVX [59].

Phylogenetic Analysis
The Phylogenetic analysis was based on amino acid sequences of 37 core genes form BusuNPV and the other 61 baculoviruses listed in NCBI genome database (Table S1). All the sequences were linked by a stationary order and multiple alignments using clusterW method with MEGA5 by using default settings. A phylogenetic tree was constructed by MEGA5 using Maximum Likelihood method based on the JTT matrix-based model [60,61]. Phylogeny tested by Bootstrap method with a value of 1000 [62].

Prediction of Secondary Structure
Secondary structure was drawn by Predict a Secondary Structure online server (http://rna.urmc.rochester.edu/ RNAstructureWeb/Servers/Predict1/Predict1.html) with default setting of DNA Nucleic Acid Type [63].