Genome Sequencing and Analysis of Catopsilia pomona nucleopolyhedrovirus: A Distinct Species in Group I Alphabaculovirus

The genome sequence of Catopsilia pomona nucleopolyhedrovirus (CapoNPV) was determined by the Roche 454 sequencing system. The genome consisted of 128,058 bp and had an overall G+C content of 40%. There were 130 hypothetical open reading frames (ORFs) potentially encoding proteins of more than 50 amino acids and covering 92% of the genome. Among all the hypothetical ORFs, 37 baculovirus core genes, 23 lepidopteran baculovirus conserved genes and 10 genes conserved in Group I alphabaculoviruses were identified. In addition, the genome included regions of 8 typical baculoviral homologous repeat sequences (hrs). Phylogenic analysis showed that CapoNPV was in a distinct branch of clade “a” in Group I alphabaculoviruses. Gene parity plot analysis and overall similarity of ORFs indicated that CapoNPV is more closely related to the Group I alphabaculoviruses than to other baculoviruses. Interesting, CapoNPV lacks the genes encoding the fibroblast growth factor (fgf) and ac30, which are conserved in most lepidopteran and Group I baculoviruses, respectively. Sequence analysis of the F-like protein of CapoNPV showed that some amino acids were inserted into the fusion peptide region and the pre-transmembrane region of the protein. All these unique features imply that CapoNPV represents a member of a new baculovirus species.


Introduction
Members of the family Baculoviridae are rod-shaped, insect-specific viruses with doublestranded large circular DNA genomes of 80-180 kb [1,2]. Lepidopteran baculoviruses synthesize two progeny phenotypes, the budded virus (BV) and occlusion-derived virus (ODV). Virus particles of the latter phenotype are embedded into occlusion bodies (OBs) [3], which offer some protection against environmental inactivating conditions such as UV light, heat and desiccation.
Baculoviridae contains four genera: Alphabaculovirus [nucleopolyhedroviruses (NPVs) of lepidopteran insects], Betabaculovirus [granuloviruses (GVs) of Lepidoptera], Gammabaculovirus (NPVs of Hymenoptera) and Deltabaculovirus (NPVs of Diptera) [4,5]. The alphabaculoviruses can be further divided into Group I and Group II, based on phylogenetic analysis and their membrane fusion proteins. Group I viruses use GP64 as the fusion protein while Group II viruses use F-protein instead [6][7][8]. Phylogeny analysis suggested that Group I fall into two clades, "a" and "b" [9]. Despite the diversity in gene content of baculovirus genomes, 37 have been identified as core genes present in all sequenced baculoviral genomes and play very important roles in the viral replication cycle [10]. In addition, there are 23 genes conserved in all sequenced lepidopteran baculoviruses (NPVs and GVs) and 11 are specific to Group I [10][11][12][13].
Catopsilia pomona (Lepidoptera: Pieridae) is distributed in Asia and Australia. In Mainland China, it is present mainly in the provinces of Hainan, Guangdong, Guangxi, Yunnan, and Fujian. It is harmful to Kassod tree, Wing-podded Senna, golden shower, pockwood and other tropical plants [14]. Larvae feed on young leaves and during outbreaks, the trees are stripped of foliage totally. In Hainan Province, the insect has 13-14 generations a year, causing damage all year round [15]. CapoNPV was isolated from dead Catopsilia pomona larvae in Hainan in 1990 [15].

Sequencing and genome characteristics
The complete nucleotide sequence of CapoNPV genomic DNA was determined using 454 pyrosequencing method. The sequences were assembled using the Roche GS De Novo Assembler version 2.7. The genome was covered 350 times by 123,698 reads. It consists of 128,058 bp in length and contains 130 predicted ORFs with a G+C content of 40% (S2 Table). The adenine residue of the translation initiation codon of polyhedrin with a forward orientation was designated as the zero point on the circular genome map. Sixty-nine ORFs were in a clockwise direction and 61 in a counterclockwise direction with respect to the transcriptional orientation of polyhedrin. The 37 core genes (red), 23 lepidopteran baculovirus conserved genes (blue) and 10 Group I specific genes (green) are illustrated on the genome map (Fig 1). Another 56 baculoviral genes and 4 hypothetical CapoNPV unique genes are shown in grey and open arrows, respectively (Fig 1).

Phylogenetic analysis of CapoNPV
A phylogenetic tree built with linked 37 core genes from 79 sequenced baculoviruses (S1 Table) classified CapoNPV into clade "a" of Group I (Fig 2). It is located on a distinct branch in clade "a"alphabaculoviruses, which is consistent with a previous phylogenetic analysis based on polyhedrin/granulin, lef-8 and lef-9 [9]. CapoNPV appeared to have diverged shortly after the separation of clades "a" and "b" and may be closer to an ancestral virus than most species in the two clades. This situation is similar to a newly sequenced Cyclophragma undans nucleopolyhedrovirus (CyunNPV) (data not shown).
Gene order of CapoNPV was compared to the above baculovirus genomes using gene parity plots [16]. Although CapoNPV is a distinct species in Group I, its gene order is substantially collinear with representatives of Group I alphabaculoviruses and partially collinear with those from Group II alphabaculoviruses. However, its gene arrangement was significantly different from that of gamma-and deltabaculoviruses (Fig 3). A collinearly conserved region of lepidopteran baculoviruses was also found in CapoNPV between capo43 to capo75 (Fig 1). It contains 20 core genes and five additional lepidopteran baculovirus conserved genes, and also includes two Group I specific genes, ac73 (capo69) and ac72 (capo70), and six other genes ac91 (capo58), cg30 (capo57), ac87 (capo58), ac79 (capo63), ac74 (capo68) and iap-2 (capo71) (Fig 1).

Regions of homologous repeated sequences
Homologous repeated sequences (hrs) of baculoviruses consist of a number of repeated sequences with an imperfect palindrome, interspersed at different locations in a genome. Hrs are highly variable, and although they are closely similar within the same genome, they may show very limited homology among different viruses. Sixty-four of the 79 completely sequenced baculoviral genomes contain 2-17 hrs (S1 Table). Previous studies suggested that hrs may act as origins of DNA replication [17,18]. However, deletion of individual hrs from the AcMNPV genome does not appear to affect genome replication [19]. The hrs also acted as enhancers of gene expression and appeared to up-regulate the expression of the AcMNPV immediate early gene-1 (ie-1) [20][21][22]. The locations and the sequences of the 8 CapoNPV hrs are summarized in Figs 1 and 4, respectively.

Gene content of CapoNPV
CapoNPV contains 12 replication associated genes, 12 transcription associated genes, 8 genes essential for oral infection, 34 structure related genes and 15 auxiliary genes ( Table 1). The rest are 45 of unknown function including 4 hypothetical unique genes of CapoNPV.
CapoNPV lacks fibroblast growth factor gene (fgf). FGF plays an important role in developmental processes affecting cell growth, differentiation, and motility and is one of the conserved proteins in vertebrates and invertebrates [23]. Lepidopteran baculoviruses also encode fgf, and it was previously found conserved in all the lepidopteran baculoviruses [9] except in Maruca vitrata nucleopolyhedrovirus (MaviNPV) [12]. Although deletion of fgf from AcMNPV had no effect on replication in tissue culture cells, bioassays showed that time of death in larvae was delayed [24]. It has been suggested that FGF may play a role in dissemination of the virus within the host insect [25]. Recent evidence suggests that FGF initiates a cascade of events that may accelerate the establishment of systemic infections [26]. In our study, fgf was not found in the CapoNPV genome.
CapoNPV lacks ac30, a gene specific to Group I. In the previous report, 11 genes (gp64, tyrosine phosphatase gene (ptp), ie2, odv-e26, ac5, ac30, ac73, ac72, ac114, ac124, ac132) have been identified as specific to Group I viruses and are absent from all other baculoviruses [13]. These genes might have had an evolutionary role in the emergence of Group I viruses [13,27]. Notably absent from CapoNPV is a homologue to ac30. This gene seems to be nonessential because deletion thereof did not affect the production of BmNPV [28]. Interestingly, CyunNPV, a member of Group I also lacks ac30 (data not shown).
CapoNPV lacks lef-7, a gene involved in DNA replication. lef-7 had stimulatory effects on transient DNA replication [29]. It is present in all previously identified Group I viruses, several Group II viruses and many betabaculoviruses. Deletion of lef-7 from AcMNPV had no impact on virus infection in Tn368 cells, but in SF21 and SE1c cells the viral DNA replication was reduced to only 10% of the wild-type virus [30], suggesting the function of LEF7 is host dependent. lef-7 was also found to be involved in the regulation of the DNA damage response (DDR). Deletion of lef-7 from the AcMNPV genome caused the activation of the DDR, and progeny infectious virus decreased about 99% [31]. CapoNPV is the first reported group I virus that does not contain a lef-7 gene.
CapoNPV lacks ODV-E66, a structure protein of ODV involved in oral infection. ODV-E66 was identified as a component of ODV envelopes [32]. AcMNPV ODV-E66 was shown to have chondroitinase activity [33] and its crystal structure was determined [34]. It was suggested that ODV-E66 may function in midgut infection by degrading the peritrophic membrane, which contains a low level of chondroitin sulfate [33]. In fact deletion of odv-e66 in AcMNPV increased the oral infection dose about 1000 times while did not changed the infectivity of BV, suggesting ODV-E66 is an important oral infectivity factor [35]. Odv-e66 is present in most alphabaculoviruses and betabaculoviruses, however, it was not found in CapoNPV genome.

F-like protein
A characteristic feature of Group I viruses is the presence of GP64 and the loss of fusion function of F. Except for gammabaculoviruses and Group I viruses, the F protein functions as the envelope fusion protein of BV. In AcMNPV, the F-like protein is also associated with BV membranes and its deletion from the genome results in infectious virus with titers similar to the parental virus in cell cultures, but the time to kill larvae is somewhat extended [36].
Previous studies showed the importance of the furin cleavage site in the fusion process. Furin protease digests F into two components, a small N-terminus membrane-anchored F2 and a large domain F1 at the C-terminus. Both are needed for viral-host membrane fusion [7,37]. The F-like protein in Group I viruses lacks the furin cleavage site and, therefore, lost its fusion function. Instead, GP64 functions as an efficient envelope fusion protein [38][39][40].
In our study on F-like protein in CapoNPV, an insertion was found in the region equivalent to the fusion peptide (Fig 5). We also found another stretch of amino acids are inserted ahead of the pre-transmembrane domain (pre-TM) of CapoNPV (Fig 5). Sometimes, pre-TM domain, which is rich in aromatic amino acids, plays an important role in membrane fusion [41][42][43][44]. Similar insertions into the fusion peptide region and the pre-TM were also found in CyunNPV (data not shown).
According to phylogeny (Fig 2), CapoNPV evolved relatively earlier than other Group I alphabaculoviruses. Thysanoplusia orichalcea nucleopolyhedrovirus (ThorNPV), another relatively early member of Group I (Fig 2) also has an insertion at the fusion peptide region (Fig 5). The change of viral fusion ability mediated by the presence of GP64 and the inactivation of F are considered critical events in the origination of Group I [13]. Our results provide new evidence in the understanding of the process of F inactivation and, therefore, the early evolutionary events of Group I alphabaculoviruses.

Viral DNA Extraction
CapoNPV infected Catopsilia pomona larvae have been preserved in the ''Chinese general virus collection center" (CGVCC) with collection number IVCAS 1.0228. OBs were purified from homogenized larvae by differential centrifugation [46] and DNA was extracted as described previously [47].

Sequencing and Bioinformatics Analyses
The genome of CapoNPV was sequenced with the Roche 454 GS FLX+ system by using a shotgun strategy. The determined nucleotide sequences were assembled with GS De Novo Assembler software version 2.7. The complete genome sequence and annotation information were submitted to GenBank (accession number: KU565883). Putative ORFs were analyzed using the FGENESV0 program (http://www.softberry.com/ berry.phtml) [48] and the NCBI ORF Finder (http://www.ncbi.nlm.nih.gov/gorf/gorf.html). ORFs potentially encoding more than 50 amino acids were designated as putative genes with minimal overlaps. Gene parity plot analysis was performed as previously described [17,49]. The Tandem Repeats Finder (http://tandem.bu.edu/trf/trf.html) was used to locate hrs. Gene annotation, comparisons were done with the aid of NCBI BLAST algorithm (http://blast.ncbi. nlm.nih.gov/Blast.cgi).

Phylogenetic Analysis
A phylogenetic tree was generated based on amino acid sequences encoded by the 37 core genes from CapoNPV and that of the other 79 reference genome sequences of baculoviruses in NCBI (S1 Table). All the sequences were joined together in the same order and the alignments were generated using muscle method of MEGA6 with default settings. A phylogenetic tree was constructed by MEGA6 using Maximum Likelihood method based on the JTT matrix-based model [-50]. Phylogeny tested by Bootstrap method with a value of 1000 [51]. The amino acid alignment of F and F-like proteins. The alignment was performed using ClusterW method. A schematic figure of SeMNPV F protein was adapted from a previous publication [45] and is shown at the bottom, and two enlarged regions with sequence alignments are also shown. Viral names and categories are on the left. The predicted regions of furin cleavage site, fusion peptide, pre-TM and transmembrane domains are indicated below the alignment. The red square shows the aromatic amino acids (F, Y, W and H) in the pre-TM region. The arrows point to the insertion regions in CapoNPV. Black background shows greater than 80% identity among compared regions, dark gray and light gray shows greater than 50% and 30% identity, respectively. doi:10.1371/journal.pone.0155134.g005 Supporting Information S1