Analysis of the Genome of a Korean Isolate of the Pieris rapae Granulovirus Enabled by Its Separation from Total Host Genomic DNA by Pulse-Field Electrophoresis

Background Most traditional genome sequencing projects involving viruses include the culture and purification of the virus particles. However, purification of virions may yield insufficient material for traditional sequencing. The electrophoretic method described here provides a strategy whereby the genomic DNA of the Korean isolate of Pieris rapae granulovirus (PiraGV-K) could be recovered in sufficient amounts for sequencing by purifying it directly from total host DNA by pulse-field gel electrophoresis (PFGE). Methodology/Principal Findings The total genomic DNA of infected P. rapae was embedded in agarose plugs, treated with restriction nuclease and methylase, and then PFGE was used to separate PiraGV-K DNA from the DNA of P. rapae, followed by mapping of fosmid clones of the purified viral DNA. The double-stranded circular genome of PiraGV-K was found to encode 120 open reading frames (ORFs), which covered 92% of the sequence. BLAST and ORF arrangement showed the presence of 78 homologs to other genes in the database. The mean overall amino acid identity of PiraGV-K ORFs was highest with the Chinese isolate of PiraGV (∼99%), followed up with Choristoneura occidentalis ORFs at 58%. PiraGV-K ORFs were grouped, according to function, into 10 genes involved in transcription, 11 involved in replication, 25 structural protein genes, and 15 auxiliary genes. Genes for Chitinase (ORF 10) and cathepsin (ORF 11), involved in the liquefaction of the host, were found in the genome. Conclusions/Significance The recovery of PiraGV-K DNA genome by pulse-field electrophoretic separation from host genomic DNA had several advantages, compared with its isolation from particles harvested as virions or inclusions from the P. rapae host. We have sequenced and analyzed the 108,658 bp PiraGV-K genome purified by the electrophoretic method. The method appears to be generally applicable to the analysis of genomes of large viruses.


Introduction
Baculoviruses represent a diverse group of viruses with covalently closed, double-stranded, circular, supercoiled genomes, with sizes varying from 80 to 180 kb, encoding between 90 and 180 genes. The DNA genome is packaged in rod-shaped nucleocapsids that are 230-385 nm in length and 40-60 mm in diameter. The virions occur in two types-occluded virions (ODV) and budded virus particles (BV). Baculoviridae are divided into four genera, Alphabaculovirus [lepidopteran-specific nuclear polyhedrosis virus (NPVs)], Betabaculovirus [lepidopteran-specific granulo-sis virus (GVs)], Gammabaculovirus (hymenopteran-specific NPVs) and Deltabaculovirus (dipteran-specific NPVs) [1,2]. Viruses belonging to the order Hymenoptera contain the smallest genomes, at .80 kb, which has been explained as a result of their restricted life cycle, confined to replication in insect gut cells [3]. Group I alphabaculoviruses cluster ,130 kb, whereas Group II shows a high degree of diversity, varying from ,130 to 170 kb. The larger genomes of the Group II alphabaculoviruses can be attributed to a combination of repeated genes that are not found in the smaller genomes. This is in contrast to the betabaculoviruses genomes, varying from 101 kb in the case of Plutella xylostella granulovirus (PlxyGV) [4] to 178 kb in Xestia c-nigrum granulovirus (XecnGV) [5]. Despite the large difference in gene content in betabaculovirus genomes, as reflected in this range of sizes, their genomes are surprisingly collinear, compared with alphabaculoviruses, which show a high degree of variation [6,7]. The first dipteran-specific deltabaculovirus, the Culex nigripalpus nucleopolyhedrovirus (Cun-niNPV), was isolated and sequenced from the mosquito Culex nigripalpus [8]. A phylogenetic analysis showed its distinctive form, making it a member of a new genus within the family Baculoviridae [9]. Compared to alphabaculoviruses family members, betabaculoviruses have been investigated to a lesser degree, because of the limitations of permissive cell lines [10]. Currently, 60 complete genomes are known in the Baculoviridae family; 45 genomes from NPV (41 alphabaculoviruses, 3 gammabaculoviruses, and 1 deltabaculovirus), 14 genomes from GV, and 1 unclassified Hemileuca sp. NPV (http://www.ncbi.nlm.nih.gov/ genomes/GenomesGroup.cgi?taxid = 10442).
The small cabbage white butterfly, Pieris rapae (P. rapae) is a serious pest of cultivated cabbages and other mustard family crops worldwide. A serious infestation can lead to the death of the plant due to reduced photosynthesis. P. rapae granulovirus (PiraGV) infects P. rapae in nature and functions as an important biological agent in controlling the population of P. rapae in the ecosystem. Although PiraGV is now a registered biocontrol agent for the control of P. rapae, research on the genetic and molecular information of the virus is still limited, apart from a recent study on occlusion-derived virus (ODV)-associated proteins of the betabaculovirus [11]. Sequencing of the complete genome of the Chinese isolate of P. rapae granulovirus (PiraGV-C) showed a size of 108,592 bp and predicted 120 open reading frames (GenBank, GQ884143) [12]. Although sequencing efforts have been significant, more detailed information about a wide range of isolates inhabiting different geographical regions would provide a more comprehensive overview of baculoviruses and further establish their candidature as pest control agents.
This study is unique, as we have taken advantage of the largesized genome and high titer of infection of P. rapae granulovirus (Korean isolate) to purify the viral genome away from host DNA by pulse-field gel electrophoresis. The viral DNA is recovered in amounts sufficient for its classical genome sequencing. The procedure requires less starting material than would be necessary if starting with the purification of virus particles from inclusion bodies. The genome sequence produced in this work was through a subcloning approach, without recourse to the use of automated high-throughput next-generation sequencing (NGS) technology.

Materials and Methods
Separation of Nuclei from P. rapae Larvae of P. rapae were obtained from a mass rearing facility at Hampyeong Insect Institute (Hampyeong, Korea) and were reared in the laboratory on kale leaf at 2563 o C with 6065% relative humidity, under a 12/12 hr natural light/dark cycle for a short duration. The final instar larvae were dissected to remove the gut and were subsequently ground and centrifuged (5,000 rpm, 10 min, 4 o C) to separate the nuclei and remove the cell debris from the solution.

Chemicals
All chemicals used were of analytical grade, and were obtained from Sigma Chemical Co. (St. Louis, MO, USA) until indicated otherwise.

Preparation of High Molecular Weight (HMW) DNA Plugs Embedded in Agarose
HMW DNA is considered vulnerable to mechanical shearing forces and suffers frequent double-stranded breaks. It is thus not suited to large-insert cloning. To prevent HMW DNA from being damaged in the nucleus lysis process, the separated nuclei were embedded in agarose gel. The nuclei were warmed for 5 min at 45 o C and were mixed with 1% InCert agarose. The mixture was subsequently poured into a plug mold (BioRad, Hercules, CA), kept on ice and allowed to solidify for 1-2 hr. The agarose plugs were then put into 50 ml of proteinase K lysis buffer (0.5 M EDTA, 1% N-lauroylascosine, 1 mg of proteinase K/ml) and incubated for 24 hr at 50 o C. After the subsequent removal of proteinase K lysis buffer from the agarose plugs, the lysis process was repeated, for a further 24 hr. After 2-3 washes in deionized water, the plugs were placed in 50 ml of TE 50 buffer (10 mM Tris-HCl, 50 mM EDTA, pH 8.0) and washed for 12 hr. Additional washing was performed for another 12 hr after replacing with TE 50 buffer. Subsequently, the plugs were incubated for 2 hr in 0.1 mM phenylmethylsulfonylfluoride (PMSF) buffer at 4 o C to inactivate proteinase K, followed by another subsequent wash in TE 50 buffer for 24 hr, and were stored in 0.5 M EDTA at 4 o C.

Pre-electrophoresis of Agarose Plugs
Next, the agarose plugs were placed in 0.56 TBE buffer (45 mM Tris-base, 1 mM EDTA, 45 mM boric acid) and dialyzed for 3 hr. Subsequently, they were inserted into the preparative slot of 1% pulse-field certified agarose gel, and PFGE was conducted using 0.56 TBE buffer and the CHEF DR-II apparatus (Bio-Rad, Hercules, CA) with a pulse time of 5 s for 10 hr at 12 o C and a voltage of 4V/cm. After the electrophoresis, the plugs were removed from the slot, stored in 50 ml of 0.5 M EDTA buffer, and dialyzed overnight at 4 o C.

Partial Digestion of Plugs
HMW DNA embedded plugs (n = 10) were placed in 500 ml of an enzyme mixture, consisting of 1 ml EcoRI at a concentration of 2 U/ml, 1 ml EcoRI methylase at a concentration of 40 U/ml (New England Biolabs, Ipswich, MA), 25 ml of 1006 Bovine Serum Albumin (10 mg/ml), 5 ml of polyamine (1006), 50 ml of methylase buffer (106) in 394 ml of DW and equilibrated for 2 hr at 4 o C, followed by a 4 hr incubation at 37 o C. After digestion, the plugs were treated with 150 ml of 0.5 M EDTA, 37.5 ml of 20% N-lauroylsarcosine and 15 ml of proteinase K (20 mg/ml), and incubated for 1 hr at 37 o C to inactivate the endonuclease. Subsequently, PFGE was conducted with a CHEF DR-II apparatus (Bio-Rad) with a pulse time between 0.1 and 40 s for 16 hr at a voltage of 6 V/cm to check the partially digested plugs.
Separation of PiraGV-K DNA from P. rapae Genomic DNA PiraGV-K DNA was separated by PFGE with an initial pulse time of 0.1 s, a final pulse time of 40 s, a temperature of 12 o C and a voltage of 6 V/cm for 14 hr. Furthermore, a lambda ladder PFG marker (New England Biolabs, Ipswich, MA) was used as a size marker to enable the band of PiraGV-K at ,125 kb to be eluted selectively.
After the PFGE treatment, the edge of the gel, including a size marker, was cut and put into ethidium bromide staining buffer to mark the location of the 125 kb band of PiraGV-K. Subsequently, the eluted portion was placed into a dialysis bag to recover the PiraGV-K DNA using PFGE with a pulse time between 0.1 and 40 s and a voltage of 6 V/cm for 14 hr.

Construction and Characterization of PiraGV-K Fosmid Library
Randomly sheared PiraGV-K DNA was cloned into the Eco72I blunt-end site of the CopyControl pCC1FOS fosmid vector (Epicentre Biotechnologies, Madison, WI). The fosmids were packaged using ultra-high efficiency MaxPlax lambda packaging extracts and plated on TransforMax EPI300 E. coli (Epicentre Biotechnologies, Madison, WI). The quality of the constructed fosmid library was assessed using standard techniques. Of a total of 6,000 clones, 96 were picked randomly and the fosmids were end sequenced from both directions using the primers (forward sequencing primer 59-GGATGTGCTGCAAGGCGAT-TAAGTTGG -39 and reverse sequencing primer 59-CTCGTATGTTGTGGAATTGTGAGC -39) to the vector. Stand-alone BLAST was performed for the nucleotide sequences against a locally curated viral sequence database (http://edunabi. com/,prgv/).

Whole Genome Shotgun Sequencing
Based on the mapping data in the locally curated viral sequence database (http://edunabi.com/,prgv/), a minimum tiling path was prepared and four fosmid library clones were selected to construct a shotgun library. The selected fosmid clones were named as NB-FOS-1-1-F40_A05A02 (27 kb), NB-FOS-1-1-F40_A23B06 (33 kb), NB-FOS-1-1-F40_C07D02 (32 kb) and NB-FOS-1-1-F40_E13E04 (37 kb). Equivalent volumes of fosmid DNA clones were digested with NotI to obtain 3-7 kb DNA pieces that were then ligated into a purified pUC118 BamHI/BAP ready vector (Takara Bio Inc., Shiga, Japan) [13]. Ligated products were transformed into E. coli DH5a cells by electroporation and spread on LB (ampicillin, 100 mg/ml) plates. The quality of the library was checked for E. coli genomic DNA contamination and empty vector contamination by cross-match. Plasmid clones that were eight times larger than each of the selected clones were randomly picked for plasmid preparation and sequencing with M13 forward and reverse universal primers using an Applied Biosystems 3730 XL DNA analyzer (Applied Biosystems, Carlsbad, CA) using the cycle sequencing method with fluorescent dye terminators and AmpliTaq DNA polymerase (ABI PRISM BigDye Terminator Cycle Sequencing Ready Reaction, Perkin Elmer, Waltham, MA). Applied Biosystems sequencing software was used for lane tracking, trace extraction and data were transferred to UNIX workstations for further processing.

Genomic DNA Assembly
Contigs were prepared using the software Pregap4, including PHRED [14,15], PHRAP (www.phrap.org), and vector masking on the average read length, clustering and assembling a repeated sequence. The primer walking procedure was used to close remaining gaps. The map of the first clone selected from PiraGV-K was constructed and a clone capable of covering 60 k to 85 k was also screened.

Sequence Analysis
Putative coding regions of PiraGV-K genome was predicted using the Genemark [16]; Glimmer [17] and AMIgene [18] open reading frame (ORF) finding software. ORFs of more than 150 bp were designated as putative genes; the overlap between any two ORFs was set to a maximum of 25 amino acids (aa); otherwise, the longer one was selected. Gene annotations and comparison of the sequences with those in public databases were carried out using the BLAST at National Centre for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/BLAST/). Multiple se-quence analysis was performed using Clustal X and GeneDoc (2.7.0). The PiraGV-K genomic DNA sequence was deposited in GenBank under the accession number JX968491.

Data Access
The whole-genome data of PiraGV-K and relevant sequence information has been maintained in a database at 'http://edunabi. com/,prgv/' for ready reference. The PiraGV-K whole genome sequence is registered under GenBank accession number JX968491.

Results and Discussion
The Electrophoretic Separation Method for PiraGV-K Whole-genome Sequencing Today, most genome sequencing projects rely on the wholegenome shotgun (WGS) method, which uses the Sanger technique to sequence genomic libraries over conventionally mapped clones using bacterial artificial chromosome (BAC), cosmid or fosmid libraries [24][25][26]. Although WGS strategy has provided rapid access to new gene models from diverse organisms with continued improvements in the assemblers, read lengths and mate pair technologies, the resulting assemblies still remain highly fragmented with an incomplete genomic representation [27,28]. This has helped the focus on BAC-based physical map construction and its integration with high-density genetic maps that have benefited from next-generation sequencing (NGS) platforms and highthroughput array platforms [29,30]. In this context, fosmids, with a narrower insert range (average of 40 kb), stable maintenance, and easy production, have found wide applications in studies related to structural variation and the organization of genomes [30][31][32].
The selection of target substances from the environment is the most critical component for the implementation of suitable approaches for whole-genome sequencing. In the case of infectious viruses, the study of the genome is more cumbersome because these agents are difficult to culture and purify. Conventional methods for the purification of genomic DNA fragments present the drawback of obtaining a large number of populations from multiple locations to acquire sufficient high-quality DNA samples for sequence analysis.
The genome sequencing method ( Fig. 1), detailed here for the first time, was used to construct fosmid library clones of doublestranded PiraGV-K genome, generating a library size of 100-150 kb corresponding to the genome size of the virus. This approach was successful in the analysis of the PiraGV-K genome, without the need for purifying PiraGV-K from P. rapae, thus simplifying sampling and reducing labor time. This approach provides a significant advantage over traditional protocols for the sequencing of dsDNA genomes and could potentially be used for circular DNA genomes of viruses, although its wider application needs to be further validated. Recently, a report highlighted the importance of sequencing small genomes without the need for standard library preparation using the Pacific Biosciences RS sequencer (the ''PacBio'') with as little as 1 ng of DNA [33]. That our method can be performed without the specialized expertise required for virus culturing and purification from their hosts, coupled with its requirement for little time and reliable precision, makes it particularly useful for laboratories lacking sophisticated viral culturing facilities. The limitations of the genome sequencing method purified by the electrophoretic method may lie in the sequencing of RNA viruses, because they are less stable than DNA in nature and may require the maintenance of cultured viral isolates, unlike our approach. A new system for rapid determination of viral RNA sequence (RDV) uses small amounts of RNA to synthesize first-and second-strand cDNAs for library construction and direct sequencing using optimized primers [34]. Although reverse transcription followed by polymerase chain reaction is commonly used for deciphering RNA viral genomes, low-copy number viral samples remain a challenge; sequence-independent methods provide attractive solutions [35,36].
In the method described here, HMW DNA embedded agarose plugs of P. rapae were digested with EcoRI, before confirmation of the potential PiraGV-K DNA at 125 kb by PFGE analysis (Fig. 2). The potential PiraGV-K DNA was found readily when EcoRI (8 U) and methylase (20 U) were used after a 2 hr preelectrophoresis step. The partial digestion step is considered critical for both the construction of the host BAC library, and also converting the viral genome into a family of circularly-permuted linear molecules of genome length. The linear form of the viral genome, thus obtained from the digestion step facilitates efficient separation of the genomic DNA in PFGE. Subsequently, PCR was conducted with different primers, designed to provide variable sizes from the nucleotide sequence of PiraGV-K, to check the validity of the potential PiraGV-K DNA. The PCR product size in all cases was found to be the same as expected for the PiraGV-K DNA sequence (Fig. 3). Subsequently, for effective separation of PiraGV-K DNA, pre-electrophoresis and partial digestion of agarose plugs was repeated with PFGE. Following the PFGE run, the DNA band of 125 kb corresponding to PiraGV-K DNA was eluted, eventually separating PiraGV-K DNA from P. rapae embedded agarose molds (Fig. 4A). The eluted DNA (20 ng) was subsequently electrophoresed in parallel with a 1 kb ladder to validate the separation process (Fig. 4B). The eluted and endrepaired PiraGV-K DNA was ligated into the pCC1FOS vector and the purified products were checked for quality by titering. In total, approximately 6,000 clones resulted, out of which 96 were selected and end-sequenced. To effectively map the fosmid-end sequences, we performed a stand-alone BLAST against a locally constructed viral sequence database. Based on the mapping data from the databases, a minimum tiling path (MTP) was prepared, leading to the selection of four fosmid library clones for the construction of a PiraGV-K shotgun library. The sizes of the four selected fosmid clones, (NB-FOS-1-1-F40_C07D02, NB-FOS-1-1-F40_E13E04, NB-FOS-1-1-F40_A05A02, and NB-FOS-1-1-

Characteristics of the PiraGV-K Genome Sequence
To date, whole-genome sequencing has been conducted successfully for 60 baculoviruses: 45 were NPVs (41 alphabaculoviruses, 3 gammabaculoviruses and 1 deltabaculovirus). Only 14 complete genomes have been sequenced of betabaculoviruses, including PiraGV-C [12]. The growing number of fully sequenced baculovirus genomes now allows some understanding of the evolutionary history of baculoviruses by comprehensive analyses of nucleotide/protein sequences, gene order, and content [37,38]. We have sequenced and analyzed the 108,658 bp PiraGV-K genome purified by electrophoretic method. The approach allows for the determination of the viral sequence with multiple fold redundancy per base position. An 8x sequence of the PiraGV-K genome was compiled from the sequence data generated here. The size of the final draft sequence was 108,658 nt (Fig. 7). The length of the sequence obtained was consistent with the predicted size of PiraGV-C (108,592 nt), differing by only 66 nt. It can thus be categorized as one of the smaller betabaculoviruses sequences, with AdorGV (99,657 nt) being the smallest. XecnGV has a whole genome size of 178,733 nt [5], which is largest genome among the completely sequenced betabaculoviruses and is closely related to sequences studied from noctuid moths, including Autographa gamma GV, Hoplodrina ambigua GV, Euxoa ochrogaster GV, and Scotogramma trifolii GV [39]. These are closely followed by HearGV, with a genome size of 169,794 bp [40]. PiraGV-K coding sequences represent 92% of the genome, leaving very little noncoding DNA.
The PiraGV-K genome has an AT content of 66%, identical to PiraGV-C (66%), and is closely related to CrleGV, having the highest known AT content of 67.6%. This result is consistent with previous findings that the sequenced betabaculovirus genomes are AT-rich, with the lowest AT content of 54.8% observed in case of CypoGV, with an overall average of 62.6%. The difference in AT content is due to the base composition at the third nucleotide position of the codon in the coding regions. It has been established previously that proteins encoded by more extreme AT and GCrich genomes generally have lower compositional complexity than those of more typical organisms [41]. A consequence of this is that the overall amino acid composition of the peptides in such organisms is skewed. Peptides of AT-rich organisms have higher proportions of Phe, Leu, Ile, Met, Asn, Lys and Tyr that are relatively rare in the organisms with GC-rich genomes. Similar correlation has been noted with smaller data sets in earlier research [19,42,43]. The end result of this is that organisms with an extreme genome composition encode peptides of a lower complexity, as measured by the global complexity value [44]. It is known that the median global complexity value, G1 for AT-rich genes from a variety of cellular organisms is in the range of 0.72 to 0.78 [41]. Whereas most PiraGV-K ORFs had an AT composition (average 65%) close to the average AT composition of PiraGV-K genome (66%), granulin had an AT composition that was significantly lower at 56% (results not shown). It is to be noted that in case of extremely anchored proteins, such as granulin, it might be impossible for the virus to maintain its preferred nucleotide composition and codon usage and still encode a particular peptide. This observation has been confirmed in other annotated, AT-rich, viral genomes [19,45] Also, it is understood that, although various ORF prediction methods have been used (Fig. 8), no one method can define all possible ORFs in compositionally extreme (AT or GC-rich) genomes, as is clearly illustrated in the PiraGV-K genome. PiraGV-K granulin had a subjective appearance of an ''alien'' gene, because the codon usage did not conform to the overall codon usage [46]. However, we believe that granulin represents a specific class of highly expressed, complex peptide that the virus encodes by sacrificing the constraints it maintains on other genes.
The primary criteria used to identify potential ORFs on the PiraGV-K genome were a minimum of 50 aa in length, having minimal overlap with larger ORFs, and sharing significant sequence identity with previously characterized ORFs of betabaculoviruses (Table 1). Also, by convention, the first nucleotide of the methionine start codon of granulin was defined as the first nucleotide of the genome, and the sequence was numbered in the direction of transcription of the gene. As in the case of other baculovirus genomes, minimal overlaps were observed in the PiraGV-K genome sequence with 65 ORFs in the granulin-sense orientation and, 54 in the opposite orientation, clustering together according to expression or function. Homologous repeat regions (hrs), functioning as enhancers of transcription and origins of replication, were also found interspersed in the genome. These repeated sequences have been reported to be more variable in betabaculoviruses than in alphabaculoviruses, where they consist of repeated palindromes. The CypoGV genome includes 13 hrs, as do the XecnGV and HearGV genomes. The AdorGV genome includes nine repeated regions that are unlike typical hrs [19]. Six repeat regions, including one unique hrs, have also been identified in the EppoMNPV genome [47]. In the completely sequenced genome of SpltNPV, 17 hrs were identified [48]. In the AcMNPV, hrs consist of repeated units of about 70 bp with an imperfect 30 bp palindrome near their center, binding to the transcriptional activator ie1 (Ac147) [49]. Also, cAMP and 12-O-tetradecanoylphorbol 13-acetate (TPA) response elements (CRE and TRE)-like sequences, located between hrs palindromes have been found to be evolutionarily conserved in alphabaculoviruses, but were not found in betabaculoviruses.
Genomic sequence identity of PiraGV-K was studied against other known betabaculoviruses genomes, with a maximum identity of 99% with PiraGV-C ( Table 2). The 1% difference was thought to be related to the presence of extra nucleotides in the intronic sequences of the PiraGV-K genome and not corresponded to any known ORF. The identity with other genomes was in the order of 42-58%, with greater identity with ChocGV (58.5%), CrleGV (55.78%) and CypoGV (55.6%) genome sequences. Of a total of 120 ORFs, only ORFs 9, 32, and 117 were considered unique to the PiraGV genome sequences of the Korean and Chinese strains. This represents 1.7% of the whole genome sequence. Also, 78 ORFs found in all betabaculoviruses sequences studied, have been called ''core GV genes''. Based on gene function, PiraGV-K ORFs have been grouped into four functional categories (Table-3): transcription (10 genes), replication (11 genes), structural (25 genes), and auxiliary (15 genes), with 59 unrepresented in the annotation. The most conserved among the core set of genes was granulin, with 100% identity with PiraGV-C. We compared the identified PiraGV-C ODV associated proteins [11], with the structural proteins found in PiraGV-K and found that the ORFs complemented and matched each other. PiraGV PiraGV-K ORF 98 encoded an inhibitor of apoptosis (iap-5) that seems to be betabaculovirus specific [21]. Also, PiraGV-K ORF 37 (homologous to Cypo46, Xecn40, and Plxy35) is likely a member of the stromelysin family within the matrix metalloproteinase (MMP) superfamily. It has been observed that this peptide is retained within infected cells until death, and subsequently is released into the body of the insect, causing proteolysis of tissues [4,50]. The most conserved baculovirus gene is polyhedrin/granulin, the major component of occlusion bodies. Another conserved PiraGV-K structural gene was odv-e25 (PiraGV-K, ORF 76), showing 80% amino acid identity to betabaculovirus homologs. In contrast, p24 capsid (PiraGV-K-58, ORF 58), which encodes a protein associated with both ODV and BV [51], was found to be poorly conserved (60% average amino acid identity to other betabaculoviruses). The p80/p87-capsid gene was absent from the PiraGV-K genome, as with other betabaculovirus genomes. The putative p10 (PiraGV-K, ORF17) gene showed similarities to three XecnGV ORFs (Xecn ORF 5, Xecn ORF 19, and Xecn ORF 83). Homologs of these three ORFs were found in PlxyGV (Plxy ORF 2, Plxy ORF 21, and Plxy ORF 50) and they were thus suggested to be p10 homologs [4]. p10 is implicated in occlusion body morphogenesis and disintegration of the nuclear body matrix, resulting in dissemination of OBs [52]. In NPV-infected cells, p10 forms fibrillar structures in the nucleus and cytoplasm. PiraGV-K ORF 17 showed a significantly low identity 14%, with   AcMNPV p10, and was smaller than its counterpart (104 vs 336 amino acids). A high sequence identity of 48% was noted with ClanGV p10, having 101 amino acid residues in relation to other betabaculoviruses.
The PiraGV-K genome did not encode the glycoprotein gp64 that constitutes a major envelope fusion protein in AcMNPV, BmNPV, OpMNPV, and EppoMNPV [53,54]. This protein thus appears to be unique to group I NPVs [55,56]. Also, 19 lef genes have been found in AcMNPV genomes, and have been implicated in DNA replication and transcription [57]. Early baculovirus genes are transcribed by the host cell RNA polymerase II, but these are often transactivated by genes such as ie-0, ie-1, ie-2, and pe38 [58]. Of these early baculovirus genes, the PiraGV-K genome contained only ie-1 and it was found to be poorly conserved in comparison with other betabaculovirus genomes, except PiraGV-C. These genes have previously been reported to be poorly conserved among baculoviruses. The CypoGV and PhopGV genomes have been reported to have a pe38, consistent with PiraGV-K genome [21].
Six genes have been described as essential for baculovirus DNA replication: lef-1, lef-2, lef-3, dnapol, helicase and ie-1 [59]. Homologs for all these necessary genes were found in the whole-genome of PiraGV-K with moderately conserved sequences. A PiraGV-K genome-wide scan suggested the absence of a lef-7 homolog. Earlier reports suggested that lef-7 was a group I NPV-specific gene, and stimulated transient DNA replication in AcMNPV and BmNPV [60,61]. The PiraGV-K ORFs also encode a DNA ligase (PiraGV-K ORF 102) and a helicase-2 (PiraGV-K ORF 108), in common with LdMNPV and other betabaculovirus genomes. The LdMNPV DNA ligase displays catalytic properties of a type-III DNA ligase [62]. Because the homologs of helicase-2 and DNA ligase are involved in DNA repair and recombination [63], the PiraGV-K genes likely have similar functions. The PiraGV-K genome lacks large (rr1) and small (rr2) subunits of ribonucleotide reductase and deoxyuridyltriphosphate (dUTPase) genes, that may account for the loss of enzymatic functions during facilitation of virus replication in non-dividing cells, where dNTP pathways are inactive. The lack of these genes has also been noted in alphabaculoviruses, such as AcMNPV, BmNPV, HaSNPV, HzSNPV, and EppoMNPV and other betabaculoviruses, such as PlxyGV and XecnGV [4,5,22,51,63,64]. Late transcription genes, including lef 4-6, 8-11, 39K, p47, and vlf-1 [65] have been found among the PiraGV-K ORFs, except a lef-10 homolog. The most conserved PiraGV-K lef homolog was lef-8, while lef-6 was the most poorly conserved. It has been understood that the GV lef-6 genes are smaller than the NPV lef-6 genes (86-102 amino acids vs 138-187 amino acids) and were reported in the XecnGV genome [5].
Chitinase [66] and cathepsin were present as auxiliary genes in the PiraGV-K genome. These genes have been identified in almost all the baculoviruses completely sequenced to date, except PlxyGV [4] and AdorGV [19]. The protein products encoded by these genes provide selective advantages in the breakdown of insect tissues at the end of infection and the release of OBs to the environment, which then spread horizontally [67]. The lack of the same in the cases of the PlxyGV and AdorGV genomes may account for the infected larvae not lysing at the end of infection; this may lead to the spread of viral infection by discharging large amounts of virus from their posterior ends. PiraGV-K ORF 50 corresponded to superoxide dismutase (sod), a well-conserved gene in baculoviruses. Among the betabaculoviruses, it was not reported in the SpliGV genome, although it is known in other betabaculoviruses. Although, SOD functions as an endogenous antioxidant, its proper function in baculoviruses remains unknown. Gene   deletion studies conducted in AcMNPV did not show any deleterious effect [68], although it may be predicted that SOD may protect OBs from superoxide radicals generated by exposure to sunlight in the environment. PiraGV-K ORF 45 corresponded to a ubiquitin protein, which have been identified in all baculoviruses sequenced to date, although it was found fused to gp37 as a single ORF in SpltMNPV [48]. Apart from polyhedrin and granulin [69], it is also one of the most highly conserved genes in the baculovirus genome, with 73% average amino acid identity to betabaculovirus homologs. Interestingly, the homolog of viral ubiquitin has not been reported in AcMNPV-ODV or HearNPV-ODV, but is known in AcMNPV-BV [70]. Per os infectivity factors (pif), another highly conserved gene, involved in oral infectivity of baculovirus ODV, has been characterized from almost all baculovirus genomes sequenced so far. We identified ORF 61, corresponding to pif-1, and ORF 16, corresponding to ODV-E56, also known as pif-5 [71] in the PiraGV-K genome. Although pif-1 and p74 (ORF 51 in the PiraGV-K genome) have been proposed to form structural components of the ODV envelope and may regulate infectivity of OBs, pif-5 is not an essential protein for binding and fusion of ODV or virus replication [72,73]. Additionally, the PiraGV-K genome was found to contain three putative fibroblast growth factors (fgf), represented by ORFs 62, 105, and 118. These fgfs contained the fgf superfamily domains, as determined by a conserved family domain search with the BLAST program. No enhancin homolog was found in PiraGV-K genome and is consistent with the absence of the same in the AdorGV, CypoGV and PlxyGV genomes. In contrast to the above betabaculovirus genomes, four enhancin homologs were reported in XecnGV, two in LdMNPV, and one in MacoNPV. Enhancin functions in disrupting the insect peritrophic membrane, and facilitates the initiation of infection [74]. PiraGV-K ORF 13 corresponded to the gp37 homolog (spindling acting as enhancing factor) that was shown to be absent from the AdorGV, AgseGV, ChocGV, CrleGV, PhopGV, PlxyGV, and SpliGV genomes, although the ORF was reported in the CypoGV, HearGV, PsunGV, XecnGV, and PiraGV-C genomes.
Furthermore, PiraGV-K was found to lack a conotoxin-like (ctl) homolog, as reported in the BmNPV, SeMNPV, HaSNPV, AdorGV, CypoGV, and PlxyGV genomes, although a ctl homolog has been identified in the genome of XecnGV. The ORF contains a six-cysteine motif similar to that in chitin-binding proteins [75]. A gene encoding protein kinase 1 (pk-1; PiraGV-K ORF 3) was also identified in the whole-genome sequence of PiraGV-K; this may be involved in the regulation of the phosphorylation status of viral and host proteins during infection. Two members of the iap genes, corresponding to iap (PiraGV-K ORF 79) and iap5 (PiraGV-K ORF 98), were also identified in the PiraGV-K genome. Although the p35 with antiapoptotic activity has been identified previously in the AcMNPV, BmNPV, and SpltMNPV genomes, it is absent from betabaculovirus genomes. The iap homologs generally contain two baculovirus IAP repeats (BIP) [76], that are associated with binding to apoptosis-inducing proteins [77], and a C-terminal zinc finger-like (RING) Cys/His motif [78]. The iap-5 appears to be GV-specific, and all betabaculoviruses sequenced to date have iap-5. PiraGV-K ORF 94 is a homolog of Plxy ORF 94, named desmoplakin because it shows similarity to an internal region of a human desmoplakin, an essential constituent of intracellular junctions [4]. Baculovirusrepeated ORFs (bro) have not been seen in the PiraGV-K genome, although truncated versions have been observed in CpGV [21]. These repeats are more conspicuously present in many baculoviruses (1 and 16 copies), although their function is unclear, with the possibility of binding to DNA.
Two uncharacterized ORFs were also identified in the whole genome sequence of PiraGV-K and PiraGV-C, indicated as PiraGV-K ORF 9 and PiraGV-K ORF 117.

Conclusions
There has been a significant increase in the number of wholegenome sequencing projects using the shotgun method, but traditional mapped clone methods using BAC, cosmid, and fosmid libraries remain an important intermediate layer for hybrid sequencing strategies. With a view towards advancing the wholegenome sequencing strategies of infectious viruses, we adopted a method for the construction of a fosmid library of virus mixed with the infected host and further screening only the viral genomic library. The method overcomes the often-difficult need to culture and purify viruses by traditional methods of genome analysis and reduces the difficulties in obtaining starting material than would be necessary if starting with the purification of virus particles from inclusion bodies. The viral DNA is recovered in amounts sufficient for classical genome sequencing, without recourse to the use of automated high-throughput NGS technology. Thus, the analysis of the genome of PiraGV-K by the novel method of electropho- The ClanGV (GenBank ID: HQ116624) and EpapGV (GenBank ID: JN408834) sequence information have not been taken for the genome characterization of PiraGV-K due to their publication after the present work was completed. doi:10.1371/journal.pone.0084183.t002 Table 3. PiraGV-K genes grouped according to function. retic separation provides significant advances towards analysis of other infectious viruses.