Genetic Characterization of a Novel Iflavirus Associated with Vomiting Disease in the Chinese Oak Silkmoth Antheraea pernyi

Larvae of the Chinese oak silkmoth (Antheraea pernyi) are often affected by AVD (A. pernyi vomiting disease), whose causative agent has long been suspected to be a virus. In an unrelated project we discovered a novel positive sense single-stranded RNA virus that could reproduce AVD symptoms upon injection into healthy A. pernyi larvae. The genome of this virus is 10,163 nucleotides long, has a natural poly-A tail, and contains a single, large open reading frame flanked at the 5′ and 3′ ends by untranslated regions containing putative structural elements for replication and translation of the virus genome. The open reading frame is predicted to encode a 3036 amino acid polyprotein with four viral structural proteins (VP1-VP4) located in the N-terminal end and the non-structural proteins, including a helicase, RNA-dependent RNA polymerase and 3C-protease, located in the C-terminal end of the polyprotein. Putative 3C-protease and autolytic cleavage sites were identified for processing the polyprotein into functional units. The genome organization, amino acid sequence and phylogenetic analyses suggest that the virus is a novel species of the genus Iflavirus, with the proposed name of Antheraea pernyi Iflavirus (ApIV).


Introduction
Antheraea pernyi Vomit Disease (AVD) is a common disease of the Chinese oak silkmoth, Antheraea pernyi [1]. AVD is widely distributed in the cold mountainous regions in the north of China, such as the Liao Ning, Ji Lin, and the Hei Longjiang provinces, but also found in the much warmer He Nan province in the south. It occurs mainly in the 5 th instar, especially close to the cocooning period. Infected larvae will become sluggish and the color of their pygidia turns black. The most typical symptom of early AVD is a white liquid vomited up from the midgut. Infected larvae lose the ability to hold on to twigs and drop to the ground or remain dangling from the branches by their posterior end (Fig. 1). As the disease progresses, the larvae stop eating, resulting in shortened bodies that with the death of the larvae turn dark as decay sets in. Since this decay does not turn cankerous or fishy, as happens when the larvae are infected by bacteria, the causative agent of this disease was already in 1986 suspected to be a virus [1].
In an unrelated transcriptome analysis of A. pernyi larvae, a considerable number of cDNA sequences were obtained, assembling into a single contig of about 3 kb, with high similarity to the RNA-dependent RNA polymerase (RdRp) of the Iflaviruses. Using an RT-PCR assay based on this contig, a small set of AVD-affected individuals were found to contain Iflavirus RNA, which triggered the present investigation.
The Iflaviridae is a family of Group IV, positive-sense singlestranded RNA insect-infecting viruses within the order Picornavirales, containing a single genus Iflavirus [2]. Iflavirus particles contain a single-stranded RNA genome of positive polarity that encodes a single, large polyprotein, which is post-translationally processed into viral proteins essential for its replication, packaging and transmission [3]. The 59UTR includes an internal ribosome entry site (IRES) structure needed for the cap-independent translation [4][5][6]. Downstream of the 59 UTR is a single large open reading frame (ORF) that encodes both structural (59terminus) and non-structural (39 terminus) proteins. The ORF is followed by a 39UTR, which is followed by a poly(A) tail [3].
Here, we report the nucleotide sequence, genomic organization and phylogenetic placement of this novel Iflavirus, its link to AVD, and its geographic and life-stage distribution in A. pernyi.

Sample origins
All the A. pernyi samples were collected with authorization from the Chinese Academy of Agricultural Sciences. The study was approved by the Ethics Committee of Dalian University of Technology.
The samples included in the study were collected in October 2012 from three provinces in the People's Republic of China (Liao Ning, Ji Lin and He Nan; Figure 2). He Nan is located in the middle of China. It has a distinct seasonal climate characterized by hot, humid summers and generally cool to cold, dry winters. Temperatures average around the freezing mark in January and 27 to 28 uC in July. A great majority of the annual rainfall occurs during the summer. There are 240 frost-free days annually. Liao Ning and Ji Lin are in the east of China. The annual average temperature in Liaoning is about 9 uC. January is the coldest month with the lowest temperature being 211 uC, while the highest temperature in July is 24 uC. Ji Lin has a northerly continental monsoon climate, with long, cold winters and short, warm summers. January mean temperature is 217.3 uC, and July mean temperature is 22.8 uC.
A total of 144 A. pernyi samples (80 eggs from 12 moths, 12 larvae, 40 chrysalises and 12 adult moths) were collected. The samples from the Liao Ning province were kept in a rearing chamber at 2563uC with 7065% relative humidity, with fresh Chinese oak leaves for feeding the larvae (adult moths have a lifespan of a few days and do not eat). The samples from the Ji Lin and He Nan provinces were frozen after collection and stored at 2 80 uC until processing.

Discovery, cloning and sequencing
The cDNA from an unrelated transcriptome project was examined by Illumina sequencing, generating 10,588 contigs whereof one 3 kb contig aligned to iflaviruses. This contig allowed us to design the initial primers for the cloning and sequencing of the RdRp region of the new iflavirus. The Iflavirus genome is naturally poly-adenylated, such that the Illumina sequences comprising the initial 3 kb contig were all located towards the 39 end of the virus genome. The remainder of the genome was  determined through primer-walking, using a series of forward and reverse primers (figure S1). The 59 and 39 ends were assessed by rapid amplification of cDNA ends (RACE) methodology using a FirstChoice RLM-RACE Kit (Invitrogen), and two sets of specific primers. All amplified fragments were purified, cloned into the pMD19-Tvector (Takara) and sequenced using Sanger sequencing technology.

Sequence and phylogenetic analyses
The nucleotide and deduced amino acid sequences of the virus genome were scanned for functional domains and putative proteolytic processing sites. The secondary structure of the 59 UTR was predicted using the MFOLD program [7] and rendered visually using the RnaViz 2.0 [8] and ViennaRNA [9] software packages.
The amino acid sequence was aligned to homologous sequences from other Iflaviruses, as well as representative Dicistroviruses and Picornaviruses, using the DNAMAN program. Sections of this alignment surrounding the conserved domains of the helicase (Hel), 3C-protease (3C-Pro) and the RNA-dependent RNA polymerase (RdRp) were isolated for use in phylogenetic analysis, using Maximum Likelihood criteria as implemented by MEGA5 [10]. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then selecting the topology with superior log likelihood value. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. Statistical support for the partitions was determined through bootstrap analyses involving 500 replicates. The full names and GenBank accession numbers of the viruses used in the phylogeny are found in Table S1. The 420 bp nucleotide sequences of the Helicase region and 2129 bp sequences of the capsid region from 9 isolates from the three provinces were aligned and scanned for variants, to obtain a measure of the natural geographic variability of the virus and the reliability of the diagnostic RT-PCR assays.

ApIV propagation, purification and verification
Whole bodies of naturally AVD-affected larvae were dissected and homogenized by grinding in 2.5 ml PBS per 1 g of tissue). The homogenized extract was injected into healthy pupae (50 ml/ pupae) which were incubated as described above. Three days postinjection, the hemolymph of infected pupae was collected and filtered using a fine-mesh nylon cloth to remove debris. The filtered extract was layered on top of a 25%:56% discontinuous sucrose gradient made in 1xPBS and centrifuged at 45,000 g for 2 hours at 4uC. The virus-containing fraction was collected at the middle interphase, between the 25% and 56% sucrose layers, using a needle. The presence of ApIV RNA in the purified fraction was confirmed by RT-PCR. The presence of Iflavirus-like particles in the purified fraction was determined by Transmission Electron Microscopy (TEM), using a JEM-1200EX transmission electron microscope.
Also, purified ApIV was injected into 5 th instar larvae. ApIV was diluted in sterile PBS and the dose was 5 mg/larva (which is estimated to about 4*10 11 copies per larva). A mock-inject group was injected with sterile PBS.

RNA extraction and cDNA synthesis
Geographic field isolates of AVD-symptomatic and asymptomatic larvae; eggs, pupae (chrysalis), adults and adult integument; as well as dissected chrysalis tissues, were homogenized by grinding the tissues in liquid nitrogen. Total RNA was extracted from 100 mg of each sample using RNAzol (Takara) according to the manufacturer's instructions. Each RNA sample was eluted in 30 mL of RNase free water. The nucleic acid concentration and purity was determined using spectrophotometry. Total RNA (1 mg) was reverse transcribed to cDNA using oligo-dT primers and the PrimeScript RT-PCR Kit (Takara).

Virus detection by RT-PCR
Two virus-specific RT-PCR assays were designed, based on primers located in the Helicase and RdRp domains ( Figure S1). Amplifications were carried out in 20 mL total reaction volume using Ex- Taq

Genome analysis
The new virus genome is 10,163 nucleotides long, has a natural poly(A) tail at its 39 end and contains a single large (9,108 nucleotide) open reading frame (ORF), flanked on the 59 and 39 by untranslated regions (UTR) of 883 nucleotides (59UTR) and 172 nucleotides (39UTR) in length, accounting for 10.3% of the genome. The genome is A/U rich (63.77%), which is consistent with that of other iflaviruses (e.g., KV 61.57%, DWV 61.26%, and VDV-1 61.41%) [11][12][13]. The presence of a poly(A) tail was confirmed by 39RACE-PCRamplification.
The ORF is predicted to encode a 3036 amino acid polyprotein. No other ORF that could encode proteins larger than 70 amino acids were found, on either strand, confirming that the virus has a positive-strand RNA genome. The polyprotein contains a number of conserved domains: Two Picornavirus-like capsid protein domains were identified between the amino acids 333-562 and 657-851 of the ApIV polyprotein, corresponding to VP3 and VP1, respectively, as well as a cricket paralysis virus (CrPV) capsid protein-like domain [14] between the amino acids1057 and 1291, corresponding to VP2 ( Figure 3A). The boundaries of these domains correspond closely to the putative 3C-protease and autocatalytic cleavage sites ( Figure  3A) that process the structural precursor protein into functional units during virus assembly.
An RNA helicase was identified between residues 1602 and 1753 ( Figure 3A) including the highly conserved Hel-A motif ( 1616 GxxExGKS 1623 ) but with E 1619 substituting the more common glycine (G) residue, which suggested to be responsible for nucleotide binding [15]. The other two helicase motifs, Hel-B (Qx5DD) and Hel-C (KGx4Sx5STN), are the same as the other Iflaviruses.
The 59UTR of ApIV is 883 nt long, and contains a number of stable secondary structures, clustered around 300 nt prior to the start of the ORF (Fig. 3B), including five hairpin structures (I-III; V-VI) and one Y-shaped structure (IV). They are similar to the IRES-related structures described for DWV/VDV-1 [5] in number, overall shape, distance relative to each other and distance from the ORF, and may well serve a similar function. Although the 59 UTR of ApIV has only 32% nucleotide identity with the 59UTR of VDV-1 and DWV, its closest relatives, there is considerable conservation of their predicted structural elements, supporting a functional role for these structures.

Phylogenetic analysis
The highly conserved motifs of the Helicase, 3C-protease and RdRp amino acid sequences from 19 viruses of the picorna-like superfamily were used for a phylogenetic analysis to evaluate the relationship of ApIV to other viruses. The RdRp has been used as a reliable protein to construct phylogenetic trees for classification of RNA viruses, as it tends to be highly conserved among RNA viruses [19,20]. The Helicase and 3C-protease domains were included to widen the coverage of the phylogeny to other regions of the genome. Only the core motifs of the various domains were included, to ascertain positional homology throughout the alignment. The RdRp tree segregated the viruses into two groups according to their taxonomic classification, Iflaviridae and Dicistroviridae (Figure 4). ApIV is most closely related to the DWV/VDV-1 species complex, but distinct enough to be considered a new species within the Iflaviridae, which we propose to name Antheraea pernyi iflavirus (ApIV).

Propagation, purification and infection of ApIV
The internal tissues of AVD-symptomatic larvae were homogenized and injected into healthy pupae, to propagate any viruses present. The hemolymph of these infected pupae was extracted three days post-infection and subjected to discontinuous sucrose gradient centrifugation. This yielded a fraction containing Iflavirus-like particles (Fig. 5) and large amounts of ApIV RNA, as determined by RT-qPCR. Healthy A. pernyi larvae injected with this fraction reproduced typical AVD symptoms within 3 days after infection (sluggish movement, reduced ability to clasp branches, and darkened pygidia and head), after which their bodies turned black and they died (Fig. 6). Mock-injected larvae had normal phenotypes. . The phylogeny is based on a concatenated 328 amino acid sequence combining the conserved domains of the helicase, protease and RdRp regions and was inferred by Maximum Likelihood using the JTT matrix-based model [39] as implemented by MEGA5 [40]. The tree with the highest log likelihood is shown. The percentage of trees in which the associated taxa clustered together is shown next to the branches. The full names and GenBank accession numbers of the viruses used are shown in Table S1. doi:10.1371/journal.pone.0092107.g004  (Table 1). ApIV is most prevalent in larvae (3/ 12) followed by pupae (2/12), adults (1/12) and is least prevalent in eggs (0/80). This could suggest that the ApIV-infection is acquired during the larval stage.

Prevalence and genetic variability of ApIV from different geographical regions
To determine the prevalence of ApIV throughout the natural range of A. pernyi, 40 healthy chrysalises collected from the Liao Ning and Ji Lin provinces in the north of China and He Nan province further south were assessed for the presence of ApIV. ApIV was detected in 22% of the chrysalises with similar prevalence in the three provinces studied.
A 420 bp section of the ApIV Helicase region and a 2129 bp section of the ApIV structural protein region was amplified and sequenced from 9 ApIV-positive chrysalises from the 3 different provinces studied: 4 from Ji Lin, 2 from Liao Ning and 3 from He Nan. Comparison of the sequences revealed low genetic variability of ApIV from the different geographic areas. There are only three variable nucleotide sites in the 420 bp helicase region, none of which affect the amino acid sequence ( Figure S3) and only 4 variable nucleotide sites in the 2129 bp structural protein region, 2 of which lead to amino acid changes: a Leucine-Serine change at amino acid 367 and a Valine-Isoleucine change at amino acid 376 ( Figure S4). In all, the ApIV isolates investigated in this study showed more than 99% nucleotide identity for these genomic regions, across all three provinces.

Discussion
In this study, we report the discovery of a putative iflavirus from larvae affected by A. pernyi vomit disease (AVD), which we propose to call Antheraea pernyi iflavirus (ApIV). The virus is monocistronic, with a single-stranded RNA genome of at least 10,163 nucleotides, excluding the poly(A) tail and contains a single, large open reading frame encoding a 3,036 amino acid polyprotein containing domains for both structural and non-structural replication proteins. The similarity of the genome organization and amino acid sequence with the viruses of the genus iflavirus suggests that ApIV is a novel member of this genus.
The length of the 59UTR length of ApIV (883 nt) is distinctly longer than that from other lepidopteran-infecting iflaviruses, such as Ectropis oblique picorna-like virus (473 nt; [21]), and Perina nuda virus (390 nt; [22]), but shorter than that of the DWV-species complex (1117-1156 nt; [23][24][25]) to which they are most closely related. These differences could be due to incomplete sequencing of the 59 ends for some of the viruses, or be of functional significance.
The Iflavirus genome lacks a 59 cap structure to operate the initiation of protein synthesis, like all other Picornaviridae. The iflavirus genome uses an internal ribosome entry site (IRES) for translation initiation [26][27][28][29][30][31]. Three iflaviruses were reported for having IRES activity in insect cells. They are EoPV [32], VDV-1 [5], and PnV [22]. It is therefore likely that the ApIV 59UTR also contains IRES elements, but definitive proof of this awaits functional analysis of the 59UTR regions.
The high degree of similarity between the ApIV strains isolated from three geographical regions at least 600 km apart indicate that the natural variability of ApIV is relatively is low, certainly compared to the honeybee Iflaviruses DWV/VDV-1 [23,24,33,34]; SBPV [28] and SBV/CSBV [35,36]. Millán-Leiva et al. [37] also found a high similarity between strains of SeIV (0.4% single nucleotide polymorphism; 39/10,347 bp), but these were isolated from the same laboratory and therefore raised the question of how large the variation is in nature. Our data show an SNP level of 0.7% (3/420 bp) and 0.2% (4/2129 bp) in geographically distantly separated isolates suggesting that the low variability could be an inherent property of the virus itself.
We were able to detect ApIV in AVD-affected larvae, and have shown that injection of purified ApIV leads to AVD symptoms,  but the tissue distribution and the transmission routes of both the disease and the virus remain to be determined.  Author Contributions