Genome Sequences of Three Apple chlorotic leaf spot virus Isolates from Hawthorns in China

The genome sequences of Apple chlorotic leaf spot virus (ACLSV) isolates from three accessions of hawthorns (Crataegus pinnatifida) grown at Shenyang Agricultural University were determined using Illumina RNA-seq. To confirm the assembly data from the de novo sequencing, two ACLSV genomic sequences (SY01 and SY02) were sequenced using the Sanger method. The SY01 and SY02 sequences obtained with the Sanger method showed 99.5% and 99.7% nucleotide identity with the transcriptome data, respectively. The genome sequences of the hawthorn isolates SY01, SY02 and SY03 (GenBank accession nos. KM207212, KU870524 and KU870525, respectively) consisted of 7,543, 7,561 and 7,545 nucleotides, respectively, excluding poly-adenylated tails. Sequence analysis revealed that these hawthorn isolates shared an overall nucleotide identity of 82.8–92.1% and showed the highest identity of 90.3% for isolate YH (GenBank accession no. KC935955) from pear and the lowest identity of 67.7% for isolate TaTao5 (GenBank accession no. EU223295) from peach. Hawthorn isolate sequences were similar to those of ‘B6 type’ ACLSV. The relationship between ACLSV isolates largely depends upon the host species. This represents the first comparative study of the genome sequences of ACLSV isolates from hawthorns.


Introduction
Apple chlorotic leaf spot virus (ACLSV), a representative species of genus Trichovirus in family Betaflexiviridae [1], is distributed worldwide and can infect most fruit tree species of family Rosaceae, including apple, pear, peach, plum, almond, apricot, cherry and hawthorn [2]. ACLSV is a latent virus that usually cannot cause obvious symptoms in cultivars of apples and pears. The severity of the symptoms caused by ACLSV shows a strong association with plant species and virus strains [2][3]. The virus can cause severe symptoms in many pomes and stone fruit trees, including plant dysplasia and less robust plant growth. The main disease agent of apples and pears grafted onto susceptible rootstocks can be attributed to co-infection of ACLSV with Apple stem grooving virus and/or Apple stem pitting virus [4]. ACLSV is mainly spread through the grafting, pruning, or propagation of materials and nematodes, and has not yet been found to be transmitted through seeds or natural media. Because of the inadequate development of virus-free plantlets in recent years, virus transmission caused by grafting now represents a major threat to the fruit industry.
ACLSV, which is 640-760 nm in length, is a positive-sense, single-stranded RNA particle. The ACLSV genome is composed of 7474-7561 nucleotides, excluding the poly-adenylated tail, with untranslated regions of 150 and 215 nucleotides at its 5 0 -and 3 0 -termini, respectively [5]. The complete nucleotide sequence contains three overlapping open reading frames (ORFs) that encode a 216-kDa replication-associated protein (Rep), a 50-kDa movement protein (MP), and a 22-kDa coat protein (CP) [6][7]. The CP is the only constitutive protein, and it has a relatively conserved gene sequence.
Although GenBank includes many partial or complete genome sequences of ACLSV, no sequence of an ACLSV isolate from a hawthorn was available prior to our present study. Some studies have indicated that ACLSV has many variants with different serological reactivity and strains that reflect different host species and geographical distributions [8][9].
To better understand the molecular characteristics of ACLSV isolates from hawthorns, the genome sequences of these ACLSV isolates were determined, and the nucleotide and amino acid identities and phylogenies were analyzed.

Plant materials
Young plant leaves and fruits of the Crataegus pinnatifida accessions used in this study were collected from Shenyang Agricultural University.

RNA extraction
Total RNA was extracted from 100 mg of hawthorn leaves using a modified CTAB method [10].

High-throughput sequencing
All cDNA library preparation and sequencing reactions were carried out by the Biomarker Technology Company. Paired-end library preparation and sequencing were performed following standard Illumina methods using a DNA sample kit. The cDNA libraries were sequenced on the following Illumina sequencing platforms: HiSeq TM 2000 for SY01 and HiSeq TM 2500 for SY02 and SY03.

Primer design and reverse transcription-polymerase chain reactions
Primers for the amplification of genome fragments from SY01 and SY02 (S1 Table and S2  Table) were designed based on transcriptome data and synthesized by GENEWIZ, Inc. (Beijing, China). Reverse transcription reactions were performed at 37°C for 30 min with Prime-Script 1 RT reagent kit (TaKaRa, Dalian, China) according to the manufacturer's instructions. PCR reactions were carried out in 20 μL total volumes with reaction mixtures that contained 1 μL cDNA, 1.6 μL of each dNTP (2.5 mM), 2.0 μL 10× PCR buffer, 1.0 μL MgCl 2 (25 mM), 0.5 μL of each primer (10 μM), 0.2 μL Taq DNA polymerase (Promega, Shanghai, China) and ddH 2 O to yield a 20 μL final volume.
The complete genome sequences of ACLSV from hawthorns were assembled with overlapping fragments of more than 100 bp, as shown in the diagram with SY01 as a representative example (Fig 1). Nucleotide and amino acid identities were compared using the DNAMAN software package (Version 5. ) and Bal1 (accession no. X99752). A multiple sequence alignment was performed using Clustal X (http://www.clustal.org) [11]. Phylogenetic trees were generated using the multiple sequence alignment results and constructed by the neighbor-joining method with 1000 bootstrap replicates with MEGA software (version 6.0). Data about the ACLSV isolates used for sequence analysis and alignment are listed in Table 1.

Assembly of the hawthorn ACLSV genome sequence
The SY01, SY02 and SY03 sequences of ACLSV from three accessions of hawthorns, with 7,543, 7,561 and 7,545 nucleotides, respectively, were first determined using Illumina RNA-seq. Then, SY01 and SY02 were assembled by RT-PCR to confirm the transcriptome data. The sizes of nine specific fragments of SY01 obtained by RT-PCR were as follows:

Genomic characterization and sequence analysis
The three complete nucleotide sequences of ACLSV from hawthorns, SY01, SY02 and SY03, consisted of 7,543, 7,561 and 7,545 nucleotides, respectively, excluding the poly-adenylated tail. We made these sequences available in GenBank with the accession numbers KM207212, KU870524 and KU870525, respectively. The genome of the SY02 isolate was longer than the other two isolates because of differences in nucleotide numbers in the 5 0 -untranslated regions (5 0 -UTR), as shown in Table 2. Sequence analysis showed that these three hawthorn isolates shared an overall nucleotide identity of 82.8-92.1%. The complete nucleotide sequence of these three hawthorn isolates contained three overlapping ORFs that were 5,634, 1,383 and 582 nucleotides in length (ORF 1, 2, and 3, respectively). All three isolates had the same number of nucleotides and amino acids. ORFs of the MO-5 isolate from apples consisted of the same number of nucleotides as those from hawthorn isolates. Sequence analysis revealed that SY01 and SY02 had high similarity, with 92.1% nucleotide identity and 95.3%, 96.1% and 98.2% amino acid sequence identity based on a comparison of the three ORFs. SY03 shared only 82.8-83.1% nucleotide identity and 90.7-97.4% amino acid similarity with SY01 and SY02. The nucleotide sequence and amino acid identities of the whole genome as well as different genomic regions between hawthorn isolates and fifteen previously reported ACLSV isolates were analyzed. Table 2 shows a sequence comparison between SY01 and other isolates. The three hawthorn isolates showed the highest nucleotide identity with YH (90.3%) and the lowest with TaTao5 (67.7%). The amino acid identities between SY01 and the three isolates from pear (JB, KMS and YH) were all greater than 89%, showed very high homology, and always clustered together according to our phylogenetic analysis. Rep, MP and CP of ACLSV isolates from hawthorns shared 74.2-94.1%, 58.3-95.0% and 74.1-99.0% overall amino acid identities, respectively, compared with those of other isolates shown in Table 2.

Variability of the CP gene among ACLSV isolates
The CP was most conserved, having more than 90% amino acid identity between SY01 and other isolates, except for two distinct isolates (Ball and TaTao5). All CPs of the eighteen isolates corresponded to 582 nucleotide genes that encoded 193 amino acids. A multiple alignment based on the eighteen amino acid sequences of ACLSV CP also illustrated the sequence conservation of the CP gene, especially from amino acid sites 100-193 (Fig 3). The 'B6 type' (S40-L59-Y75-T130-L184) and 'P-205 type' (A40-V59-F75-S130-M184) of ACLSV are classified by the five characteristic sites of the CP, which have been frequently described in previous studies [12,19]. The three hawthorn isolates were each similar to the 'B6 type'. All ACLSV isolates discussed in this present study belonged to the 'B6 type', except for the A4 isolate. However, compared with the B6 isolate, SY01 and SY02 had amino acid M at position 59, which was consistent with the JB isolate from pears; SY03 and the other two pear isolates had amino acid V in that same position, and SY01 had a different amino acid, S, at position 130. In an alignment of the reported CP amino acid sequences of ACLSV isolates, the TaTao5 isolate had the fewest conserved amino acids. The amino acid motif (S40-V59-Y75-K130-I184) of the TaTao5 isolate only had two conserved amino acids with B6 at sites 40 and 75.

Phylogenetic analysis
According to our phylogenetic analysis based on nucleotide and amino acid sequences of eighteen ACLSV isolates, similarities of the ACLSV isolates showed a strong association with their respective host species. Our analysis of the phylogenetic tree generated from the whole genome ( Fig 4A) revealed that these isolates could be mainly divided into four distinct clades. The peach isolate TaTao5 and the only cherry isolate Bal1 both individually formed separate clades. The apple, pear and hawthorn isolates belonged to pome fruit trees, and those isolates from pear and hawthorn along with apple isolate MO-5 formed another clade. Finally, isolates from stone fruit trees, including Z1 and Z3 from peach and PBM1 and P863 from plum, were grouped into the last clade along with five other apple isolates. This grouping also applied to the phylogenetic trees for Rep ( Fig 4B) and MP (Fig 4C). Many sub-clades are always present in trees. SY03 belonged to the same sub-clade with the three pear isolates, while the isolates from peach (except TaTao5) and plum formed another sub-clade.

Discussion
Previously, viral RNA was extracted from purified virus [6] and first-and second-strand cDNA were obtained by generating cDNA libraries [20]. Recently, next-generation sequencing (NGS) has been developed and has been applied to allow for rapid diagnosis and detection. Both emerging and known plant viruses can be easily discovered by high-throughput sequencing [21]. Liu et al. [22] determined the whole genome sequence of a Chinese isolate of Pepper vein yellows virus using deep sequencing of small RNAs. Bejerman et al. [23] discovered a new enamovirus and obtained genome sequences from alfalfa plants that showed dwarfism symptoms by de novo sequencing, which was confirmed by the Sanger method. In this present study, three ACLSV isolates from hawthorns were determined by Illumina RNA-seq, and two of them were validated by RT-PCR, which showed a high degree of similarity with the transcriptome data. The study of Khalifa et al. [24] also mentioned that the de novo assembled genomes from Illumina were 99.3-100% similar to Sanger sequencing results. Together, these findings established the veracity and reliability of sequence data from NGS. Through a comparison of hawthorn isolates and fifteen reported isolates, we found that ORF3 was the most conserved, while ORF1, ORF2 and the 3 0 -UTR and 5 0 -UTR were relatively diverse, especially ORF1, which had a highly variable region [18]. ORF1 of ACLSV encodes Rep, including methyltransferase, protease, helicase, and RNA-dependent RNA polymerase. The poorly conserved region was mapped to the protease. Zhu et al. [14] have suggested that the sequence of the hypervariable region might be related to the phylogenetic evolution of the virus. The CP at the C-terminus of the plant virus was relatively conserved. We can conclude that most of the variability was present in the N-terminal domain of the CP, which overlapped with the C-terminus of MP, whereas the C-terminus of CP was significantly less variable [8], as shown in Fig 3. Conservation of CP corresponds with conservation of the entire genome sequence. Isolate TaTao5 had the fewest conserved amino acids, and had a distant relationship with the other isolates.
A comparison of the amino acid sequences of the CP revealed that only the A4 isolate showed the same amino acid combination (A40-V59-F75-S130-M184) as P-205 (Fig 3). Based on the phylogenetic trees, A4 was always grouped into the same subclade as P-205. This present study also confirmed that the classifications of the 'B6 type' and 'P-205 type' were reasonable. Yaegashi et al. [12] proposed that the specific combination of amino acid sites 40 and 75 (S40-Y75 or A40-F75) had a strong influence on viral accumulation and replication. From the diagram shown in Fig 3, it was evident that these two sites were highly conserved. Chen et al. [25] proposed the four phylogenetic types based on the three signature sites of the CP of ACLSV. From Chen's classification standard, we can conclude that isolates from hawthorns and pears and MO-5 belong to group II (S40-Y75-S79), P-205 and A4 belong to group III (A40-F75-E79), and TaTao5 belongs to group IV (S40-Y75-T79), while the others belong to group I (S40-Y75-E79). There were many types of ACLSV CP, and virus variation is widespread in nature; however, the mutation mechanism has remained unclear to date.
Genome sequences of the presently reported eighteen ACLSV isolates were from Asia, Europe and America, and the respective hosts were Rosaceae fruit trees, including apple, pear, peach, plum, cherry and hawthorn. The eighteen isolates could be divided into four groups according to the phylogenetic analysis for trees generated based on whole genome sequences; this grouping was consistent with that of a previous study [3]. Among these groups, the apple isolates could be divided into two groups-MO-5 formed one group, while the other isolates formed another group. Niu et al. [7] proposed that two types of ACLSV isolates exist in peaches, the Z1 type and the TaTao5 type. Hawthorn isolates also formed two branches. We can conclude that isolates from the same fruit trees fall into the same or adjacent clades according to the phylogenetic clades. Currently, there is not sufficient evidence to show whether there is a correlation between different isolates and the original country of the host. Characterizing the molecular characteristics of ACLSV and the relationship between isolates and host species and origins will require further study to obtain new insights into virus population structure and evolution.
In summary, this study represents a comparative analysis of the whole genome sequences of ACLSV isolates from hawthorns and assessed sequence similarities and phylogenies among the eighteen ACLSV isolates that had been previously reported. Our present findings demonstrate that isolates from hawthorns and pears show a very close relationship, and the sequence identities of ACLSV isolates depend largely on the host species. This study also supports the notion that the classification of 'B6 type' and 'P-205 type' that had been reported [12] was reasonable and describes the variation in the CP of ACLSV. These findings may provide a basis for strain partitioning of ACLSV, which could lay a foundation for viral prevention and control.
Supporting Information S1