Molecular Genetic Analysis and Evolution of Segment 7 in Rice Black-Streaked Dwarf Virus in China

Rice black-streaked dwarf virus (RBSDV) causes maize rough dwarf disease or rice black-streaked dwarf disease and can lead to severe yield losses in maize and rice. To analyse RBSDV evolution, codon usage bias and genetic structure were investigated in 111 maize and rice RBSDV isolates from eight geographic locations in 2013 and 2014. The linear dsRNA S7 is A+U rich, with overall codon usage biased toward codons ending with A (A3s, S7-1: 32.64%, S7-2: 29.95%) or U (U3s, S7-1: 44.18%, S7-2: 46.06%). Effective number of codons (Nc) values of 45.63 in S7-1 (the first open reading frame of S7) and 39.96 in S7-2 (the second open reading frame of S7) indicate low degrees of RBSDV-S7 codon usage bias, likely driven by mutational bias regardless of year, host, or geographical origin. Twelve optimal codons were detected in S7. The nucleotide diversity (π) of S7 sequences in 2013 isolates (0.0307) was significantly higher than in 2014 isolates (0.0244, P = 0.0226). The nucleotide diversity (π) of S7 sequences in isolates from Jinan (0.0391) was higher than that from the other seven locations (P < 0.01). Only one S7 recombinant was detected in Baoding. RBSDV isolates could be phylogenetically classified into two groups according to S7 sequences, and further classified into two subgroups. S7-1 and S7-2 were under negative and purifying selection, with respective Ka/Ks ratios of 0.0179 and 0.0537. These RBSDV populations were expanding (P < 0.01) as indicated by negative values for Tajima's D, Fu and Li's D, and Fu and Li's F. Genetic differentiation was detected in six RBSDV subpopulations (P < 0.05). Absolute Fst (0.0790) and Nm (65.12) between 2013 and 2014, absolute Fst (0.1720) and Nm (38.49) between maize and rice, and absolute Fst values of 0.0085-0.3069 and Nm values of 0.56-29.61 among these eight geographic locations revealed frequent gene flow between subpopulations. Gene flow between 2013 and 2014 was the most frequent.


Introduction
Rice black-streaked dwarf virus (RBSDV), a member of the genus Fijivirus in the family Reoviridae, causes maize rough dwarf disease (MRDD) and rice black-streaked dwarf disease (RBSDD), which lead to severe yield losses in maize and rice in East Asia [1,2]. Variability, codon usage and nucleotide composition bias, recombination, selection pressure, and population genetic structure can each affect the evolution of a virus [3][4][5][6][7]. Therefore, we investigated the population codon usage bias and genetic structure of RBSDV in 111 MRDD and RBSDD isolates (S1 Table) sampled from eight geographic locations in 2013 and 2014. These locations were mainly in the Yellow and Huai River summer maize-growing regions of China, where the MRDD prevailed, including Henan, Shandong, Jiangsu, Hebei provinces and Beijing.
RBSDV has icosahedral, double-layered particles with a diameter of 75-80 nm that contain ten linear dsRNAs (S1-S10) that range in size from 1.8 to 4.5 kb [2,[8][9][10][11]. The dsRNA S7 is comprised of two ORFs designated S7-1 and S7-2 that encode the proteins P7-1 and P7-2, respectively. P7-1 is a nonstructural protein comprised of 363 amino acids (with a molecular mass of 41.0 kDa) that causes male sterility due to nondehiscent anthers in Arabidopsis [12]. P7-2 is a nonstructural protein comprised of 309 amino acids with a molecular mass of 36 kDa that interacts with SKP1, a core subunit of SCF ubiquitin ligase [13]. Although P7-1 and P7-2 exhibit many characteristics consistent with a role in virus replication, the genetic structure and codon usage bias of their encoding dsRNAs have not yet been elucidated. Further, the interactions of host plants with RBSDV should be examined to gain insights into the evolution of the S7 dsRNA.
Studying the nucleotide composition of these viral molecules, and the extent and causes of biases in their codon usage is essential to understanding the evolution of RBSDV, particularly to detect any interplay between the virus and the cells or immune responses of its hosts [4]. Studies have revealed complicated patterns of nucleotide composition and codon usage bias (CUB) in some viruses, but the forces shaping their evolution have not been illuminated [4]. Codon usage bias refers to the phenomenon wherein synonymous codons do not appear with equal frequencies in protein sequences. Synonymous codon usage has been studied in a wide variety of organisms, including prokaryotes, eukaryotes, and viruses [14]. CUB occurs in higher organisms, microorganisms, and in some human and animal viruses [15][16][17][18]. Among plant viruses, there have been studies on sobemovirus [19], citrus tristeza virus [20], and soybean dwarf virus [21]. However, there has been little research into CUB in RBSDV or other reoviruses to date [22].
However, analyses of the genetic structure and codon usage bias of the RBSDV S7 dsRNA had not previously been performed. In the present study, the genetic structure and codon usage bias of 111 RBSDV S7 sequences from maize and rice hosts from eight geographic locations in 2013 and 2014 were analysed. Our findings provide further insights into the evolution of RBSDV based on molecular genetic analysis of the S7 dsRNA.

Sampling of virus isolates
Maize and rice plants with symptoms of rough dwarf disease of Beijing (I) were collected from the experimental field of Chinese Academy of Agricultural Sciences. In Tangshan (II), plants were collected together with Wen-Yue Tong of Tangshan Agricultural Reseach Institutes. In Baoding (III), plants were collected together with Dr. Jie Shi and Dr. Bo Li of Hebei Academy of Agriculture and Forestry Sciences. In Jinan (IV), plants were collected together with Dr. Zhao-Dong Meng and Dr. Qi Sun from Shandong Academy of Agricultural Sciences. In Jining (V), plants were collected together with Zhao-Wen Sun of Jining Agricultural Reseach Institutes. In Zhengzhou (VI), plants were collected together with Dr. Shuang-Gui Tie and Dr. Xiao-Hua Han of Henan Academy of Agricultural Sciences. In Yancheng (VII) and Nanjing (VIII), plants were collected together with Dr. Yan-Ping Chen of Jiangsu Academy of Agricultural Sciences. In this study, our maize and rice plants were not cultivated on private land. Our study involved no specific permissions for these locations/activities, because our plant materials were all collected together with the scientific researchers of local institutions in the experimental fields of every academy of agricultural sciences. Our study did not involve endangered or protected species.
A total of 111 maize or rice plants with symptoms of maize rough dwarf disease or rice black-streaked dwarf disease were collected from eight areas in which these diseases prevailed in 2013 and 2014 (S1 Table). Nine plants were collected from Beijing, 21 from Hebei, 33 from Shandong, 25 from Henan, and 23 from Jiangsu. Rice plants were also harvested from near the same locations in which maize was also cultivated in Baoding (III), Jining (V), Zhengzhou (VI), and Nanjing (VIII). These virus-infected plant leaves were frozen in liquid nitrogen and stored at -80°C. A total of 76 maize isolates from eight geographic locations (from I through VIII), and 35 rice isolates from four geographic locations (II, V, VI, and VIII) (S1 Table) were processed and used for analyses of RBSDV S7 sequences.
RNA extractions, RT-PCR, and sequencing RBSDV dsRNA was extracted from individual maize and rice isolates following previously described methods [9,33,34]. The quality and integrity of the dsRNA were assessed on 1.2% native agarose gels and the dsRNA concentrations were estimated using a NanoDrop 2000 spectrophotometer (Thermo Scientific, USA). First-strand cDNA was synthesized using a Fast Quant RT Kit (TIANGEN, China), and PCR products were amplified with two pairs of S7-specific primers (S2 Table) using KOD-Plus-Neo enzyme (TOYOBO, Japan). These products were then sequenced at the AuGCT DNA-SYN Biotechnology Company (Beijing, China) using the dideoxy chain-termination method. For partial S7 sequences, three independent PCR reactions were sequenced to confirm sequencing quality. The sequence data was assembled and analyzed using DNAMAN and Jemboss1.5 software (EMBOSS, Cambridge, UK) [35].

Analysis of codon usage bias in S7-1 and S7-2 sequences
Codon usages in P7-1 and P7-2 were assessed using the program Codon W 1.4.4 (http:// sourceforge.net/projects/codonw/). The effective number of codons (Nc value) represents the bias towards synonymous codons but does not pertain to amino acid composition or codon number [36,37]. Nc values for different genes or isolates ranged from 20 (when one codon is used per amino acid) to 61 (when all possible codons are used equally). Highly expressed genes tend to have high codon bias with low Nc values [38]. GC3 S denotes the frequency of G+C, and the expressions A3 S , U3 S , G3 S , or C3 S indicate the frequencies of A, U, G, or C, respectively, at synonymous third-base positions.
The codon adaptation index (CAI) was used to measure the extent of codon bias in expressed genes [39,40], S7-1 and S7-2 in the present study. The value of CAI ranges from zero to one, where a value of one indicates high codon usage bias and potential expression level [40]. The codon bias index (CBI) was used to estimate the proportion of preferred codons [41]. When the CBI value is one, only preferred codons are used for all triplets in the mRNA, which would indicate a nonrandom process. In contrast, negative values for CBI indicate that nonpreferred codons are used more often than expected.
To determine the preferred codons for the S7-1 and S7-2 sequences, the value for relative synonymous codon usage (RSCU) was calculated using 111 sequences from 111 isolates. RSCU is the ratio of the observed to the expected codon frequency, assuming that all synonyms for that amino acid have an equal chance of being used. There is positive codon usage bias when the value of RSCU is greater than one, and relatively negative codon usage bias when RSCU is less than one. When RSCU equals one, a codon has been chosen randomly [42].
Five percent of the total genes with the highest and lowest CAI values were defined as the high-and low-expression datasets respectively, and were selected to determine optimal codons. Codon usage was compared using a Chi-squared contingency test of groups, defining codons whose frequency of usage was significantly higher (P < 0.01) in the high-expression dataset than in the low-expression dataset as the optimal codons [43].

Sequence variants and nucleotide diversity in S7 sequences
Nucleotide or amino acid sequence alignments among these 111 viral isolates from 2013 and 2014 were performed using the MegAlign program in DNAStar5.01 software (Madison, USA) [2,44] set to default settings. The nucleotide sequences for S7 across these 111 viral isolates were aligned using MEGA 6.06 [45]. Sliding-window analyses of nucleotide diversity (π) in S7 sequences was performed using a 200-bp window in 100-bp steps with TASSEL 3.0 software [46]. Nucleotide diversities for S7 sequences were calculated for these isolates either grouped by geographic location, host, and year, or for all isolates combined.

Detection of genetic recombination within and phylogenetic analyses of S7 sequences
Nucleotide and amino acid sequences were aligned using CLUSTAL W in MEGA 6.06 with default settings [45]. Possible recombination sites within S7 sequences were examined using the software RDP 4.22 with the RDP, GENECONV, BOOTSCAN, Maximum Chi SQUARE (MAXCHI), CHIMAERA, Sister Scanning (SISCAN), and 3Seq algorithms in the default configurations, except that the 'linear sequence' and 'disentangling overlapping signals' options were selected [47]. Recombination events were validated only if they were detected by more than two methods. The default parameter for the number of simulated datasets was 100 and the P-value cutoff was 0.05. Phylogenetic trees were constructed using the neighbor-joining (NJ) method in MEGA 6.06 software [45] for the S7 sequences from these 111 isolates. The number of bootstrap replicates was set to 1000. Only bootstrap values greater than 50% are shown.

Detection of selection pressure on S7 nucleotide sequences
The Ka/Ks ratio was used to estimate the level of selection pressure on S7, where Ka is the average number of nonsynonymous substitutions per nonsynonymous site and Ks is the average number of synonymous substitutions per synonymous site. The average values of Ka and Ks were calculated using MEGA 6.06 software [45] according to the methods described in previous studies [48,49]. When the Ka/Ks ratio is greater than one, the gene is considered to be under positive or diversifying selection. If the Ka/Ks ratio is one, selection is neutral. However, if the Ka/Ks ratio is less than one, the gene is under negative or purifying selection.
Tajima's D, Fu & Li's D, Fu & Li's F statistical tests, and haplotype diversity were estimated using the software DnaSP 5.0 [50]. Tajima's D [51], Fu and Li's D, and Fu & Li's F tests [52] hypothesize all mutations to be selectively neutral. The frequencies and numbers of haplotypes indicate the haplotype diversity in the population.

Estimation of genetic differentiation and gene flow
To detect genetic differentiation between different subpopulations, three permutation-based statistical tests, Ks Ã , Z (the rank statistic), and Snn (the nearest-neighbor statistic), were performed. Because these three tests can powerfully detect genetic differentiation, they are particularly effective for datasets in which mutation rates are high and sample size is small [53,54]. The level of gene flow between subpopulations was measured by estimating Fst (the component of genetic variation between populations or the normalized variation in allele frequencies among populations) and Nm (the product of the effective size of each population [N] and the rate of migration among populations [m]) [55]. Fst ranges from zero to one for undifferentiated to fully differentiated populations, respectively. An absolute value of Fst of greater than 0.33 normally suggests that infrequent gene flow has taken place. Genetic drift that can result in substantial local differentiation can be indicated if the value of Nm is less than one, but not if the value of Nm is greater than one [56]. The statistical tests for genetic differentiation and estimation of Fst were performed using DnaSP 5.10 [50].
Nc plots (a plot of Nc versus GC3s) were used to understand the relationship between nucleotide composition and codon bias in S7-1 and S7-2 ( Fig 1A). Nc should fall on a continuous curve between Nc and GC3s if GC3s is the only determinant of Nc. The Nc values for S7-1 ranged from 42 to 47 and those for S7-2 ranged from 38 to 41, indicating that there are very significant differences in codon bias between S7-1 and S7-2 (P < 0.01). The relationships between nucleotide composition and codon bias for both S7-1 and S7-2 are independent of years ( Fig  1B), hosts (Fig 1C), and geographical locations (Fig 1D). A small number of points lie on the standard curve towards GC-poor regions in the Nc plot for S7-1, but no points lie on the standard curve in Nc plot for S7-2. However, most of the points with low Nc values lie below the standard curve (Fig 1A), which suggests that S7-1 and S7-2 have additional codon usage bias independent of GC3s. In fact, points for S7-2 mostly lie far away from the standard curve in comparison with those for S7-1, which indicates that mutational bias had a weaker effect on codon usage variation in S7-2 than in S7-1.

Correspondence analysis of relative synonymous codon usage and optimal codons
Further evidence that mutational bias and other factors are responsible for codon usage variation in S7-1 and S7-2 came from correspondence analysis (CA) of the RSCU values for the two ORFs. The first two major axes explain fractions of the total variation (37.76% and 14.60% in S7-1; 38.96% and 9.64% in S7-2), and the next two axes account for 12.78% and 10.54% of the total variation in S7-1 and for 9.18% and 8.08% of the total variation in S7-2, respectively. The first and second axes for S7-1 and S7-2 were clustered in the plot (Fig 2); however, the majority of data for S7-1 and S7-2 do not cluster completely. S7-1 was scattered around the first axis, S7-2 concentrated mostly in a region located at the first quadrant of the two axes. However, the difference between S7-1 and S7-2 in this analysis was not significant (P > 0.05).
To detect correlations along the first two major axes for both CAI and Nc, correlation coefficients were calculated among values of these parameters. The separation of codons on the first axis appeared to be largely due to differences in the frequencies of codons that end with G/C or A/U. The S7-1 on axis one were strongly correlated with the C3s value (r = 0.9560, P < 0.0001) and Nc (r = 0.9234, P < 0.0001), and significantly negatively correlated with the U3s (r = -0.9516, P < 0.0001) and G3s (r = -0.8720, P < 0.0001) values ( Table 1). The S7-2 on axis one were strongly correlated with the GC3s value (r = 0.9241, P < 0.0001) and CAI (r = 0.7650, P < 0.0001), and significantly negatively correlated with the A3s (r = -0.9214, P < 0.0001) and GC (r = -0.8919, P < 0.0001) values (Table 1). For S7-2, values of CAI were significantly correlated or negatively correlated with Nc and certain codons (GC3s, GC, C3s, A3s, G3s) (|r| > 0.7, P < 0.0001) ( Table 1). But the value of CAI in S7-1 was uncorrelated with Nc or other parameters.
To determine the optimal codons used in S7-1 and S7-2, the average RSCU values in highand low-expression datasets were determined (S3 Table). Six codons were identified as the optimal codons in S7-1 and S7-2, according to the Chi-square test. Most optimal codons ended with G (41.67%) or U (33.33%), indicating that codon usage in RBSDV-S7 was biased towards synonymous codons ending with G or U.

Nucleotide diversity across S7 in 111 viral isolates
In the present study, 76 maize isolates with typical rough dwarf disease symptoms and 35 rice isolates with typical black-streaked dwarf disease symptoms were collected from eight locations in 2013 and 2014 (S1 Table). A total of 486 nucleotide mutation sites, including 194 singleton variable sites and 292 parsimony-informative sites, were detected among the S7 sequences across these 111 viral isolates, with an average of one mutation site per five base pairs. Fourteen amino acid changes were detected in P7-1, with an average of one mutation site per 26 amino acids, and 69 amino acid changes were detected in P7-2, with an average of one mutation site per four or five amino acids.

Recombination and phylogenetic analysis
One recombination event within S7 was detected in maize isolate 13IIIM-2 from Baoding using three different methods (Maxchi, Chimaera, SiSscan). The breakpoint positions were located at nucleotide (nt) 1242 in ORF S7-2 and at nt 2192 in the 3' UTR of 13IIIM-2 within the major and minor parental sequences for isolates 13VIIM-4 and 13VR-2. A phylogenetic tree was constructed from these 110 isolate sequences to determine the evolutionary relationships among these RBSDV S7 isolates. The recombinant in the present study was not included, because the phylogenetic algorithm we used cannot accommodate recombinants (Fig 4). Based on S7 sequences, these 110 isolates could be classified into two main groups, designated A and B, that were independent of year, host, and geographical origin (Fig 4). Both groups A and B could be further clustered into two subgroups (groups AI and AII; and BI and BII). Subgroup AI included seven isolates from 2013 and nine isolates from 2014; subgroup AII included four isolates from 2013; subgroup BI included four isolates from 2013 and six isolates from 2014; subgroup BII included 31 isolates from 2013 and 49 isolates from 2014.

Selection pressure and neutrality tests
To analyze possible selection pressure on RBSDV S7, the ratios of nonsynonymous to synonymous sites (Ka/Ks) were calculated for maize and rice hosts from eight geographic locations in 2013 and 2014 ( Table 2). The Ka/Ks ratios for S7-1 and S7-2 suggest that both S7-1 and S7-2 were under negative and purifying selection ( Table 2). There was no significant difference in Ka/Ks ratios for S7-1 or S7-2 between 2013 and 2014, with S7-1 values of 0.0181 and 0.0177, and S7-2 values of 0.0510 and 0.0569, respectively. And there were no significant differences in Ka/Ks ratios for S7-1 and S7-2 between hosts or between geographic locations. However, Ka/ Ks ratios for S7-1, which ranged from 0.0147 to 0.0370, were significantly lower than those for S7-2, which ranged from 0.0328 to 0.0702 (P < 0.01). This result suggests that S7-1 and S7-2 each experienced different levels of selection, and that selection pressure on S7-1 was greater than that on S7-2.
Values for Tajima's D, Fu and Li's D, and Fu and Li's F, as well as haplotype, were evaluated using DnaSP version 5.10 ( Table 3). The values for Tajima's D, Fu and Li's D, and Fu and Li's F were all negative for year, host, and geographic location except in locations I and IV. The Pvalues for Tajima's D, and Fu and Li's D, and Li's D and F were less than 0.01 in the entire population of 111 isolates and less than 0.05 in the maize, location VI, and location VIII subpopulations. This result suggests that the RBSDV populations were expanding (P < 0.01). The maize, location VI, and location VIII subpopulations were in a state of significant expansion (P < 0.05). The other subpopulations were also expanding, but not significantly. The values of haplotype diversity for S7 ranged from 0.8330 to 1.000 in different subpopulations. Such high values for haplotype diversity also indicate that the RBSDV populations were expanding.

Genetic differentiation and gene flow between subpopulations
In the present study, genetic differentiation and gene flow between RBSDV subpopulations, including years, hosts, and geographic locations, were analyzed. The P-values for Ks Ã , Z, and Snn calculated from RBSDV S7 subpopulations derived from 2013 or 2014 and the subpopulations derived from maize or rice were greater than 0.05. These results suggest that genetic differentiation was not significant between subpopulations defined as years or hosts (Table 4). However, genetic differentiation of six particular groups of subpopulations reached significant or very significant levels (Table 4). These six groups were derived from the combinations of locations I and III, I and VII, I and VIII, III and V, III and VII, and IV and VII. Neighbor-joining phylogenetic tree based on the nonrecombinant nucleotide sequence of S7 from different RBSDV isolates. The number of bootstrap replicates was set to 1000. Only bootstrap values > 50% are shown. Red lines represent the isolates that clustered into subgroup AI; pink lines represent the isolates that clustered into subgroup AII; black lines represent the isolates that clustered into subgroup BI; blue lines represent the isolates that clustered into subgroup BII. The absolute values of Fst for subpopulations based on years, hosts, and geographic locations were less than 0.33, indicating gene flow between subpopulations of RBSDV (Table 4). Gene flow was the most frequent across years, because the absolute Fst values for subpopulations comprised of 2013 and 2014 were the smallest. The absolute values of Nm for subpopulations comprised of 2013 and 2014, maize and rice hosts, and 24 groups based on geographic locations (except for combined locations I + III, I + VII, IV + VII, and V+VII) were greater than one (Table 4). This result suggests that gene flow occurred between years or parts of geographic locations. The absolute values of Nm were greater than four in some subpopulations, such as 2013 and 2014, hosts maize and rice, and combined geographic locations I and V, II Table 2. Nonsynonymous-to-synonymous substitution ratio for S7-1 and S7-2 sequences from RBSDV.

Discussion
MRDD is a serious viral plant disease in the Yellow and Huai River summer maize-growing region of China, in which winter wheat is also grown [57][58][59]. Maize and rice are infested naturally by the small brown planthopper (SBPH) viral vector that overwinters on winter wheat [60,61]. The SBPH also migrates between regions in China and infects maize or rice [1,62], so variation in the virus could occur during migration and reproduction of this vector. The genetic diversity of the virus might be supported by its frequent transmission by the SBPH vector in maize and rice hosts in these eight geographic locations in 2013 and 2014. In the present study, the levels of nucleotide diversity observed in these isolates were similar in maize and Genetic Analysis and Evolution of Rice Black-Streaked Dwarf Virus rice, independent of geographic locations or years. However, the differences in observed nucleotide diversity among years or parts of geographic locations reached significant levels (P < 0.05). So it is possible that the distinct levels of nucleotide diversity in these two years and eight geographic locations may be greater than that in the two hosts. High levels of adaptation of codon usage have been reported for several viruses including those in the family Flaviviridae, which infect humans, and in other viruses that infect bacteria and humans [63,64]. A detailed comparative analysis was performed to evaluate the level of codon usage bias occurring in RBSDV S7 sequences. In general, RBSDV S7 exhibits a low degree of codon usage bias (average Nc, S7-1: 45.63, S7-2: 39.96), thus mutational bias is likely to be the major force driving codon usage bias in RBSDV S7. The nucleotide composition of these genes provided evidence of mutation as the major factor influencing the codon usage bias between S7-1 and S7-2 but not towards convergence. This result is consistent with previous reports showing that mutational bias is the major force that affects codon usage in other viruses [20,65,66]. Previous studies have shown that protein secondary structure and genomic architecture also influence codon usage bias in plant viruses [67]. Combining information from the conserved sequence of RBSDV S7 and its codon usage pattern, an RNA interference (RNAi) vector could be constructed to use to transform maize for disease resistance.
Previous studies have shown that the RBSDV population in China can be organized into three groups based on S8 sequences [1], or into two groups based on S9 [32] and S10 sequences [1,2], regardless of host or geographic origin. In the present study, 111 Chinese S7 isolates also clustered into two groups without regard to host or geographic location. This result also conforms with the results of a previous study on RBSDV S9 [32]. However, in the present study, years influenced the grouping to some degree. Within subgroup A, AI was widely distributed, while AII was comprised of the isolates from only 2013. Some isolates from 2014 clustered into subgroup BII. These results provided direct evidence of the irrelevance of hosts or different geographic locations but the relevance of years in regard to genetic variation among RBSDV isolates. Correspondence analysis of relative synonymous codon usage revealed a relationship between the phylogeny and the first and second axes of S7-1 and S7-2. These results suggest that the phylogenetic clusters are correlated with the values for CAI, Nc, GC3s, GC, AC3s, and G3s.
Population genetic structure is a significant aspect influencing the evolution of plant viruses and several studies of the genetic structure of plant virus populations have been reported. However, genetic structure had rarely been studied in S7 sequences from RBSDV or other segments of similar viruses harboring two ORFs. Frequent gene flow events were detected between the subpopulations comprised of two years, two hosts, and most of the geographic locations analysed, especially among year and host subpopulations in China. These results suggest that gene flow between years was more frequent than that between hosts, and that gene flow between geographic locations was the lowest. Because only S7 sequences were investigated in the present study, more evidence from other segments of RBSDV should be gathered to verify this hypothesis.
In conclusion, the genetic structure and codon usage bias of RBSDV S7 sequences were determined for 111 Chinese isolates from maize and rice hosts obtained from eight geographic locations in 2013 and 2014. Genetic variation and genetic structure were analysed for the RBSDV S7 dsRNA sequence that is comprised of two ORFs. Further, the present study represents the first time that codon usage bias in RBSDV has been analysed. These results should help elucidate the evolution of this virus and promote further exploration of the relationship between this virus and its hosts.
Supporting Information S1 Fig. Basic characteristics of codon usage in S7-1 and S7-2. (a) Values for Nc, CAI, CBI, GC3s, and GC in S7-1 and S7-2 in data for two years, two hosts, and eight geographic locations are shown. (b) Values for G3s, A3s, C3s, and U3s in S7-1 and S7-2 in data for two years, two hosts, and eight geographic locations are shown. (TIF) S1