Banana bunchy top virus genetic diversity in Pakistan and association of diversity with recombination in its genomes

Banana Bunchy top virus (BBTV) is a multipartite circular single strand DNA virus that belongs to genus Babuvirus and family Nanoviridae. It causes significant crop losses worldwide and also in Pakistan. BBTV is present in Pakistan since 1988 however, till now only few (about twenty only) sequence of genomic components have been reported from the country. To have insights into current genetic diversity in Pakistan fifty-seven genomic components including five complete genomes (comprises of DNA-R, -U3, -S, -M, -C and -N components) were sequenced in this study. The genetic diversity analysis of populations from Pakistan showed that DNA-R is highly conserved followed by DNA-N, whereas DNA-U3 is highly diverse with the most diverse Common Region Stem-loop (CR-SL) in BBTV genome, a functional region, which previously been reported to have undergone recombination in Pakistani population. A Maximum Likelihood (ML) phylogenetic analysis of entire genomes of isolates by using sequence of all the components concatenated together with the reported genomes around the world revealed deeper insights about the origin of the disease in Pakistan. A comparison of the genetic diversity of Pakistani and entire BBTV populations around the world indicates that there exists a correlation between genetic diversity and recombination. Population genetics analysis indicated that the degree of selection pressure differs depending on the area and genomic component. A detailed analysis of recombination across various components and functional regions suggested that recombination is closely associated with the functional parts of BBTV genome showing high genetic diversity. Both genetic diversity and recombination analyses suggest that the CR-SL is a recombination hotspot in all BBTV genomes and among the six components DNA-U3 is the only recombined component that has extensively undergone inter and intragenomic recombination. Diversity analysis of recombinant regions results on average one and half fold increase and, in some cases up to four-fold increase due to recombination. These results suggest that recombination is significantly contributing to the genetic diversity of BBTV populations around the world.


Introduction
Banana bunchy top disease (BBTD) is the most common and devastating viral disease of Banana, predominantly found in Pacific and Asian regions. It has been considered one of the most important plant viral diseases around the world [1]. In Pakistan, BBTD was first observed in 1988 in Sindh province. Later, based on symptomology it was identified in 1991 [2]. BBTD is caused by Banana Bunchy top Virus (BBTV), which belongs to genus Babuvirus in the family Nanoviridae [3]. BBTV is transmitted persistently by the aphid Pentalonia nigronervosa Coq, its sole known vector, and infects only the members of Musaceae [4].
Banana bunchy top virus (BBTV) is a multipartite circular single stranded (css) DNA virus comprising of six css DNA components each of about 1Kb in size [5,6]. BBTV genome consists of six integral ssDNA components, including DNA-R, C, M, S and -N which encode master replication initiation protein (M-Rep), cell cycle link protein (Clink) movement protein (MP), capsid protein (CP) and nuclear-shuttle protein (NSP), respectively while DNA-U3 encodes a protein for which no function has been assigned so far [5,[7][8][9][10]. Two conserved notable functional regions exist in all six DNA components of BBTV, which include the common region stem-loop (CR-SL), which is about 70 nucleotides long and contains a 31 nucleotides origin of replication (ori) for proposed rolling-circle replication [11,12]. Second region which is more than 70% identical in six components, and about 90 nucleotide long, is the common region major (CR-M) [5,13] to which small ssDNA primers bound and initiate complementary-strand DNA synthesis after entering a host cell [14] and is under the course of concerted type evolution [15,16].
Occurrence of genetic variations due to recombination is a well-established phenomenon in BBTV [17]. This phenomenon is of particular significance in the evolution of viruses as it provides the possibility of natural recombination events, which leads to extensive viral diversity [17]. In geminivirus evolution, the importance of recombination is well recognized [18][19][20] as it is the most probable mechanism responsible for the genetic diversification of agriculturally important begomovirus species [21,22]. Whilst generating the descendants with increased fitness, recombination is also a source for increased genetic diversity in begomoviruses [17]. In viruses, recombination breakpoints identification is a useful way to detect circulating recombinant forms and to infer the underlying recombination mechanisms such as intra and intergenomic recombination [23,24]. Although sequence analysis of individual BBTV components [9,10,15,16,[25][26][27][28][29][30] and complete BBTV genomes [31][32][33][34][35][36] revealed evidence of intra and intergenomic recombination, but no study related to the understanding the contribution of these recombination mechanisms in genetic diversity has been reported so far for BBTV. Previous molecular analysis of BBTV from Pakistan has been very limited with only one partial genome [37] and a few DNA components [16,25,37] have been sequenced from different districts of Sindh province. Thus, earlier phylogenetic analyses within the country performed on BBTV have used sequences of individual BBTV components rather than full genomes.
In the current study, we reported the sequence of complete genomes of five BBTV isolates originated from different districts of Sindh, Pakistan. The diversity analysis and putative recombination events were studied for Pakistani isolates and then for sequences available in the public database in GenBank. Also, the contribution of genetic diversity by recombination, recombination hotspots and population genetics are studied for BBTV genomes to better understand the evolution of this virus in various geographic regions around the globe.

DNA extraction, PCR amplification, cloning and sequencing
BBTV infected plant material showing typical BBTD symptom was collected from district Tandojam, Sindh province in 2006 for P.TJ1 isolate (for which DNA-R was previously reported by Hyder and colleagues (2007)), and in 2007, for P.BS1 & 2, P.GH1 & 2, P.HD1 & 2, P. JS1 (the DNA-U3 for P.GH1, P.JS1 and P.HD1 were previously reported by Hyder and colleagues (2011)), P.KP1 & 2, P.MT1 & 2, P.NS1, P.TA1 & 2 from Tandojam, Bhitshah, Ghotki, Hyderabad, Jamshoro, Khairpur, Matiari, Nawabshah and Tandoadam districts of the Sindh respectively. The P.TJ3, P.NARC and P.Sakrand & P.TJ4 were isolated during 2011, 2017 and 2018 respectively from Tandojam district. The total genomic DNA extraction was performed using the CTAB method as described by Hyder et al., (2007). PCR amplification and DNA sequencing were performed with two pairs of adjacent outwardly extending abutting primer each specific for a different location in each genomic component. Using this technique multiple reads in sequencing every component was generated and the sites where one set of primer binds, were sequenced using 2nd pair of primer and vice versa. for each genomic component of BBTV (Table 1).  (F80lacZΔM15), hsdR17, recA1, endA1, gyrA96, thi-1, relA1) cells by electroporation. While for those isolates sampled during 2007, the PCR products were ligated directly into the pGEM1-T Easy Vector (Promega, Madison, WI) and cloned in E. coli DH5α cell. The clones were selected using 50 μg/ml ampicillin and screened for white colonies generated by insertional inactivation of functional beta galactosidase gene whose expression was induced by IPTG (Isopropyl β-d-1-thiogalactopyranoside) converting chromogenic X-GAL (5-bromo-4-chloro-3-indolyl-β-Dgalactopyranoside) into blue color in non-recombinant bacterial cells. The plasmid DNA was extracted using a minipreparation protocol according to Sambrook and Russell [38], and confirmed by digestion with appropriate restriction endonucleases. For every component of an isolate, both the strands of two individual clones originated from two different primer sets of the respective component were sequenced. The genomic components of P.TJ1 isolate was sequenced using the CEQ Dideoxy Dye Terminator cycle sequencing Kit (Beckman Coulter, USA) and the CEQ 8000 Genetic Analysis System (Beckman Coulter, USA) by universal M13 and insert-specific primers to obtain full length sequence of each strand. The isolates sampled during 2007 were processed in a similar way and sequenced at sequencing facility of Iowa State University, U.S.A. Both the strands of the PCR products of two different primer sets specific for the same genomic component, from each isolate, sampled in 2011, 2017 and 2018 were directly sequenced using commercial DNA Sequencing Facility of Macrogen (Macrogen, Inc. South Korea). The sequence data of both the strand of a clone/PCR product was assembled first using DNA Dragon Contig Sequence Assembly Software (version 1.5.0) SequentiX-Digital DNA Processing (Germany) (https://www.sequentix.de/software_dnadragon.php). Then the consensus of sequencing, originating from two different primer sets, was developed for each component of an isolate. The consensus sequence of each sequenced component from an isolate was submitted in GenBank. The accession numbers were (MK140625, MK140626,  MK140627, FJ859727, FJ859728, FJ859722, FJ859733, FJ859734, FJ859732, FJ859723,  FJ859724, FJ859729, FJ859730, FJ859731, FJ859725, FJ859726, JX170762 for DNA-R),  (MK140628, MK140629, MK140630, JX1700764 for DNA-U3), (MK140619, MK140620,  MK140621, EF593169, FJ859740, FJ859741, FJ859735, FJ859746, FJ859747, FJ859745,  FJ859736, FJ859737, FJ859742, FJ859743, FJ859744,

Sequence retrieval and genetic diversity
The full-length sequences of all components of BBTV genome were obtained from NCBI Gen-Bank using Taxonomy Browser till April 05, 2020 and given abbreviated names for convenience (S1 Table). The coding and regulatory regions, including CR-SL and CR-M were identified as described by Burns and colleagues (1995), and the sequences were started from the start of major ORF identified by Herrera-Valencia and colleagues (2007) [39] in each component for alignment through MAFFT version 6.864 [40]. The genetic diversity values of each BBTV component and subgenomic regions were determined on the aligned sequences first for Pakistani isolates and then worldwide. The nucleotide diversity range analysis was performed using MatGAT [41], while Pair-wise average nucleotide diversity per 100 sites π (Pi) and Watterson estimator θw (Theta-w) for population mutation rates per 100 sites were determined using DnaSP version 6.12.03 [42].
To study the relationship BBTV isolates of Pakistan with BBTV isolates from other parts of the world, a Maximum Likelihood (ML) phylogenetic genomic tree based on the concatenated nucleotide sequences of entire genomic components (i.e. DNA-R, U3, S, M, C and N concatenated together in single sequence for an isolate) of those isolates for which full genomic sequences were available in the GenBank, was constructed with 1000 bootstrap replicates. In addition, individual component-based ML phylogenetic analyses of full-length sequences of each component separately was also performed in the Molecular Evolutionary Genetics Analysis program (MEGA), version X [43] with 1000 bootstrap replicates. The phylogenetic trees were visualized using FigTree version 1.4.3 [44] (http://tree.bio.ed.ac.uk/software/).

Selection pressure analysis
The selection pressure was estimated by d N /d S ratio, where d N represents the average number of nonsynonymous substitutions per nonsynonymous site and d S is the average number of synonymous substitutions per synonymous site. In MEGA X [43]. d N and d S values were estimated separately by using Nei-Gojobori method (Jukes-Cantor). The gene is under neutral selection when dN/dS ratio = 1, positive (or diversifying) selection when the dN/dS ratio is > 1 and negative (or purifying) selection when dN/dS ratio < 1.

Neutrality test
DnaSP version 6.12.03 [42] was used for testing Tajima D [45], Fu and Li's D � and F � [46] number of haplotypes (H), haplotype diversity (Hd) and nucleotide diversity (π). Tajima's D test in a genomic region measures the departure from neutrality for all mutations. Tajima's D test is based on the differences between the number of segregating sites and the average number of nucleotide differences. Fu and Li's D � test is based on the differences between the total number of mutations and number of singletons (mutations appearing only once among the sequence). Fu and Li's F � test is based on the differences between the number of singletons and the average number of nucleotide differences between every pair of sequences. Haplotype diversity refers to the frequency and number of haplotypes in the population while nucleotide diversity estimates the average pairwise differences among sequences.
The contribution of genetic diversity by recombinants in their respective populations was determined by selecting the recombined genomic regions for each recombination event in alignments of the respective component as identified by RDP and then selecting recombination population in the ML trees from the respective geographic location, aligning only the recombinant regions using MAFFT software and then calculating the genetic diversity range, π (Pi) and Watterson estimator θw (Theta-w) values as described earlier for the respective recombinant population with and without recombinant sequences.

Genetic diversity of BBTV population
Total fifty-seven genomic components including five complete genomes of BBTV were sequenced from Pakistan during 2006 to 2018 (accession numbers given in Table 2). The genetic diversity analysis of these components along with other reported isolates of BBTV from Pakistan [16,25,37], showed that DNA-R is highly conserved followed by DNA-N, whereas DNA-U3 is highly diverse followed by DNA-C. Similarly, for coding regions, DNA-R is highly conserved followed by DNA-M, while, DNA-C followed by DNA-S are highly diverse (Table 3). However, when different functional regions of BBTV genome such as CR-M and CR-SL are compared with the full-length sequences and coding regions, it is revealed that between BBTV components CR-M is highly conserved while CR-SL is highly diverse in DNA-U3. Contrary to sequences of full-length components and coding regions, CR-M in DNA-R is highly diverse followed by DNA-C. In a similar contradiction, DNA-U3 bears the most conserved CR-M region. The sequence diversity in the CR-SL region of Pakistani BBTV population reveals very interesting findings, the DNA-R, -S, -M and -C have identical CR-SL and there exists no divergence, however, DNA-U3 and N showed diverse CR-SL region. While in DNA-N the genetic diversity was lower than DNA-U3 in this region ( Table 3). The BBTV population in Pakistan is considered to have a monophyletic origin and supposed to have been originated from a single introduction of BBTV in the country [25,37] that took place before 1988 when BBTV was first observed in the Sindh [2]. The current diversity analysis (Table 3) suggests that BBTV components in Pakistan, are not evolving at a similar rate as some components are quite conserved i.e. DNA-R and DNA-N, while some have more diversification (i.e. DNA-U3). Interestingly, within a component, the divergence is not uniform, the most conserved component i.e. DNA-R has the most diverse CR-M, while DNA-U3 has CR-SL that is the most diverse functional region in the entire genome of BBTV.
Once having these insights about the genetic diversity of BBTV population in Pakistan, the genetic diversity of entire BBTV populations around the world was also calculated in similar way. The full-length sequences (1425 full-length components) from various BBTV isolates were obtained from GenBank using Taxonomy browser (S1 Table) and their diversity was calculated for two major groups of BBTV population i.e., the South Pacific and Asian groups [26] now referred to as Pacific Indian Ocean (PIO) and the Southeast Asian (SEA) group respectively [57] along with the entire world population. The analysis (Table 4) revealed that PIO group is more conserved compared to the SEA group, a finding that is consistent with the previous studies [7,9,26,33,58]. Among components DNA-R is the most conserved, while DNA-U3 is the most diverse component (Table 4) and the coding region analysis also follows the same trend. However, contradictory to the diversity analysis of BBTV-Pakistan, DNA-U3 has the most diverse CR-M in the entire BBTV population, while DNA-C has the most conserved CR-M region. The DNA-N has the most conserved CR-SL region in the entire population, while, like Pakistani population, DNA-U3 also has the most diverse CR-SL region. These analyses clearly suggest that different components and their functional part in BBTV genome are harboring different extent of genetic diversity and this diversity is related to different broader geographic regions such as PIO and SEA or to a country such as Pakistan.

Phylogenetic analysis
The phylogenetic relationship of BBTV isolates from different banana growing regions was analyzed to determine the relationships of isolates from Pakistan with those from worldwide. For this purpose, a concatenated genome-based ML phylogenetic tree was generated. The tree includes a total of 132 BBTV genomes out of which 107 were from PIO and 25 were from SEA group. The phylogenetic analysis showed close homology of Pakistan isolates with each other making a single clade with close relationships with isolates from Congo, Sri Lanka, Rwanda, Malawi, Burundi and India with a bootstrap support of 78% (Fig 1)     Note: Percent identity range analysis indicates the minimum to maximum percent nucleotide identity values obtained after pair-wise comparison of isolates in a given population. π (Pi) denotes pair-wise average nucleotide diversity per 100 sites along with standard deviation, while θw of BBTV isolates into two major clusters (PIO and SEA group) [57]. Interestingly, though Pakistani BBTV population phylogenetically belonged to PIO based on all the genetic components however, their genetic diversity pattern (Table 3) is quite distinct compared to the PIO  group (Table 4). Notably, the most conserved component in BBTV Pakistan is DNA-R, while in PIO it is DNA-C, while most conserved functional region in BBTV Pakistan is CR-SL (identical in DNA-R, -S, -M and -C) while in PIO it is CR-M of DNA-C.

Population genetics and selection pressure
To understand the evolutionary selection pressure on various BBTV genomic components, the nonsynonymous to synonymous substitutions rate per site were calculated (Table 5)   shows different populations in one event detected in RDP. Asterisk sign ( � ) shows only one isolate was involved in recombination at the given node. Arrows on both sides indicates that all the isolates were involved in respective recombination event. Blue arrow on both sides indicates that all the isolates of PIO group were involved in respective recombination event. The recombinant events described in S2 and S3 Tables are marked. Due to recombination different isolates showed intra and intergenomic events at different nodes.
The DNA-U3 encodes U3 protein for which any function is not yet been determined, however, it is an integral [5] and essential component for infection as demonstrated by infectivity assay for nanoviruses [61,62]. Despite being essential component, the higher dN/dS values noted for DNA-U3 in some countries where rest of the components are under purifying selection, indicates that some significant factors are at play differently for DNA-U3. Interestingly, three out of four DNA-U3 components analyzed in this study were previously reported to have undergone recombination in Burundi [31], similarly, all of the DNA-U3 components of India and most of the components from Samoa, USA and Australia have also been reported for recombination [31] thus pointing toward the possible role of recombination in observed positive selection seen in DNA-U3 in some countries.
Similarly, the Indian subcontinent is noted as a major hub [31] of long-distance banana and BBTV movements. Stainton and colleagues (2015) indicated it as both the major donor location (for BBTV dispersal events to other parts of the world) and the major recipient location (of virus introductions in it). In general, BBTV isolates from the PIO group are found throughout the natural geographical range of Musa balbisiana, whereas isolates from the SEA group are found throughout the ranges of M. balbisiana and M. acuminata [63]. However, the banana germplasm of some Indian regions primarily comprised of hybrids between M. balbisiana from the Indian subcontinent and M. acuminata from Southeast Asia [63,64]. This might explain the neutral pressure seen in DNA-U3 in India isolates. Different selection pressure was also observed among different banana genome types [30]. Later, Chiaki and colleagues (2015) also noted that selection pressure was higher in viruses infecting banana varieties with the AAB or ABB genotypes than those infecting with AA or AAA genotypes. The data of selection pressure in present study, suggests that BBTV genome is under negative selection pressure, however, the coding regions of DNA-U3 in some geographic regions around the word are under neutral and positive selection which might be due to mixing of isolates from PIO and SEA groups, and/or due to virus propagation in hosts of different genetic backgrounds and/or due to recombination.
Haplotype diversity values were high for BBTV components ranging from 0.4-1.0 (Table 6). Tajima's D values were calculated to determine the impact of demographic expansion and contraction in different BBTV DNA component populations. Negative values with statistical significance (P<0.02 or P<0.01) were obtained for DNA-R and DNA-C (Tonga population) and in the Australian population of DNA-S. These results were further confirmed Table 6. Neutrality test, haplotype and nucleotide diversity of each BBTV population.  (Table 6). Which suggests that these populations may be under expansion phase. On the contrary, no statistically significant positive or negative values were found in the remaining populations of BBTV components suggesting that these populations may be undergoing a neutral or contraction period.

Recombination analysis
A detailed analysis on full-length sequences of BBTV genomic components using various methods implemented in the Recombination Detection Program (RDP), reveals many recombination events of intergenomic (homologous recombination between the same components) (S2 Table) and intragenomic (heterologous recombination between the two different components) (S3 Table) in BBTV genome. There are about fifty-four (66%) intergenomic recombination events out of total eighty-two events, while there are twenty-eight (34%) intragenomic recombinant events. Component wise, DNA-U3 is involved in the majority (thirty-five events, about 43%) of detected recombination events, followed by DNA-M (thirteen events, about 16%) while DNA-R and -N are involved in only nine (11%) events each. The data of recombination occurrence for each component revealed that DNA-S which encodes coat protein of BBTV has the least recombination occurrence of 1.8% (5 recombination events from 274 components) followed by DNA-R with 2.6% (9 recombination events detected in 345 components) ( Table 7). It is worthy to note that DNA-U3 showed the highest 16.8% (35 recombination events occurred in 208 component) occurrence of recombination in BBTV genome which is about 9 times higher than the least recombined component of DNA-S. Interestingly, the DNA-U3 which was found to have the greatest genetic diversity (Table 4) shows the highest involvement in the recombination events occurring in BBTV genome, while DNA-R that is the most conserved component, is also among the least recombined components in BBTV genome.
To understand the contribution of recombination in BBTV genetic diversity, the recombined regions identified by RDP for the respective events were selected and the diversity

Component Geographical group Fu & Li's D � Fu & Li's F � Tajima's D Nucleotide diversity (π) Number of Haplotype (H) Haplotype diversity (Hd)
Taiwan (n = 9) -1. analysis of these geographic populations with and without recombinants was performed. The analysis (Table 8) revealed that recombination is responsible to increase the genetic diversity of many of the geographic populations which harbor recombinant isolates. In some instances (recombination event (1), Table 8 Table 8) the diversity of recombinant population is less than their non-recombinant population, however, the average of the entire dataset (for which recombinant and non-recombinant population of certain geographic region was available) showed a net contribution of about 1.4 times suggesting a significant contribution by recombination in BBTV genetic diversity. The frequency of recombined regions in BBTV genome was analyzed by plotting them on various components (Fig 3) using TJ1 as a reference for nucleotides coordinates. The data show that DNA-U3 is highly involved both in inter and intragenomic recombination and DNA-R and DNA-S are least recombined components. In the BBTV genome, CR-SL, a functional regulatory region is identified as the recombinant hotspot for both intragenomic and intergenomic recombination.

Discussion
Understanding the genetic structure of virus populations and their evolutionary mechanisms is an important aspect of managing viral diseases [65]. In Pakistan, information on the molecular analysis of BBTV was very limited with only a partial genome [37] and few full-length DNA components sequenced and deposited within the public database in GenBank [16,25,37]. However, inferences based on full-length BBTV genomes suggest that genetic exchange by recombination and reassortment might have played an important role in BBTV evolution    around the world [31]. Therefore, fifty-seven genomic components including five complete genomes were sequenced in this study from Pakistan and their genetic diversity and recombination analyses was performed.
In the present study to understand existing variability in BBTV population, a detailed analysis of genetic diversity of its all component was performed first for the Pakistani population and later for the entire population of BBTV in the world. The genetic diversity data of Pakistani population (Table 3) Table 3). An earlier study of recombination in CR-SL of DNA-U3 from Pakistan has also verified the diverse nature of this subgenomic regulatory region [16]. The analysis also indicated the heterogeneity of genetic diversity values associated with each component and their different parts, the overall diversity remained relatively small compared to the previously reported values for the Pacific Indian Ocean group [4,26,58]. Which has been split from the total population of BBTV probably a much larger time than the first introduction of BBTV in 1988 in Pakistan.
The genetic diversity of entire BBTV population around the world (Table 4) was also calculated and compared with the diversity analysis of Pakistani population. In contrary to the existing BBTV diversity, diversity analysis of Pakistani BBTV population showed that the CR-M of DNA-U3, is the most diverse CR-M in the entire BBTV population. The analysis also confirmed the heterogeneity of genetic diversity values associated with each component and their different parts, through the diversity values (ℼ) for full-length sequences ranged between 4.42 to 8.54 pair-wise average nucleotide diversity per 100 sites, with Watterson estimator (θw) for population mutations rate of 6.61 to 11.07 per 100 sites, greater of an order of 10 to 20 times than the Pakistani population, suggesting that other factors might be at play for this observed diversity. Multiple sequence alignment of full-length DNA-R component worldwide shares 92-100% and 89.8-100% sequence identity ( Table 4) with members of PIO and SEA group respectively. Which shows on average about 3% and 7% increase in PIO and SEA group as compared to earlier studies [15,26,33,58] (Table 3). While multiple sequence analysis of full-length components in PIO showed DNA-C as most conserved component (92.4-100%) and CR-M of DNA-C as conserved functional region (92.1-100%) ( Table 4). These differences of genetic diversity in various components of Pakistani BBTV population compared to PIO group might be due to various factors such as differential evolutionary pressure on important conserved proteins encoded by respective components or due to reassorted genomic components introduced in the country belonging to different regions. However, the phylogenetic analysis of individual genomic components (Fig 2, S1-S5 Figs and full entire genomic ML phylogenetic analysis (Fig 1) indicate that all Pakistani isolates clustered together as a clade of BBTV indicating that they have a common origin. This observation argues strongly against any possible reassortment. Therefore, the observed difference in the genetic diversity values of Pakistan isolates compared to PIO group, is not due to different origin rather it might be due to different evolutionary pressure on each component and/or possible recombination. The phylogenetic analysis of full-length DNA-R component (S1 Fig) showed that Pakistan isolates have closed homology with Egyptian and Indian isolates. Which is similar to previous studies [25,37,66] based on DNA-R full-length where Pakistan isolates have close homology with Egypt, India and Australia isolates. Similar analysis for other fulllength BBTV components (DNA-U3, -S, -M and -N) indicated close homology of Pakistan isolates with India (Fig 2, S3-S5 Figs). While in DNA-C (S4 Fig) the order of homology of Pakistan isolates with other PIO isolates was Egypt, Sri Lanka and India respectively. Notably the phylogenetic analysis of concatenated sequences of BBTV (Fig 1) showed clustering of Pakistan isolates within PIO which shows its homology with members of this region. However, within the PIO, Pakistan isolates have close homology with isolates from India, Congo, Sri Lanka, Rwanda, Malawi, and Burundi. Based on entire genome wise and individual component wise (DNA-U3, -S, -M and -N) ML analyses it is evident that Pakistani isolates have very close relationships with the Indian isolates (except for DNA-R and DNA-C component where they are closer to Egyptian isolates than Indian). The previous phylogenetic analyses from Pakistan based on Neighbor-Joining phylogenetic analysis of DNA-R only however, indicated that Pakistani isolates were closer to Egyptian isolates [25,37]. Based on more rigorous Maximum Likelihood phylogenetic analyses of entire as well as other genomic components (i.e. DNA-U3, -S, -M and -N) it is evident that Pakistani isolates are closely related to Indian isolates than rest of any country in the PIO group. The tendency of BBTV to induce genetic diversity is perhaps relevant for their ecological fitness. Selection pressure is an important estimator of evolutionary constraints imposed on coding regions [67]. The dN/dS ratio for coding regions (Table 5) of BBTV components showed that the population of DNA-R, DNA-S, DNA-M, DNA-C, and DNA-N components have strong purifying/negative selection while DNA-U3 have either positive or negative selection, which corroborates with the results of earlier studies for component R and S [30,64]. In a comparative susceptibility study between two banana cultivars (Dwarf Brazilian AAB and Williams AAA) to BBTV, Dwarf Brazilian showed lower percentage of BBTV infection (39%) as compared to Williams (79%) in field experiments [68]. Based on their observation, Hooks and colleagues (2009) hypothesize that one or more morphological differences between the two cultivars might impact P. nigronervosa ability to inoculate BBTV. Banana pseudostem is waxy and banana aphid prefers to feed in pesudostem so the differences in wax content or composition between the two cultivars may lead to observed differences in virus transmission in their studied varieties. Furuya and colleagues (2012) [69] in a susceptibility study between Dwarf Cavendish (AAA) and Itobasho (BB) also observed reduced virus transmission in itobasho variety. Our analysis exhibited an excess of synonymous over nonsynonymous substitutions, indicating strong purifying (negative) selection as an additional mechanism constraining genetic variation [70]. BBTV is disseminated by aphid in a persistent manner, hence it is admissible that the respective constraint is inflicted on the CP gene to circumvent the accumulation of deleterious mutations which might be able to impede the virus-vector complex interactions [64,71]. Majority of substitutions in DNA S were synonymous. As the sequence of DNA-S is conserved among all areas, the role of CP is crucial for the endurance of virus in the host plant [30]. Another possible sign of nucleotide encoding protein sequence having an impact on recombination patterns in BBTV is that the DNA-U3 component, which has no confirmed protein coding function, has a higher concentration of detectable recombination breakpoints (S2 and S3 Tables, Fig 3) than those of the known protein coding genes of other components. Remarkably, BBTV DNA-U3 seems to be most frequently exchanged by reassortment [31]. The presence of higher frequencies in the respective components depicts that it is substantially evolving neutrally without any risk that the recombinants might express defective chimeric proteins [72,80]. Therefore, there is little conservation of coevolved epistatic interactions within the component. In comparison to the full-length sequences, low genetic diversity was observed in the coding regions (Table 6) delineating that recombination is not only confined to coding regions but it also proliferates to CR-M and CR-SL (Fig 3) in BBTV.
To evaluate natural selection at the population level significant negative values of neutrality test in different BBTV components (Table 6) suggest population expansion that is consistent with the previous studies for BBTV components R and S [30]. This suggests that whenever a population is increasing the number of segregating sites will increase more rapidly than that of nucleotide diversity leading to a negative test value. Positive values in few populations of BBTV components indicate that during balancing selection, alleles are kept at intermediate frequencies because there will be more pairwise differences than segregating sites [73]. This is due to considerable sequence heterogeneity that imparts a reservoir of virus variants in the population. It enables a significant adaptation to the changing environmental conditions. Therefore, the gene flow provided by the recombination exploit the mechanism to ameliorate their evolutionary tendencies and local adaptation.
Recombination is a major contributor to genetic diversity [18,74] and genetic variations [75] in ssDNA viruses. Recombination analysis of BBTV components based on nucleotide sequence showed that DNA-U3 exhibits complex intra-and inter recombination patterns ( Fig  3) with a recombination occurrence (Table 7) of about 16.8% which is about 9 times more than the least recombined DNA-S component (1.8%). While the recombination occurrence for other BBTV components ranged from 2.6%-7.92%. These observations suggest that recombination in components such as DNA-U3 may be selectively more favorable than recombination in components such as DNA-R, -S, -M, -C and -N [31]. DNA-R and DNA-S showed more frequent intergenomic recombination than intragenomic with a percentage of 1.73 and 1.45 respectively. Few incidences of recombination in DNA-R and DNA-S suggest that because of conserved nature and core functions [16,64,76] they are more prone to intergenomic rather than intragenomic recombination.
The contribution of genetic diversity by recombination (Table 8) revealed by analyzing the recombined and without recombination regions for each recombination event detected in RDP. The analysis showed that overall, on average there was about a 56.25% (27 populations out of 48 populations) increase in genetic diversity due to recombination. While within the BBTV components DNA-N (population of Tonga) and DNA-U3 (population of Australia, Tonga, China, Philippines and Congo) showed on average 100% and 90% increased diversity of recombinant populations. which indicates that there is a possibility that with component The genomic components of BBTV are represented with inner circle having CR-SL, CR-M and ORF in black, pink and blue colours respectively. The nucleotide coordinates are corresponding to P.TJ1 isolates for reference. The red circles represent the intergenomic recombination, while the blue circle represent the intragenomic recombination. The intensity of colours was used to depict the frequency of recombination at a particular region using the key above. The recombinant events described in S2 and S3 Tables are marked. The figures were not drawn on scale.
The frequency of recombined regions in BBTV genome undergoing recombination (Fig 3) involved as many as seven times in intergenomic, while as many as four times in intragenomic recombination in various isolates of BBTV. DNA-U3 showed higher frequencies of recombined regions than other components of BBTV and overall recombined regions reside mostly in the CR-SL region. This region is a recombination hot spot in members of the geminiviridae [12,20,72] due to the production of a nick in the stem-loop by Rep which serves as the initiation site for rolling circle replication [79] through host DNA polymerase. Since BBTV also replicates by a rolling-circle mechanism [14], the CR-SL of DNA-U3 may similarly be a region subject to high levels of recombination.
Conclusively, five full genomes have been sequenced from BBTV infected banana plants from Pakistan. Deep insight into the genetic diversity analysis of Pakistani and entire BBTV populations around the world showed some interesting contradictions in the diversity of functional regions which might be attributable to recombination. This study highlights the benefits of characterizing complete BBTV genomes rather than focusing on individual components from Pakistan. This analysis also complements the facts that different BBTV genomic components are diversifying at different rates and within one component, different parts have different levels of divergence. These differences suggest that geographic factors are also playing role in shaping the diversity and evolution of BBTV. Recombination analysis showed DNA-U3 a recombination hot component while CR-SL a recombination hotspot in BBTV genome. This in certain cases leads towards about four-fold increase in genetic diversity in the recombined population.
Supporting information S1