Origins and Domestication of Cultivated Banana Inferred from Chloroplast and Nuclear Genes

Background Cultivated bananas are large, vegetatively-propagated members of the genus Musa. More than 1,000 cultivars are grown worldwide and they are major economic and food resources in numerous developing countries. It has been suggested that cultivated bananas originated from the islands of Southeast Asia (ISEA) and have been developed through complex geodomestication pathways. However, the maternal and parental donors of most cultivars are unknown, and the pattern of nucleotide diversity in domesticated banana has not been fully resolved. Methodology/Principal Findings We studied the genetics of 16 cultivated and 18 wild Musa accessions using two single-copy nuclear (granule-bound starch synthase I, GBSS I, also known as Waxy, and alcohol dehydrogenase 1, Adh1) and two chloroplast (maturase K, matK, and the trnL-F gene cluster) genes. The results of phylogenetic analyses showed that all A-genome haplotypes of cultivated bananas were grouped together with those of ISEA subspecies of M. acuminata (A-genome). Similarly, the B- and S-genome haplotypes of cultivated bananas clustered with the wild species M. balbisiana (B-genome) and M. schizocarpa (S-genome), respectively. Notably, it has been shown that distinct haplotypes of each cultivar (A-genome group) were nested together to different ISEA subspecies M. acuminata. Analyses of nucleotide polymorphism in the Waxy and Adh1 genes revealed that, in comparison to the wild relatives, cultivated banana exhibited slightly lower nucleotide diversity both across all sites and specifically at silent sites. However, dramatically reduced nucleotide diversity was found at nonsynonymous sites for cultivated bananas. Conclusions/Significance Our study not only confirmed the origin of cultivated banana as arising from multiple intra- and inter-specific hybridization events, but also showed that cultivated banana may have not suffered a severe genetic bottleneck during the domestication process. Importantly, our findings suggested that multiple maternal origins and a reduction in nucleotide diversity at nonsynonymous sites are general attributes of cultivated bananas.


Introduction
Cultivated bananas are the fourth most important crop in developing countries, and it has been proposed that they derive from the domestication of genus Musa [1,2]. Previous archaeological and linguistic studies have indicated that cultivated banana was initially domesticated by farmers in Southeast Asia about 7,000 years ago, and subsequently introduced into other regions of the world by transmigrants or travelers [1,2,3]. Nowadays, more than one thousand landraces of domesticated banana are cultivated in the tropical and subtropical regions of the world [1]. To gain a better understanding of the origin and domestication of cultivated banana, a series of studies using morphological and molecular methods has focused on the systematics and classification of members of the genus Musa. For example, Cheesman [4] divided this genus into sections Eumusa (x = 11), Rhodochlamys (x = 11), Callimusa (x = 10) and Australimusa (x = 10) based on morphological traits and basic chromosome number. This classification system was modified by Simmonds [5,6] and Argent [7], who included a new section Ingentimusa (x = 7) due to the different basic chromosome number. There have been extensive discussions about the identities of the progenitors of domesticated banana; M. acuminata and M. balbisiana have been proposed as the wild parents of modern bananas [6]. This hypothesis was subsequently confirmed by genetic studies of the genus Musa which indicated that at least four wild species, M. acuminata (donor of the A genome), M. balbisiana (donor of the B genome), M. schizocarpa (donor of the S genome) and M. textilis (donor of the T genome), have contributed to the gene pools of domesticated bananas [1,8,9]. At present, these four wild relatives are still widespread in the tropical and subtropical regions of Asia.
The species M. acuminata (section Eumusa) is widely distributed in the tropical and subtropical regions of Asia and at least nine subspecies have been identified: banksii, burmannica, burmannicoides, errans, malaccensis, microcarpa, siamea, truncata and zebrina [10,11,12]. Although no subspecies categories have been designated in M. balbisiana (section Musa), it also exhibits wide variation in morphological characters and is distributed across the tropical and subtropical regions of Asia. In contrast, however, the other two wild progenitors, M. schizocarpa (section Musa) and M. textilis (section Callimusa), are endemic to Papua New Guinea and Philippines respectively, and show no obvious morphological diversification [13]. Cultivated bananas differ from their wild relatives in being seedless and parthenocarpic; that is, the fruit develops without seed development or pollination and fertilization [1]. Although cultivated bananas reproduce through vegetative propagation, they exhibit a high level of morphological diversification in fruit size, shape and color. To provide a framework for banana classification, Simmonds and Shepherd [3] divided cultivated bananas into genotypes AA, AB, AAA, AAB and ABB on the basis of qualitative morphological descriptors and genome composition. This system provides a clear and coherent classification for cultivated banana and has, therefore, been widely accepted.
Although domesticated bananas are of socioeconomic importance, genetic studies on them have been limited due to the existence of polyploidy and parthenocarpy and to the difficulties inherent in sample collection. To elucidate the systematic relationships and genetic diversity of Musa germplasm, several studies have evaluated the genomic constitution of cultivated banana and its wild relatives using restriction fragment length polymorphism (RFLP), amplified fragment length polymorphism (AFLP) and simple sequence repeat (SSR) markers [2,14,15,16,17,18]. These studies have revealed that cultivated bananas originated from the genus Musa through complex geodomestication pathways. To date, although the major stages in banana domestication have been clarified, the maternal and parental donors of most cultivars are unknown, and the pattern of nucleotide diversity in domesticated banana has not yet been fully resolved.
In this study, to gain a better understanding of the origin and domestication of cultivated banana, we performed genetic analyses of 16 banana cultivars and 18 wild Musa accessions using two single copy nuclear genes: granule-bound starch synthase I (GBSS I or Waxy) and alcohol dehydrogenase 1 (Adh1). The product of Waxy is a key enzyme in amylose synthesis. It has been shown that mutations in Waxy (null alleles) can result in a reduction in amylose content [19]. Adh1 is a member of the alcohol dehydrogenase gene family whose products catalyze the NAD + -dependent oxidation of alcohols. The Waxy and Adh1 genes are the plant nuclear regions that have been most intensively investigated in studies on molecular phylogeny and population genetics [20,21,22,23,24,25]. For example, Guzman et al. [25] employed the Waxy gene to evaluate the phylogenetic relationships and origins of different types of wheat and revealed that Iberian spelt has a different origin from other spelts. In addition, Yoshida et al. [22] examined the nucleotide diversity of the Adh1 gene in Oryza rufipogon and found that this gene showed relatively low genetic diversity in comparison with other loci investigated in this species. Taken together, these previous studies have demonstrated that Waxy and Adh1 are ideal nuclear loci for use in investigating the molecular phylogenetic and nucleotide diversity of domesticated plants. To further infer the paternal and maternal donors of the cultivars investigated in the present study, we also surveyed the sequences of two chloroplast fragments maturase K (matK) and trnL-F. Our aims are to (i) reveal the origins of, and the domestication process that gave rise to, cultivated bananas and (ii) assess the genetic diversity of domesticated bananas and their wild relatives.

Plant Material
The Musa accessions used in this study were obtained from the Biodiversity International Transit Centre (ITC) and our own collections. As shown in Table 1, 16 cultivated accessions were sampled from 12 countries; they contain the six major genotypes (AA, AB, AS, AAA, AAB and ABB). For each genotype, multiple accessions were collected to represent different subgroups and to cover their geographical ranges. Similarly, the wild Musa accessions were selected to cover their natural geographic distributions and presented most of their varieties or subspecies.

DNA/RNA Extraction and cDNA Synthesis
Genomic DNA was extracted from frozen plant leaf tissue taken from single plants using the DNeasy Plant DNA Extraction kit (Tiangen, Beijing). To determine the structures of the Musa Waxy and Adh1 genes, total RNA was isolated from the unripe pulp of cultivar Haigong (AA genotype) using the RNAiso Plus and RNAiso-mate for Plant Tissue extraction kit (TaKaRa, Dalian). First-strand cDNA synthesis was performed using the Prime-Script TM 1st Strand cDNA Synthesis Kit (TaKaRa) following the manufacturer's directions.

Identification of Musa orthologs and sequence determination
The Musa orthologs of the Waxy and Adh1 genes were identified by performing a BLAST homology search [26] against the Musa EST database at NCBI using the Oryza sativa coding sequences as queries (GenBank accession numbers: Waxy: NM_001065985, Adh1: EF122490). The parameters for BLASTN (http://blast.ncbi. nlm.nih.gov/Blast.cgi) were set as follows: database was expressed sequence tags (est) and organism name was Musa. Sequences that were retrieved in this way were used to design primers with the software Primer Premier 5.0 (Premier Biosoft International, Palo Alto, CA). The positions of exons were determined by reference to the annotated O. sativa sequences. The homologs of matK and trnL-F in Musa accessions were amplified using published universal primer sequences [27,28].
All polymerase chain reaction (PCR) amplifications were performed using a PTC-200 thermal cycler (MJ Research) in 30 mL volumes each containing the following components: 10-50 ng template DNA, 0.3 mM of each dNTP, 0.6 mM of each primer, 16LA PCR buffer (Mg 2+ plus), 1.5 unit of LA Taq polymerase (TaKaRa). Cycling parameters were: 94uC for 1 min, 30 cycles of (98uC, 10 s; 68uC, 2 to 6 min) and a final extension at 72uC for 10 min. All the amplified products were cloned into the pMD-18 vector (TaKaRa). To obtain all haplotypes for each Musa accession, between four and ten clones were sequenced per accession, and all sequences were determined using an ABI 3730 DNA analyzer (Applied Biosystems). Singleton variants were checked by sequencing multiple clones or from direct sequencing of PCR products.

Data Analysis
Initial sequence editing and assembly were performed using ContigExpress (Informax Inc. 2000 North Bethesda, MD). DNA sequence alignment was implemented in ClustalX 1.83 [29] and if necessary alignments were edited manually in BioEdit 7.0.1 [30].
To infer the wild parents of cultivated bananas, the neighborjoining (NJ) method for phylogenetic inference was carried out in MEGA version 5 [31], using Kimura's 2-parameter distances [32]. Gaps were treated as missing data and bootstrap support values for the NJ trees were obtained from 1,000 replicates.
It has been proposed that AA diploid cultivated bananas were initially domesticated from wild relatives and triploid cultivars were then generated from diploid cultivars [2]. In addition, our phylogenetic analyses revealed that only the subspecies of M. acuminata that are distributed in the islands of Southeast Asia (ISEA) have donated genomes to domesticated bananas. We therefore estimated the overall nucleotide diversities of the Waxy and Adh1 genes in the A and B genomes of cultivated banana as well as in their respective wild relatives. We also evaluated the nucleotide diversity for AA, AAA, AAB, M. acuminata (island) and M. acuminata (mainland) groups separately. The nucleotide diversity of AS and ABB genotypes was not calculated because only one haplotype was obtained from these groups. Genetic analyses of sequence polymorphism were performed using DnaSP version 5 [33], values determined included number of segregating sites (S), number of haplotypes (H), Tajima's D [34] and Fu and Li's D* and F* [35]. In addition, we surveyed nucleotide diversity p [36] and h [37] for total, silent and nonsynonymous sites independently; insertions/deletions (indels) were not included in this analysis. To further evaluate how domestication has modulated the genomic constitution of cultivated bananas, we estimated the allele frequencies of Waxy and Adh1 genes in both M. acuminata (island) and the A-genome of cultivated bananas using DnaSP version 5.

Gene structures and primer sequences for Waxy and Adh1
To determine the exact structures of the Waxy and Adh1 genes in Musa, we sequenced cDNAs from the two single copy nuclear genes and predicted exon and intron boundaries by comparing them with the genomic sequences. Details of the Waxy and Adh1 gene structures and the primer sequences used are given in Table  S1 and Figure S1.

Genealogical patterns
In order to address the systematic relationships among these Musa accessions, phylogenetic analyses were performed for Waxy, Adh1 and combined chloroplast DNA (cpDNA) datasets separately. Although some clades exhibited low bootstrap support (, 50%), the phylogenetic trees of both Waxy and Adh1 genes consisted of three major clades (Figures 1 and 2). As shown in We have also identified the putative paternal and maternal donors of each cultivated accession based on phylogenetic analyses of nuclear and cpDNA datasets (Figure 1, 2 and 3). For instance, the S genome haplotype of Wompa (AS) clustered together with M. schizocarpa in the cpDNA phylogenetic tree. In contrast, the Agenome haplotype of Wompa (AS) was grouped with M. acuminata ssp. banksii/microcarpa. This suggests that M. schizocarpa donated the maternal genome of Wompa (AS). Our analyses revealed that each genotypic group of cultivated banana originated from multiple intra-and inter-specific hybridization events. For example, five diploid AA cultivars from different points across the distribution range were investigated in this study. The results of phylogenetic analyses of nuclear and cpDNA markers revealed that Haigong and Gongjiao have similar maternal origins, and Pisang Mas and Tomolo may originate from the same maternal donor. Similar results were also observed in triploid AAB and ABB groups, in which different cultivars within the same group showed diverse paternal and maternal origins. Overall, our results not only confirmed the complex geodomestication process that has given rise to cultivated banana, but also shed further light on the multiple origins of each genotypic group.

Nucleotide polymorphism and neutrality tests
DNA polymorphism analyses of the Waxy and Adh1 genes showed that genomic variations were abundant in these Musa accessions, with the total number of segregating sites being 167 in cultivated bananas (A genome) and 266 in M. acuminata (Table 2). In addition, 15 insertions/deletions (INDELs) were observed, all of them in introns of the two genes (data not shown). Two large INDELs were identified in Waxy, in introns 8 and 9.
Estimates of nucleotide diversity (p and h) for all cultivated and wild Musa accessions were performed for silent, nonsynonymous and total sites independently. Summaries of nucleotide diversity data for the two nuclear genes are given in Table 2. Reduced levels of polymorphism emerged as a general property of cultivated bananas relative to their wild progenitors. For example, the nucleotide diversity values (p and h) of the Waxy and Adh1 genes demonstrated that M. acuminata has slightly higher nucleotide diversity than the A-genome of cultivated bananas at total and silent sites ( Table 2). As the phylogenetic trees had revealed that only the ISEA subspecies of M. acuminata have donated genomes to cultivated banana, we divided the M. acuminata accessions into an island and a mainland group. Results from the analysis of DNA polymorphisms showed that diploid AA cultivars harbored only slightly lower nucleotide diversity than that of M. acuminata (island) ( Table 2). These findings suggested that the cultivated banana may not have undergone any very severe genetic bottleneck during the initial domestication process. Similarly, both the triploid AAA and AAB groups also possessed high levels of nucleotide diversity, indicating that the historical population size of the triploid bananas may also have been large. The B-genome of cultivated banana showed higher nucleotide diversity than that of M. balbisiana (Table 2).
Interestingly, we found that nucleotide diversity at nonsynonymous sites in both Adh1 and Waxy genes was reduced in the Agenome as well as within each cultivar genotype. No polymorphic sites in the Adh1 gene were observed within the AAA and AAB groups. Theoretically, reduced genetic diversity at nonsynon- ymous sites usually implies that artificial selection may have acted on the coding regions. Nonetheless, it was found that although the genetic diversity of M. acuminata was 4-to 6-fold higher than those of A-genome cultivars, about half the numbers of nonsynonymous mutations were identified in the A-genome of cultivated bananas (4 and 23 mutations for Adh1 and Waxy, respectively) compared with M. acuminata (8 and 44 mutations for Adh1 and Waxy, respectively). Additionally, the patterns of nucleotide variations in Waxy and Adh1 were examined for deviation from neutral equilibrium evolution using the Tajima's D and Fu and Li's D* and F* tests. As expected, no significant departures from the neutral model were observed in any test. This observation was further examined by an analysis of allele frequencies, in which cultivated bananas showed an excess of intermediate frequency variants in both Waxy and Adh1 (Figure 4).

Discussion
Domestication is a complex evolutionary process in which human use of plants and animals result in phenotypic changes that distinguish domesticated species from their wild progenitors [38,39]. In recent decades, many researchers have conducted studies of how, and to what extent, different plants and animals were domesticated from wild relatives [40,41,42,43,44,45,46]. Here, we address the origin and domestication of cultivated bananas using results obtained with two single copy nuclear (Waxy and Adh1) and two chloroplast (matK and trnL-F) genes.

Origin and domestication of cultivated bananas
It has been suggested that AA diploid cultivars were directly domesticated from M. acuminata [1,2,47,48]. To further address the origin of, and the domestication process that gave rise to, cultivated bananas, five diploid AA accessions and seven M. acuminata subspecies representing their geographical ranges were investigated. Results from phylogenetic analyses revealed that diploid AA cultivars were initially domesticated from ISEA through multiple intra-specific hybridizations between different subspecies of M. acuminata. This finding confirmed the previous hypothesis that cultivated bananas originated from three contact zones in ISEA through complex geodomestication pathways [2]. Perrier et al. [2] have proposed that the subsp. banksii and errans contributed to most of the diploid AA cultivars. However, our results demonstrated that the subsp. malaccensis might contribute  genetically to most of the five AA cultivars. In addition, we have shown that the five AA cultivars investigated here have diverse maternal donors, suggesting that multiple maternal origins may be a general characteristic of the AA cultivars. The origin of AAA triploid bananas has long been debated, and the most accepted view holds that they are derived from hybridizations between AA diploid cultivars [2,49,50,51]. In the present study, four AAA triploid accessions collected from Southeast Asia, Africa and American were evaluated, and the phylogenetic analyses generated similar genealogical patterns to those of AA cultivars. These results further confirmed a previous hypothesis that triploidization may occur separately in various geographic regions through multiple hybridizations between different diploid AA cultivars and subsequent spread to other regions of the world [1,2]. It is noteworthy that previous studies demonstrated that the cultivars Gros Michel (AAA) and Grande Naine (AAA), which comprise over 50% of banana production, have a common diploid ancestor [52,53]. Our study has further revealed that subsp. malaccensis-derived diploid AA cultivars may have donated maternal genomes to these cultivars.
It has been proposed that M. balbisiana and M. schizocarpa have also contributed to the gene pools of modern bananas [1,2]. In this study, seven cultivated accessions belonging to the AB, AS, AAB and ABB genotype groups were investigated. According to the phylogenetic trees (Figure 1, 2 and 3), at least one haplotype of each of these accessions was clustered together with M. balbisiana (shown in yellow) and M. schizocarpa (red), with the exception of Kunnan (AB genotype), in which all of the haplotypes grouped Overall, our study has not only confirmed previous findings that cultivated banana was initially domesticated from ISEA through complex geodomestication pathways, but also demonstrated that multiple maternal origins may be a general phenomenon in all diploid and triploid cultivars.

Genetic diversity of Musa germplasm
Theoretically, cultivated plants are usually expected to undergo severe genetic bottleneck events during the initial domestication and subsequent improvement processes, and such events may result in a sharp reduction in genetic diversity [54,55]. This hypothesis has been well documented in genetic studies of rice (Oryza sativa), wheat (Triticum aestivum) and other crops [56,57,58]. To evaluate the germplasm of domesticated bananas, several studies have investigated the genetic diversity of cultivated and wild Musa accessions using isozyme [59,60], AFLP [14,15], microsatellite [18] and PCR-RFLP [61] markers. These studies have demonstrated that domesticated bananas harbor high levels of genetic diversity.
In this study, we evaluated the nucleotide diversity of 34 wild and cultivated Musa accessions by assessing the DNA polymorphisms of the Waxy and Adh1 genes. As expected, although only five diploid AA accessions were investigated, slightly lower nucleotide diversity was observed in Waxy and Adh1 genes in these accessions relative to their wild parent M. acuminata (island). High nucleotide diversity in cultivated banana implies that it may have had a historically large population size and did not undergo any severe genetic bottleneck during the domestication process. Nonetheless, high degrees of phenotypic divergence between different cultivars were observed, suggesting that artificial selection may have acted on morphological traits in such a way as to decrease the genetic diversity of domesticated bananas. Two factors may explain this phenomenon: the first is that multiple intra-specific hybridizations between subspecies of M. acuminata led repeatedly to the occurrence of mutant lines with seedless and parthenocarpic fruit, and this may have brought about an increase in the number of initial founders of domesticated bananas. This hypothesis has been put forward previously, and our present study shows that cultivated bananas have arisen through complex geodomestication pathways [1,2]. This allows us to speculate that multiple hybridization origins may be at least partly responsible for the enormous nucleotide diversity of diploid AA cultivars. Multiple morphological variants were also found among the subsp. of M. acuminata [4,62], and these may have contributed to the phenotypic variation found in domesticated banana. Secondly, most cultivars were initially collected from the wild by farmers and then brought into cultivation via vegetative propagation. This will have facilitated preservation of selected somatic clonal variants with useful traits, such as inflorescence, fruit and height characteristics [63,64,65]. Taken together, accumulation of mutations resulting from hybridization as well as new somatic mutations may be responsible for the high nucleotide diversity of AA cultivars. Triploid cultivars (e.g., AAA, AAB and ABB) originated from diploid AA cultivars, and high levels of nucleotide diversity were also found in the A and B genomes ( Table 2). As shown in the phylogenetic analyses (Figures 1, 2 and 3), triploid cultivars (including AAA and AAB) have also undergone multiple hybridization events that may have resulted in the observed high levels of nucleotide diversity.
It should be noted that nucleotide diversity at nonsynonymous sites in the Waxy and Adh1 genes was reduced in cultivated banana in comparison with its wild relatives, implying that functional constraints may affect the coding regions. It has been shown that the Waxy gene plays a crucial role in amylose synthesis [19]. Several Waxy mutants (null alleles) have been identified in maize, rice and barley, and these null alleles down-regulate amylase expression. Thus, Waxy (amylose free or glutinous) mutants became targets of artificial selection during domestication and subsequent crop improvement processes [66,67]. In domesticated banana, however, although the amylose content in total reserve starch varies in different cultivars (from 15% to 25%), the percentages of amylose in most cultivars are usually below 19% [68]. In addition, our study showed that no Waxy mutants were found in these Musa accessions and there were no significant departures from the neutral model in any statistical tests. These observations allow us to infer that artificial selection may have not acted on the Waxy gene of cultivated bananas. This phenomenon was also observed in the Adh1 gene, in which nucleotide diversity at nonsynonymous sites decreased in cultivars relative to wild accessions. The Adh1 gene, a member of the alcohol dehydrogenase gene family, has been shown to be a neutral locus in grass crops [57,69]. Only a small number of nonsynonymous mutations were identified in the A-genomes of cultivated bananas and of the ISEA subspecies of M. acuminata. Therefore, an overall decrease in the number of mutations might have affected the nucleotide diversity at nonsynonymous sites. Overall, our findings implied that the Waxy and Adh1 genes may not have been under artificial selection during the domestication process. However, this finding requires further investigation in order to address the question of whether a reduction in nucleotide diversity at nonsynonymous sites is a general attribute in cultivated bananas.

Conclusion
Previous studies in the fields of archaeology, genetics and linguistics have shed light upon the major stages in the domestication process of cultivated bananas [1,2,70]. In this study, we have not only confirmed the multiple intra-and interspecific hybridization origins of cultivated banana, but also revealed that both diploid and triploid cultivars harbor high levels of nucleotide diversity. However, only a small number of Musa accessions were investigated in this study, and it may therefore be insufficiently representative of the genus. It will be necessary to employ tens of nuclear genes and hundreds of Musa accessions in future studies to further elucidate the domestication process undergone by cultivated banana.