The genetic diversity of cowpea was analyzed, and the population structure was estimated in a diverse set of 768 cultivated cowpea genotypes from the USDA GRIN cowpea collection, originally collected from 56 countries. Genotyping by sequencing was used to discover single nucleotide polymorphism (SNP) in cowpea and the identified SNP alleles were used to estimate the level of genetic diversity, population structure, and phylogenetic relationships. The aim of this study was to detect the gene pool structure of cowpea and to determine its relationship between different regions and countries. Based on the model-based ancestry analysis, the phylogenetic tree, and the principal component analysis, three well-differentiated genetic populations were postulated from 768 worldwide cowpea genotypes. According to the phylogenetic analyses between each individual, region, and country, we may trace the accession from off-original, back to the two candidate original areas (West and East of Africa) to predict the migration and domestication history during the cowpea dispersal and development. To our knowledge, this is the first report of the analysis of the genetic variation and relationship between globally cultivated cowpea genotypes. The results will help curators, researchers, and breeders to understand, utilize, conserve, and manage the collection for more efficient contribution to international cowpea research.
Citation: Xiong H, Shi A, Mou B, Qin J, Motes D, Lu W, et al. (2016) Genetic Diversity and Population Structure of Cowpea (Vigna unguiculata L. Walp). PLoS ONE 11(8): e0160941. doi:10.1371/journal.pone.0160941
Editor: Swarup Kumar Parida, National Institute for Plant Genome Research, INDIA
Received: May 18, 2016; Accepted: July 27, 2016; Published: August 10, 2016
This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: USDA National Institute of Food and Agriculture Hatch project accession number 1002423.
Competing interests: The authors have declared that no competing interests exist.
Cowpea (Vigna unguiculata L. Walp.), an annual crop, is one of the most important and widely cultivated legumes in the world, particularly in Africa, Latin America, and some parts of Asia and the United States. According to the data from the Food and Agriculture Organization (FAO) (http://www.fao.org), approximately 5.8 million tons of dry cowpea cereal is produced annually with a minimum of 11 million hectares planted all over the world. Cowpea cereal is a very important source of carbohydrates (63%) and proteins (25%), with low fat content (1.5%), and is rich in vitamins, minerals (Ca, P, Fe), folate, thiamin, and riboflavin. Cowpea is chiefly used as a grain crop; however, it also finds use as animal fodder or as a vegetable [1,2].
To breed new varieties with target traits, researchers need to have a better understanding of the raw breeding materials. An accurate selection of materials containing the desired gene may help to achieve the breeding objectives and shorten the breeding period. Outstanding cultivars are likely to be selected from a wider range of germplasm, on the other hand, from low-diversity germplasm, it is difficult to obtain high seed quality or new disease-resistant varieties . Therefore, genetic diversity research is a crucial premise for breeding. From another perspective, long-term natural selection and artificial selection lead to certain genetic differences among crop populations and individuals. This is an important basis to cultivate new varieties of crops and to develop crop production. Along with the wide application of the molecular methods, breeding programs have remarkably expedited new cultivar release. Nevertheless, with the high-efficiency breeding process and new variety releases, some of the traditional local varieties have been gradually eliminated, resulting in narrowing of the genetic background of crop varieties. It is reported that in the last a few decades, Indonesia and China have lost 1,500 local rice varieties and 651 soybean varieties, respectively . The shrinking genetic resources of crop varieties poses a great threat to agricultural production. Cowpea has a very long domestication history, and many Asian countries have ancient records of planting and cultivating cowpea 2000 years ago, which has resulted in cowpea facing a similar threat [3, 5, 6]. Therefore, protection, research, development, and utilization of plant genetic diversity are extremely important as the basis of cowpea breeding and genetic research. Genetic diversity research provides the basis of the genetic variation and genetic relationships among cowpea genotypes, thus providing information for the preservation and utilization of germplasm resources and improvement of cultivars .
The genetic diversity and relationship research of cowpea is a challenging topic for geneticists and breeders. In the last century the Vigna was initially divided and identified based on morphological traits . However, morphological markers are easily influenced by the environment . DNA markers such as restriction fragment length polymorphisms (RFLPs) , random-amplified polymorphic DNAs (RAPDs) [11–13], amplified fragment length polymorphisms (AFLPs) , inter simple sequence repeats (ISSRs) , and simple sequence repeats (SSRs) [15, 16] have been widely used for analysis of population structure and genetic diversity in plants. Single nucleotide polymorphism (SNP) has emerged as a powerful tool in genetic diversity studies as compared to other markers such as AFLP and SSR , because SNPs are abundant in the genomes of plants and other organisms . Before 2012, to our knowledge, no cowpea diversity studies reported the use of SNPs markers . In recent years, next generation sequencing (NGS) technologies using genotyping by sequencing (GBS)[19, 20] have been widely used for SNP discovery utilized in trait mapping associated with an inexpensive and fast approach[21, 22].
Previous cowpea genetic diversity researches revealed a clear but limited global picture because most investigators focused only on accessions from one or a few continents, and only a few researchers conducted genetic diversity studies using the entire global germplasm. Huynh et al  conducted genetic diversity and population structure analysis using SNP markers among 442 cowpea landraces collected throughout Africa and in other cowpea-growing regions of Asia, Europe, North America, and South America. The results revealed the presence of two major gene pools in cultivated cowpea in Africa. They described the relationship of cowpea landraces among the regions within and outside of Africa. However, only 15 samples collected from the American continent were included in their study. It would be beneficial to include more cowpea genotypes for genetic diversity analysis. Globally, it is difficult to find any reported current research on the worldwide genetic diversity of cultivated cowpea. Therefore, the diversity of cultivated cowpea from worldwide germplasm needs to be studied.
Genetic diversity research is often limited to domestication and phylogeny studies. There are several opinions on the origin and history of the ancient domesticated cowpea  and previous studies focused mainly on the local cowpea resource, especially in Africa and Asia . West Africa and the Indian Sub-continent were considered the origin of cowpea domestication, according to earlier studies [25, 26]. With accumulation of evidence, the theory of Asian origin was unable to explain the traits and distribution of wild cowpea Vigna dekindtiana . However, the intermedium-type of wild-domesticated cowpea, found in West and Central Africa, was considered proof of the West African origin center theory [8, 28, 29]. However, subsequent reports have not shown any consistent conclusions about the first domestication location for cowpea in Africa. Recently, 26 domesticated and 30 wild cowpea lines from West, East, and South Africa were analyzed by Ba et al. (2004) using RAPD markers. The results revealed wild species in East Africa having more polymorphisms, indicating that it may be the origin of cultivated cowpea . Coulibaly et al. , using molecular markers, proposed that early domestication of cowpea occurred in Eastern Africa. Although whether Africa is the first domestication region remains uncertain, both West and East domestication theories are now widely accepted. Nonetheless, compared to many other important crops, cowpea is relatively little understood with respect to the relationship of dispersal and development among first domestication region, sub-domestication region, and cultivated regions.
A diverse set of 768 cultivated cowpea genotypes, distributed in 58 countries, was included in this study. Besides the cowpea accessions from Africa, several accessions from Asia, America, and Oceania continents were also included in this study to analyze the genetic diversity among all geographic regions and the relationship among geographic regions. GBS was used to discover SNP in the cowpea set, and SNPs postulated from GBS were used to estimate the level of genetic diversity, population structure, and phylogenetic relationships. The study aimed to detect the gene pool structure of cowpea and to determine its relationship among different regions. According to the phylogenetic relationship between each individual, region and country, we attempted to trace the accession from off-original regions back to the two candidate original areas (West and East of Africa) to predict the migration and domestication history during the cowpea dispersal and development. The objective of this study was to systematically analyze the genetic variations and relationship among globally cultivated cowpea genotypes and to conduct population structure analysis of the species in order to put forth information for curators, researchers, breeders to utilize, conserve, and manage cowpea germplasm accessions and cultivars in cowpea breeding and other research programs.
Materials and Methods
A total of 768 cowpea genotypes, collected from 56 countries, were used in this study. Among them, 716 accessions were obtained from the US National Plant Germplasm System (NPGS, http://www.ars-grin.gov/npgs/), and 52 local cultivars and breeding lines were taken from the University of Arkansas breeding program. The accessions were mainly from four regions consisting of 567 lines, India having 160 accessions, North America 162 (including 74 American cultivars), South Africa 133, and West Africa 112; and the other regions had 201 cowpea lines: Central East Africa 25, East Asia 26, Europe 8, Oceania 9, Central West Asia 66, and Latin America 67 (Table 1, S1 Table).
DNA extraction, sequencing, and SNP calling
Five seeds from each cowpea line were planted in pots in the greenhouse. Two- weeks- old leaves from each accession were bulked into a single sample. The genome DNA was isolated from approximately one gram of leaf tissue from each bulk using a CTAB-based method . The DNA quality and concentrations were detected by electrophoresis gel and a Nano Drop 2000 Spectrophotometer (Thermo Scientific, Wilmington, DE, USA). The DNA samples with a bright band and a concentration of more than 400 ng μl-1 were selected and transported to the Beijing Genome Institute (BGI) for GBS [19, 20] and SNP calling. DNA normalization, library preparation, and GBS were conducted by HiSeq 2000 in the BGI. The raw sequencing data and SNP calling were analyzed by BGI using SOAP family software (http://soap.genomics.org.cn/). The SOAPaligner/soap2 (http://soap.genomics.org.cn/) was used to align the short-read to cowpea genome reference (cowpea_Genome_0.03.fa) and SOAPsnp v 1.05 was used for SNP calling [33, 34]. The cowpea_Genome_0.03.fa (6,750 scaffolds or contigs) (http://harvest-blast.org/) was kindly provided by Dr. Timothy J. Close at the University of California, Riverside, U.S.A.
The model-based program STRUCTURE 2.3.1  (http://pritchardlab.stanford.edu/software/structure_v.2.3.1.html) was used to infer the population structure. In order to identify the number of populations (K) the capturing of the major structure in the data, we set up at a burn-in period of 10,000 Markov Chain Monte Carlo iterations and 100,000 run length, with an admixture model following Hardy-Weinberg equilibrium and correlated allele frequencies as well as independent loci for each run . Ten independent runs were performed for each simulated value of K, ranging from 1 to 11. Subsequently, the optimal K was determined using Structure Harvester  (http://taylor0.biology.ucla.edu/structureHarvester/).
For each SNP, the major allele frequency, heterozygosity, gene diversity, and polymorphism information content (PIC) were calculated using the PowerMarker V3.25 software, and the genetic diversity for the entire set of cowpea genotypes as well as the geographically based sub-populations were also identified by PowerMarker version 3.25 , using genetic distances with CS Chord 1967 method . Analysis of molecular variance was performed with the software Arlequin 3.5  applied to all informative markers. Phylogenetic relationships and principal component analysis (PCA) were generated by TASSEL 5.2.13 to analyze genetic relationships among accessions and to determine the optimal number of clusters in the study. Phylogenetic Tree based on the genetic-distance among regions or countries was calculated by using neighbor-joining method  in the function of PowerMarker version 3.25 and was visualized by software MEGA 6 .
Single nucleotide polymorphism diversity
Totally 5,828 polymorphic SNPs with less than 50% missing across 768 accessions/cultivars were obtained from the BGI. These were subsequently filtered by removing the rare alleles (less than 5%), high-missing ratios (more than 30%) and the high heterozygosity alleles (more than 70%). The finally selected 1,048 SNPs were subjected to genetic analyses and six SNP types were determined from them as follows: [AG] SNP type had 270 SNPs (25.7%); [CT] 246 (23.5%); [GT] 149 (14.2%); [AT] 139 (13.3%); [AC] 127 (12.1%); and [CG] 118 (11.2%). Among the 1,048 SNP loci, the major allele frequency, gene diversity, heterozygosity, and PIC averaged 0.77, 0.32, 0.06 and 0.26, respectively, and also showed large ranges of; 0.50–0.95, 0.09–0.52, 0.0–0.68, and 0.08–0.41, respectively, indicating the existence of SNP variations and mutations (gene flow) and also genetic diversity among the 768 cowpea genotypes.
Population structure and genetic diversity
The population structure of the 768 cowpea accessions/cultivars was inferred using STRUCTURE 2.3.4  and the peak of delta K was observed at K = 3, suggesting the presence of three main populations (clusters, Q1, Q2, and Q3) in the cowpea panel (Fig 1A and 1B). The classification of accessions into populations based on the model-based structure from STRUCTURE 2.3.4 is shown in Fig 1B and the S1 Table. Using 0.55 as the likelihood to cluster for each accession in the three populations, a total of 732 accessions/cultivars (95.2%) were grouped to one of the three populations. The first cluster of 288 (37.5% of total accessions/cultivars) accessions was grouped into Q1, the next 260 (33.9%) into Q2, and 183 (23.8%) into Q3 (Fig 1B, S1 Table). The remaining 37 out of the 768 accessions (4.8%) were placed in the admixture (S1 Table).
(A) Delta K values for different numbers of populations assumed (K) in the STRUCTURE analysis. (B) Classification of 768 accessions into three populations using STRUCTURE 2.3.1. The distribution of the accessions to different populations is indicated by the color code. Numbers on the y-axis show the subgroup membership, and the x-axis shows the different accession.(C)Unrooted Neighbor-Joining (NJ) tree of the 768 accessions drawn by MEGA 6 and each colored shape represents one cluster matching the structure population (blue for Q1, red for Q2, yellow for Q3, and green for admixture).(D)the scatter diagram of Principal Component Analysis (PCA) of the 768 accessions, calculated by TASSEL and drawn by Excel and each colored spot is representative of one cluster Q1 to Q3 same as in (C).
Neighbor-joined cluster analysis from MEGA 6  also clearly divided the 768 accessions into three groups (Fig 1C), which was consistent with the model-based population structure from STRUCTURE. The distribution of the two dimensions created by principal component analysis (PCA) on all 768 accessions (Fig 1D) also supported the separation of the accessions into three clusters, which was also consistent with the model-based population structure. In summary, the model-based ancestry analysis, the phylogenetic tree and the PCA strongly supported that cowpea had three well-differentiated genetic populations and admixtures.
Genetic diversity by region
In this study, the tested 768 cowpea accessions, except the Indian accessions, were divided into 9 groups based on their original geographical regions: South Africa, West Africa, Central East Africa, East Asia, Central West Asia, Europe, Oceania, North America, and Latin America; the germplasm accessions from India was listed as a special group called “India” because West Africa and the Indian Sub-continent were considered to be the origin of cowpea domestication according the earlier studies [25, 26]; and the cultivars from US was also placed in a separate group named “American Cultivars” because they were more developed as cultivars for cultivation by farmers (Table 1, S1 Table).
Based on the 11 groups, the genetic parameters of each of the 11 groups were estimated for the number of cowpea accessions, number of countries, the major allele frequency, gene diversity, heterozygosity, and PIC (Table 1). Nine of the groups had 25 or more cowpea accessions however; Europe and Oceania had only 8 and 9 accessions, respectively. The countries in each group were also listed in Table 1, where the accessions in “North America” were all collected from the U.S.A; the “Oceania” group from Australia; the special groups “India” and “America Cultivar” from India and the U.S.A., respectively. The gene diversity varied in different groups from 0.21 in Europe to 0.35 in India, indicating genetic variation in each group. The heterozygosity was 0.05 or 0.06, representing that there was 5 or 6% heterozygosity of alleles existing in each group and most of the alleles in the cowpea accessions of each group were fixed with homozygosity. The PIC ranged from 0.17 in Europe to 0.32 in India, which was similar to the gene diversity, indicating that the accessions in Europe had the least variation and the accessions in India varied the most.
The genetic distance among the 11 groups was calculated using CS Chord 1967  method in PowerMarker version 3.25 . The phylogenetic tree drawn using neighbor-joining method  was visualized by software MEGA 6  (Fig 2), and divided into three clusters where, North America, Latin America, Oceania, Central East Africa, India, and South Africa together formed the cluster 1; West Africa alone comprised the cluster 2; and the American Cultivar, East Asia, Central West Asia, and Europe were placed in the cluster 3 (Fig 2). The cowpea genotypes in the same cluster displayed closer genetic backgrounds.
The blue and yellow balls in each region represent the accession ratio of cluster Q3 ratio in region: blue for low (less than 10%) and red for high (greater than 20%) (Table 2).
The population structure in each of the 11 groups was further analyzed based on the results from STRUCTURE 2.3.1 (Fig 2). Three clusters (Q1, Q2, and Q3) were observed in the all tested 768 cowpea genotypes (Fig 1). However, the numbers and the percentage of accessions in each of the 11 groups were different (Table 2). Eight of the 11 groups except East Asia, Europe, and Oceania had three structured populations (clusters Q1 to Q3); East Asia had two clusters Q1 and Q2 but lacked Q3; Europe exhibited only one cluster (Q1); and Oceania had Q2 and Q3 but no Q1. In cluster 1 (Q1), the majority of accessions (above 10%) came from West Africa (51 accessions; 17.7%), India (49 accessions; 17.0%), North America (42 accessions; 14.6%), Latin America (30 accessions with 10.4%), and American Cultivars (35 accessions; 12.2%), representing 71.9% of the total of Q1 accessions. In cluster 2 (Q2), the majority of accessions came from South Africa (58 accessions; 22.3%), West Africa (51 accessions; 19.6%), Central West Asia (34 accessions; 13.1%), and American Cultivars (34 accessions; 13.1%), representing 68.1% of the total of Q2 accessions. In cluster 3 (Q3), the majority of accessions came from India (80 accessions; 43.7%), South Africa (36 accessions; 19.7%), and North American (19 accessions; 10.4%), representing 73.8% of the total of Q3 accessions. The different accession numbers and percentage of each cluster among the 11 groups and the different accession percentage of each group among the three clusters revealed that a geographical or a regional factor existed for cowpea genetic diversity and population structure.
For each of the 11 groups, the majority of accessions were sectored into different structured populations (clusters). For the two groups; America Cultivar and West Africa, the majority of accessions were divided into two populations (cluster Q1 and Q2) with nearly half each; in the other three groups: North America, Latin America, and Central East Africa, the majority of accessions belonged to Q1 with nearly 50% accessions, and the other half were distributed in Q2 and Q3; Europe had 100% accessions in Q1; for the East Asia, Central West Asia, and South Africa group, nearly half to more than half were in Q2; a total of half the accessions in India were located in Q3 (50.0%), and 88.9% accessions in Oceania were in Q3 (Table 2), further emphasizing that geographical factors play a role in the cowpea genetic diversity and population structure.
Based on the percentage of accessions in the structured population 3 (Q3) in each of the 11 groups, two larger groups can be constructed: i) a high Q3 ratio group including North America (21.6%), Latin America (25.4%), Oceania (88.9%), India (50.0%), Central East Africa (24.0%), and South Africa (27.1%), and ii) a low Q3 ratio group consisting of America Cultivar (4.1%), Europe (0%), East Asia (0%), Central west Asia (9.1%), and West Africa (7.1%), which is similar to the phylogenetic relationship in Fig 2 created by PowerMarker drawn by MEGA 6.
Genetic diversity by country
The genetic diversity was further analyzed by country in all 768 tested cowpea genotypes originally collected from 56 countries with 22 out of the 56 countries showing 5 or more cowpea accessions (S1 Table). A sub-total of 705 accessions from the 22 countries was further studied for genetic diversity based on its country of origin. The genetic distances among the 22 countries were obtained with CS Chord 1967 method  by PowerMarker  and the phylogenetic tree was created and viewed using MEGA 6 (Fig 3). The results obtained were differentiated into three distinctive clusters. Nine countries were placed in Cluster 1, where five were from Asia (Afghanistan, Iran, Pakistan, Turkey, and China), two were from West Africa (Cameroon and Niger), and one was from Europe (Hungary) plus the America Cultivar; Cluster 2 consisted of four countries: Nigeria, South Africa, India, and US; and Cluster 3 included ten countries, four from Latin America (Brazil, Guatemala, Mexico, and Paraguay), three from South Africa (Botswana, Mozambique, and Zimbabwe), and one each from West Africa (Senegal), Central East Africa (Kenya) and Oceania (Australia) (Table 3). The three clusters created by country were similar to those by region (Table 3). Eighteen of the 24 countries with the exception South Africa, India, US, Senegal, and f two from West Africa (Cameroon and Niger), were divided into the corresponding three clusters (Table 3), indicating a similarity of results from two genetically diverse approaches.
Three Clusters were divided among 22 countries plus the US cultivars (USC) (Table 3).
The genotyping by sequencing (GBS) using the Illumina HiSeq showed high levels of SNP variations among the cowpea samples. The percentage of each SNP type in our study was 25.7, 23.5, 14.2, 13.3, 12.1, and 11.2% for [AG], [CT], [GT], [AT], [AC], and [CG], respectively, which complied with previously reported research  where 1536-SNPs of the Golden Gate genotyping assay was used. Our research confirmed that the SNP variations existed in cowpea germplasm and the [AG] or [CT] SNP types were more prevalent in cowpea than other types.
The majority of genetic variance exists within instead of among geographic regions and within instead of among countries in the USDA cowpea world collection. Compared to the study by Zannouou et al.  with accessions from Benin, the genetic variances among populations (regions and countries) from this study were very low. However, there are some reports with similar results in the accessions from Sudan  and Ghana  . Several other studies like the one on Phleum, also did not observe significant correlation between the various accessions and their geographic origins . The explanation for this phenomenon might be because of the different germplasm used.
Population structure analyses divided the 768 cultivated cowpea genotypes into 3 gene pools (clusters Q1, Q2, and Q3) based on the peak of delta K (DK) at K = 3. While the DK had a high score at K = 2 (Fig 1 B), in accordance with previous studies showing that the landrace germplasm was divided into two clusters as two gene pools which were distributed in two geographical regions of Africa. The landraces from gene pool 1 were mostly distributed in western Africa while the majority of gene pool 2 were located in eastern Africa . However, observing the results of structure analysis, the two gene pool system perhaps was not the best choice in our study. Based on the population structure analysis by STRUCTURE (Fig 1) the three gene pool system was the best fit and also delivered better results based on the geography of region and the country using genetic distance analysis by PowerMarker viewed on MEGA to create phylogenetic trees (Figs 2 and 3). The three gene pools (3 clusters) was efficient in distinguishing the populations within the three regions of America (American Cultivar, North America, and Latin America), three African regions (Central East Africa, South Africa, and West Africa), Central West Asia, and India except East Asia, Europe and Oceania (Table 2). The distinctions between populations within limited local areas corroborated with prior reports [3, 23, 26]. Although both the two gene pool- and three gene pool-systems cannot segregate the populations among each continent completely, the three gene pool system had a better performance within the local area. The obscure separation between each continent may be caused by 1) material exchange and transfer during cowpea breeding improvement, 2) the multiple usage of cowpea. These two reasons can also explain the phenomenon that the majority of genetic variance exists within instead of among geographic regions in world cowpea collection [3, 26]. Multiple types of cowpea are often planted in the same continent for vegetable, grain, and fodder. This planting habit would dramatically increase the diversity in the local area and decrease the genetic distance among continents. In addition, the distinctions between each region in same continent might also be caused by cowpea usage based on local dietary habit.
Based on PIC values, the accessions in the cowpea collection that originated in India and East Africa are most highly diversified (3.2 and 3.0), followed by Oceania, America, South and West Africa with medium high diversity, then the American Cultivar, East and Central West Asia with medium and Europe with lowest PIC (0.17). Different growing environments, availability of genetic stocks, and diverse cowpea consumption behaviors may be responsible for diversity differences among the regions. In our study the diversity degree compared to prior landrace study have been slightly reduced , which may be due to long-term cultivation or breeding.
In general, the degree of genetic diversity tends to have a positive correlation with the number of countries from which the accessions were collected. The more the number of origins of accessions, the more is the genetic diversity detected . Truly, this regulation would explain the high genetic diversity of accessions from South and West Africa, which contain 7 and 8 countries, respectively. It also can explain the lowest genetic diversity of accessions from Europe with only 2 countries. Nevertheless, we still observed exceptions like: India, North America and Oceania, which contains one country each but had high PIC (0.32, 0.25 and 0.28), which might be explained by the higher number of cowpea accessions from India (160) and North America (88) but not for Oceania with only 9 accessions. The accessions from East Asia (9) and Latin America (17) did not get the expected high PIC. In our study, the number of origins may not be the factor to influence the PIC and the genetic distances among cowpea genotypes are low due to the inherent self-pollination mechanism of the cowpea . In our study the genetic distance, genetic structure and genetic diversity analysis had a high consistency and accordance. The populations from different areas with similar genetic structure always have a smaller distance and similar genetic diversity and vice versa. We can observe that the populations from 6 regions (Table 2) with high Q3 ratio, India, North America, Latin America, Oceania, South and East Africa, not only have high PIC (Table 1) but also are grouped together in Cluster 1 (Fig 2).The same phenomenon was also observed in another cluster with 4 regions (American Cultivars, East and Central West Asia and Europe) with low Q3 ratio. The only exception was the population in West Africa that had high PIC and low Q3 ratio and was related closely to Cluster 1 (Q1) and Cluster 3 (Q3).
In the African continent, the populations belonging to each region have very low genetic distance, but the populations between West and South, and the Central East have an obvious difference in Q3 accessions ratio. One possibility for this paradox is that the Q3 accessions have high homology with Q1. This is also authenticated by the phylogenetic result (Fig 1C) which showed that the Q1 and Q3 clusters on a same main branch and the PCA result (Fig 1D) which revealed more overlap between Q1 and Q3. Given the context, the plausible hypothesis is that the Q3 gene pool may be derived from the Q1 gene pools by any of the possible reasons like hybridization, gene flow or cultivar localization during the domestication. This hypothesis would explain why the populations in India has the most Q3 and PIC accessions in our study. We also found a very short genetic distance between India and Central East Africa, which implies the movement via human migration from Central East Africa to India that is known as sub-domestication region. This import and domestication occurred for a long time. Cowpea grown in such vast areas in the regions must adapt to complex environmental conditions in terms of temperatures, water availability, elevations, soil types which maybe the reason why India was recognized as the secondary center of cowpea diversity . The most special forms of cowpea are sesquipedalis and biflora, which can only be found in India and East Asia. These two types differ from the African domesticated forms. The possible assumption is that when cowpea moved farther east into East Asia and encountered more humid environments with less sunshine, unsuitable for drying pods and grains it made people prefer the immature pods as vegetable in Asia . During this domestication and import of cowpea in East Asia, the two types of cowpea were formed and adopted to local diet habits especially in China and the Southeast Asian countries . The cluster of India, Oceania, North and Latin America, also suggests that the cowpea in the America continent and Oceania may have come from India during the colonization of the British.
The wield distribution of countries from West Africa was found in the Country Cluster Analysis (Fig 3), which can describe the relationship and correlation among regions. This result can well explain why West Africa barely clusters into group 1 or 2 (Fig 3). The accessions from Europe cluster with the two West Africa countries in the same branch, which implies that Europe might have imported the cowpea directly from West Africa.
The closest relation was found between North and Latin America, also having a highly similar genetic structure and PIC, which reveals a high accordance in variety import and localization. In addition, low Q3 ratio in America Cultivar implies that breeding programs in America might import and employ a large number of the accessions from West Africa or Asia. A similar report was also brought out by Fang et al  who indicated that the America breeding lines have minimum 86% similarity with the accessions from West Africa. The non-Cultivar accessions in America are not close to those from West Africa, which was also reported by Huynh et al  who compared landraces between the two regions. That may be the reason behind the differences in PIC and genetic structure in the American non-cultivar and cultivar. The difference between cultivar and non-cultivar accessions in genetic structure and distance indicated that there are still abundant genetic resources in breeding materials, which show good adaptation in local areas, especially under biotic and abiotic stress environments. The accessions from India were more diverse than the accessions from East Asia, Oceania, North and Latin America, which may suggests a bi-direction of bottlenecks or founder effects during cowpea domestication and diffusion.
Breeding projects could generally narrow the genetic variation of crop resources. If there were no germplasm introduced into the programs, genetic diversity would be reduced over time. In our study, we came to the same conclusion that breeding dramatically reduces the genetic variations of cowpea. We also found that some new cowpea resources were created when it was moved or domesticated into new environments, for example, in East Asia, which may increase the genetic diversity of the breeding groups. Now geographical barrier has now drastically reduced when the germplasm involved into whole cultivation system in the world during the globalization, which is the chance to help us improve the breeding resources. We have to consider both phenotype and genotype aspects in understanding germplasm resources, especially in breeding programs.
Three well-differentiated genetic populations called structured populations or clusters were postulated from this study in the 768 world-wide cowpea genotypes based on genome-wide SNPs. The populations (clusters) were associated with the regions and countries where the cowpea genotypes were collected. Cluster 1 mainly consisted of the cowpea genotypes from Asia (Afghanistan, Iran, Pakistan, Turkey, and China), West Africa (Cameroon and Niger), and Europe (Hungary) plus the American Cultivars; Cluster 2 was composed of accessions from South Africa, India, and US; and Cluster 3 was from Latin America (Brazil, Guatemala, Mexico, and Paraguay), South Africa (Botswana, Mozambique, and Zimbabwe), Central East Africa (Kenya) and Oceania (Australia). This study supports the two candidate theory of the original areas (West and East of Africa) as the first domestication regions of cowpea and India as a sub- domestication region of cultivated cowpea.
S1 Table. Cowpea accession, name, taxon, region, country, location collected, and cluster assigned in this study.
Cowpea germplasm accessions were provided by USDA-ARS Station at GRIFFIN, GA. The cowpea_Genome_0.03.fa (6,750 scaffolds or contigs) (http://harvest-blast.org/) was kindly provided by Dr. Timothy J. Close, University of California, U.S.A. This work is supported, at least in part, by the USDA National Institute of Food and Agriculture Hatch project accession number 1002423.
- Conceptualization: AS BM DW.
- Data curation: HX AS JQ.
- Formal analysis: HX AS JQ.
- Funding acquisition: AS BM.
- Investigation: HX DM WL JM YW WY.
- Methodology: HX AS JQ.
- Project administration: AS.
- Resources: BM DM.
- Supervision: AS BM DW.
- Validation: HX WL JM YW WY.
- Writing - original draft: HX AS JQ.
- Writing - review & editing: AS BM JQ.
- 1. Behura R, Kumar S, Saha B, Panda MK, Dey M, Sadhukhan A, et al. Cowpea [Vigna unguiculata (L.) Walp]. Methods in molecular biology. 2015;1223:255–64. doi: 10.1007/978-1-4939-1695-5_20 pmid:25300846.
- 2. Muchero W, Diop NN, Bhat PR, Fenton RD, Wanamaker S, Pottorff M, et al. A consensus genetic map of cowpea [Vigna unguiculata (L) Walp.] and synteny based on EST-derived SNPs. Proceedings of the National Academy of Sciences of the United States of America. 2009;106(43):18159–64. doi: 10.1073/pnas.0905886106 pmid:19826088; PubMed Central PMCID: PMC2761239.
- 3. Singh BB, Agriculture IIoT. Advances in Cowpea Research: International Institute of Tropical Agriculture; 1997.
- 4. Kapoor-Vijay P, White J. Global Biodiversity Strategy. Conservation biology: a training manual for biological diversity and genetic resources. 1992:13–22.
- 5. Fana SB, Remy SP, Paul G. Genetic diversity in cowpea [Vigna unguiculata (L.) Walp.] as revealed by RAPD markers. Genetic Resources Crop Evolution 2004;51:539–50.
- 6. Fang J, Chao C-CT, Roberts PA, Ehlers JD. Genetic diversity of cowpea [Vigna unguiculata (L.) Walp.] in four West African and USA breeding programs as determined by AFLP analysis. Genetic Resources Crop Evolution 2007;54(6):1197–209.
- 7. Tan H, Tie M, Luo Q, Zhu Y, Lai J, Li H. A review of molecular markers applied in Cowpea (Vigna unguiculata L. Walp.) Breeding. J Life Sci. 2012;6(2012):1190–9.
- 8. Maréchal R. Etude taxonomique d'un groupe complexe d'esp [eces des genres Phaseolus et Vigna (Papilionaceae) sur la base de données morphologiques et polliniques, traitées par l'analyse informatique: Conservatoire et jardin botaniques; 1978.
- 9. Meglic V, Staub J. Inheritance and linkage relationships of isozyme and morphological loci in cucumber (Cucumis sativus L.). Theor Appl Genet. 1996;92(7):865–72. doi: 10.1007/BF00221899. pmid:24166552
- 10. Fatokun C, Danesh D, Young N, Stewart E. Molecular taxonomic relationships in the genus Vigna based on RFLP analysis. Theor Appl Genet. 1993;86(1):97–104. doi: 10.1007/BF00223813. pmid:24193388
- 11. Kaga A, Tomooka N, Egawa Y, Hosaka K, Kamijima O. Species relationships in the subgenus Ceratotropis (genus Vigna) as revealed by RAPD analysis. Euphytica. 1996;88(1):17–24.
- 12. Simon MV, Benko-Iseppon AM, Resende LV, Winter P, Kahl G. Genetic diversity and phylogenetic relationships in Vigna Savi germplasm revealed by DNA amplification fingerprinting. Genome / National Research Council Canada = Genome / Conseil national de recherches Canada. 2007;50(6):538–47. doi: 10.1139/G07-029.
- 13. Malviya N, Sarangi B, Yadav MK, Yadav D. Analysis of genetic diversity in cowpea (Vigna unguiculata L. Walp.) cultivars with random amplified polymorphic DNA markers. Plant Syst Evol. 2012;298(2):523–6.
- 14. Ajibade S, Weeden N, Chite S. Inter simple sequence repeat analysis of genetic relationships in the genus Vigna. Euphytica. 2000;111(1):47–55.
- 15. Ogunkanmi L, Ogundipe O, Ng N, Fatokun C. Genetic diversity in wild relatives of cowpea (Vigna unguiculata) as revealed by simple sequence repeats (SSR) markers. Journal of Food, Agriculture & Environment. 2008;6(3&4):263–8.
- 16. Gupta S, Gopalakrishna T. Development of unigene-derived SSR markers in cowpea (Vigna unguiculata) and their transferability to other Vigna species. Genome / National Research Council Canada = Genome / Conseil national de recherches Canada. 2010;53(7):508–23.
- 17. Varshney RK, Chabane K, Hendre PS, Aggarwal RK, Graner A. Comparative assessment of EST-SSR, EST-SNP and AFLP markers for evaluation of genetic diversity and conservation of genetic resources using wild, cultivated and elite barleys. Plant Science. 2007;173(6):638–49.
- 18. Deulvot C, Charrel H, Marty A, Jacquin F, Donnadieu C, Lejeune-Hénaut I, et al. Highly-multiplexed SNP genotyping for genetic mapping and germplasm diversity studies in pea. BMC Genomics. 2010;11(1):468.
- 19. Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES, et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PloS One. 2011;6(5):e19379. doi: 10.1371/journal.pone.0019379. pmid:21573248
- 20. Sonah H, Bastien M, Iquira E, Tardivel A, Légaré G, Boyle B, et al. An improved genotyping by sequencing (GBS) approach offering increased versatility and efficiency of SNP discovery and genotyping. PloS One. 2013;8(1):e54603. doi: 10.1371/journal.pone.0054603. pmid:23372741
- 21. Elmer I, Humira S, François B. Association mapping of QTLs for sclerotinia stem rot resistance in a collection of soybean plant introductions using a genotyping by sequencing (GBS) approach. BMC Plant Biology. 2015;15(1):5.
- 22. Nimmakayala P, Levi A, Abburi L, Abburi VL, Tomason YR, Saminathan T, et al. Single nucleotide polymorphisms generated by genotyping by sequencing to characterize genome-wide diversity, linkage disequilibrium, and selective sweeps in cultivated watermelon. BMC Genomics. 2014;15(1):767.
- 23. Huynh B-L, Close TJ, Roberts PA, Hu Z, Wanamaker S, Lucas MR, et al. Gene Pools and the Genetic Architecture of Domesticated Cowpea. The Plant Genome. 2013;6(3). doi: 10.3835/plantgenome2013.03.0005.
- 24. Faris DG. THE ORIGIN AND EVOLUTION OF THE CULTIVATED FORMS OF VIGNA SINENSIS. Canadian Journal of Genetics and Cytology. 1965;7(3):433–52. doi: 10.1139/g65-058.
- 25. Pant K, Chandel K, Joshi B. Analysis of diversity in Indian cowpea genetic resources. SABRAO J. 1982;14:103–11.
- 26. Smartt J. Cowpea Research, Production and Utilization. Edited by Singh S. R. and Rachie K. O. Chichester, New York, etc.: John Wiley and Sons (1985), pp. 460, £22.50. Experimental Agriculture. 1986;22(04):431-. doi: 10.1017/S001447970001468X.
- 27. Steele W. Cowpeas: Vigna unguiculata (Leguminosae-Papilionatae). Evolution of Crop Plants NW Simmonds, ed. 1976.
- 28. Baudoin J, Maréchal R. Cowpea taxonomy, origin and germplasm. Cowpea research, production and utilization, John Wiley & Sons, Chichester. 1985:3–9.
- 29. Ng N. Cowpea Vigna unguiculata (Leguminosae-Papilionideae). J, Smartt & NW, Simmonds (Eds), Evolution of crop plants, 2nd edition, Longman, England. 1995.
- 30. Ba FS, Pasquet RS, Gepts P. Genetic diversity in cowpea [Vigna unguiculata (L.) Walp.] as revealed by RAPD markers. Genet Resources Crop Evaluation. 2004;51(5):539–50.
- 31. Coulibaly S, Pasquet RS, Papa R, Gepts P. AFLP analysis of the phenetic organization and genetic diversity of Vigna unguiculataL. Walp. reveals extensive gene flow between wild and domesticated types. TAG Theoretical and applied genetics Theoretische und angewandte Genetik. 2002;104(2–3):358–66. doi: 10.1007/s001220100740 pmid:12582708.
- 32. Rogers SO, Bendich AJ. Extraction of DNA from milligram amounts of fresh, herbarium and mummified plant tissues. Plant Molecular Biology. 1985;5(2):69–76. doi: 10.1007/BF00020088. pmid:24306565
- 33. Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27(21):2987–93. doi: 10.1093/bioinformatics/btr509. pmid:21903627
- 34. Li R, Yu C, Li Y, Lam T-W, Yiu S-M, Kristiansen K, et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics. 2009;25(15):1966–7. doi: 10.1093/bioinformatics/btp336. pmid:19497933
- 35. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155(2):945–59. pmid:10835412
- 36. Earl DA. STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genetics Resources. 2012;4(2):359–61.
- 37. Liu K, Muse SV. PowerMarker: an integrated analysis environment for genetic marker analysis. Bioinformatics. 2005;21(9):2128–9. pmid:15705655
- 38. Cavalli-Sforza LL, Edwards AW. Phylogenetic analysis. Models and estimation procedures. American Journal of Human Genetics. 1967;19(3 Pt 1):233. pmid:6026583
- 39. Excoffier L, Lischer HE. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Molecular Ecology Resources. 2010;10(3):564–7. doi: 10.1111/j.1755-0998.2010.02847.x. pmid:21565059
- 40. Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987;4(4):406–25. pmid:3447015
- 41. Tamura K, Stecher G, Peterson D, Filipski A, and Kumar S. MEGA6: Molecular Evolutionary Genetics Analysis Version 6.0. Molecular Biology and Evolution. 2013;30:2725–2729. doi: 10.1093/molbev/mst197. pmid:24132122
- 42. Evanno G, Regnaut S, Goudet J. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology. 2005;14(8):2611–20. pmid:15969739
- 43. Zannouou A, Kossou D, Ahanchédé A, Zoundjihékpon J, Agbicodo E, Struik P, et al. Genetic variability of cultivated cowpea in Benin assessed by random amplified polymorphic DNA. African Journal of Biotechnology. 2008;7(24).
- 44. Ali ZB, Yao KN, Odeny DA, Kyalo M, Skilton R, Eltahir IM. Assessing the genetic diversity of cowpea [Vigna unguiculata (L.) Walp.] accessions from Sudan using simple sequence repeat (SSR) markers. African Journal of Plant Science. 2015;9(7):293–304. doi: 10.5897/AJPS2015.1313.
- 45. Egbadzor KF, Ofori K, Yeboah M, Aboagye LM, Opoku-Agyeman MO, Danquah EY, et al. Diversity in 113 cowpea [Vigna unguiculata (L) Walp] accessions assessed with 458 SNP markers. SpringerPlus. 2014;3:541. Epub 2014/10/22. doi: 10.1186/2193-1801-3-541 pmid:25332852; PubMed Central PMCID: PMC4190189.
- 46. Asare AT, Gowda BS, Galyuon IK, Aboagye LL, Takrama JF, Timko MP. Assessment of the genetic diversity in cowpea (Vigna unguiculata L. Walp.) germplasm from Ghana using simple sequence repeat markers. Plant Genetic Resources. 2010;8(02):142–50.
- 47. Tanhuanpaa P, Manninen O. High SSR diversity but little differentiation between accessions of Nordic timothy (Phleum pratense L.). Hereditas. 2012;149(4):114–27. doi: 10.1111/j.1601-5223.2012.02244.x. pmid:22967141
- 48. Xu P, Wu X, Wang B, Luo J, Liu Y, Ehlers J, et al. Genome wide linkage disequilibrium in Chinese asparagus bean (Vigna. unguiculata ssp. sesquipedialis) germplasm: implications for domestication history and genome wide association studies. Heredity. 2012;109(1):34–40. doi: 10.1038/hdy.2012.8. pmid:22378357