Genetic Diversity and Population Structure of the Major Peanut (Arachis hypogaea L.) Cultivars Grown in China by SSR Markers

One hundred and forty-six highly polymorphic simple sequence repeat (SSR) markers were used to assess the genetic diversity and population structure of 196 peanut (Arachis Hypogaea L.) cultivars which had been extensively planted in different regions in China. These SSR markers amplified 440 polymorphic bands with an average of 2.99, and the average gene diversity index was 0.11. Eighty-six rare alleles with a frequency of less than 1% were identified in these cultivars. The largest Fst or genetic distance was found between the cultivars that adapted to the south regions and those to the north regions in China. A neighbor-joining tree of cultivars adapted to different ecological regions was constructed based on pairwise Nei’s genetic distances, which showed a significant difference between cultivars from the south and the north regions. A model-based population structure analysis divided these peanut cultivars into five subpopulations (P1a, P1b, P2, P3a and P3b). P1a and P1b included most the cultivars from the southern provinces including Guangdong, Guangxi and Fujian. P2 population consisted of the cultivars from Hubei province and parts from Shandong and Henan. P3a and P3b had cultivars from the northern provinces including Shandong, Anhui, Henan, Hebei, Jiangsu and the Yangtze River region including Sichuan province. The cluster analysis, PCoA and PCA based on the marker genotypes, revealed five distinct clusters for the entire population that were related to their germplasm regions. The results indicated that there were obvious genetic variations between cultivars from the south and the north, and there were distinct genetic differentiation among individual cultivars from the south and the north. Taken together, these results provided a molecular basis for understanding genetic diversity of Chinese peanut cultivars.


Introduction
The cultivated peanut, Arachis hypogaea L., is an important oilseed and cash crop worldwide. It is one of the primary sources of vegetable oil and protein in developing countries. The peanut annual planting area is around 24 million hectares, with an annual production of nearly 35 million tons (http://apps.fao.org/cgi-bin/ nph-db.pl?subset = agriculture). China is the largest peanut producer in the world with over 20% of planting area and more than 40% of production. During the last five decades, cultivated peanut had been subjected to intensive artificial selection, resulting in favorable changes in yield, disease resistance, biochemical composition and other agronomic traits [1][2]. Remarked progress had been made in peanut genetic improvement of yield [1]. However, more than 70% of the cultivars were affirmed to contain co-ancestors 'Fuhuasheng' and 'Shitouqi' by pedigree analysis [1][2], showing that the Chinese peanut cultivars could have a narrow genetic basis for peanut varieties.
The previous studies paid more attention on morphological and agronomic trait variations [3][4][5][6][7][8] and traits identification, e.g. diseases resistance [9][10], stress resistance [11][12][13] and high oleic acid content [14]. Very limited studies had been performed on the genetic diversity [14][15][16]. The knowledge of genetic diversity was critical for germplasm utilization in peanut breeding. To broaden the genetic variation of cultivated peanut in future breeding program, it is necessary to perform a more comprehensive of genetic diversity and population structure of the released varieties.
Traditional cluster analysis could provide an easy and effective way in determining the genetic diversity of germplasm collections [40], and several other statistical systems including population structure [41], principal component analysis (PCA) [42] and principal coordinate analysis (PCoA) [43], they had been developed for analyzing the structure of natural populations through molecular markers, and widely used in rice [44][45], apple [46], sweet sorghum [47], wheat [48] and cucumber [49]. Cultivar populations were not only the natural populations but also the germplasm resources with specific desirable traits for genetic improvement. To analyze the genetic structure and to find out the excellent allelic variation, the cultivars with excellent allele had been an important system for the molecular design breeding program [50]. Estimating population structure is a necessary first step in association analysis, and is important to avoid false positives or spurious associations and to constrain association studies in natural populations [40][41]51]. Analysis of 190 soybean accessions in China showed that the cultivar populations were composed of seven subpopulations, and linkage disequilibrium (LD) was detected extensively with syntenic and nonsyntenic markers [52]. A total of 29 SSR markers were used to analyze the genetic diversity of 150 accessions of cultivated rice from Korea, China, and Japan. The model-based structure analysis revealed the presence of three subpopulations, basically consistent with cluster analysis based on genetic distance [53]. Breseghello and Sorrells (2006) analyzed the population structure of 95 soft winter wheat cultivars of the eastern United States. These cultivars were divided into four subpopulations, which demonstrated that association mapping could complement and enhance previous QTL information for marker-assisted selection [54]. So, understanding the genetic diversity and structure populations of peanut cultivars would be vital to association mapping and molecular breeding program in peanut [55].
In the present study, the genetic diversity and population structure of 196 genotypes of peanut cultivar released in the last 5 decades from three ecological regions (the south region, the north region and the Yangtze River) in China were analyzed using 146 highly polymorphic SSR markers. Our objectives were to estimate the levels of genetic diversity, and to characterize the population structure of the Chinese peanut cultivars.

Genetic diversity
The genotypes of 196 Chinese peanut cultivars (Table S1) were amplified using 210 SSR markers. Sixty-four SSR markers did not amplify in most of the accessions, and therefore were eliminated from the analysis. The remaining 146 SSR markers amplified 440 polymorphic bands with an average of 2.99, ranged from 2 to 9 per primer pair ( Table 1; Table S2). The polymorphism information content (PIC) for the SSR loci ranged from 0.01 to 0.75, and the average PIC value was 0.38 (Table 1; Table S2). The gene diversity index ranged from 0.01 to 0.51, with an average of 0.11. Eighty-six rare alleles with a frequency less than 1% were identified at 66 loci in this study. Among these rare alleles, 13 rare alleles were found in cultivar 'Nenghua 3' and 10 rare alleles were found in 'Jilinsilihong' ( Table 2, Table 3). Those two accessions were collected from Jilin province. The ratio of cultivars with rare alleles was the highest in cultivars from the south, and was the lowest in cultivars from the north. The 17 cultivars from Guangdong province had rare alleles, and no rare alleles were found in cultivars from Guizhou province ( Table 2).
The average number of alleles per locus was the greatest in cultivars from the south, and was the least in cultivars from the Yangtze River. The average gene diversity index was the lowest in cultivars from the south, followed by that from the north ( Table 2). The average number of alleles per locus was increasing in the cultivars with each decade, but the ratio of cultivars with the rare allele was decreasing. In contrast, the parameters of the average gene diversity index and the frequency of major allele did not change much (Table S3).

Genetic relationships
There was the largest genetic distance (0.40) in cultivars between the south and the north. Cultivars from the Yangtze River were very similar to those from the north (0.12). For different provinces, genetic distances between the populations from the south (Fujian, Guangdong, Guangxi) and the north (Shandong, Henan, Hebei, Jiangsu) were larger than those of others. Population of Shandong showed the smallest genetic distance with population of Henan (0.12), whereas population of Jilin showed the greatest genetic distance with populations of Anhui, Henan and Jiangsu (0. 55) ( Table 4). Pairwise comparison on the basis of the values of F-Statistics (F st ) could be interpreted as standardized population distances between two populations. The F st values would almost be in accordance with the genetic distances. A neighbor-joining tree of the thirteen province populations was constructed based on Nei's genetic distances, showing significant difference between cultivars from the south and from the north (Fig. 1).
According to Nei's genetic distance based on 146 SSR loci in different years, cultivars released before 1970 had greatest Nei's genetic distance. Among other each decade, Nei's genetic distances did not change much (Table S4), which shows that the genetic diversity of peanut cultivated varieties had no obvious changes.

Population structure
The population structure of the Chinese peanut cultivars was inferred using Structure 2.0 software based on 146 SSR markers. The number of subpopulations (K) was identified based on maximum likelihood and delta K (nK) values ( Fig. S1). At K = 3, accessions were divided into three main populations, here denoted as P1, P2 and P3, respectively (Fig. 2a, Table 5). P1 contained 77 accessions, among which 64 were from the southern provinces including Guangdong, Guangxi and Fujian, 7 were from the northern provinces including Shandong, Henan and Jilin, and 6 were from the Yangtze Rivers including Hubei, Hunan and Guizhou provinces. The two cultivars from Jilin province were also clustered in P1. P2 had 27 accessions, most from Hubei, Henan and Shandong provinces. P3 contained 92 accessions, 75 of which were from the northern provinces including Shandong, Henan, Hebei, Jiangsu, Anhui, and 1 was from Guangxi province belonging to the south, and 16 cultivars were from the Yangtze River including 10 of Sichuan and 6 of Hubei (Table 5). At K = 4, two sub-subgroups (P1a-P1b) existed in P1, they consisted of 42 and 35 accessions, respectively (Fig. 2b, Table 5). The two cultivars from Jilin province were clustered in P1b. At K = 5, P3 could be divided into two sub-subgroups (P3a-P3b) which consisted of 44 and 48, respectively (Fig. 2c, Table 5). PCoA and PCA, which were based on marker genotypes, revealed five distinct clusters for the entire population that were related to their germplasm regions (Fig. 3, Fig. S2). The first and second principal coordinates explained 88.64% and 3.71% of the molecular variance, respectively. The first two principal components explained 66.46% and 8.08% of the molecular variance, respectively. Furthermore, neighbor-joining tree showed five branches within the peanut cultivars, which were fairly consistent with the structure-based membership assignment for most of the cultivars (Fig. 4).

Analysis of molecular variance
Released years, geographic origins and population structures provided the organizing data for hierarchical AMOVA. In the total genetic variance among populations based on the released years, 34% was attributed to the released year, and the remaining 66% was explained by individual differences within populations ( Table 6). In the total genetic variance among populations of the geographic origins, 31% was attributed to the geographic origins, and the remaining 69% was explained by individual differences within populations (Table 6). In the total genetic variance among populations based on structure, 36% was attributed to the populations based on structure, and the remaining 64% was explained by individual differences within populations. These results demonstrated that different levels of genetic variance existed among released years, geographic origins, sub-groups, and individuals within populations, respectively.

Discussion
Comparing of peanut genetic diversity SSR markers have been extensively used to detect the variability in peanut genotypes and to evaluate their genetic diversity. Compared with previous reports, the genetic diversity of the 196 Chinese peanut cultivars in this study was at a lower level, as   [15][16]20,51]. A study of U.S. peanut mini-core collection uncovered an average of 8.1 alleles per locus [16,51]. 168 accessions of peanut germplasm from 42 counties of five continents were employed for SSR analyses using 27 SSR markers, 4.29 alleles were identified per locus [16]. In Chinese core collection, polymorphic bands of var. hypogaea germplasm was 4.0, and breeding lines was 2.2 [15]. In this study, allele number per locus of cultivars would be similar to that of breeding lines [15], indicating that there would be less genetic diversity in released cultivars than landraces. In fact, the breeding lines and cultivars (outside of landrace) had the same origins from crossing. In this study, gene diversity index was 0.11, which was lower than those of previous peanut collections (0.15-0.18) [16,51]. A study of U.S. peanut mini-core collection uncovered an average of 0.15-0.18 gene diversity index in ssp. hypogaea and ssp. fastigaea [16,51]. Although allelic diversity and gene diversity index could be used as indicators of genetic variation, such values were relative and depend on the number of polymorphic loci and the relatedness of genotypes analyzed. But in this study, allelic diversity and gene diversity index identified that there was a lower level of genetic diversity in Chinese peanut cultivars than landraces [14][15], which was consistent with the results reported in rice [44][45]52], wheat [48,53] and soybean [50,54]. Thus, it was necessary to introduce more peanut germplasm into Chinese peanut cultivars to broaden their genetic diversity.
Some of the SSR markers generated genotype-specific bands, which could be used as molecular identity data for specific genotypes. Eighty-six rare alleles with a frequency less than 1% were identified in 196 genotypes peanut cultivars using 146 SSR markers. There was 0.45 in average rare alleles per SSR marker, which was lower than that of US mini-core collection (1.68) [16,51]. This finding also indicated there would be low genetic diversity in Chinese peanut cultivars. Among these rare alleles, 13 unique rare alleles were found in the accession 'Nenghua 3', 10 were found in the accession 'Jilinsilihong'. These results indicated that the two cultivars are distinct from the others, probably due to that the two cultivars belong to ssp. fastigiata var. fastigiata that is rarely planted in China. Many unique rare alleles would be discovered in ssp. fastigiata, showing that ssp. fastigiata would have more genotype-specific bands than ssp. hypogaea do [16].

Analysis of genetic relationships of peanut cultivars
On the basis of geography origin of different peanut cultivars, peanut cultivars were divided in the three ecological regions: the south region, the north region and the Yangtze River region, which are the three ecological regions of the national peanut regional test in China [2]. The south includes Fujian, Guangxi, Guangdong provinces etc. The Yangtze River region includes Hunan, Hubei, Sichuan, Guizhou provinces etc. The north includes Shandong, Henan, Hebei, Jiangsu, Anhui, Jilin provinces et al. There is the largest denetic distance in cultivars between the south and the north. Cultivars from the Yangtze River are very similar to those from the north. In fact, many cultivars were divided into P3 population using structure analysis ( Table 5), indicating that the cultivars from the Yangtze River and those from the south are different. Genetic distances between populations from the south and the north were larger than those of between other populations, which indicate that there are obvious differences between the south populations and the north.
All populations from the south were clustered together, and most of the cultivars were ssp. fastigiata. Similarly, most of populations from the north were clustered together, and most of the cultivars were ssp. hypogaea [1][2]. Consistently, the US peanut mini-core collections were grouped into two major clusters corresponding to the subspecies groupings, i.e. subspecies fastigiata and hypogaea [16]. Although Jilin lies in the north, the cultivars 'Nenghua 3' and 'Jilinsilihong' from Jinlin province are ssp. fastigiata var. fastigiata, which were clustered with those from the south together. AMOVA analysis revealed that, in the total genetic variance of the geographic origins, 31% were attributed to the geographic origins (Table 6), which indicated that there were obvious differences in peanut cultivars from different provinces in China. Our results showed that the average alleles per marker increased in released cultivars each decade, while there are no obvious differences for gene diversity and major allele frequency in released cultivars (Table S3). The cultivars released before the 1970s had greater pairwise estimates of F st and Nei's genetic distance than those of the other years, and they did not change much after the 1970s (Table S4). These results showed that the genetic diversity of peanut cultivars had no obvious change since then. The reason might be that early cultivars in this study were landraces and main parents, which did not suffer from reciprocal cross on a large scale, and they presented a higher genetic diversity consequentially. Many early accessions were introduced into cross breeding, which led to an increase of the average alleles per marker. However, strong artificial selection generated similar derivatives as the parents, and then these derivatives had little genetic diversity. According to AMOVA, in the total genetic variance of the released years, 66% was explained by individual differences within populations (Table 6). These results showed that there are obvious differences in peanut cultivars released in different decade in China.

Population structure of peanut cultivars in China
To identify the true optimal number of subsets (K) in STRUCTURE, LnP(D) and DK were chosen [44][45][46][47][48][49]55]. The K value that provides the maximum likelihood, called LnP(D) in STRUCTURE, is generally considered as the optimal number of subdivisions [41]. In this study, the distribution of L(K) did not show a clear mode for the true K in China peanut cultivars (Fig.  S2b). Thus, another ad hoc quantity (DK) was used to overcome the difficulty for interpreting the real K values [56]. The highest value of DK for the 196 peanut cultivars is K = 5 (Fig S2a). Clustering bar plots with K = 3-5 are shown in Figure 2. At K = 3, all 196 accessions are divided into three subpopulations from the south, the north and the Yangtze River, respectively, which confirmed the above results (Fig. 1, Table 4). At K = 4, there are two sub-subgroups (P1a-P1b), which consist of 42 and 35 accessions, respectively (Table 4), showing that there was genetic differentiation in the cultivars from the south. At K = 5, two subsubgroups existed in the P3 (P3a-P3b) which consisted of 44, 48, respectively (Table 4), indicating that there was genetic differentiation in the cultivars from the north. Analysis of these data identified the major substructure groups when the number of populations was set at three with the highest value of DK (Fig. 2), which was consistent with the clustering results based on genetic distance (Fig. 4). As shown in Figure 2, most of the accessions were clearly classified into one of the five subpopulations. PCoA and PCA based on the marker genotypes revealed five distinct clusters for the entire population (Fig. 3, Fig. S2), which were related to their germplasm regions. Furthermore, the neighbor-joining tree showed five branches within the peanut cultivars, which was fairly consistent with the structure-based membership assignment for most of the cultivars (Fig. 4). Cultivars of Sichuan province were divided into P3a and P3b populations, which were caused by the cultivars related to ssp. hypogaea 'Luojiangjiwo' [1], while most cultivars from the north are related to ssp. hypogaea [1][2].
A narrow genetic base had been reported for peanut cultivars in China, probably because of high selection pressure for good grain quality and repeated use of the same-origin parents with proven yield ability in the breeding program, resulting in significant genetic erosion of the local peanut gene pool [1][2]. The results presented in this study suggested that the genetic base of Chinese peanut cultivars be relatively narrow, and a wider range of accessions should be introduced to improve present varieties in future breeding programs. In fact, according to the result of population structure, there were very abundant genetic differentiations in cultivars of the south or the north. The cultivars adapted to local ecological environment, which were the best parents for breeding. Our results suggested that there was highest possible to improve the yield by crossing in cultivars from between P1a and P1b in the south or from between P3a and P3b in the north.

Plant material
According to the information of Peanut varieties and their pedigree in China [1], and Peanut Genetics and Breeding in China [2], a total of 196 accessions of cultivated varieties were collected from 13 different provinces of China (Table S1), these accessions were planted at Oil Crops Research Institute, Chinese Academy of Agriculture Sciences.

DNA extraction
Two or three young unopened leaves were collected from each accession, and used for DNA extraction using the hexadecyl trimenthyl ammonium bromide (CTAB) method [57]. DNA was quantified with a Beckman DU-650 spectrophotometer. Integrity and quality of DNA was evaluated by electrophoresis on a 1% agarose gel.

Data analysis
The polymorphism bands were recorded as 1 (present) or 0 (absent) for the same amplified fragments. Data format could be changed according to different analysis software. Major allele frequency, number of alleles per locus, gene diversity and polymorphism information content (PIC) were analyzed using PowerMarker V3.25 [46]. Genetic distance (GD) and pairwise values of F-Statistics (F st ) between populations were calculated using PowerMarker V3.25. The dendrogram was constructed based on genetic distance calculated using the Neighbour subroutine of PoweMarker V3.25 as described [59]. 41]. The number of subpopulations (k) was assumed to be from 1 to 10, without admixture and with correlated allele frequencies, and the burn-in time and iterations for each run were both set to 50,000. Ten replications were used for each k. Due to the difficulties associated with finding the highest posterior probability before a large k value was examined, LnP(D) and Evanno's Dk [56] were used to determine the most appropriate k value. It is calculated as Dk = M[|L(k -1) -2L(k) + L(k + 1)|]/S[L(k)], in which L(k) represents the kth LnP(D), M is the mean of 10 runs, and S is their standard deviation.  Principal components analysis (PCA) was a visualization technique commonly used in multivariate statistics, which also identifies eigenvectors and amounts of variance and cumulatively explained variances per component. The PCA analysis was conducted using the NTSYS package [42]. Genetic similarities between genotypes in each group were determined with a coefficient based on the proportion of shared alleles and on a principal coordinates analysis (PCoA) using the software package  (Table S1). doi:10.1371/journal.pone.0088091.g004 GenAleX [43]. The molecular variance of subgroups, provinces and years were calculated using an analysis of molecular variance (AMOVA) approach in the software package Arlequin [60]. Figure S1 The modal value of this distribution is the true K. The Dk of 10 repeats based on STRUCTURE calculation using SSR data. Dk calculated as Dk = m (| L    Author Contributions