Genetic Diversity and Population Structure of Basmati Rice (Oryza sativa L.) Germplasm Collected from North Western Himalayas Using Trait Linked SSR Markers

One hundred forty one basmati rice genotypes collected from different geographic regions of North Western Himalayas were characterized using 40 traits linked microsatellite markers. Number of alleles detected by the abovementioned primers were 112 with a maximum and minimum frequency of 5 and 2 alleles, respectively. The maximum and minimum polymorphic information content values were found to be 0.63 and 0.17 for the primers RM206 and RM213, respectively. The genetic similarity coefficient for the most number of pairs ranged between of 0.2-0.9 with the average value of 0.60 for all possible combinations, indicating moderate genetic diversity among the chosen genotypes. Phylogenetic cluster analysis of the SSR data based on distance divided all genotypes into four groups (I, II, III and IV), whereas model based clustering method divided these genotypes into five groups (A, B, C, D and E). However, the result from both the analysis are in well agreement with each other for clustering on the basis of place of collection and geographic region, except the local basmati genotypes which clustered into three subpopulations in structure analysis comparison to two clusters in distance based clustering. The diverse genotypes and polymorphic trait linked microsatellites markers in the present study will be used for the identification of quantitative trait loci/genes for different economically important traits to be utilized in molecular breeding programme of rice in the future.


Introduction
Rice (Oryza sativa L.) occupies the premier place among the food crops cultivated around the world; thus rice production and improvement are of interest to the Indian economy. India has the largest acreage under rice (44 million hectares) with annual production of about 104 million tones and ranks second only to China [1]. It provides 43 percent of the caloric requirement for more than 70 per cent of Indian population. Rice protein, though small in amount, is of high nutritional value [2]. Basmati rice makes a metallothionein-like protein, rich in sulfur containing amino acid cysteine that aids in iron absorption. Basmati rice is desirable in international market for its unique quality attributes, such as distinct and pleasant aroma, fluffy texture of cooked rice, high volume expansion during cooking, which is characterized by linear kernel elongation with minimum breadth wise swelling, palatability, easy digestibility and longer shelf life. It is cultivated in the foothills of the Himalayas in the North Western (NW) parts of Indian sub-continent comprising the states of Haryana, Punjab, Uttaranchal, Western Uttar Pradesh, Jammu & Kashmir, Himachal Pradesh and Delhi for hundreds of years. As regards, Jammu & Kashmir it plays an important role in the livelihood of the people of this hilly and sub-mountainous state.
Little attention has been paid to their improvement except for sporadic reports on germplasm evaluation and genetics of some quality traits. As such there is very little information available on genetic diversity of traditional basmati rice. With introduction of high yielding varieties, the land races that include basmati quality types are moving out of cultivation. Moreover, basmati varieties are highly mixed with each other and it is very difficult to differentiate them. Knowledge of genetic diversity and relationships among basmati rice genotypes commonly grown in NW Himalayas may play a significant role in breeding programmes to improve production, productivity, quality traits, biotic and abiotic stresses, and also provide valuable information that can be used by plant breeders as a parental line selection tool. Thus, estimation and quantification of genetic diversity among the basmati rice germplasm are perquisite for their genetic enhancement. Morphological and biochemical markers were used for genetic diversity analysis and for establishing a relationship among cultivars. But these are limited in number, stage specific and highly influenced by the environmental conditions, which thus renders them less popular among the researchers. With the advent of PCR based molecular marker technology, genetic characterization of crop plants has entered into a new era. Amongst various molecular markers, simple sequence repeats (SSR) markers have become a method of choice owing to their high reproducibility, simplicity, easy scoring ability, reliability, co-dominant and multi-allelic nature. Microsatellites or SSR are sequences of a few repeated and adjacent base pairs and abundance throughout the eukaryotic genome [3]. Variations in the number of repeats can be detected by polymerase chain reaction (PCR), with the development of primers (20-30 base pairs) specifically built for amplification and complementary to conserved sequences flanking the microsatellite. These markers have been used for genetic diversity analysis, genotypic identification and population structure estimation in several rice genetic studies [1,. It has been hypothesized that the use of random markers for assessing genetic diversity might not reflect the functionally useful variations prevalent at the coding regions of the genome [25], a crucial requisite for the breeding programmes. For suitable selection of suitable diverse parental lines, it is pertinent to study and compare the pattern of genetic diversity by using random vis-à-vis trait-linked simple sequence repeat markers, which would confirm their suitability to assess genetic diversity. Understanding the genetic diversity and structure populations would be vital to association mapping and molecular breeding program in basmati rice.
In the present study, the genetic diversity and population structure of 141 genotypes including landraces, farmer's varieties, elite cultivars and advanced breeding lines of basmati rice accession collected from NW Himalayas were analyzed using 40 highly polymorphic trait linked SSR markers. Our objectives were to estimate the levels of genetic diversity, and to characterize the population structure of the NW Himalayas basmati germplasm.

Plant material
The present study material consisted of 141 basmati rice accessions representing landraces, farmer's varieties, elite cultivars and advanced breeding lines collected from different basmati growing regions of India (Table 1 and Fig 1). These accessions were planted at Sher-e-Kashmir University of Agricultural Sciences & Technology of Jammu, Chatha, Jammu & Kashmir, India, following panicle to row method to maintain genetic purity. The detailed basic information about the availability of germplasm used in the present study is summarized in S1 and S2 Texts.

DNA extraction
Two grams fresh leaf samples were collected from each genotype for DNA extraction. Total genomic DNA was isolated from each genotype by CTAB method [26]. Quantification of DNA samples was done by using the Nanodrop (mySPEC, Scientific GmbH, Germany). The quality of the DNA was estimated by using 0.8% agarose gel electrophoresis. High concentration of DNA samples was further diluted in 10:1 Tris-EDTA to a working concentration of 50 ng/μl and stored at 4°C for PCR based marker analysis.

PCR assay
PCR amplification was performed on each of the 141 basmati rice genotypes using primers for each SSR locus. Total 40 pairs of rice primers flanking the microsatellite region were selected from previously developed and published [27]. Detailed description of the primers is available at www.gramene.org/markers/microsat/. The primer pair was selected from each chromosome. PCR reaction was prepared with 50 ng of rice genomic DNA, 0.2 μg of 3' and 5' end primers, 200 mM of each dNTP, 10X PCR buffer containing 50 mM KCL, 10 mM Tris HCl (pH 8.9), 2.0 mM MgCl 2 and one unit of Taq polymerase with a total of 25 μL solutions individually for all 40 primer pairs. PCR thermal cycler was programmed as one step at 94°C for 4min, followed by 1 min at 94°C, 1 min and 30 seconds at 55°C, 1 min at 72°C and a final cycle of 10 min at 72°C. PCR amplification with each primer was performed thrice and only reproducible and distinct bands were scored and subjected to analysis. Amplified products were separated on 3.5% of agarose gel followed by staining with ethidium bromide. A 100-bp DNA ladder (Life Technologies-GIBCO BRL) was used to estimate the size of each band.

SSR marker analysis
Amplified fragments of different sizes were considered as different alleles. DNA bands that were amplified by a given primer were scored as present (1) or absent (0) for all the samples under study. In order to determine the utility of these markers, number of amplicons/alleles per marker, major allele frequency, polymorphic information content (PIC), effective multiplex ratio (EMR) / resolving power (RP), discrimination power (DP) and marker index (MI) were calculated. The polymorphic information content values of individual primer were calculated based on the formula PIC = 1-S n i = 1 P 2 ij [27]. Marker index, a product of information content, as measured by PIC and EMR was calculated [3]. Resolving power (RP) and discrimination power (DP) of each primer combination were calculated using standard methods [28,29]. The Jaccard's similarity index was calculated using NTSYS-pc version 2.02e (Applied Bio-Statistics, Inc., Setauket, NY, USA) package to compute pairwise Jaccard's similarity coefficients [30] and this similarity matrix was used in cluster analysis using an unweighted pair-group method with arithmetic averages (UPGMA) and sequential, agglomerative, hierarchical and nested (SAHN) clustering algorithm to obtain a dendrogram. The genetic similarity coefficient was calculated for each pair of genotypes [31] to determine the effectiveness of the SSR loci in distinguishing each of the 141 genotypes.

Population structure analysis
Model based cluster analysis was performed to infer genetic structure and to define the number of clusters (gene pools) in the dataset using the software STRUCTURE version 2.3.4 [32]. The number of presumed populations (K) was set from 2 to 10, and the analysis was repeated 5 times. We used the burn-in period of 50,000 and Monte Carlo Markov Chain replicates of 100,000 and a model without admixture and correlated allele frequencies was used [33]. The run with maximum likelihood was used to assign individual genotypes into groups. Within a group, genotypes with affiliation probabilities (inferred ancestry) 80% were assigned to a distinct group and those with <80% were treated as "admixture", i.e., these genotypes seem to have a mixed ancestry from parents belonging to different gene pools or geographical origins. The significance of population differentiation clustered by STRUC-TURE 2.3.4 was further investigated by performing an analysis of molecular variance (AMOVA) with Arlequin 3.5 [34]. Pairwise population differentiation was estimated among five sub-populations using Arlequin 3.5 [34]. Another dendrogram among the five subpopulation (generated through Structure analysis) based on unbiased genetic distance [35] was constructed by UPGMA (unweighted pair-group method with arithmetic average) using POPGENE version 1.31.

SSR Polymorphism among basmati rice varieties
All the 141 basmati rice accessions were genotyped with 40 traits linked microsatellite markers; and were selected for their ability to produce amplified product at optimum concentration, polymorphism level among the varieties and consistency of the pattern. Out of 40 traits linked microsatellite markers, two markers (RM130 and RM571) were found monomorphic revealing one allele at each locus in all the genotypes. Total 114 alleles were scored from these primer pairs, and 95 percent were found polymorphic. The gel picture showing banding pattern of 141 genotypes of basmati rice with RM3 marker is given in Fig 2. These loci were used to discriminate the morphologically similar genotypes; their use allowed to discriminate all the genotypes.
The respective values for overall genetic variability for polymorphism information content, resolving power, major allele frequency, discrimination power and marker index across all the 141 genotypes are given in Table 2. Highest PIC value (0.63) was observed for the primer RM206 and lowest PIC value (0.17) was recorded for the primer RM213 (Table 2) with an average of 0.405. The MI values ranged from 3.14 to 0.34 with an average of 1.22. The RP is a feature of marker that indicates the discriminatory potential of the primer. RP ranged from 1.76 to 0.34 with an average of 1.01 for polymorphic marker. In case of polymorphic markers the major allele frequency ranged from 0.55 to 0.91 with an average of 0.74 (Table 2 and Fig 3).
The DP values ranged from 0.62 to 0.16 with an average of 0.41. The allele number per locus varied from 2 to 5 with an average of 3 alleles per locus ( Table 2).

Genetic relationship
To find out the genetic relationship between different basmati rice genotypes, SSR data were used for analysis using NTSYSpc version 2.02e. The genetic similarity coefficients found in the genotype comparison matrix were relatively moderate. The distribution analysis of the 9870 pairwise comparisons (Fig 4)     The dendrogram was also constructed among five subpopulation generated through structure analysis using POPGENE version 1.31 to know relationship among them. The five population were grouped into two clusters (Z and X, respectively), population A (pop1), B (pop2) and E (pop5) were grouped in cluster Z, and population C (pop3) and D (pop4) in cluster X. All the local basmati genotypes were grouped in cluster Z and the genotypes other than local basmati were grouped in cluster X (Fig 7). Functional genes as modified from [25], [36], [37], [38], [39]. doi:10.1371/journal.pone.0131858.t002

Analysis of molecular variance
The five populations generated from structural analysis were also subjected to analysis of variance (AMOVA) to estimate the percentage of variation among populations and within population. In the total genetic variance among populations based on structure, 39.40% was attributed to the populations based on structure, and the remaining 60.60% was explained by individual differences within populations (Table 3). Pairwise Fst values showed significant differentiation among all the pairs of sub-population ranging from 0.0756 to 0.6873 suggesting that all the five groups were significantly different from each other ( Table 4). The sub-population D and E were more differentiated from each other as per the Fst estimate (Table 4). In summary, the results of AMOVA and Fst analysis were in good agreement with the results obtained through phylogenetic tree-based, similarity coefficient distribution and stucture analysis, and confirmed the presence of statistically moderate genetic diversity and high population structure. A critical and important factor to consider before carrying out association mapping (AM) analysis.

Discussion
The genetic improvement of yield and other economically important traits in crop species depends upon the genetic diversity available within the crop species. The cultivated varieties of basmati rice arise as a result of human selection from the available genetic diversity in various environments and human cultures. Modern breeding in the last two centuries has resulted in the development of varieties that are more uniform, less stable and more adapted to better control and limited environments. This has resulted in the popularization of few genotypes among the farmers, including basmati rice leading to narrow genetic base. The crop had become more prone to biotic and abiotic stresses. The basmati rice improvement requires the identification  Genetic Diversity and Basmati Rice of highly diverse germplasm and highly polymorphic molecular markers which in turn can be effectively utilized for the mapping of genes/QTLs for economically important traits and their subsequent use in molecular breeding. Hence the present study is initiated to know the genetic base of the basmati germplasm commonly grown in north western Himalayas. Identification of diverse genotypes using molecular markers is advantageous over the conventional approach [39]. SSRs molecular markers have been widely applied in the genetic diversity analysis, genotypic identification and population structure estimation in several rice genetic studies, including basmati rice [1,[4][5][6][7][8][9][10][11][40][41][42][43][44][45]. In the present study, 38 out of 40 markers were polymorphic and produced unique allelic profiles for the 141 basmati rice genotypes. In total 112 alleles were detected among 141 rice genotypes with an average number of 3 alleles per locus and an average polymorphism information content (PIC) of 0.41. The genetic diversity observed in the present study is similar to earlier studies [1], they detected 4.8 alleles per locus and an  average PIC value of 0.50. Three alleles per locus with an average PIC value of 0.41 among 88 Indian rice varieties collected from different agro-climatic regions of India were also reported [9]. Similarly, the average PIC value of 0.44 was observed among 43 Thai and 57 IRRI germplasm of rice [46]. In another study, an average PIC value of 0.45 was observed among the 183 Indonesian rice landraces on the Islands of Borneo [47]. A slightly lower genetic diversity was reported with an average of 2.75 alleles per locus and average PIC value of 0.38 among 40 rice accessions of Pakistan [8]. Similarly, a lower SSR diversity was also observed in a study with 36 polymorphic HvSSRs in which they detected 2.22 alleles per locus and an average PIC value of 0.25 in 375 Indian rice varieties collected from different regions of India [7].
In the present study, the average genetic similarity (GS) was observed (0.60) which mostly ranged between 0.2 and 0.9 (Fig 4), reflecting moderate degree of genetic diversity among the genotypes used in this study. The levels of average GS observed in this study, which is also comparable to earlier study [1] in which an average GS of 0.55 was reported among 82 accessions including both Indian and exotic rice was reported. The genetic similarities (GS) ranging from 0.21 to 0.92 among 155 japonica rice accessions was also observed [19]. Similarly, an average GS of 0.59 among 88 rice accessions that included landraces, farmer's varieties and popular basmati lines from India using 50 SSR markers was also reported [9]. The lower average genetic similarities (GS) of 0.39 was observed among 40 elite basmati and non-basmati rice accessions of Pakistan [8]. This is because of less number of diverse germplasm lines of rice have been used for the diversity study.
The dendrogram showed that all 141 genotypes of basmati rice were grouped into four major clusters (Fig 5). The genotypes were well clustered based on their place of collection and geographical region (Figs 5 and 6). The genotypes from Ko Brahimna Samba, Koul Ramgarh Samba, R. S. Pura, Ramgarh, Sainia were grouped in cluster III. Similarly, the genotypes from Badyal, Chatha, Bishnah and Hansley Chak were clustered in cluster II. Thus, most of the local basmati genotypes were clustered in cluster II and cluster III suggesting moderately less genetic diversity among these genotypes. It is because of similar breeding material were used for the development of these genotypes or in other words they have same ancestry. However, the varieties from SKUAST-J, IARI, Kathua, Kaul Haryana, GBPUAT and some from PAU were grouped in cluster IV. The varieties from Palampur, Meerut, some from IARI, GBPUAT and most of the genotypes from PAU were grouped in cluster I. Hence the varieties from IARI, GBPUAT and PAU were present in both cluster I and cluster IV which were distant in dendrogram. This is because of different types of material have been used for the breeding of these varieties.
The population structure analysis revealed 5 subpopulations A, B, C, D and E. The grouping of the genotypes here also is well based on the place of collection and geographic region. The local basmati genotypes were grouped in three subpopulations A, B and E. The genotypes from Badyal, Chatha, Bishnah, Hansley Chak, Sarore, some from Ko Brahimna Samba and R. S. Pura were grouped in sub-population A. Similarly, the genotypes from Koul Ramgarh Samba, Kathua, Sainia, some from Ko Brahimna Samba and R. S. Pura were grouped in subpopulation E. Genotypes from Ramgarh and some from R. S. Pura were grouped in subpopulation B. The varieties from SKUAST-J, Kaul Haryana, some from PAU were grouped in subpopulation C. The varieties from Palampur, Meerut, GBPUAT, IARI and few from PAU were clustered in subpopulation D. Additionally, the presence of statistically significant population structure was confirmed by AMOVA and Fst analyses. These findings are in accordance with earlier studies in which the variation among groups (35.28%) and within groups (64.72%) with a pair-wise Fst estimate ranged from 0.204 and 0.680 [46]. Similarly, the variation among population (34%) and within population (66%) has also been reported [25]. The results obtained through structure analysis and distance-based clustering are in well agreement with each other except the local basmati rice which were clustered into three subpopulations in structure analysis comparison to two cluster in distance-based clustering.
From the similarity coefficient distribution, dendrogram, structure, AMOVA and Fst analysis it is evident that the studied of NW Himalayas basmati rice germplasm has moderate diverse genetic base and high population structure. Hence the most divergent genotypes obtained in this study can be utilized for the future basmati rice breeding programme. Also the diverse genotypes and highly polymorphic functional SSR markers identified during this study can be used for the mapping of QTLs/genes for different biotic and abiotic stresses as well as for quality traits of basmati rice. The present studied basmati germplasm can also be effectively utilized in association mapping (AM) analysis for grain quality traits as is evident from there population structure analysis which is our future objective.
Supporting Information S1 Text. Availability of germplasm for research purposes (DOC). (DOC) S2 Text. Detailed information about the availability of material (PDF). (PDF)