Comparison of SSR and SNP Markers in Estimation of Genetic Diversity and Population Structure of Indian Rice Varieties

Simple sequence repeat (SSR) and Single Nucleotide Polymorphic (SNP), the two most robust markers for identifying rice varieties were compared for assessment of genetic diversity and population structure. Total 375 varieties of rice from various regions of India archived at the Indian National GeneBank, NBPGR, New Delhi, were analyzed using thirty six genetic markers, each of hypervariable SSR (HvSSR) and SNP which were distributed across 12 rice chromosomes. A total of 80 alleles were amplified with the SSR markers with an average of 2.22 alleles per locus whereas, 72 alleles were amplified with SNP markers. Polymorphic information content (PIC) values for HvSSR ranged from 0.04 to 0.5 with an average of 0.25. In the case of SNP markers, PIC values ranged from 0.03 to 0.37 with an average of 0.23. Genetic relatedness among the varieties was studied; utilizing an unrooted tree all the genotypes were grouped into three major clusters with both SSR and SNP markers. Analysis of molecular variance (AMOVA) indicated that maximum diversity was partitioned between and within individual level but not between populations. Principal coordinate analysis (PCoA) with SSR markers showed that genotypes were uniformly distributed across the two axes with 13.33% of cumulative variation whereas, in case of SNP markers varieties were grouped into three broad groups across two axes with 45.20% of cumulative variation. Population structure were tested using K values from 1 to 20, but there was no clear population structure, therefore Ln(PD) derived Δk was plotted against the K to determine the number of populations. In case of SSR maximum Δk was at K=5 whereas, in case of SNP maximum Δk was found at K=15, suggesting that resolution of population was higher with SNP markers, but SSR were more efficient for diversity analysis.


Introduction
Rice (Oryza sativa L.) is a staple food crop in the world and accounts for 21, 14 and 2% of global energy, protein and fat supply, respectively [1]. It serves as a model plant for genetic breeding and genomics research. Rice is rich in genetic diversity at both interspecific and intraspecific levels. Three subspecies; indica, japonica and javanica constitute a large reservoir of rice germplasm including a variety of local landraces and cultivars [2,3]. Knowledge regarding the extent of genetic variation and genetic relationships between genotypes are important considerations for designing effective breeding and conservation programmes.
Molecular markers allow precise and rapid varietal identification, which has been proved to be an efficient tool for crop germplasm characterization, collection and management. Earlier RAPD, ISSR and AFLP have been used very frequently for fingerprinting and characterization of varieties and germplasm accessions of different crop species. Since these markers can be utilized without prior genomic information on the target crop for analysis, they were generally used as markers of choice. But after year 2000 the locus specific markers such as Simple Sequence Repeat (SSR) got its preferential application in cultivar identification in many crops, such as grape [4], potato [5], rape [6], rice [7], almond [8], apple [9] and wheat [10]. With the sequencing of several genomes and the possibility of revealing single nucleotide polymorphism (SNP) markers en masse, SNPs are gaining importance in diversity studies [11,12]. The primary advantages of these markers are that they occur in genomes at a much higher frequency than SSRs, with close to one SNP being observed per 140 bp in rice [13], and that they can be genotyped in high throughput systems with a high multiplex ratio. The polymorphisms of SSR and SNP are generated via different mechanisms (replication slippage for SSRs vs. point mutation for SNPs) and the two marker types can therefore provide different views of the structure of a given population. Single nucleotide polymorphisms are valuable markers for the construction of high-resolution genetic maps, for the study of population structure, and for the discovery of marker-trait relationships in association-mapping experiments. Application of SNP markers on plant cultivar identification have been reported, in grape [14], grapevine [15], melon [16] and rice [17].
The present study was conducted to compare the two most important and preferred genetic markers, SSRs and SNPs for assessment of genetic variability and population structure among 375 rice varieties tested under distinct, uniform and stable (DUS-tested) system/ released and notified varieties (RV) by Indian system.

Plant Materials
Seed samples of 231 DUS-tested, 130 DUS-tested/released varieties and 14 released varieties of rice from different parts of India were received from Indian National Genebank, National Bureau of Plant Genetic Resources (NBPGR), New Delhi. The details of each variety along with passport data (national ID i.e. Indigenous Collection (IC) number, State, local name, pedigree, regions), status (DUS/RV) and important traits are given in Table S1.

DNA Extraction from Rice Seed
Six (6) to nine (9) seeds of each variety were dehusked and used for DNA isolation using QIAGEN DNeasy plant mini kit (Hilden, Germany). Kernels were ground into fine powder using tissue lyser (Tissue lyser II Retsch, Germany) with a tissue lyser adapter set (QIAGENq). DNA extraction procedure was as per manufacturer's protocol.

Genotyping of Rice Varieties using SSR Markers
One hundred and twenty highly variable SSR (HvSSR) marker loci with repeat lengths of 51-70bp located across all twelve chromosomes of rice [18] were selected for initial screening. Ten markers located on long and short arm of each rice chromosome were selected so that total genome can be covered effectively.
Gradient PCR was set for each primer with selected rice samples to standardize the temperature of amplification (Ta).
Out of 120 HvSSR primers, 36 primers which were showing good amplification were selected for final study.
Genomic DNA of all the 375 varieties was diluted to prepare working stocks of 10 ng/µl. PCR reaction was set in a total volume of 10µl containing 2µl genomic DNA (10ng/µl), 1µl of 10X buffer, 0.8 µl of 25 mM MgCl 2 , 0.2µl of 10mM dNTPs, 0.2 µl of each primer (10nmol), 0.2 µl of Taq DNA polymerase (Fermentas, Life Sciences, USA) and 5.6 µl distilled water. Amplification was performed in a thermocycler using following program; Initial denaturation at 94°C for 4 min followed by 36 cycles of 94°C for 30s, Ta for 45s, 72°C for 1 min and final extension at 72°C for 10 min. The amplified products were analyzed on 4% Metaphor agarose gel for 4 hrs at constant supply of 120V. Gel pictures were recorded using Gel Documentation System (Alpha Imager®, USA).

Genotyping of Rice Varieties using SNP Markers
The Sequenom Mass ARRAY system (Sequenom Inc., San Diego, CA, USA) uses matrix assisted laser desorption ionization-time of flight (MALDI-TOF) mass spectrometer for accurate detection of SNPs in a high-throughput manner (www. sequenom.com). Sequenom Mass ARRAY multiplex assays were designed for 36 SNPs (iPLEX gold chemistry), representing conserved single-copy rice genes [19], taking three genes per rice chromosome. The 36-plex assays were designed and validated by Sequenom Corporation (San Diego). The 30-mer pre-amplification primers and variable length genotyping primers generated by the Assay Design 3.1 software were procured and used for the validation of SNPs according to the Sequenom user manual.

Statistical Analyses
The SSR profiles were scored based on the size (bp) of fragments amplified across all the 375 varieties. The weak gel bands of negligible intensity and smeared bands were excluded from the final data analysis. In case of SNP, the Mass ARRAY Typer 3.4 Software was used for the visualization of SNPs and allele calling. The major allele frequency, gene diversity, heterozygosity and PIC for each locus were calculated for both SSR and SNP markers using Power Marker 3.5 [20]. In addition, genetic distances [21] across the genotypes and neighbor-joining (NJ) tree were calculated using Power Marker 3.5 [20]. The dissimilarity matrix generated by Power Marker was used to construct un-weighted neighbour joining tree using DARwin software 5.0.158 [22].
Principle Coordinate Analysis (PCoA) and Analysis of Molecular Variance (AMOVA) were performed using software GenAlEx V6.5 [23]. In case of SNP data, the bases were numerically coded as follows: A=1, C=2, G=3, T=4 and missing data was coded as 0 as suggested in GenAlEx V6.5 user manual [23]. Software STRUCTURE V2.3.1 was applied to infer historical lineages that show clusters of similar genotypes [24]. The membership of each genotype was run for range of genetic clusters from value of K= 1 to 20 with the admixture model and correlated allele frequency. For each K it was replicated 3 times. Each run was implemented with a burn-in period of 100,000 steps followed by 100,000 Monte Carlo Markov Chain replicates [24]. Ln(PD) derived for each K and then plotted to find the plateau of the ΔK values [25]. Online available programme "structure harvester" was used (http:// taylor0. biology.ucla.edu) to calculate final population structure. The proportion of the genome of an individual that belongs to each inferred population (admixture) was also estimated.

Results
The present study was conducted on 375 indica rice varieties which included DUS tested as well as released and notified varieties from eighteen major rice growing states of India and varieties released and notified by Central Varietal Release and Notification Committee (CVRC) of India. These 375 varieties includes 5 landrace 369 modern varieties and one hybrid variety (KRH-2)representing five regions of India where rice is grown as a major crop (Table S1). For comparing the efficiency of SSR and SNP markers in assessing genetic diversity and population structure, equal number of locus (thirty-six primers each) of SSR and SNP have been used and compared at the statistical, genetic relatedness and population structure level.

Statistical Comparison of HvSSR and SNP Markers
Temperature of amplification (Ta) for 36 HvSSR primers ranged from 51.9°C to 61.3°C, and used for generating amplification profiles of rice varieties. The number of alleles amplified per SSR primers varied from 2 to 4 ( Table 1). Maximum numbers of alleles were amplified by primer HvSSR12-39 (4 alleles). A total of 80 alleles were amplified with an average of 2.22 alleles per locus in 375 varieties. PIC value for HvSSR primers ranged from 0.04 for HvSSR06-16 to 0.5 for HvSSR05-09 with an average of 0.25. The gene diversity ranged from 0.05 to 0.58 with an average of 0.3. Heterozygosity was also calculated and for five loci heterozygosity was zero (HvSSR05-30, HvSSR06-16, HvSSR08-14, HvSSR09-26 and HvSSR10-03). Maximum heterozygosity was present at HvSSR09-55 loci (0.73) and average heterozygosity across all 36 loci was 0.12. The major allele frequency was also calculated for all 36 markers which ranged from 0.49 to 0.97 with an average of 0.78 ( Figure 1a, Table 1).
The unlinked SNP markers located on the short arm, centromeric region and long arm of all 12 rice chromosomes were developed and used for diversity analysis. The chromosome number, primer ID and their physical position on rice chromosome are given in Table 2. SNP is bi-allelic in nature therefore; total 72 alleles were amplified with an average of 2 alleles per locus in all tested rice varieties and the major allele frequency ranged from 0.56 to 0.98 with an average of 0.80 ( Figure 1b, Table 2). Alleles were scored as wild, heterozygous and alternate alleles across all the 375 rice varieties and the percent allele was calculated for all 36 loci (Table 2). Highest heterozygous alleles (58.6%) were found with marker 11-1849 located on chromosome 11. The PIC value was highest for primers 08-4218-5_C_129 and 1011927_C_178 (0.37) and lowest for primer 01-3916-1_C_156 (0.03) with an average PIC value of 0.23. The gene diversity also ranged from 0.03 to 0.49 with average gene diversity 0.28. Heterozygosity across the 36 loci ranged from 0.02 to 0.64 with mean value of 0.19.

Comparison of HvSSR and SNP Markers in Genetic Relatedness Study
All the HvSSR amplicons generated across 375 varieties were assessed for genetic distance and the dissimilarity matrix was used for cluster development using the neighbour joining (NJ) method. In the Unrooted tree ( Figure 2a) and Phylogram (Fig. not shown) rice genotypes were grouped into three major clusters. Further cluster1 was sub-grouped into cluster1a which contains 66 varieties, cluster1b with 13 varieties and cluster1c with 14 varieties. Similarly, cluster2 was sub-grouped into four clusters; cluster2a, cluster2b, cluster2c and cluster2d with 37, 29, 56 and 12 varieties respectively. Cluster3 was the largest containing 52 varieties in cluster3a, 87 varieties in cluster3b and 9 varieties in cluster3c. Grouping of the rice varieties labeled with different colours to represent different regions were getting represented in all the major clusters and even in the sub clusters. Further, traits of some of these varieties as recorded in the passport data and their grouping into different clusters was analyzed to find trait based grouping (if any). Interestingly in the cluster1c and 2d varieties having resistance to blast disease were grouped (Table S2).
Similarly all 36 SNP loci were scored across all the tested rice varieties. Genetic distance was calculated and NJ tree was constructed using dissimilarity matrix. Unrooted tree ( Figure  2a) as well as Phylogram (Fig. not shown) was generated to find genetic relationships among the rice varieties. All varieties were grouped into three major clusters. Cluster2 which was the major cluster further subdivided in to three sub clusters2a, 2b and 2c, respectively ( Figure 2b). In the cluster2a 127 varieties, cluster2b 76 varieties cluster 2c 43 varieties were grouped whereas, in cluster1 and cluster3, 75 and 54 varieties grouped, respectively. Grouping pattern of the rice varieties using SNP markers were found similar to SSRs. However, in cluster 2a out of 127 varieties 24 varieties were from Kerala which could be a notable observation regarding geographical isolation. No significant grouping was observed for trait on the basis of information available as passport data.
Analysis of Molecular Variance (AMOVA) with HvSSR showed that among regions only 1% diversity existed whereas at population level it was 4%. Maximum diversity has been observed at individual's level (70 %) and within individual (25%) level (Table 3, Figure 3a). In case of SNP maskers among regions, no diversity exists whereas at population level 1% diversity existed (Table 3, Figure 3b) and maximum diversity was partitioned between individual level (67 %) and within individuals (32%).
Principal Coordinate Analysis (PCoA) with SSR markers showed that large diversity existed in Indian rice varieties. Varieties exhibited uniform distribution across the two axes ( Figure 4a). The first three axes explained 13.33% of cumulative variation (Table 4). In PCoA all varieties were labeled with different colours based on their different regions to indicate their region specificity (Figure 4a), the intermixing of colour across the coordinates further, support the unrooted tree that there is no location-specific grouping of the samples because all varieties were intermixed across the coordinates whereas, in case of SNP markers varieties were getting grouped into three broad groups across the first two axes (Figure 4b). The first three axes of SNP explained 45.20% of cumulative variation (Table 4). In PCoA, the intermixing of colour across the coordinates, further support the unrooted tree with SNP marker that, there is no location-specific grouping.
In present study 29 Autumn rice varieties, also known as 'Aus' in West Bengal were compared with indica subpopulations and based on AMOVA, 1% variation (Table  S3a, and Figure S1a) with Fst value 0.009 (Table S3b) was found between the two subpopulations based on SSR markers but no such population difference was observed with SNP markers (Table S4a & S4b, Figure S1b).  Table 2. List of SNP primers used for genotyping of 382 rice accessions along with base call, gene diversity, heterozygosity and PIC.

Comparison of HvSSR and SNP Markers in Population Structure Study
To study the population structure, a model based programme STRUCTURE was used to determine genetic relationship among individual rice varieties. This model assumes that the number of populations was k and the loci were independent and at Hardy-Weinberg equilibrium. In the case of SSR, K=1 to K=20 population were tested, Ln(PD) kept on increasing with increasing population number but there was no clear population structure, therefore Ln(PD) derived Δk was plotted against the K to determine the number of populations using a software "Structure harvester" available online. At K=5 maximum Δk was found (Figure 5a) and this was considered as number of population for 375 Indian rice varieties. In population1 79 varieties, population2 63 varieties, population3 93 varieties, population4 57 varieties and in population5 83 varieties were grouped. Further, using structure analysis (Figure 6a) varieties under different populations were categorised as pure or admixture and for categorisation purpose varieties with more than 0.80 score were considered as pure and less than 0.80 as admixture. In population1 39 pure and 40 admix, population2 29 pure and 34 admixture population3 42 pure and 51 admixture, in population4 19 pure and 38 admixture and in population5 31pure and 52 admixture were present and in total 215 pure and 160 admixture were identified (Figure 6a).
Similarly in case of SNP markers Δk values were plotted against the K to determine the number of populations and at K=15maximum Δk was found (Figure 5b) Further, K=15 number was closer to number actual number of Indian states

Figure 2. NJ tree constructed for (a) SSR and (b) SNP data based on regions south (red), north-east (sky blue), east (blue), north (green), west (pink) and CVR (orange).
doi: 10.1371/journal.pone.0084136.g002 considered for the present study. Pure and Admixture in two populations were decided as above in case of SSR markers. In population K=15, 66 pure and 309 admix were present ( Figure  6b). Observation of only 66 pure individuals in the K=15 population is further supported by the pedigree because only few varieties were developed using local material whereas majority for modern varieties were bred using genetic material introduced from different locations (Table S1). The grouping of Aus and Indica subpopulation was also analysed with both SSR and SNP markers but no specific grouping of Aus was observed from indica subpopulation (Figure 6a & 6b).

Discussion
Molecular basis of polymorphism and their distribution across the genome is quite different for SNP and SSR markers. Hence, the utility of SSR/ SNP marker in crop improvement will depend on the quality of information they provide with respect to parameters for genetic diversity and population structure. This is the first such study where SSR and SNP marker system were assessed for their efficiency in assessing genetic diversity and population structure in the large collection of Indian rice varieties. This study is also important on the account that India is the centre of origin for rice and varieties released for cultivation in five major regions are expected to be diverse because they are released by different states according to their suitability to respective agro-climatic conditions. A comprehensive analysis of SSR and SNP markers in such a diverse set of varieties may assist rice breeders in deciding which one of the two is appropriate for the rice germplasm characterization and designing breeding strategies.
The SSR vs SNP comparison in Indian rice collection for statistical parameters, such as number of alleles amplified per primers (2.2/2.0 allele per primer), gene diversity (0.30/0.28), heterozygosity (0.12/0.19) and PIC value (0.25/0.23), revealed that both marker systems generated almost similar information when equal number of locus (36 locus each) were studied. Earlier reports say that more SNP loci should be studied when comparing diversity generated by SSR [26]. Further, comparison of our findings with previous studies in rice using both SSR and SNP markers, showed mixed results in number of cases; low numbers of alleles were reported for SSR markers. The low number of alleles recorded in this study may be mainly due to the resolution effectiveness of the metaphor agarose used for separation of amplified products. Various studies in rice using agarose/ PAGE gels have reported lower numbers of alleles [27][28][29]. Another reason for reduced number of alleles may be the exclusion of monomorphic and spurious bands from analysis. The low average PIC value (0.25) observed with SSR markers indicates lower genetic diversity among the rice varieties considered for our study. The high average PIC value for SSRs have been mainly reported with diverse germplasm lines because they are genetically more diverse compared to varieties. PIC depends upon many factors such as breeding behavior of the species, genetic diversity in the collection, size of the collection, sensitivity of genotyping method and location of primers in the genome used for study. Low PIC value with SSR markers was found in other rice collection studies [29,30]. For the SNP markers PIC values ranged from 0.03 to 0.37 with an average PIC value 0.23. PIC value in the similar range was reported for SNP markers in a collection of rice varieties by Chen et al [31]. Moreover, due to bi-allelic nature of SNPs, their PIC values can range from 0 to 0.5. Whereas for SSR markers which are mutli-allelic PIC value goes above 0.5 and it can go up to 1.0. Therefore, in our study, SNP markers with mean PIC value 0.23 were more informative than SSR markers with mean value of just 0.25. Furthermore, efficiency of both the markers was compared in assessing genetic relatedness among the varieties on the basis of their clustering pattern in the unrooted tree. Although the unrooted trees constructed using either SSR or SNP markers, grouped varieties into three major clusters, the numbers of varieties grouped into the clusters were different for both the trees. These findings were not surprising as broad pattern of grouping is expected to be more or less similar irrespective of type of markers used for genetic relatedness study. Similar, findings were also reported in European rice collection by Courtois et al. [17]. Traits available in passport data were  regarding geographical isolation. Lack of geographical isolation observed in the present study was mainly due to frequent introduction of genetic material from different locations for development of modern rice varieties. AMOVA analysis also indicated that at regions only 1% diversity exists whereas among population level 4% diversity exists in HvSSR and 1% at population level in SNP with the both the markers maximum diversity is getting portioned at individual's level (70 % in case of SSR and 67% in case of SNP) and within individual (25% in SSR and 32% in SNP) level. AMOVA study shows that HvSSR marker has explained better partitioning of variation than SNP markers. Similar partitioning of variation at population and sub population level has been reported in case of rice [32]. The PCoA plot analysis of rice varieties using SSR vs SNP generated interesting results. Though, the broad pattern of distribution of varieties in the PCoA plots was similar with both the markers, but a closer look revealed three major clusters for rice varieties in case of SNP markers, such grouping was not found in case of SSR markers. Similar observation has been reported in case of wheat with SNP markers [33]. The proportion of variance explained by first three coordinates in case of SNP (45.20%) was higher than the SSR (13.33%) which in accordance with finding in maize [34] and Wheat [33]. Overall at the genetic relatedness level SSR markers are more informative as compared to SNP markers. Therefore, it may be concluded on the basis of this study that for genetic diversity analysis HvSSR markers were more effective. SSR marker in genetic diversity analyses have been a powerful tool because these markers are neutral, multi-allelic and co-dominant in nature (Lapitan et al. 2007) [27].
Based on SSR and SNP markers study Indica rice has been subdivided into two subpopulations Aus and Indica [3,35,36]. In the present study SSR markers support the grouping but with SNP marker no such clear distinction between the two subpopulations was observed. There are three seasons for growing rice in India viz.-autumn, winter and summer. Autumn rice is known as 'Aus' in West Bengal, 'Ahu' in Assam, 'Beali' in Orissa, 'Bhadai' in Bihar, 'Virippu' in Kerala and 'Kuruvai/kar/ Sornavari' in Tamil Nadu. The pedigree of modern varieties also shows that autumn rice has been frequently used as one of the parent in the development of modern varieties, such as Ratna, Manoharsali and Annada (Table S1) has been used for development of Shanti, Himalaya-2, Nagarjuna, Chandan, Kapilee and Luit. This may be another reason for less distinction between the two subpopulations. Similar trend was observed with population structure where Aus did not show distinct grouping from indica subpopulation (Figure 6a & 6b).
At population level, no clear population structure for the rice varieties was observed either with SSR or SNP markers which may be due to large genetic variation or frequent intermixing of rice varieties in rice crossing programme across the regions. Therefore ad hoc population was determined using Δk and different populations numbers were found with SSR (K=5) and SNP (K=15) markers. The genetic structures of populations have been previously reported in rice [37][38][39][40][41]. The population observed in these studies ranged from K=3 ) to K=7 (Jin et al. 2010). The higher value (K = 7) is primarily due to the higher number and diverse set of germplasm [41].
The present study generated a population structure with five clusters using SSR, and fifteen clusters using SNP. Fifteen population cluster and large number of admixture varieties with SNP indicated that population structure can be better explained with SNP markers, because in released varieties of Indian rice to create variation large number of diverse parents has been used. Since admixture is the representation of diverse parents, which themselves have diverse ancestry in breeding history and domestication, may be main reason for variation present in the population [42]. Since, in the SNP based population structure rice varieties appeared subdivided in more clusters than SSR, indicated ability of SNP marker system in delineating population structure at fine level in crops.
In conclusion, though SSR/SNP are multi-allelic/bi-allelic nature and have different distribution pattern over genome even then in the present study equal number of locus (thirty-six each) amplified almost equal number of alleles in case of SSR (80) and SNP (72) markers. However, the unique features of SNP markers such as, abundance in the genome, ability to generate polymorphism due to variation at single base level and their development from the conserved single-copy rice genes [19], enabled these markers to present different diversity spectrum as well as the population structure in Indian rice varieties as compared to the SSR markers. At the population structure level SNP markers showed better genetic relatedness with more population number whereas at the diversity level SSR showed better grouping samples even at trait level. For this reason, SNP markers should be preferably used for determination of population structure in crops. Moreover, SNP markers are mostly derived from genes, as a result genetic diversity assessed using these markers reveals functional variations and may be potentially exploited for the marker trait association studies. Additionally, SNP markers in the present study were derived from the genomic region of rice having synteny with wheat genome and therefore may be equally useful for assessing genetic diversity, population structure and other marker based studies in wheat. Table S1. List of rice varieties genotyped using SSR and SNP markers.