Population Structure, Diversity and Trait Association Analysis in Rice (Oryza sativa L.) Germplasm for Early Seedling Vigor (ESV) Using Trait Linked SSR Markers

Early seedling vigor (ESV) is the essential trait for direct seeded rice to dominate and smother the weed growth. In this regard, 629 rice genotypes were studied for their morphological and physiological responses in the field under direct seeded aerobic situation on 14th, 28th and 56th days after sowing (DAS). It was determined that the early observations taken on 14th and 28th DAS were reliable estimators to study ESV as compared to56th DAS. Further, 96 were selected from 629 genotypes by principal component (PCA) and discriminate function analyses. The selected genotypes were subjected to decipher the pattern of genetic diversity in terms of both phenotypic and genotypic by using ESV QTL linked simple sequence repeat (SSR) markers. To assess the genetic structure, model and distance based approaches were used. Genotyping of 96 rice lines using 39 polymorphic SSRs produced a total of 128 alleles with the phenotypic information content (PIC) value of 0.24. The model based population structure approach grouped the accession into two distinct populations, whereas unrooted tree grouped the genotypes into three clusters. Both model based and structure based approach had clearly distinguished the early vigor genotypes from non-early vigor genotypes. Association analysis revealed that 16 and 10 SSRs showed significant association with ESV traits by general linear model (GLM) and mixed linear model (MLM) approaches respectively. Marker alleles on chromosome 2 were associated with shoot dry weight on 28 DAS, vigor index on 14 and 28 DAS. Improvement in the rate of seedling growth will be useful for identifying rice genotypes acquiescent to direct seeded conditions through marker-assisted selection.


Introduction
Rice (Oryza sativa L.) is widely cultivated under irrigated and rainfed conditions, with cultivated varieties and landraces that specifically adapt to these situations. In India, more than 50% of rice areas under rainfed conditions are cultivated as direct seeded rice (DSR). The existing rainfed upland varieties are of short duration type, drought tolerant and low yielding and possess early vigor trait. Direct seeding is also a common practice in low land rice of India and weed is the major problem in those situations. Early seedling vigor (ESV) trait of rice will help in competing with weed and this trait is needed in lowland rice. While, irrigated rice varieties are high yielding and relatively poorer in early vigor as compared to rainfed varieties. In rice growing areas of South and South East Asia, ESV trait has been exploited in rainfed upland cultivar development but not to the same extent in rainfed low land and irrigated situations, mainly due to the narrow genetic base of selecting attributes primarily for higher grain yields. Attempts were made to introgress ESV from upland landraces to high yielding cultivars; however, significant progress could not be achieved due to complex nature of the trait [1]. Further, only a small fraction of available landraces has been used in practical breeding. Traditional native landraces are widely grown under rainfed lowland conditions by resource poor farmers in India due to lack of suitable improved varieties for such harsh conditions. Most of these landraces are less productive, but they possess weed-smothering ability by early vigor and excellent growth adaptation features to direct seeded conditions. Due to unpredictable rainfall patterns, direct seeded rice has become regular practice under rainfed situation. Conversely, due to labor scarcity and unavailability of timely irrigation water, the farmers in the irrigated ecosystems are also adapting to dry direct seeding techniques. However, currently available modern rice varietal architecture with semi-dwarf stature and reduced seedling vigor is not amenable to dry direct seeded conditions [1]. It is clear that, ESV is relatively restricted to upland varieties as compared to transplanted rice varieties. Therefore, for DSR ESV is an important trait to achieve higher biomass, smothering weed ability with higher grain yields [2].
For better understanding of ESV trait, it is important to know the genetic relationship among a large set of cultivated genotypes by using DNA markers efficiently. Among the different DNA markers, simple sequence repeats (SSR) are most commonly used, as they are hyper variable, co-dominant, robust, multi-allelic in nature, chromosome specific and greatly facilitated linkage map construction and use in plant breeding. SSR markers are widely employed to assess genetic diversity and genetic structure in rice and several other crops. However, limited information is available on classification of ESV in rice cultivars using DNA markers. Molecular markers-based genetic maps facilitate applied genetics and breeding program [3].
ESV and its component traits being complex quantitative nature, their involvement with several physiological processes and influences of G x E interactions makes these traits to achieve slower progress for genetic enhancement. Therefore, integration of marker tools into breeding schemes provides an opportunity to handle such challenges and facilitates markerassisted selection. Continuous variation could be observed for ESV within a segregating population and frequency of favorable alleles at many loci could also be achieved by selection. QTL mapping based on segregating population helps to identify loci for quantitative and qualitative traits with limited number of segregating alleles at any locus and low resolution typically in the range of 10-30cM [4]. To overcome such situations, association mapping based on the linkage disequilibrium is popularly adapted to achieve a higher resolution and targets multiple alleles at individual loci by exploiting historical recombination events available in germplasm and by identifying association between marker and phenotype of larger number of traits. Till date, very few researches on association studies in rice for ESV have been done. Therefore, the present study examined to explore; i) the characteristic differences for ESV of 629 eco-geographical rice genotypes, ii) genetic diversity of the selected genotypes using molecular markers, and iii) genomic regions associated with ESV related traits.

Experiment 1
The total of 629 rice (Oryza sativa. L) genotypes (Fig 1) were representative of breeding lines and landraces, comprising of 25 tropical japonica, 57indica landraces, 127 breeding lines, and 427 ARC (Assam Rice Collection) accessions cited in S1 Table. The seeds were obtained from Gene bank of National Rice Research Institute (NRRI), Cuttack, India. The experiment was conducted at NRRI during 2013 dry season (experiment 1) in upland situation (20°.48'N, 85°.86'E). The experimental site soil had 34.4% sand, 29.1% silt, 32.4% clay with bulk density of 1.46 Mg/m 3 . Seeds of each of the genotype were direct sown in 4 m long row with 15 cm apart and 20 cm between rows in augumented block design. Pre-emergence herbicide Pendimethalin was sprayed at the recommended rate within 24 hr of irrigation and maintained weed-free through the growing period by hand weeding. Crop was raised with recommended fertilizers doses of 80 kg nitrogen ha -1 , 40 kg phosphorus ha -1 , and 40 kg potash ha -1 . The field was maintained under non-saturated aerobic condition with supplemental surface irrigation.
In order to study the variability among genotypes for ESV, several vegetative traits were observed. Fourteen days after sowing (DAS), six rice plants in the middle row from each genotype were selected to study the shoot length (cm), number of leaves, root length (cm) and shoot dry weight (DW) (g). The same set of traits was observed on 28 and 56 DAS with number of tillers per plant and leaves per tiller. Growth rate of seedlings were measured to assess ESV by vigor index (VI) as suggested by Maguire [5].

VI ¼
Rate of germination ð%Þ Seedling dry weight ðmgÞ While, germination rate (GR) was calculated as Where, X n is the number of germinated seeds on the n th day and Y n is the number of days from the first experimental day. Absolute growth rate (AGR) was calculated as Where, h 1 and h 2 are the plant height at times t 1 and t 2 respectively. Relative growth rate (RGR) was determined by measuring plant dry weight periodically during growth and is represented as mg g -1 day -1 .
Where, W 1 and W 2 are the plant dry weights at times t 1 and t 2 respectively. Crop growth rate (CGR) measured the plant dry weight of a particular ground area at a regular interval of time divided by land area and represented as g m -2 day -1 .
Where w 1 and w 2 are the plant dry weights at times t 1 and t 2 respectively, P = spacing (m 2 ). An initial descriptive statistics, including mean, standard deviation, minimum and maximum values, coefficient of variation (%), skewness, kurtosis, and distribution pattern was performed by box plot technique. PCA was performed to estimate Euclidean distance between two genotypes in the multivariate space and adaptation of genotypes to specific environment [6][7]. These analyses were performed using the Windostat 7.5 software (Indostat Services, Hyderabad, India.). Discriminate function analysis was applied to find the best variable(s) that can discriminate high and low seedling vigor genotypes. For this, the genotypes were divided arbitrarily into three groups based on the shoot length and dry weight (Tables 1 and 2).
Based on the broad trend observed in the PCA and phenotypic data. Forward stepwise discriminate function analysis (DFA) was carried out to understand the combination of variables, which could best explain the grouping in STATISTICA ver.10 (Stat soft, Inc, Tulsa, ok, USA). In DFA, a larger eigen value indicates that the discriminant function is more useful in distinguishing between the groups. Ninety-six genotypes were selected representing high, medium and low vigor genotypes with almost equal proportions from each group were selected for the panel from the results of the experiment 1. Groups were defined by following the scale presented in Table 1. The selected genotypes were raised in earthen pots (20 cm height and 15 cm diameter) containing upland soil and farmyard manure in 3:1 ratio in three replications during 2014 dry season. Three plants were maintained per pot. The same set of 96 genotypes was raised on upland situation in augumented design with similar agronomic practices in four-meter row length as followed in experiment 1. Five plants from each genotype were selected for measurement of shoot length (cm), root length (cm), leaf length (cm), leaf width (cm), number of leaves, shoot dry weight (g), root dry weight (g), and VI on 14 th and 28 th DAS. By using these variables, RGR, AGR and CGR were derived to measure growth rate of seedlings as mentioned in experiment 1. PCA based clustering was performed to find out the pattern of phenotypic diversity and variables contributions to diversity using the Windostat 7.5 software. DNA isolation and selection of SSR markers. The experimental material comprised of selected 96 genotypes that were direct sown in the pot and grown in net house at NRRI, Cuttack. After 20 days of sowing, young leaves were harvested and genomic DNA was isolated from the bulked leaf sample following CTAB method [8]. The quantity was estimated by spectrophotometer and agarose gel electrophoresis using lambda DNA of known concentration. The isolated DNA samples were diluted in T 10 E 1 buffer (10 mM Tris-HCl, 1 mM EDTA, pH 8.0) to obtain the final concentration of 15 ng/μl for amplification.
A priori information was used to select ESV linked 52 SSR markers from the reports of earlier mapping population studies and the distribution of markers covered all the chromosomes to illustrate the diversity [1].
PCR amplification and electrophoresis. The PCR amplification was executed with 20 μl of PCR mixture contained 30 ng of genomic DNA, 1x PCR buffer, 200 μM dNTP mix, 4 picomoles of each forward and reverse primers, 2 mM of MgCl 2, and 1U of Taq DNA polymerase (MBI, Fermentas, USA). Template DNA initially denatured at 94°C for 5 min followed by 36 cycles of denaturation at 94°C for 1 min, annealing at 55-67°C (varies upon primers) for 1 min and extension at 72°C for 5 min followed by final extension at 72°C for 10 min. The amplification products were separated on 2.5% agarose gel containing ethidium bromide using 1x TBE buffer. DNA fragments were visualized under UV transillumination using Alpha Innotech gel documentation system (Flour Chem TM 5500, Alpha Innotech, USA).
Data analysis. Amplicons were scored according to their product size for each genotype and primer combination. Number of alleles (N), major allele frequency (A), observed heterozygosity (H o ), expected heterozygosity (He), and PIC, for each SSR locus were determined by using Power marker 3.25 [9]. A PIC value of each marker was determined as suggested by Botstein et al. [10].
Further, the allelic data were subjected to estimation of genetic distances among genotypes using simple matching coefficients by bootstrapping 10,000 times and they were clustered using neighbor joining method [11]. PCoA was performed to highlight the resolving power of the ordination and the first two components were used to represent the genotypes in the graphical form. PCoA and dissimilarity matrix were performed by using DARwin software version 5.0 [12]. Further, Analysis of molecular variance (AMOVA) was performed to describe variance components among individuals and the population differentiation among the seven assumed sub populations using GeneAlEx 6.41 program [13] with 1000 permutations. Genetic differentiation among the assumed sub population was analyzed using Nei's gene diversity statistics [14] using POPGENE program version 1.31 [15]. To identify the genetic structure of given population and assign individuals to populations, the software STRUCTURE version 2.3.3 [16] were used. To derive the optimal number of groups (K), STRUCTURE was run with K varying from 1 to 10, with five runs for each K value. To determine the true value of K, ad hoc statistic ΔK was followed [17]. Parameters were set to 1,00,000 burn-in periods and 10,000 Markov Chain Monte Carlo (MCMC) replications after burn-in with an admixture and allele frequencies correlated model. Association analysis. To find the genetic relatedness between phenotypic performances of the rice accessions and SSR makers, two statistical models were used; i) General linear model (GLM) and ii) Mixed linear model (MLM) in TASSEL version 5.0 [18]. Significantly associated markers with traits were identified on the traits of their R 2 and p-value.

Experiment 1
Variation in ESV related morphological traits on 14, 28and 56 days after sowing. In the present investigation, distribution pattern of ESV traits of 629 genotypes were presented in S1A-S1H Fig . The higher level of difference could provide an opportunity to select genotypes with ESV. Inter-quartile range (IQR) is a measure of variability. Among all the traits studied, lower IQR was observed for DW on 14 DAS and RGR between 28 and 56 DAS followed by VI on 14 DAS. Germination rate and number of leaves/tiller on 56 DAS have significant number of outliers. Among the traits studied on 14 DAS, shoot length varied from 9.4to 26.6cm as measured in 'Sharvesh'(9.40 cm) and 'ARC 10797' (26.58 cm)respectively with mean of 16.75 cm and 'ARC 10797' exhibited 1.58 times higher than the mean value. Coefficients of variation (CV) indicating variability among the genotypes were high for VI on 56 DAS (57.93%) followed by CGR between 28 th and 56 th (52.14%), DW on 56 DAS (51.44%) and VI on 28 DAS (42.23%). Higher CV suggests that, the selected material in this experiment exhibited higher variability.
Grouping pattern of genotypes and association between variables. Principal component analysis (PCA) was performed to identify association between traits, responsible traits for ESV and grouping pattern of rice genotypes on the basis of the traits. The first two PCs accounted for about 71.47% of the total variability with >1.0 eigen value. The biplot in Fig 2 shows a strong relationship between the DW and VI (r = >0.900), dry weight and shoot length (r = >0.400) and VI and CGR (r = >0.800) irrespective of different days of observation (14, 28 and 56 DAS). Cultivar-by-trait biplot indicated that the traits were grouped into two, based on days of observation. Traits observed on 14 and 28 DAS, were grouped together and separated from traits observed on 56 DAS (Fig 2). Genotypes with early vigor and growth on 14 and 28 DAS were grouped together in quadrant 2, whereas genotypes with high biomass on 56 DAS were grouped in quadrant 3. The discriminate function is statistically significant with high canonical correlation (r = 0.8398). Further, it discriminated high and low seedling vigor genotypes and found that shoot length on 28 DAS showed the lowest partial lambda, the highest F-remove value and standardized coefficient, followed by shoot weight on 28 DAS. Traits and grouping pattern of selected genotypes. The selected 96 genotypes were raised under net house and field condition during dry season. PCA was executed for all the 96 genotypes under both situations to identify trends among low and high early seedling vigor genotypes and the traits responsible for source of the variability (Fig 3A and 3B). The PC 1 accounted for about 60.71% and PC 2 accounted for 21.22% of the total variance (totaling 81.93%) under net house condition. From Fig 3A, it is inferred that the longest vector loading such as AGR, leaf length and shoot length on 28 DAS are the major discriminators. However, under field condition RGR, leaf length and shoot length on 14 DAS are the major discriminators ( Fig 3B). In PCA under field condition, first and second component explained 75.08% and 15.88% (totaling 90.96%) of the variance respectively. The PCA under both conditions, groups the genotypes into two groups early vigor genotypes on right side and non-early vigor genotypes on left side.
SSR marker segregation. In the present experiment, 96 rice genotypes include 85 landraces and 11 modern/breeding lines representing six major rice growing states of India. Initially the experiment was begun with 52 SSR markers to assess the genetic relatedness. Among them, 39 markers generated polymorphic bands ( Table 3). The number of alleles amplified per marker varied from 2 to 7 with an average of 3.28 per locus in 96 genotypes. PIC values ranged from 0.04 (RM3839) to 0.37 (RM148) with a mean of 0.24. Observed heterozygosity (H o ) ranged from 0.04(RM3839) to 0.97 (RM148) with an average heterozygosity across all 39 loci was 0.42. None of the loci exhibited heterozygosity as zero. The expected heterozygosity (gene diversity) (H e ) ranged from 0.04 to 0.50 with an average of 0.30. Major allele frequency was also calculated for all 39 loci, which ranged from 0.52 to 0.98 with an average of 0.79 (Table 4).
Genetic relatedness by cluster and principal coordinate analysis (PCoA). Cluster analysis was carried out to assess genetic distance and the dissimilarity matrix-using neighbor joining method. In the Unrooted tree genotypes were grouped into three major clusters (Fig 4). Cluster 1 has 59 genotypes, they sub grouped into cluster 1a and cluster 1b with 43 and 16 genotypes respectively. Similarly, cluster 2 was sub grouped into cluster 2a that contains 18 genotypes and cluster 2b with 10 genotypes. Cluster 3 was the smallest containing 9 genotypes. On the basis of geographical location, no significant grouping was observed. However, sub group 1b and 2b and cluster 3 having genotypes of landraces alone. Further, sub group 1a and 2a clustered genotypes of both improved and landraces. The genotypes of tropical japonica were grouped into two sub groups as 2a and 2b.
PCoA with SSR markers were used to determine the genetic relatedness among the genotypes (Fig 5). The first three axis of differentiation explained 19.98% of the total variation. In PCoA, genotypes were presented in colors corresponding to the clusters observed in unrooted tree. Similar to clusters analysis, PCoA also did not group the genotypes based on the locationspecific and intermixing of colors across the coordinates were observed. However, improved genotypes were inclined to stay on the quadrant I and IV; tropical japonica AC 38892 and AC 38556 were on the opposite quadrant, indicating their dissimilarity.
Analysis of molecular variation (AMOVA) and Nei's genetic distance. The assumed populations were subjected to AMOVA. AMOVA revealed that maximum variation was among individuals with greater variance (99%), while sub-populations were grouped together, the variance was only 1% and the F st (0.016) was non-significant. The inbreeding co-efficient F IS (-0.419) and F IT (-0.396) were found to be 1.000. It indicated that individuals from different populations are genetically more closely related than within population. Nei's genetic distance was estimated between assumed seven populations. The pairwise Nei's genetic distance ranged from 0.0024 (lowest) between population of pop 6 and pop 7 to 0.0316 (highest) between population of pop 3 and pop 4.   Population structure. STRUCTURE software was used to determine relationship among the genotypes studied and distinguished 96 genotypes into two populations with ΔK value of 31 (Fig 6). In population 1, 46 genotypes and rest of the 50 genotypes were grouped into population 2. Two tropical japonica, 11 improved lines, 26 ARC lines and seven landraces were grouped into one population with proportion of 47.91%. While, population 2 has the membership proportion of 52.08% consisted of 44 ARC lines, five landraces and one improved genotype IR72.By structure analysis, pure or admixture genotypes were categorized. The genotype with score >0.80 was considered as pure and <0.80 as admixture. Among the 96 genotypes, 27 genotypes were pure and 19 were admix in population 1, while 34 were pure and 16 were admix in population 2. In total, 61 were pure and 35 were identified as admixture and the same can be visualized in graphical representation (Fig 7). The fixation index (F st ) values of two population ranged between 0.0218 (population 1) and 0.3397 (population 2), while allele frequency divergence between two population was 0.057. Association of SSRs with early seedling vigor traits. The results of association analysis using GLM and MLM (Q+K) are presented in the Table 5. The structured association mapping revealed 16 and 10 marker-trait association for 12 and 11 ESV related parameters were observed by GLM and MLM approach respectively. GLM analysis of marker-trait association among the population revealed markers RM13 and RM334 with leaf length observed on 14 DAS located on chromosome 1 and 5 respectively. By accounting population structure and relative kinship effects (Q+K model), RM13 marker associated leaf length on 14 DAS with 7.3% of phenotypic variability. GLM and MLM analysis together they detected RM223 strongly associated with leaf length on 28 DAS with 10.3% and 8.4% phenotypic variability respectively. There were two common markers associated with leaf width on 14 DAS distributed over chromosome 8 and 7, of which RM230 on chromosome 8 had described 5.05% of the phenotypic variability. For root length on 14 DAS, two QTLs were detected among the population by both analyses, and RM3839 (chromosome 4) was associated with root length followed by RM230 (chromosome 8). RM249 and RM6091 on chromosome 4 and 8 had described 4.8% and 3.8% of the phenotypic variability respectively for root length on 28 DAS by GLM analysis. Followed by MLM, RM249 exhibited association with root length on 28 DAS with 5.04% of phenotypic Genetic Diversity and Marker Trait Association Analysis in Rice for ESV variability. Three SSR markers (RM13, RM3348 and RM16) related to shoot length on 14 DAS were detected by GLM. Among them RM13 and RM16 accounting 6.05% and 5.14% phenotypic variability for shoot length detected by MLM. None of the markers were found to be associated with shoot length on 28 DAS and dry shoot weight on 14 DAS by GLM and MLM analyses. Marker RM341 was found to be associated with vigor index QTLs as detected by GLM and MLM with >4.5% phenotypic variability.

Discussion
Genetic variability is the key determinant for any breeding program. The sporadic reports on germplasm evaluation of landraces and genetics of ESV traits in rice encouraged us to take up this study. Traditional native landraces those grown under rainfed are competitive to weeds. By   introduction of modern high yielding with low ESV semi dwarf varieties, weed competitive landraces are almost extinct and disappearing in farmers' field. Therefore, the present experiment was executed to identify the suitable genotypes with high ESV and to find the loci responsible for those traits of the gene by association mapping. We began with a panel of 629 rice accessions (S1 Table). Landraces included in this study were from seven states of India viz., Assam, Odisha, Uttar Pradesh, Karnataka, Tamil Nadu, West Bengal and Bihar. The early seedling vigor trait distribution pattern of the panel of genotype studied on 14 th , 28 th and 56 th days after sowing (DAS), exhibited that germination rate, number of leaves on 14 DAS, DW on 56 DAS, VI on 56 DAS and CGR distribution was asymmetrical. VI and DW on 56 DAS and CGR between 28 and 56 DAS exhibited maximum variability and portrayed that the population was heterogeneous.

Spatial distribution of genotypes and variation of ESV traits
In this study, association between dry weight and shoot length or vigor index suggests that accumulation of dry weight is an important trait contributing for ESV. Cultivar-by-trait biplot analysis grouped the vectors of the variables of 14 and 28 DAS into a separate group (highly correlated). Similarly, variables of 56 DAS had formed another group (highly correlated). The reason might have been due to higher differences in the trait variability observed before 28 and 56 DAS. Further, it clarifies that the observation taken on 14 and 28 DAS was the reliable estimator to study ESV in rice than observed on 56 DAS. The position and perpendicular projection of genotypic points onto variable vectors grouped early vigor genotypes on 14 and 28 DAS in quadrant 2 and these genotypes were moderately vigorous on 56 DAS than genotypes captured in quadrant 1 and 4. Genotypes captured in quadrant 3, exhibited vigorous growth on 56 DAS with moderate early vigor than genotypes grouped in quadrant 1 and 4. The PCA analysis of selected 96 genotypes under net house and field condition classified early vigor genotypes on right of the biplot and non-early vigorous genotypes on left side of the plot. This is evident from the genotypes like 'Sabita' (a known early vigor genotype) [32] positioned on right side with other early vigor genotypes viz., Varshadhan, Vandana, AC4387 and Pyari. In contrary, genotypes KB46, AC 38399 and ARC 10656 moved on the left side of the plot, where they exhibited reduced vigor index and dry weight.

Diversity estimation, genetic relationships and grouping of genotypes
The genetic architecture of diverse rice accessions can be assessed by model based and distance based clustering approach using the SSRs or SNPs marker system [33][34][35][36][37]. This is the first such study where ESV trait linked SSR markers were utilized to assess genetic diversity and population structure in 96 germplasm lines of Indian rice collections. We are confident enough that, comprehensive analysis of trait linked SSR markers in such a diverse collection will be helpful for breeders to design breeding strategies for direct seeded situation with ESV.
The 39 SSRs generated a total of 128 alleles with an average of 3.28 alleles per locus (Table 4) [35]. Several factors were reported to improve average PIC values of SSRs such as inclusion of diverse lines, size of collection, breeding behaviors of the species, genotypic method and location of markers in the genome [35]. Lower PIC value observed in the present study might be due to inclusion of large number of ARC accessions. One unique and nine rare alleles were observed in the present study. Unique allele can be considered for diagnostic of a particular genotype and increase in number of rare alleles contributes for overall genetic diversity. The proportion low or high frequency allele also contributes to overall genetic diversity. In our case of study, high frequency allele has 58.7% contribution over low frequency allele.
Genetic relatedness among the accessions was assessed on the basis of their clustering pattern in the unrooted tree. The unrooted tree grouped varieties into three major clusters with varying number of genotypes per cluster. The clustering pattern of genotypes obtained in the present study was found to be trait based grouping and no geographical isolation clustering has been observed. Inclusion of large number of ARC collections might be the reason for lack of geographical isolation. Accessions of subgroup 1b possessing 11 ESV traits measured on 14 and 28 DAS were grouped together with higher crop growth rate (CGR). The genotypes were grouped together under the subgroup1b were tall and possessed high vigor index (VI) as preferred for ESV. While, cluster 3 positioned opposite to subgroup 1b grouped the genotypes of least vigor type and exhibited inferiority for 11 ESV traits including vigor index. Singh et al. [35] also reported similar grouping pattern in Indian rice collections for biotic stress. The grouping pattern of the PCoA corresponded well with the pattern of Neighbor Joining Tree. Genotypes of subgroup 1b (deep blue color) tends to stay in quadrant 3, whereas, corresponding genotypes of cluster 3 (purple color) moved apart to the left extreme of PCoA plot. However, few exceptions were observed in position of genotypes compared to cluster analysis due to the complexity of the variations could not explained in two dimensional graphical representation, as the first two PCs explained the total variation less than 80% [40].
Source of variation, population structure and marker-trait association AMOVA study indicated higher partitioning of variation within individuals and lower portion existed among populations. Estimates of the fixation indices revealed that F st indicates little divergence existing between subgroups of the population. F IT is the overall inbreeding coefficient of an individual within the total population. Negative value of F IT indicates higher proportion of heterozygote genotypes on the level of sub populations and all evaluated individuals as well. Negative F IS value also suggests that there is an excess of heterozygotes in these loci (out breeding) compared with Hardy Weinberg Equilibrium (HWE) expectations and reflects the genetic variability of general population. The reason for heterozygotes might be inclusion of landraces in the present study and these genotypes might have heterozygous allele for the studied trait.
The model based approach by structure analysis revealed two subpopulations. The genetic structure of populations has been previously reported in rice to vary from 2 to 8 subpopulations [33,37,[41][42][43][44]. Courtois et al. [33] and Das et al. [42] have detected two and four subgroups in their study population of 425 and 91 accessions of rice landraces respectively. Threshold level to identify accessions belonging to a specific subpopulation varies among different scientific groups of 60-80% [33,45,46]. In the present study, 35 landrace accessions were categorized as admixtures with the stringent threshold of 80%. Based on criterion of maximum membership probabilities, 96 accessions were grouped into two populations of 46 and 50 genotypes each. This finding is in accordance with the previous studies where they obtained two distinct subgroups in their core collection of rice [33,47]. In the current rice collection panel, the use of trait linked SSR markers might also be the reason for the formation of less number of groups. Besides, grouping of the genotypes into two were noticed based on the traits of ESV observed on 14 th and 28 th DAS. Between the two sub-populations, second sub-population exhibited higher shoot length, root length, number of leaves, leaf length, dry weight and VI on both DAS than the sub-population-1. However, genotypes of admixtures were observed to have superior in early vigor trait than pure lines. Since, this study includes large set of landraces, which might have diverse ancestry still possessing heterozygosity for the trait studied in the present experiment.
The application of association mapping for ESV will be helpful in deciding the molecular markers to be used in stacking the multiple QTLs responsible for this trait. The significant SSR markers of GLM explained on an average of 5.0% and MLM demonstrated 5.2% of phenotypic variation. Maximum 10.3% and 8.4% phenotypic variation was explained by RM223 for leaf length on 28 DAS by GLM and MLM respectively. Similarly, Q-Q plot exhibits the significant association markers for leaf length on 28 DAS (Fig 8). A QTL for leaf length on 14 DAS detected in the present study on chromosome 1 (RM13) was observed in both GLM and MLM models.
SSR genomic region RM230 and RM125 mapped on chromosome 8 and 7 were significantly associated with QTL controlling leaf width variation on 14 DAS. Increasing leaf width by utilizing favorable allele at this locus would likely to be useful for increasing photosynthetic area of seedling and weed smothering ability. On the other hand, the same marker (RM230) allele was significantly associated with root length on 14 DAS. Previously, Zhang et al. [19] reported this marker on chromosome 8 as contributing to root length andRM3839 was also observed to be significantly associated with root length on 14 DAS by both GLM and MLM approach. QTLs qRL-1 and qRL-2 of root length were reported on chromosome 1 and 2 respectively from RIL population of Xiushui 79/C-Bao [48]. The root dry weight QTL was detected on chromosome 5 (by linked marker allele RM249) on 14 DAS coincided with QTL of root length on 28 DAS. For root dry weight on 28 DAS, significant association of RM250 was found by both GLM and MLM on chromosome 2. Further, a QTL qRDW10-1was identified for root dry weight on chromosome of 10 with LOD score of 5.6 from RIL population of Zhenshan 97/Ming hui63 [21]. Two significant QTLs (chromosome 1 and 3) were identified for shoot length on 14 DAS. Marker alleles RM13 on chromosome 1 and RM16 on chromosome 3 were positively associated with shoot length on 14 DAS. Moreover, previous report of Zhang et al. [19] supports the association of genomic region RM13 with QTLs for shoot length. Hence, selection of these alleles can be expected to result in genetic improvement for this trait to obtain vigorous growth in the early stage of seedling under direct seeded condition. Cairns et al. [28] observed a QTL qSL-2for shoot length on chromosome 2 from back cross population of Vandana/Moroberekan. In our study, allele RM224 located on chromosome 11 was significantly associated with shoot dry weight on 28 DAS and another tightly linked marker allele (RM341) showed a stronger association with the same trait and detected by both GLM and MLM on chromosome 2. However, Zhou et al. [27] and Cui et al. [21] observed QTL qFV-5-1 and qDW5 for shoot dry weight on chromosome 5 from Lemont/Teqing and IR64/Azucena mapping population respectively. Interestingly, RM341 coincided with QTLs for vigor index on chromosome 2 at 14 and 28 DAS and showed a stronger association detected by both GLM and MLM. Conversely Diwan et al. [23] observed QTL qVI on chromosome 5 with LOD score of 4.4. Therefore, marker-assisted selection in favor of RM341 could increase the shoot length and vigor index and might also contribute favorably to grain yield and biomass under direct seeded conditions.

Conclusion
Screening large number of accessions in the present study for early seedling vigor brings out valuable information. We conclude from this study that the vigor index should be studied on 14 th and 28 th DAS. The genotypes utilized for studying diversity pattern by employing traitlinked markers have categorized early vigor and non-early vigor genotypes. The results of distance based approach and principal coordinate analysis were in accordance with model based structure analysis. Further, the study has identified significantly associated markers and markers with pleiotropic effects indicating that association mapping served as an effective tool. The putative QTLs, identified in this study, need to be validated before they could be applied for marker assisted breeding in rice. Genotypes identified from this study possessing favorable alleles can be utilized for improving early seedling vigor after functionally characterizing them.
Supporting Information S1 Fig. (a-h). Distribution and differences for shoot length, number of leaves, root length, DW, VI, G. rate, AGR, CGR and RGR under direct seeded situation during 14, 28 and 56 DAS among 629 rice genotypes. Acronyms used: DW-shoot dry weight, VI-vigor index, G. rate-Germination rate, AGR-Absolute growth rate, CGR-Crop growth rate, RGR-Relative growth rate and DAS-days after sowing. (TIF) S1 Table. The list of the rice genotypes used in the study.