Identification of QTLs for high grain yield and component traits in new plant types of rice

A panel of 60 genotypes comprising New Plant Types (NPTs) along with indica, tropical and temperate japonica genotypes was phenotypically evaluated for four seasons in irrigated situation for grain yield per se and component traits. Twenty NPT genotypes were found promising with an average grain yield varying from 5.45 to 8.8 t/ha. A total of 85 SSR markers were used in the study to identify QTLs associated with grain yield per se and related traits. Sixty-six (77.65%) markers were found to be polymorphic. The PIC values varied from 0.516 to 0.92 with an average of 0.704. A moderate level of genetic diversity (0.39) was detected among genotypes. Variation to the tune of 8% within genotypes, 68% among the genotypes within the population and 24% among the populations were observed (AMOVA). This information may help in identification of potential parents for development of transgressive segregants with very high yield. The association analysis using GLM and MLM models led to the identification of 30 and 10 SSR markers associated with 70 and 16 QTLs, respectively. Thirty novel QTLs linked with 16 SSRs were identified to be associated with eleven traits, namely tiller number (qTL-6.1, qTL-11.1, qTL-4.1), panicle length (qPL-1.1, qPL-5.1, qPL-7.1, qPL-8.1), flag leaf length (qFLL-8.1, qFLL-9.1), flag leaf width (qFLW-6.2, qFLW-5.1, qFLW-8.1, qFLW-7.1), total no. of grains (qTG-2.2, qTG-a7.1), thousand-grain weight (qTGW-a1.1, qTGW-a9.2, qTGW-5.1, qTGW-8.1), fertile grains (qFG-7.1), seed length-breadth ratio (qSlb-3.1), plant height (qPHT-6.1, qPHT-9.1), days to 50% flowering (qFD-1.1) and grain yield per se (qYLD-5.1, qYLD-6.1a, qYLD-11.1).Some of the SSRs were co-localized with more than two traits. The highest co-localization was identified with RM5709 linked to nine traits, followed by RM297 with five traits. Similarly, RM5575, RM204, RM168, RM112, RM26499 and RM22899 were also recorded to be co-localized with more than one trait and could be rated as important for marker-assisted backcross breeding programs, for pyramiding of these QTLs for important yield traits, to produce new-generation rice for prospective increment in yield potentiality and breaking yield ceiling.

Introduction Rice (Oryza sativa L.) is a staple food crop sustaining more than 3.5 billion people in the globe. In current scenario, rice productivity is increasing at a rate of 1% per year which is less than the required rate of 2.4% per year to double the global production by 2050 [1]. Considering a glimpse of the history, a quantum jump in productivity was achieved due to the green revolution in mid-sixties, which drastically enhanced the rice production of the world. However, a ceiling of productivity potential is reported by and large in semi-dwarf inbred indica genotypes since release of IR-8 [2], in spite of substantial improvement in yield stability, per day productivity and grain quality [3]. A breakthrough in productivity barrier is necessitated because of increasing competition for natural resources such as, land, water and others given population explosion coupled with expanding industrialization, urbanization and diversion of agricultural land [1,4,5]. This is further aggravated with the abnormal change in weather and climate with significant influence on crop productivity and quality [6,7].
Rice scientists are facing many challenges for doubling rice production by 2050. Irrigated rice has a share of 75% of total rice production in the world, although it has a share of about 55% of the total rice area [2]. Therefore, improvements and modification of rice genotypes for this ecology are supposed to have significant impact on rice productivity in future. During the past decade, there has been a significant slowdown in the production potential of modern rice cultivars. In this context, physiologists and breeders hypothesised that this stagnation could be overcome by improving the plant type. The existing plant type bears several unproductive tillers in high tillering type and limited sink size i.e., small panicles. The excessive leaf area causes mutual shading, low light and a reduction in canopy photosynthesis [8,9]. Apart from that, there are several bottlenecks viz., spikelet sterility, short panicle length, limited grain numbers, lodging susceptibility, etc. Moreover, there is also the loss of genetic diversity in improved varieties for which breeders are facing difficulties in finding divergent gene pools. Modern high yielding rice varieties have been associated with some unfavourable traits/alleles, which may be sensitive to biotic and abiotic stresses and may be responsible for lowering grain yield [9].
In this context, IRRI scientists developed "New Plant Type" (NPT-2 nd generation) by recombining some suitable features of tropical japonicas with indica [9]. The main idea behind the 2 nd generation NPT was development of high yielding super rice varieties, which could be able to produce high yield lines endowed with stability. Some of the NPTs performed exceedingly well and produced even more than 10t ha -1 in the Philippines [2]. During the process of development, some of advanced generation segregating materials were shared with NRRI, India. The materials were subjected to further selection at NRRI under irrigated ecology, as appreciable variability was still available, with an objective of developing promising NPTs suitable for the climate specific to eastern region in particular and country in general. Trait specific selections were done for few generations to establish fixed lines, i.e., NPT selections (NPTs). In this context, NPTs were evaluated systematically under observational yield trial (OYT) for one season and the number was narrowed down subsequently. Advanced Yield Trial (AYT) followed it for four wet seasons at NRRI. Some of the highly promising NPTs were identified with high-quality agronomic traits like higher grain number per panicle, panicle length, panicle weight, grain size, ear bearing tiller number along with ideal plant height. Some of NPTs performed exceptionally well and showed the productivity of more than 10.0 t ha -1 during dry season 2011 [10].
With this backdrop, we wish to proceed for development of still higher yielding genotypes or super rice kind of crop ideotype utilizing the existing set of highly promising NPTs, which should have the productivity potential, at least 20% higher than the popular rice and check varieties. The target was utilization of one of the most promising gene pools through conventional as well as molecular approach. The focus is to accumulate the thousands of minor QTLs with additive genetic variance along with major ones. Here, the extent of genetic variation and relationships between genotypes are more important for designing effective breeding strategy [11].
The association mapping (AM) is an useful tools in identifying QTLs/genes associated with different traits in plant species. It utilizes natural variation [12], and is supposed to have great potential to evaluate and characterize a wide range of alleles. Several researchers have reported the utility of association analysis in the identification of QTLs for different traits in rice, viz., grain yield [13], grain yield under water deficit [14], grain quality traits [15], agronomic traits [16,17], grain yield under reproductive phase drought stress [18], panicle architecture and spikelet's/ panicle [19], plant height and grain yield [20].
However, there are meagre reports on QTLs association for grain yield and yield-related complex traits particularly on NPT. The genes/ QTLs related to high grain yield would be of great help in breaking yield ceiling. Moreover, it would be beneficial in identifying traits specific donors for designing effective breeding strategy for the development of super rice. The present study was undertaken to identify QTLs associated with grain yield and yield-related agronomic traits using diverse genotypes.

Plant materials
A panel comprising sixty rice genotypes, including 48 NPTs, six highly popular released indica varieties, three temperate japonica and three tropical japonica, most of them significantly diverse and distinct, were used for identification of QTLs for 11 yield-related traits through association studies (S1 Table). The NPTs were from 41 NPT populations collected from IRRI at the advance segregating stage. From those populations, conscious trait specific single plant selections (SPS) were made basically for yield-related traits for few generations and finallỹ 500 promising fixed SPS were identified and evaluated in OYT at ICAR-National Rice Research Institute (NRRI) (coordinates 20.4539˚N, 85.9349˚E). Subsequently, the number was drastically narrowed down to 48 strictly based on yield and important agronomic traits. Forty-eight best performing NPTs were evaluated under four environments along with 12 checks (6 indica, 3 tropical japonica and 3 temperate japonica) and these were further studied for molecular diversity and QTL association.

Phenotyping
All the 60 genotypes were grown in two replications, each genotype covering 5.04 m2 area (800 m2 total plot size), following Randomized Complete Block Design (RCBD) during wet seasons of 2011, 2012, 2013 and 2014 (S2 Table). The phenotypic data of yield per se and yieldrelated traits were recorded at different phenological stages. Normal management practices and plant protection measures were taken during crop growth. The genotypes were harvested at maturity, i.e., after 30 to 35 days of flowering. The post-harvest data were recorded after the crops were harvested, threshed and dried. This study primarily focused on 11 yield and yieldrelated traits, namely, days to 50% flowering (DFF), plant height (PH), tiller number (TL), panicle length (PL), flag leaf length (FLL), flag leaf width (FLW), no of fertile grains (FG), total no of spikelets per panicle (TG), 1000-grain weight (TGW), seed length-breadth ratio (SLBR) and grain yield t/ha (YLD). The yield per se was measured by weighing the plot yield (4m 2 each) at 13% moisture level and converted it to tons/ha. Other yield contributing traits were measured using standard procedure. The seed length-breadth ratio was measured using Anndarpan machine and software developed by CDAC, Govt. of India [21]. The phenotypic data were used for statistical analyses viz. SD, CV, ANOVA, correlation, regression, and principal component analysis (PCA), Bi-plots using XLSTAT software version 2019.1 (Addinsoft, Paris, France). The ClustVis, an online web tool (http://biit.cs.ut.ee/clustvis/) was used for analysis of phenotypic traits. The visualizing clustering of multivariate data of yield per se and yield-related traits were analyzed by Heat-map and PCA [22]. The ClustVis has been written in the Shiny web application framework by using R package version 0.10.2.1 for R statistics software [15,23].

Genotyping
The genomic DNA was isolated from 3-4 g of fresh leaf tissues of each rice genotype following Cetyl Trimethyl Ammonium Bromide (CTAB) method (Murray and Thompson, 1980). The extracted genomic DNA samples were dissolved in TE buffer (10 mM Tris-base, 1 mM EDTA, pH-8.0). The quality and quantity of DNA of each sample were measured by agarose gel electrophoresis and spectrophotometer. The SSR markers were selected on the basis of the previous report associated with different yield QTLs [24][25][26][27][28][29][30] and polymorphic contents (http:// www.gramene.org). The polymerase chain reaction (PCR) was performed in a 20μl reaction mixture containing 5 pM (pico-mole) of forward and reverse primers of each SSR locus, 200 mM of each dNTP, 0.5 U of Taq DNA polymerase, 10 mM Tris-HCl (pH = 8.3), 50 mM KCl and 1.5 mM MgCl 2 . The PCR amplification was carried out in a thermal cycler (Veriti 96, Applied Biosystems, USA) as per the following cycling parameters: initial denaturation at 94 0 C for 3 min, followed by 35 cycles of denaturation at 94 0 C for 1 min, annealing at 55−67 0 C (depending upon primer) for 1 min and extension at 72 0 C for 1.5 min and final extension at 72˚C for 5 min. The amplified products were separated on 2.5% -3% agarose gels using 1X TBE buffer and stained with ethidium bromide (0.5 μg/μl). The gels were observed under UV radiation and were photographed using a gel documentation system (G-Box, Syngene, USA) to detect amplified fragments. The size of amplified bands was determined based on the migration relative to molecular weight size markers (50 bp DNA ladder, MBI Fermentas, Lithuania).

Genetic diversity
The amplified bands were scored as present (1) or absent (0) for each genotype and micro-satellite marker combination. Each band was considered as an allele. The data were entered into a binary matrix as discrete variables and subsequently used for assessing allelic and molecular diversity such as number of total alleles (TA), unique alleles (UA), rare alleles (RA), expected alleles (Ne), polymorphism information content (PIC), gene diversity, homozygosity (Ho) and heterozygosity (He) by using Power-Marker Ver 3.25 [31]. The polymorphism information content (PIC) was calculated using the formula, "PICi ¼ 1 À P n i¼0 P 2 ij " where Pij is the frequency of j th allele for the i th marker and summation extend over k alleles [32].
The genotypic data of 66 polymorphic markers were utilized for genetic diversity analysis. Jaccard's similarity coefficient was calculated by using the NTSYS-PC software package [33,34]. Cluster analysis was performed using UPGMA and sequential agglomerative hierarchal nested (SAHN) module of NTSYS-PC. The Nei's pairwise genetic distance neighbourjoining [35] and Shannon's diversity index (I) was calculated using POPGENE v 1.32 (http:// www.ualberta.ca/fyeh) and MEGA 6 software. The Power-Marker was used for better visualization and understanding the clustering pattern of genotypes. The estimation of population differentiation among and within the genotypes was analyzed by Principal coordinates analysis (PCoA) and AMOVA by using software GeneAlEx 6 version 6.501 [36]. AMOVA was used to assess molecular variance within and between populations at 999 permutations.

Structure analysis
Bayesian model-based clustering analysis available in STRUCTURE software 2.3.4 was used for data analysis to obtain possible population structure [37,38]. The software provides the likelihood, classifies according to their population types, and assumes as K. The highest likelihood was interpreted by the corresponding estimate of the basic number of clusters [38]. Each genotype was burned 10,000 and 150,000 steps, followed by 100,000 and 150,000, respectively, using Monte Carlo Markov Chain replicates (MCMC). The K-value was run for 10 times with a K-value ranging from 1 to 10. The optimum K-value was determined by plotting the log posterior probability data to the given K-value. The ΔK value was estimated using the parameters described by Evanno et al. (2005) [39] using online software program Structure Harvester v6.0 (http://btismysore.in/strplot). In structure, the value of K is not constant because the distribution of L (K) does not show a clear cutoff point for the true K. An ad hoc measure of ΔK was used to detect the numbers of the subgroups. Some independent replicates, the admixture model and allele frequency model (length of burn-in + length of an MCMC repetitions x, number of independent replicates) were also calculated [13,38,40].

QTLs association
The GLM, MLM, Quantile-Quantile (Q-Q) plot and Manhattan plot were used for association analysis of 11 yield related traits, by incorporating Q+K matrices using TASSEL version 5.2.9 [41]. The p-values at <0.005 level of significance were used to determine the significant association of SSR markers. In GLM and MLM, association analysis was performed at 1000 permutations for the correction of multiple testing [42,43]. The False Discovery Rate (FDR) was calculated using SPSS statistical v20. (http://www-01.ibm.com/support/docview.wss?uid= swg21476447) at the 5% threshold level for multiple testing to standardise p-value [44]. The false-positive markers-traits association was controlled by applying models Q, K, and Q+K that were compared with each other using quantile-quantile (Q-Q) plot [45].

In-silico study
The in-silico study was carried out for analysis of previously reported QTLs and genes associated with respective traits using computer and web-based servers. For this study, several webservers were used i.e. http://www.gramene.org/, https://www.ncbi.nlm.nih.gov/ and https:// rapdb.dna.affrc.go.jp/ etc. This study helps in searching association between QTLs and genes in rice population.

Phenotypic variation
The grain yield in rice is considered to be one of the most important traits in crop improvement which is influenced by several complex traits. The present study focused on ten yieldrelated traits that directly or indirectly control the grain yield. The set of 60 genotypes was phenotypically evaluated for grain yield and associated traits under irrigated situation for four consecutive wet seasons. A wide range of phenotypic variations was observed in all the grain yield and 10 yield-related traits (Table 1).
Grain yield varied from 1.19 t/ha (Curinga, 2013) to 9.89 t/ha (N-129, 2014) with mean yield varying from 1.82 t/ha to 8.8 t/ha. Similarly, a wide range of phenotypic variation was observed for all the traits as could be observed from the value of range and mean as depicted in Table 1. Other statistical parameters viz., CV, P value and correlation with grain yield were also calculated to show their importance along the validity. Some genotypes produced appreciably higher grain yield in respective seasons. Among them, N-129 produced highest grain yield in all the four seasons (9.12 t/ha in 2011; 9.89 t/ha in 2012; 6.59 t/ha in 2013 (productivity affected due to Cyclonic Storm Phailin) and 9.60 t/ha in 2014). It was followed by R-261 (8.  Table). The average grain yield of four seasons of these 20 genotypes varied from 5.45 to 8.8 t/ ha, while popular varieties such as IR64 and MTU1010 produced the maximum yield of 4.80 to 4.99 t/ha (Fig 1A, 1B and 1C, S2 Table).  The CV%, correlation and linear regression analysis of all traits were calculated at 5% level of significance (Table 1, S3 Table). Eight traits viz. DFF, PH, PL, TL, FLW, FG, TG and TGW were positively correlated with grain yield (mean of four-season data) (S3 Table). Similarly, linear regression showed a positive association of six yield contributing traits (PL, DFF, TL, FG, FLL, FLW) with grain yield while four traits (PH, TG, TGW, and SLBR) showed a negative association with grain yield (S4 Table, Fig 2A). The standardized coefficient plots showed that PH, SLBR and TG, were negatively associated grain yield, whereas DFF, TL, PL, FLL, FLW and FG were positively associated with grain yield (S4 Table, Fig 2A). The standardized plot also showed that PH, SLBR, TG and TGW were not directly involved in controlling grain yield. Their significance level might be influenced by environmental factors. The standardized coefficient plot was shown with positive bars for genotypes, which showed grain yield of more than 6.0 t/ha. Similarly, a negative bar of traits having less grain yield is not associated with the traits (S4 Table, Fig 2A).
The PCA Biplot analysis was carried out for focusing on dominant phenotypic traits ( Fig  2B). The first PC1 explained 45.67%, while PC2 explained an additional 12.18% of the phenotypic variance. The analysis indicated that the traits viz., TGW, SLBR, TL, FLL, FLW, PL and YLD were predominant for the genotypes situated in the green circle in quadrant I (S5 Table). Similarly, the genotypes belonging to the pink circle were having a preponderance of traits viz., DFF, TG, FG and PH. However, these traits were dominant in quadrant IV (lower right side) ( Fig 2B).
The circle represents genotypes are close to each other and have many similarities between them. The genotype belonging to the green and pink circles produced approximately 6-10 t/ha grain yield. The Red and Blue circles were represented by genotypes having less grain yield, i.e. 3-4 t/ha and it was distinguished from corresponding right-side circles. The present study reported a broad range of grain yield in different years, ( Table 1, Fig 1). The current study identified sixteen NPT genotypes, which performed better than standard check variety IR64 and MTU1010 (Fig 1). The three best NPT genotypes were identified and the highest grain yield was recorded in N-129 i.e. 9.12 t/ha (WS 2011), 9.59 t/ha (WS 2012), 6.59 t/ha (WS 2013) and 9.6 t/ha (WS 2014), respectively. N-8 showed second highest grain yield in four seasons i.e. 6.63 t/ha (2011), 6.88 t/ha (2012), 6.34 t/ha (2013) and 8.86 t/ha (2014). Similarly, thirdhighest grain yield was reported for R-255 i.e., 7.30 t/ha (2011), 6.72 t/ha (2012), 5.85 t/ha (2013) and 8.72 t/ha (2014) in four consecutive years. Therefore, these genotypes could be used as a donor for yield-related specific traits.
Heat map helps to understand the specific diversity and dominance pattern between genotypes and traits. Heat map showed that dominant traits were grouped into two significant clusters of genotypes based on certain similarities. Among the traits, the 1 st cluster included FG, TG, YLD, TL, FLW, and SLBR, whereas the 2 nd cluster comprised TGW, PH, FLL, DFF and PL, which were found very important for specific genotypes (Fig 3). In map red colour traits, i.e., FG, TG, PH and DFF in their particular clusters were showed dominance for respective genotypes. The trait, thousand grain weight (TGW) was dominant in most of the NPT genotypes which belonged to 2 nd cluster (Fig 3).

Genetic diversity
The Nie's pairwise genetics analysis by Neighbour-joining tree grouped all the 60 genotypes into three clusters/populations (POP1, POP2 and admixture) (Fig 4A). The first cluster included four sub-clusters containing Indica, Trop. Japonica, Tem. Japonica and one NPT genotypes. However, first and second clusters included all of the NPT genotypes, while some of NPTs were found as an admixture in the second cluster ( Fig 4A). The NPT genotypes have been derived from indica, tropical and temperate japonica. Hence, two types of populations were observed along with admixture cluster. Both the populations in the trees were observed to be distinctly different ( Fig 4A, Table 3, S6 Table).  The Principal Coordinate Analysis (PCoA) differentiated all the 60 genotypes and separated NPTs from indica and japonica genotypes (Fig 4B). Japonica and indica genotypes were grouped separately and slightly different from each other. However, many NPT genotypes were grouped into the separate cluster as per neighbour-joining cluster and structure analysis. The PCoA percentage of molecular variance explained by three axes was found to be 12.43%, 7.93%, and 7.45%. In PCoA, the 4 th quadrant group showed that some NPTs having an admixture of indica and japonica populations were intermixing (Fig 4B).
The UPGMA cluster analysis grouped all the 60 genotypes into two major clusters at 54% genetic similarity. The first major cluster (I) was sub-grouped into four sub-clusters, i.e., A, B, C, and D with similarity index varying from 0.54 to 0.92. These sub-clusters contained all the six indica, three tropical japonica, and three temperate japonica and one NPT, respectively ( Fig 5, Table 3). Second major cluster (II) contained 47 NPTs with similarity index varied from 0.56 to 0.91. Further, it was sub-grouped into four sub-clusters i.e., E, F, G, and H containing 5, 37, 4 and 1 genotypes, respectively. The sub-cluster D and H contained only one genotype each, i.e., N-110 and N-129, respectively (Fig 5, Table 3).

Analysis of molecular variance (AMOVA)
The two populations obtained through STRUCTURE analysis were used to know the genetic variation between and within clusters using AMOVA. The analysis indicated that there was 8% variation within individuals (genotypes), 68% among individuals within a population, whereas 24% among populations (Fig 6, Table 4). The F-statistics on all three groups was found to be highly significant (P<0.001). The overall Fst (Fst = 0.240) had significant (P<0.001) genetic variation among the three populations (Fig 6, Table 4).

Population structure
The true value of K was identified according to the maximum value of LnP (D) (Pritchard et al., 2000). Structure harvester of Evano table (http://taylor0.biology.ucla.edu) analysis showed that at K = 2, the ΔK = 179.57, where value was the highest in both independent burns [39]. The ΔK values were decreased from K = 2 to 10 in general but had a moderate value of 56.29 at K = 4. At K = 4, all the 60 genotypes were divided into four subpopulations, POP1, POP2, POP3, and POP4, which contained 6 indica, 3 temperate japonica, 3 tropical japonica, and 48 NPT genotypes, respectively (Fig 7). The populations POP2, POP3, and POP4 showed  admixture types. Both Pritchard's and Evonne's methods confirmed the K-value as 2. Furthermore, analysis of POP gene software showed that 60 genotypes were grouped into two major populations, followed by one admixture group (Fig 7, S1 Fig). The STRUCTURE analysis grouped the genotypes into two types of populations at K = 2, while at K = 3 and K = 4, 60 genotypes were grouped into three and four types of populations, respectively.
The phenotypic variance contributed by QTLs/SSRs were found to be 13.51% (RM20285) to 26 Fig 8). More than 25% phenotypic variance was explained      Twenty SSR markers were found to be significantly associated with more than one trait (Table 6). RM5709 was found to be associated with nine traits while RM275 was found to be associated with five traits. Similarly, RM5575, RM204 and RM1133 were found to be associated with four traits each, while RM154, RM168, RM20285, RM5711, RM447, RM22899 were found to be associated with three traits each. Nine SSR markers were found to be associated with two traits (Table 6).
Nineteen SSR markers were found to be associated with different traits in more than one season. Four SSRs i.e., RM5709, RM5575, RM20285 and RM234 were found to be associated with PL, PH, SLBR and YLD and common for 4 seasons. Among them, RM5709 is co-localized   (Tables 5 and 6). The association of traits with markers could be confirmed in the 2D plot and Q-Q Plot (Fig  8, S2 Fig, Table 5, Table 6). The QQ-plot showed a similar distribution of marker-trait association for 11 traits (S2 Fig). The GLM Manhattan plot shows 26 SSR markers associated with grain yield at p-value at 0.05 (S3 Fig). However, seven SSRs were associated with grain yieldrelated traits in MLM Manhattan plot at p-value 0.005 (S4 Fig). The lowest p-value 8.73E-04 was found with marker RM5709 for plant height trait, followed by 0.00158 (Thousand-grain weight) with RM263 and 0.00266 (Flag leaf width) with RM5709 (Table 5).

In-silico study for marker co-localization
The present study has used the computer and web-based servers' big data to confirm the association of co-localizing genes and QTLs linked with yield-related traits of rice. Twenty SSRs were identified that co-localized with grain yield and related traits ( Table 6). Using in-silico approach, it was found that 10 out of 20 co-localized SSR markers were in agreement with previous reports. These 10 co-localized SSRs viz., RM154, RM5709, RM5575, RM20285, RM204, RM3827, RM5711, RM447, RM22899 and RM171 were found to be very important because of their association with grain yield-related traits. RM5709 found to be highest co-localized on 4 th chromosome (associated with 9 yield-related traits viz., DFF, PH, PL, FLL, FLW, FG, TG, YLD and SLBR), followed by RM297 (associated with PL, FLL, FLW, TGW and YLD) and RM5575 (associated with PL, FLW, TGW and YLD) ( Table 6).

Phenotypic variance
Improving rice yield potential is one of the primary breeding objectives in many countries for several decades [5]. In 1960s and 1980s, the green revolution was initiated with the development of semi-dwarf High Yielding Varieties (HYVs) like IR 8 and IR 36 [2,9,[54][55][56][57]. The main objective of the green revolution was to fulfil and achieve self-sufficiency in food requirement, which helped the developing countries around the world especially in South Asia. It was realized in rice due to development of semi-dwarf, lodging resistant and fertilizer responsive high yielding varieties. It led to stability in rice production and mitigating the hunger of growing population. It was accomplished in mid-sixties with the development of miracle variety IR8. Since then, a stagnant in yield potential of semi-dwarf indica inbred varieties was noticed in indica inbreds, which needs to be cracked [54].
New Plant Types (NPTs) was a potential approach for breaking the yield ceiling. The initial effort on NPT was made by IRRI scientists to develop 2 nd generation NPT genotypes accumulating favorable alleles from tropical japonicas and popular indica for yield-related traits with multi-environment testing [5,54]. The main idea behind NPT development was to develop a plant type endowed with combination of unique traits that would help for efficient photosynthesis and biomass partitioning for very high grain yield in irrigated ecology. In this process, favorable tropical japonica genes were accumulated in indica background in second-generation NPT lines. IRRI scientists identified highly potential genotypes, i.e., IR 72967-12-2-3 which reportedly produced 10.16 ton/ha [9]. Our main target area for breaking yield ceiling was in eastern zone of India, which has many climatic constraints particularly low light due cloudy weather in wet season. In current study, the advance generation 2nd generation NPTs were phenotypically screened for high grain yield and associated traits in four seasons at NRRI, Cuttack, India. Phenotyping is the most crucial step for crop improvement. Identification of suitable transgressive segregants for the specific quantitative trait in any crops is a challenging task for the breeders. At the outset, a set of such elite materials of NPTs was chosen (advance generation segregating materials) as initial materials. Trait specific selection and evaluation of these materials subsequently led to identification 48 NPTs with variable grain yield, which were subjected to multi-environment testing. In this study, potential 20 NPTs were identified with an average yield in the range of 5.45 to 8.8 tons/ha. These genotypes could be utilised directly or as prospective parents based on yield per-se.
PCA Bi-plot analysis showed association with major yield-related traits (Fig 2B, S5 Table). The PC1 and PC 2 explained 45.67% and 12.18% of the total variance, respectively. Similar variability were also reported for PC1 and PC2 viz. 35.2% and 14.4%, respectively [58]. The distribution pattern of traits clearly differentiated the genotypes and relative importance of traits, which influenced the grain yield (YLD). The positive relation was observed among genotypes in the first quadrant, which showed the importance of traits viz., PL, FLL, FLW, TL, YLD, SLBR, and TGW particularly in NPTs. The genotypes were associated with respective traits in a 1 st quadrant, which could be responsible for better average grain yield. The first quadrant consisted of important traits and the genotypes endowed with those traits predominantly, hence could be selected as donor parents for specific traits in NPTs. Similarly, traits viz., TG, FG, DFF, and PH were predominant for the genotypes in quadrant IV (S5 Table), and it could be selected as a donor based on a number of fertile spikelets and effective tiller number.
The present study reported that dominant phenotypic traits such as PL, TL, FLL, FG and TG, had a positive correlation with yield. However, the more focused selection should be done for those traits (PH, FLW, TGW and SLBR) that are showing weak correlation with grain yield, because of environmental effects (Table 1, S3 Table). The best genotypes were assigned for grain yield based on phenotypes which are N-129, N-8, and R-255 (Fig 1A and 1B, S2 Table, S3 Table). The dominant specific traits and genotypes for high grain yield could be selected for designing effective breeding strategy. This would be helpful to the breeder for the proper choice of a parent/donors for bi-parental/multi-parental mating vis-à-vis during the process of selection in segregating generations. Therefore, present study reports phenotyping followed by different statistical analysis which suggests trait-specific genotypes for prospective parents in the hybridization programs for breeding super rice [22,58] The Heatmap shows a data matrix in the form of the colour pattern due to the numeric differences in multivariate data. In ClustVis, hierarchical clustering can be optionally applied on specific traits those were linked with particular genotypes and observations [22]. The Heatmap analysis showed the order of colour merging with the specific traits that are playing a vital role in the association for targeted trait. The different colouring patterns were linked with respective traits starting from deep green to red (Fig 3). Apart from green, the white and light colour traits indicated relatively poor association with yield-related traits viz., PH, DFF and SLBR traits. As the colour moves along the colour chart from green to red, the association with yield improves with grain yield that means the red colour especially was strongly associated. In this context, the order of association based on the colour intensity starts with FG, followed by TG, TGW, FLL, TL, PL and FLW. The respective genotypes have been depicted with their strong associated traits colours. The dense colour gives ideas for a strong character. So that breeder can easily choose donor parents that are actively associated with specific traits [22] (Fig 3).

Allelic and genetic diversity
The utility of SSR markers for population structure, diversity and association mapping depends on the quality of information they provide. The allelic and genetic diversity helps the breeder to understand genetic constitution of germplasm makeup and target donor selection for designing effective future breeding strategy. The 66 out of 85 SSR markers (77.64%) showed polymorphism, which amplified 154 alleles. Similarly, Anandan and his team reported the 39 polymorphic SSRs which amlified 128 alleles [59]. The average PIC value in this study was found to be 0.70, which was similar to previous reports [22,58,[60][61][62][63]. The lower rate of the average PIC was reported in association studies by several workers {0. 31 [64]; 0.47 [65,66]}. The PIC value showed a positive correlation with the total number of alleles (S6 Table, S7  Table). Similar findings were reported by previous researchers [42,60].

Population structure
The population structure analysis helps to understand and differentiate the types of population groups existing in a set of genotypes. The population structure based on Bayesian clustering model [15,66,67] has been most frequently used to correct spurious associations. The delta K value was measured by ad-hoc and based on the relative rate of change in likelihood LnP (D). The Delta k = 2 was set to get a higher likelihood optimal number of LnP (D) among groups.
The 60 genotypes were differentiated into four sub-populations. Similar sample sizes were used by several researchers in association analysis in rice [68,69] and alfalfa [70]. The UPGMA cluster analysis grouped 60 genotypes into two major groups at 54% of genetic similarity. The Nei's pairwise genetic distance showed three types of populations, i.e. POP1, POP2, and one admixture population. Similarly, at K = 2, STRUCTURE analysis could differentiate entire populations into two subpopulations (Fig 7). Genotypes in these populations along with high mean values could be utilized as potential parents for transgressive segregants with high yield and yield attributing traits towards breaking yield ceiling The mining through the Power Marker into the details of individual groups revealed that the first population (POP1) contained hardcore NPTs, which was distinctly different from indica (Ind), temperate japonica (Temp.) and tropical japonica (Trop.) (2 nd population, POP2, which also includes few NPTs along with Ind, Temp. and Trop.). However, all NPTs contain the genomes of indica as well as temperate and tropical japonicas. Moreover, the population has one admixture group, which lies in between the two classes comprising the characters of both populations. Therefore, a targeted hybridization between consciously selected parents of these two distant groups might result in transgressive segregants with super rice traits for future yield enhancement. At K = 4, the population was clustered into four groups viz., Ind (1st), Trop (2nd), Temp (3rd) and NPTs (4th) according to their genotypic and evolutionary significance. However, this study suggested that popular varieties clustered together according to their ecology, morphology and inter-varietal hybrid fertility of rice varieties in indica and japonica [58,71]. Here, almost all the NPTs were grouped separately, except one i.e., N-129. Moreover, the genotypes in the 4 th group comprised the genomes from indica, and japonica and supposed to have a relationship with the first three clusters. The population cluster 1, 2 and 3 were purer and divergent, but in the 4 th cluster, genotypes were intermediates of first three clusters. This could help breeders in devising necessary breeding strategy and choosing parents for yield improvement.

QTL association
QTL association has been widely used for the identification and mapping of QTLs for various traits such as tolerance to biotic and abiotic stresses, quality and grain yield in different crops [11,12,18,30,48,72]. This study also targets findings new QTLs, alleles, and genes [73] and validate the previously reported QTLs. The present association study was conducted on 60 diverse genotypes panel and 85 SSR markers The present association study focused on identification QTLs associated with yield and related traits in relatively small population and with limited markers [74]. Therefore, our study is analogous to previous reports with a small, focused group of genotypes and limited marker pairs combination [12,13,59,66,68,[74][75][76].
In association studies, both GLM and MLM models are used. Population stratification and cryptic relationships are two common reasons for the inflation of false-positive association [38]. GLM model has more false positive association as compared to MLM model analysis [19,41,70,72,77]. It does not consider to influence the population structure and kinship [70,78]. MLM model has higher accuracy and a smaller number of spurious marker-trait association with genotypes as compared to GLM model. This model is having a powerful algorithm, which systematically increases power, improves calibration and reduce computational cost to structured populations generally used for SNPs in GWAS [45,72]. The MLM model integrates structure and kinship matrix (Q+K) which supposedly corrects the false-positive error to the tune of 62.5%. Hence, MLM model has been popularly used in several cases for marker-traits association [12,18,43,45,71,[79][80][81][82][83].
In association mapping, mixed model (Q+K) showed a significant improvement in goodness of fit and reducing spurious associations. The K and Q matrix corrected the association between eleven phenotypic traits with markers [43,70] at permutation value is 1000 at p<0.005 for GLM and MLM for the level of significance. In association mapping, p-value plays an essential role because it controls over the level of false-positive association between traits and markers. It means that if the p-value is minimized, there is less chance of a false positive association of markers with respective traits and vice-versa [45]. The value of p is in agreement with previous reports [70][71][72]84]. However, some researchers reported their results at p<0.05-0.01 value, which is much higher compared to our study, where the number of markers is appreciably high [43,70].
Twenty SSRs were having association with more than one traits and have been reported in the gramene-database (http://www.gramene.org/) by earlier studies (Table 6) [27,28,48,87]. These were significantly associated with yield controlling complex traits viz., PH, DFF, SLBR, TL, PL, FLL, FLW, FG, TG, TGW and YLD and supposed to have played a significant role in yield enhancement (Table 5, Table 6, Fig 8). Similar reports by Zhang and team (2014) revealed the pleiotropic effect of gene LSCHL-4, in influencing increment of leaf chlorophyll, enlargement of flag leaf size, higher panicle branch and grains per panicles [30]. The previous report suggests that specific marker association with more than one trait might be either due to the linkage of genes or pleiotropic effects of a single locus [88][89][90]. However, variation in population structure, QTL detection methods and environmental conditions restrict our choices to compare the newly identified QTLs with the already reported QTLs. Therefore, we do have a need for further study on these potential trait-specific genotypes. It would lead to design effective breeding strategy for introgression of high grain yielding traits associated QTLs into popular rice varieties for obtaining super rice targeting to overcome yield ceiling.
The in-silico study of co-localization of SSR markers with respective traits would be helpful to the breeders to confirm the association of trait-specific SSRs. The present study has reported 10 SSR markers, which are associated with grain yield-related traits and found with gene IDs ( Table 6). Most of the SSRs were co-localized with more than two traits. The highest co-localization was identified in RM5709 linked with nine traits followed by RM297 co-localized with five traits. Similarly, RM5575, RM204 and RM22899 were also recorded to be co-localized with more than one trait and could be rated as important. This marker could be useful in marker-assisted backcross breeding program to produce next-generation super rice.

Conclusion
Breaking yield ceiling in rice warrants conscious selection of parents. Sufficient variability was available in NPT genotypes, because of the genetic distance of parents using tropical japonicas and indicas, leading to fixation of distinct lines. In the present study, microsatellite markers were used for association studies for grain yield and ten yield-related traits. Few highly-potential genotypes with high yield along with variation in morpho-physiological traits were identified after conducting the trial in four consecutive years. Wide variations were found in all the traits, which would be helpful for the identification of genotypes required for bi-parental/multi-parental crosses in developing super rice genotypes with higher grain yield. The STRUCTURE and tree diagrams were helpful in the classification of populations into distinct clusters vis-à-vis uniqueness among them and helped in the identification of a diverse gene pool for necessary parental selection for targeted transgressive segregants. The in-silico study reported twenty SSR markers; those were associated with more than one trait. Nineteen, SSR markers were found to be associated with different traits in more than one season. Four SSRs such as, RM5709, RM5575, RM20285 and RM234 were found to be associated with PL, PH, SLBR and YLD and common for four seasons. More than 25% phenotypic variance was explained by QTLs bracketing RM5709 for each of nine traits, DFF, PH, PL, FLL, FLW, FG, TG, SLBR and YLD. RM5709 would be useful for transfer above nine traits into popular rice varieties. The present study reported that 16 SSRs were linked with 11 yield traits and were found to be associated with 30 novel QTLs. Some of the QTLs are very important viz., TL (qTL-6.1), PL (qPL-1.1, qPL-5.1, qPL-7.1, qPL-8.1), FLL (qFLL-9.1), FLW (qFLW-5.1, qFLW-8.1), TGW (qTG-2.2, qTGW-5.1, qTGW-8.1), SLBR (qSlb-3) and YLD (qYLD-5.1, qYLD-6.1a, qYLD-7.1, qYLD-11.1) because of their association with important yield contributing traits or yield per se for multiple seasons. Hence, these could be pyramided in elite background for realization of higher yield and breaking yield ceiling. The study would be immensely helpful for selecting target donors with requisite traits for designing effective future breeding strategy for super rice.