Resequencing and variation identification of whole genome of the japonica rice variety "Longdao24" with high yield

Japonica rice mainly distributes in north of China, which accounts for more than half of the total japonica rice cultivated area of China. High yield, good grain quality and early heading date were the main breeding traits and commercial property in this region. We performed re-sequencing and genome wide variation analysis of one typical northern japonica rice variety Longdao24 and its parents (Longdao5 and Jigeng83) using the Illumina sequencing technology. 53.17 G clean bases were generated and more than 96.8% of the reads were mapped to the genomic reference sequence. An overall average effective depth of 43.67 × coverage was achieved. We identified 420,475 SNPs, 95,624 InDels, and 14,112 SVs in Longdao24 genome with the genomic sequence of the japonica cultivar Nipponbare as reference. We identified 361,117 SNPs and 81,488 InDels between Longdao24 genome and Longdao5 genome. We also detected 428,908 SNPs and 97,209 InDels between Longdao24 genome and Jigeng83 genome. Twenty-two yield related genes, twenty-two grain quality related genes and thirty-nine heading date genes were analyzed in Longdao24. The alleles of Gn1a, EP3, SCM2, Wx, ALK, OsLF and Hd17 came from the female parent Longdao5. The other alleles of qGW8, SSIVa, SBE3, SSIIIb, SSIIc, DTH2, Ehd3 and OsMADS56 came from the male parent Jigeng83. These results will help us to research the genetics basis of yield, grain quality and early heading date in northern rice of China.


Introduction
Given continuing population growth and increasing competition for arable land between food and energy crops, the next century may witness serious global food shortage problems. Thus, there is a need for an increase in rice yield, because rice is the world's most important staple food crop. To substantially increase rice yield, Japan initiated a super high-yielding rice breeding program in 1981 that was targeted to raise rice yield 50% within 15 years. However, this target has not yet been realized [1,2]. Next, the International Rice Research Institute (IRRI) launched a new plant-type breeding program to develop super rice in 1989, with the target of developing a super rice. However, this target was not reached either [1][2][3]. In order to meet the PLOS  food demand required by the Chinese people in the 21st century, a program to breed super rice through combining morphological improvement and the utilization of inter-subspecific heterosis was set up by the Ministry of Agriculture and the Ministry of Science and Technology in 1996 and 1997, respectively [1,2,4]. Today, nearly 80 super rice varieties have been released and some of them show high grain yields of 12-21 t/hm 2 in field experiments [4]. The core of super rice breeding is an effective use of germ plasma resources and favorable genes. For example, most of the northern japonica super rice varieties have DEP1 gene, which controlled large erect panicle [5,6]. Although a large progress about the genetic basis for yield, grain quality and heading date of super rice had achieved [5][6][7], but the molecular mechanism is still poorly understood.
To meet the global food demands, achievements in molecular genetics of complex traits in rice will be important for increasing yield in the post-genomics era [8]. The draft genomic sequences of two rice subspecies, japonica (Nipponbare) and indica (93-11), were released [9,10]; and later, the final genome sequence of Nipponbare was completed by International Rice Genome Sequencing Project [11]. These achievements provided a high quality reference genome for re-sequencing rice genomes using high-throughput sequencing technologies [12]. The widely used high-throughput sequencing technologies are also known as next-generation sequencing (NGS) technology. The NGS technology has been widely used in rice genomics and molecular breeding studies [13]. These characteristics have enabled researchers to perform accurate genetic polymorphism analysis and discovered important cloned genes of rice variety [14]. Several super rice or its parents including Liangyoupei 9 [15], Shanyou 63 [16] and Shennong 265 [17] has been re-sequenced. However the Heilongjiang super rice varieties such as Longdao 5, Songgeng 9, Longgeng 31 and so on still not be researched. In this study, we resequenced a new bred elite japonica rice variety, Longdao24, with high quality, super high yield and early heading date in Heilongjiang province. Our results provided several candidates account for the super high yield, good quality and early heading date of rice in Heilongjaing province. The results therefore lay the groundwork for long-term efforts to uncover important genes and for future variety improvement.

Plant materials
Longdao24 (Fig 1) was selected from the cross between two super japonica rice Longdao5 and Jigeng83. Longdao5 is a leading cultivar widely cultivated in the south and east part of Heilongjiang province and has high resistance to disease and super high yield. Jigeng83 is a leading cultivar widely cultivated in the middle part of Jilin province and has super high yield.

Mapping of reads to the reference
The clean sequencing reads were aligned to the temperate japonica Nipponbare reference genome-the unified-build release Os-Nipponbare-Reference-IRGSP-1.0 [20], using the BWA software under the default parameters with a small modification. The alignment results were then merged and indexed as BAM files [21]. Average sequencing depth and coverage were calculated using the alignment results. The mapped reads were then used to detect SNPs, InDels and SVs polymorphisms.

Detection of SNPs, InDels and SVs polymorphisms
Firstly, we used GATK tools software [22] to detect SNPs and InDels. Secondly, SVs were detected using BreakdancerMax.pl software [23] with its default parameters. In our result, the types of SVs include insertion (INS), deletion (DEL), inter chromosomal translocation (CTX), deletion including insertion (IDE), intra chromosomal translocation (ITX) and inversion (INV). And in our analysis, InDels were defined as the insertion or deletion the length of which was from 1 to 10 bp.

Annotation of SNPs, InDels and SVs
The annotation of SNPs and Indels were performed by SnpEff software [24]. The localization of SNPs, InDels and SVs were based on the annotation of reference genome databases. The polymorphisms in the gene region and other genome regions were annotated as genic and intergenic. The SNPs, InDels and SVs were classified according to their localization. SNPs in the CDS were further separated into synonymous and non-synonymous. The functional annotation data were achieved by blast each gene including non-synonymous SNPs or SVs to NR [25], SwissProt [25], GO [26], COG [27] and KEGG [28] database.

Frequency distributions of different variation
We then examined genome-wide variations. 420,475 SNPs, 95,624 InDels, and 14,112 SVs were yielded from the Longdao24 genome. The frequency distributions of SNPs and InDels on Long-dao24 genome were showed in Fig 2 and  Characteristics of SNPs, InDels and SVs 22,250, 65,858, 137,848 and 107,288 SNPs were found in intergenic region, intron region, upstream region and downstream region respectively. 9,054 SNPs were located in the 5'UTR  Fig 5A). 5,623, 17,609, 34,717 and 27,132 InDels were found in intergenic region, intron region, upstream region and downstream region respectively. 4,465 InDels were located in the 5'UTR and 3'UTR region. 426 InDels were detected near the splice site. A total of 5,602 InDels were located in the CDS regions, among which 3,401 caused frame shift and 2,092 caused codon change ( Fig 5B).  Variation annotation and gene categories 16,831 variations including 10,615 non-synonymous SNP mutations, 3,727 frame shifts and 2,489 SVs caused by indels, transversions, and transitions in 12,772 genes might influence the expression of the relevant protein. All tested and functional annotated genes were classified into gene ontology (GO) categories (Fig 6), which mainly divided the genes into three categories: cellular components (20 groups), molecular function (17 groups), and biological process (17 groups). Classification of gene variations compared with COG database by blast showed that most variations (1,935) were identified in replication, recombination and repair progress. Compared with the reference genome, most variations was screened in genetic information processing and metabolism.

Variation analysis on high yield genes
We investigated 22 cloned yield related genes [7] which might explain the high yield of Long-dao24. A large number of SNPs and Indels (Table 2) were only detected in the CDS sequence of 4 genes related to high yield among Longdao24, Longdao5 and Jigeng83, such as Gn1a (related to spikelet number), EP3 (related to erect panicle and spikelet number), SCM2 (related to lodging resistance and spikelet number) and qGW8 (related to grain size and grain quality).
Four non-synonymous SNPs and a codon insertion InDel were identified in Gn1a locus. Two non-synonymous SNPs and a codon change + deletion InDel were identified in SCM2 locus. Three non-synonymous SNPs and a codon deletion InDel were identified in qGW8 locus. One frame shift InDel were identified in EP3 locus. The allele of Gn1a, DEP3 and SCM2 were come from the female parent Longdao5. The allele of qGW8 were come from the male parent Jigeng83.

Variation analysis on starch quality genes
We investigated 22 cloned grain quality related genes [29] in Longdao24 (Table 3). A total of fourteen SNPs and one Indels were detected in the CDS sequence of seven grain quality genes among Longdao24, Longdao5 and Jigeng83. Eleven SNPs and one Indels were detected in GBSSII (granule-bound starch synthase II), SSIIIb (Soluble starch synthase IIIb) and SSIIc (Soluble starch synthase IIc) locus between Longdao24 and Nipponbare. The GBSSII in Longdao24 had only one SNP. The SSIIIb and SSIIc of Longdao24 had six SNPs plus one Indel and four SNPs, respectively. The other three SNPs were detected in SSIVa and SBE3 locus among Long-dao24, Longdao5 and Jigeng83. The allele of GBSSII locus was same among Longdao24, Long-dao5 and Jigeng83, but different from Nipponbare. The other four allele of SSIVa, SBE3, SSIIIb and SSIIc were come from the male parent Jigeng83.  Variation analysis on heading date genes Eight (DTH2, OsLF, Hd17, Hd3a, Hd1, Hd2, Ehd3 and OsMADS56) from 39 heading date genes had 24 variations (19 SNPs and 5 Indels) among Longdao24, Longdao5 and Jigeng83 ( Table 4). The only one non-synonymous SNP in DTH2 allele from Longdao24 was same as Nipponbare and Jigeng83, but different from Longdao5. Hd1 allele in Longdao24 had 6 nonsynonymous SNPs, one frame shift Indel and three long base pare insertion (36 bp, 69 bp and 51 bp). Hd2, Ehd3 and OsMADS56 in Longdao24 all be identified with 3 non-synonymous SNPs. OsLF in Longdao24 all had one non-synonymous SNP and one Indel. Only one nonsynonymous SNP was detected in Hd3a and Hd17. The SNPs and Indels in Hd3a, Hd2 and Hd1 locus were detected between Nipponbare with the other three varieties. The allele of OsLF were come from the female parent Longdao5. The other fourth allele of DTH2, Hd17, Ehd3 and OsMADS56 were come from the male parent Jigeng83.

Development DNA markers for providing evidence of polymorphisms
Although the accuracy of deep sequencing has now been greatly improved, it still needs to be confirmed by other methods for the variations of interest. Therefore, we selected one InDel and one SNP polymorphisms to developed DNA markers (Table 5). We designed the InDel marker according the 9 bp deletion in SCM2 locus. A complete matching genotypes was observed in Longdao24 and its parents. Longdao24 and Longdao5 was found having this 9-bp deletion compared to Jigeng83 (Fig 7A). We also developed a CAPS marker to provide the SNP (2235191) in Hd17 locus. A complete matching genotypes was also observed in Long-dao24 and its parents (Fig 7B).

Discussion
Breeding value of the re-sequencing data of super high yield rice variety Marker-assisted selection (MAS) is expected to provide higher efficiency, reduced cost and shorter duration of the breeding scheme, compared with conventional methods. But most important complex traits in rice, including yield, quality and stress tolerance, are controlled by quantitative trait loci (QTLs). Isolating and characterizing genes involved in QTLs has been the key step for MAS. Whole genome re-sequencing technology has provided the possibility of genome wide variation analysis in rice breeding varieties. Longdao24 had been used widely as breeding parent, because of its high yield, good quality, early heading date and strong stress tolerance.

The genetic basic of high yield in Longdao24
Developments of whole genome re-sequencing technology have provided new tools for discovering and tagging novel useful alleles for improving target traits and for manipulating those genes in rice breeding program through MAS. We identified four yield related genes (Gn1a, EP3, SCM2 and qGW8) having genetic variations in Longdao24. The first one is Gn1a, which controlled cytokinin accumulation in inflorescence meristems and increases the number of reproductive organs, resulting in enhanced grain yield [30]. Comparison of the DNA sequences between the cultivars revealed several nucleotide changes, including a 6-bp insertion in the first exon, and four nucleotide changes resulting in amino acid variation in the first and fourth exons of the Longdao24 and Longdao5 allele (Fig 8A). The Gn1a allele from Longdao24 and Longdao5 was similar to Habataki, which has been reported having the increasing Gn1a allele. The second one is EP3, which controlled panicle architecture and enhance the grain yield in rice [31]. Comparison of the DNA sequences between the cultivars revealed a 2-bp deletion in the first exon of the Longdao24 and Longdao5 allele (Fig 8B). The EP3 allele from Longdao24 and Longdao5 was a novel allele. The third one is SCM2, which showed enhanced culm strength and increased spikelet number [32]. Comparison of the DNA sequences between the cultivars revealed several nucleotide changes, including a 9-bp deletion in the second exon, and two nucleotide changes resulting in amino acid variation in the first exons of the Longdao24 and Longdao5 allele (Fig 8C). The SCM2 allele from Longdao24 and Longdao5 was same as Habataki, which has been reported having the increasing SCM2 allele. The last one is qGW8, which showed enhanced culm strength and increased spikelet number [33]. Comparison of the DNA sequences between the cultivars revealed several nucleotide changes, including a 10-bp insertion in the first exon, a 3-bp deletion in the first exon, and three nucleotide changes resulting in amino acid variation in the first and third exons of the Longdao24 and Jigeng83 allele (Fig 8D). The qGW8 allele from Longdao24 and Longdao5 was same as HJX74, which has been reported having the increasing yield haplotype. So, the super high yield formed by the large half erect panicle and lodging resistance of Longdao24 would be caused by these four increasing alleles of high yield genes including Gn1a, EP3, SCM2 and qGW8.  Hd3a, Hd1, Hd2 and DTH2) having genetic variations in Longdao24.

The genetic basic of early heading in Longdao24
Heading date is a major determinant of rice adaptability to regional and environmental conditions. It has been a major selecting target in rice breeding programs. To elucidate the key heading date gene in Longdao24 for adaptation to the northern limit of rice cultivation Heilongjiang Province (43˚26'-53˚33'), we analyzed 39 cloned heading date related genes. DTH2, OsLF, Hd17, Hd3a, Hd1, Hd2, Ehd3 and OsMADS56 had variations among Nipponbare, Longdao24, Long-dao5 and Jigeng83. The alleles of Hd3a, Hd1 and Hd2 in Longdao24, Longdao5 and Jigeng83 were different from that in Nipponbare. The might be the major reason for these three varieties adaptation to the northern limit of rice cultivation. Hd1 and Hd3a are the two key genes controlled heading date in rice [34]. Comparison of the DNA sequences of Hd3a between the cultivars revealed only one nucleotide changes resulting in amino acid variation in the fourth exon of Longdao24, Longdao5 and Jigeng83 allele (Fig 9A). This Hd3a allele was a novel allele, which was not be reported before. The only one SNP caused an amino acid change at the carboxyl end of the predicted protein: the Asn (AAG) in Nipponbare was changed to Lys (AAC). The Hd1 allele in Longdao24, Longdao5 and Jigeng83 was also a novel allele. Four InDels and six SNPs was identified by comparison of the DNA sequences of Hd1 between the cultivars. Three long insertion and four nucleotide changes resulting in amino acid variation were identified in the first exon; and a 2-bp insertion and two nucleotide changes resulting in amino acid variation were identified in the second exon ( Fig 9B). The Hd2 allele in Longdao24, Longdao5 and Jigeng83 all had the three nucleotide changes resulting in amino acid variation in the first, fourth and eighth exons (Fig 9C). These variations were found in many European and Asian rice cultivars, which flower extremely early under natural long-day conditions [35]. The DTH2 allele in Longdao24 was same as Jigeng83 ( Fig 9D). This SNP in the third exon was reported as a key functional differences in the rice landraces from northern limit of rice cultivation in the world [36]. We also identified the allele of Hd17, Ehd3 and OsMADS56 in Longdao24 was came from Jigeng83 (Table 4). But only allele of OsLF was detected from Longdao5 (Table 4). All these information showed that the Hd3a, Hd1 and Hd2 might be the major reason for the varieties adaptation to the northern rice cultivation region. The DTH2, OsLF, Hd17, Ehd3 and OsMADS56 might formed different varieties adaptation to different region in northern rice cultivation location.

The genetic basic of grain quality in Longdao24
Starch is the major component of rice grain, which is mainly composed of amylose and amylopectin. The starch synthesis pathway is an ideal system for examining the evolution of biochemical pathways. Over 20 genes involved in the rice starch synthesis pathway have been identified so far [37]. We investigated 22 cloned starch synthesis related genes in Longdao24 (Table 3). We only identified variations in SSIVa, SBE3, SSIIIb and SSIIc between the parents of Longdao24. All four allele of SSIVa, SBE3, SSIIIb and SSIIc were come from the male parent Jigeng83. These genes represented starch synthases (SSIVa, SSIIIb and SSIIc) and branching enzymes (SBE3). These enzymes are involved in amylopectin biosynthesis in rice endosperm. They had low to medium effects on variation in starch trait [38]. So we need do more works to get enough data, for example polymorphisms or expression levels of these genes, to explore the quality genetic basis of Longdao24.