Genome-Wide DNA Polymorphisms in Seven Rice Cultivars of Temperate and Tropical Japonica Groups

Elucidation of the rice genome is expected to broaden our understanding of genes related to the agronomic characteristics and the genetic relationship among cultivars. In this study, we conducted whole-genome sequencings of 6 cultivars, including 5 temperate japonica cultivars and 1 tropical japonica cultivar (Moroberekan), by using next-generation sequencing (NGS) with Nipponbare genome as a reference. The temperate japonica cultivars contained 2 sake brewing (Yamadanishiki and Gohyakumangoku), 1 landrace (Kameji), and 2 modern cultivars (Koshihikari and Norin 8). Almost >83% of the whole genome sequences of the Nipponbare genome could be covered by sequenced short-reads of each cultivar, including Omachi, which has previously been reported to be a temperate japonica cultivar. Numerous single nucleotide polymorphisms (SNPs), insertions, and deletions were detected among the various cultivars and the Nipponbare genomes. Comparison of SNPs detected in each cultivar suggested that Moroberekan had 5-fold more SNPs than the temperate japonica cultivars. Success of the 2 approaches to improve the efficacy of sequence data by using NGS revealed that sequencing depth was directly related to sequencing coverage of coding DNA sequences: in excess of 30× genome sequencing was required to cover approximately 80% of the genes in the rice genome. Further, the contigs prepared using the assembly of unmapped reads could increase the value of NGS short-reads and, consequently, cover previously unavailable sequences. These approaches facilitated the identification of new genes in coding DNA sequences and the increase of mapping efficiency in different regions. The DNA polymorphism information between the 7 cultivars and Nipponbare are available at NGRC_Rices_Build1.0 (http://www.nodai-genome.org/oryza_sativa_en.html).


Introduction
Genome sequence data have greatly increased with significant advances in next-generation sequencing (NGS) technology.Millions of DNA polymorphisms, including single nucleotide polymorphisms (SNPs) and insertion-and-deletion polymorphisms (InDels), have been obtained using this information by using highthroughput methods and bioinformatic tools.SNPs and InDels have been used as DNA markers.In particular, numerous SNPs have facilitated whole-genome genotyping and genome-wide association studies in animals and higher plants [1][2][3][4][5][6][7][8][9].DNA markers are a powerful tool for marker-assisted breeding, quantitative trait locus analysis, and genome-wide association analyses [10,11].Rice is one of the most important crops worldwide; the genome sequence of 2 rice groups, japonica and indica, have been established by the International Rice Genome Sequencing Project (IRGSP) [12,13] and the Beijing Genomics Institute [14].To date, the whole genomes of 4 cultivars of the japonica group (Koshihikari, Omachi, Eiko, and Rikuu-132) have been sequenced using the NGS technology and mapped to the Nipponbare reference sequences [5,6,15].Recently, by using only about 16 genome coverage, the genomes of 1,083 indica and japonica cultivars were sequenced by NGS, revealing the relationship of rice cultivars spread across wide regions [4].
NGS is also an important technology in the field of genomics.Advances in bioinformatics have facilitated the production of complete genome sequences of bacteria, viruses, and yeasts, as well as individual human chromosomes, through de novo assembly of short-reads sequenced by NGS [16].However, at present, further technological developments are needed to allow complete whole genome sequencing using this method in biological species having a large genome size.Bioinformatics has been useful in genetic analyses, such as identification of mutations in functional genes, in not only microorganisms but also plants.Several bioinformatic studies on rice and Arabidopsis have been reported, e.g.identification of T-DNA insertion loci in activation-tagged Arabidopsis lines, mapping the physical location of point mutations in Arabidopsis genes, and mapping various transposable elements in the rice genome [17][18][19].Bioinformatic tools have thus become important in the identification of genes.
Complete sequencing of the rice genome by NGS technology alone remains challenging, because of a low sequence homology that hinders mapping to the reference sequences in many regions.
Consequently, there are gaps in the sequence of the sample genome.These unavailable sequences might contain important functional genes, which remain unknown; obtaining these unavailable sequences may facilitate more efficient use of NGS data.Such sequences might be higher in the tropical japonica cultivar that is considerably different from the temperate japonica reference cultivar [20].Unavailable sequences have also resulted from biased sequences.Increasing the efficacy of unavailable sequences might be useful for enhancing the value of NGS data.
In this study, we sequenced 5 temperate japonica cultivars: Yamadanishiki and Gohyakumangoku as well as Omachi, the sequencing information for which has been previously reported, are used for sake brewing.Kameji is one of the oldest landrace like Omachi.Koshihikari and Norin 8 are modern cultivars and similar to reference Nipponbare.One tropical japonica cultivar Moroberekan, which is cultivated in West Africa and considered to confer durable resistance to rice blast disease [21] as well as rice yellow mottle virus [22], was also sequenced.Among the sequences of these cultivars, only those of Koshihikari and Omachi were supported by deep sequencing data [6,15].By sequencing various cultivars, we showed that deep sequencing over 306 the genome improved data acquisition in gene regions, particularly in the coding DNA sequences (CDSs).In addition, the use of unmapped reads obtained by assembling contigs helped in identifying new CDSs even in the cultivar Moroberekan, which showed the highest number of SNPs.Identification of more CDSs might increase the value of NGS technology.

Whole-genome Sequencing and in silico Mapping
The genomic DNA of 6 cultivars, namely, Yamadanishiki, Kameji, Gohyakumangoku, Koshihikari, Norin 8, and Moroberekan, were sequenced using the NGS technology.More than 170 million short-reads from each cultivar were generated.Short-reads of Omachi that had been produced in a previous study [15] were also used in these analyses.Short-reads from the genome of each cultivar were mapped to the Nipponbare reference genome, and 81.5-94.0%and 2.2-13.4% of the total short-reads were successfully mapped to chromosomal and organelle genomes, respectively (Table S1).The short-reads that were mapped to the genome were classified into 2 groups: unique or multiple locations.The short-reads uniquely mapped to chromosomes varied from 69.7% to 83.1% of the generated reads among cultivars and corresponded to over 9.9 Gb of the Nipponbare genome (Table S1).Average sequencing depths of the uniquely mapped reads in each cultivar were from 326 to 566 across the entire genome and covered 83.4-90.1% of the Nipponbare reference genome IRGSP1.0 (Table 1).The highest coverage (%) with a sequencing depth of $5 was found in Koshihikari, and the lowest one was found in Moroberekan.The coverage of uniquely mapped reads suggested an increasing tendency in modern cultivars similar to Nipponbare compared to that in old cultivars and tropical japonica cultivar.Thus, more than 83% of the whole genome sequences of various cultivars were covered using NGS with the Nipponbare genome as reference.

Sequencing of Gene Regions using NGS Technology
Although the coverage level described above seemed to be sufficient to obtain remarkable numbers of DNA markers, obtaining further sequences of gene regions would be ideal.CDSs of approximately 45,000 genes have been annotated in the Nipponbare reference genome IRGSP-1.0 [13].To improve the information available for gene regions, we investigated the relation between sequencing depth and sequence coverage in the CDSs of gene regions by using the Omachi reads as a test case.Few Omachi short reads were randomly eliminated to produce various sequencing depths that corresponded to 506, 406, 306, 206, and 106 coverage of the genome.The short-reads of each set were mapped onto the Nipponbare reference genome, and the coverage of the CDSs was calculated.Figure 1 shows the number of genes that were covered by short-reads in 90% of the regions.A 90% limit was used to account for possible deletions in the CDSs.In all, 23,616 genes had 90% or more of the CDS covered by short-reads when the sequencing depth was 106 the genome; this was only approximately half of the genes in the genome (Figure 1).When the sequencing depth was 206 the genome, the number of genes obtained was 1.44 times as that obtained with a sequencing depth of 106 the genome.When the sequencing depth was increased to 306 the genome, the number of genes increased to 37,793, which was 1.6 times the number covered with a depth of 106 the genome and was equivalent to approximately 85% of the genes in the genome.The number of genes did not show any marked increase with a sequencing depth of 406or 506, yielding 1.05 and 1.08 times the number of genes obtained with a sequencing depth of 306 the genome, respectively.This result suggested that a sequencing depth of at least 306 the genome was required, for example, to identify a specific gene with mutations in the CDSs by using whole genome sequencing.All 7 cultivars were covered with a sufficient sequencing depth for future analyses of genes by using this criterion (Table 1).
However, the genomes of the 7 cultivars harboured different number of CDSs (Table 2).In the 7 cultivars, most genes showed 100% coverage of CDSs and those with CDSs covered in excess of 80% accounted for .90% of the total number of genes.The maximum and minimum numbers of genes with completely covered CDSs was 40,296 in the Koshihikari and 33,229 in Kameji cultivars, respectively; the sequencing depths for these cultivars were 416 and 406 the genomes, respectively.The sequencing depth of the Moroberekan genome (326) was the lowest, and CDSs of 1,188 genes were not covered by any shortreads; this number was more than twice compared to that in other cultivars.However, in this genome, the number of genes with 100% coverage of CDSs was 39,193, which was the second highest number of genes after Koshihikari.These results indicated that, in more than 90% of the total genes, .80% of the CDSs could be determined by using whole genome sequencing with a sequencing depth of 306 the genome.

Comparison of SNPs and InDels among the Different Cultivars and the Nipponbare Genome Sequences
SNPs and InDels were detected when the sequence of each cultivar was compared with the Nipponbare reference genome sequences IRGSP1.0 (Table 3).The numbers of SNPs detected in the cultivars in a descending order are as follows: Moroberekan (827,448), Gohyakumangoku (142,087), Omachi (132,799), Yamadanishiki (123,657), Koshihikari (115,455), Norin 8 (85,291), and Kameji (76,967).Moroberekan showed 10-fold more SNPs than those found in Kameji.The number of SNPs with a call rate of .90% was proportional to that of insertions and deletions in those cultivars, except for Kameji (Table 3).The number of SNPs observed in the Kameji genome was about half the number expected on the basis of the number of insertions and deletions.False-positive SNPs were eliminated by filtering out SNPs with a call rate of ,90%.When the call rate of .70% was used, the number of SNPs was almost equal to that expected from the number of insertions and deletions in Kameji (Figure S1).Although the cause for this phenomenon remains unknown, approximately 80,000 SNPs obtained are usable as DNA markers (Table 3).The quality of the detected SNPs was evaluated by analysing the genome of each cultivar using SNP array technology.The SNP validation rates reached .99.5% (Table S2), indicating that the SNPs could effectively be used as DNA markers.
The SNPs obtained from the 7 cultivars were annotated using the gene set of IRGSP1.0.On the basis of the average values of the 7 cultivars, 17.661.4% of the total SNPs and 17.460.7% of the total InDels were located within a gene (Table 4).Among them, 5.560.7%SNPs and 2.460.3%InDels were located in CDSs.In all, there were 3.060.4%non-synonymous SNPs.The percentages of SNPs and InDels belonging to each category were similar in the 7 cultivars.These data are available at a GBrowse, NGRC_Rices_Build1.0 (corresponding to IRGSP1.0)(http:// www.nodai-genome.org/oryza_sativa_en.html).

Relationship between Cultivars by Comparison of SNPs
The variation among cultivars was investigated by assessing SNPs and InDels across individual chromosomes (Table 3).In temperate japonica cultivars, chromosomes 4 and 11 had high densities of SNPs and InDels, whereas chromosome 9 had low densities of variants in all cultivars, except Gohyakumangoku.SNP density across chromosome 9 in Gohyakumangoku was about 5 times higher than that in other temperate japonica cultivars.Some cultivars showed a characteristic pattern of SNP location on chromosomes.The precise origins of these differences in chromosomes were detected by comparing the total number of SNPs per megabase in the chromosomes of each cultivar (Figures S2-S13).There were 7 regions in which all cultivars shared high density, with over 500 SNPs/Mb; these included 24-25 Mb, 15-17 Mb, 15-16 Mb, 25-26 Mb, 2-3 Mb, 6-8 Mb, and 27-28 Mb on chromosomes 2, 3, 4, 7, 10, 11, and 12, respectively.In Moroberekan, several regions showed extremely high densities of SNPs, such as 8-13 Mb on chromosome 4, 6-10 Mb on chromosome 10, and 11-23 Mb on chromosome 12, whereas the densities of SNPs in these regions in the other cultivars were low.Our analysis approach was unable to detect regions with less than 50 SNPs/Mb in any of the 7 cultivars.
Comparison of SNPs and InDels indicated that the pair of cultivars Omachi and Yamadanishiki had the closest relationship among all the cultivars.However, there were differences in SNP distributions between these 2 cultivars at 27 regions, indicating a more than 5-fold difference in the number of SNPs.In particular, 7 regions showed marked differences, i.e. 8-9 Mb on chromosome 2, 2-5 Mb on chromosome 4, 0-2 Mb on chromosome 7, 9-11 Mb and 14-16 Mb on chromosome 8, 10-12 Mb on chromosome 9, and 19-20 Mb on chromosome 10.Between Omachi and the other cultivars, the number of these differing regions increased.In the cases of Kameji, Koshihikari, and Moroberekan, 55, 122, and 248 regions, respectively, were markedly different, again indicating a possible genomic relationship.

Analyses of Unmapped Sequence Reads
In terms of CDSs, whole genome sequencing with a depth of 306 the genome provided over 80% coding sequences in 90% of genes; however, in each of the cultivar, there still remained many CDSs that were not covered by any short-reads.The unavailable sequences corresponding to these CDSs were expected to be present in the unmapped reads, which varied from 2.6-7.0% of the total reads in the 7 cultivars (Table S1).The unavailable sequences were reduced and annotated CDSs were increased by generating contigs by assembling unmapped reads and remapping these to the Nipponbare genome.The effect of read length on assembly quality was determined by preparing Omachi datasets  containing different read lengths by trimming an original length of 74 bases.Among these trimmed reads, reads having quality value over 20 were used to decrease the rate of mis-assembly.The percentages of reads with such quality were increased with decreasing read length (Table S3).In the case of reads of original length, only 2% reads had such quality, but the trimmed 40 base reads contained 8.7% reads with such quality.In general, the sequence quality decreases with advanced read sequencing by using NGS; hence, filtering out the low-quality reads following trimming would decrease the rate of mis-assembly.The quality of the assembly was shown in terms of output parameters in the VelvetOptimizer program and included the number of assembled reads; Velvet hash values; total number of contigs; N50 length; longest contig length; number of contigs longer than 1,000 bases; and total bases in contigs longer than 1,000 bases (Table S3).Significant trimming of the read length affected the number and length of contigs that could be constructed.Among these trimmed read lengths, reads with a length of 50 bases could generate the largest number and longest contigs in terms of N50 length.Under this condition, the total numbers of contigs that could be constructed were 6,903, which corresponded to approximately 80% of applied reads.The total length of all contigs was 2,479,877 bases.Application of the same approaches to other cultivars resulted in .2,500contigs (Table S3).Notably, assembling of Moroberekan reads could yield over 20,000 contigs.
The resultant contigs were annotated to confirm that they were similar to the Nipponbare sequences.A sequence similarity search was conducted against the Nipponbare genome by using the BLAST algorithm, with an E value threshold of 10 210 [23].Many contigs showed similarity to sequences in Nipponbare chromosomes 1-12.Of the total, 17-28% contigs were aligned to unique regions of the Nipponbare genome, and the remaining 47-71% contigs were aligned to multiple regions (Table 5).The degree of unique hit sequences in the contig sequences varied in cultivars.However, it reached over 80% in approximately 30% of contigs aligned on the unique regions in each of the cultivars (Figure S14).About 2-3% of contigs were aligned to about 1 Mb unanchored sequences, which were categorized to be Nipponbare sequences unaligned onto chromosomes 1-12.
The quality of contigs was validated by selecting and sequencing 12 Omachi contigs showing similarity to unique regions of the Nipponbare genome (Table S4).Among these, the sequences of 10 contigs were successfully amplified using primers shown in Table S4 and showed matches between Sanger and NGS sequences.
The contigs, including with over 80% unique hit sequences, were distributed unevenly across the chromosomes (Figure 2).Specifically, more contigs mapped to chromosomes 4 and 11 compared to other chromosomes, except in Moroberekan, and fewest contigs mapped to chromosome 9, followed by chromosome 5.In Moroberekan, such contigs were distributed evenly across the chromosomes.Whether those contigs covered new gene regions was determined by first listing the regions that had many reference nucleotides without mapped short reads in each cultivar.Next, the contigs having unique hit sequences were mapped to listed regions.This approach allowed the identification of a total of 315 contigs containing newly mapped sequences that had not been covered by any short-reads (Table S5).The regions covered by these contigs coded for a total of 119 genes (Table 6, Table S6).A significant number of uniquely hit genes were found in Omachi and Moroberekan, but not in Norin 8.This might indicate that the method used was effective for distantly related cultivars from the reference cultivar.Thus, contig construction using the unmapped reads was very effective in obtaining information on CDSs.
The ''no hit'' contigs in Table 5 were annotated to the sequences of the other japonica rice cultivars by using BLASTn search with Oryza sativa japonica rice.In all, 22-249 contigs were assigned to a part of the sequences in the targets (Table 7).However, in this case, the size of the target was very large compared to that of the contig length.Among the contigs assigned, 7-87 contigs showed similarity with over 80% of the contig length to a part of the target sequences.The descriptions of targets are listed in Table S7.The most hit target was the locus ''Pik,'' which is known to confer rice blast resistance in cultivars of Tsuyuake and Kanto 51 japonica rice [24,25].
Next, the remaining contigs that did not show hits to the japonica rice sequences were searched using BLASTn with Oryza sativa indica rice.Considerable numbers of contigs showed hits to indica rice in all of the 7 cultivars (Table 7).Specifically, in Omachi, Kameji, and Moroberekan, which were older cultivars, and a tropical japonica cultivar, more contigs were aligned to indica rice genome.In Moroberekan, 3 contigs were assigned on a part of the sequences of Kasalath genome, of which the target contained the sequences corresponding to the phosphorus uptake 1 QTL of Nipponbare (Table S7) [26].Thus, the expanding of the target cultivars resulted in increased annotation efficiency.

Discussion
To increase the available genome information in rice, we selected and analysed the whole genomes of 6 different types of cultivars, namely, Yamadanishiki, Gohyakumangoku, Kameji, Koshihikari, Norin 8, and Moroberekan.Comparisons of the genomes of these cultivars, which differ in uses, breeding background, and cultivation area, would be expected to provide interesting and useful information.Previously, we reported numerous polymorphisms between the Nipponbare reference IRGSP build4 and Omachi genome sequences [15].Significant differences in the number of SNPs and InDels and their annotation have not detected between the results of IRGSP build4 and build5 (data not shown).However, the remarkable changes were observed in the number of SNPs and InDels of Omachi by IRGSP1.0 than in those by IRGSP build4 (Tables 1, 3, [15]).The genome sequence and the annotation data of IRGSP1.0 probably had significantly improved from the previous database.In this study, we used Nipponbare reference IRGSP1.0.Over 83% of the whole genome of each of the cultivars was successfully covered, and this 80% sequences comprised .90% of the total genes in each genome (Tables 1, 2).Thus, the Nipponbare genome sequence was useful as a reference for gene regions determined using NGS in various japonica rice genomes, including tropical japonica Moroberekan.Large numbers of SNPs and InDels were identified between each cultivar and the Nipponbare reference genome sequence (Table 3).The sequence data and polymorphisms are available at NGRC_Rice_Build 1.0 and are expected to be useful for genomic and genetic analyses.
In a previous report, many polymorphisms between the Nipponbare reference and Omachi genome sequences had been found with a sequencing depth of approximately 206 the genome [15].This sequencing depth was sufficient to detect DNA polymorphisms that could be used as DNA markers.However, approximately 20% of the total number of genes had not yet been covered under this condition, and this was deemed insufficient coverage of CDSs in gene regions.When the relation between sequence coverage and sequencing depth was investigated, we found that the number of sequenced genes approached saturation with a sequencing depth of 306the genome (Figure 1).Under this condition, .90% of the total genes in Omachi were accounted for among genes with .90% of CDSs covered.This indicated that A similarity search of the contig sequences was conducted against the Nipponbare genome chromosomes 1-12 and unanchored sequences.The numbers in the categories of unique, multi-hits, and no hits show the total number of contigs classified.doi:10.1371/journal.pone.0086312.t005 increasing the sequencing depth was needed to sufficiently cover CDSs and is a promising approach to increase efficiency of identifying genes responsible for a phenotype.
Considering the sequence coverage of CDS regions of genes in each cultivar, CDSs of Koshihikari were covered almost completely, whereas those of Kameji were covered least well (Table 2).Interestingly, although the CDS regions of 1,188 Moroberekan genes were not covered by any short-reads (twice as many as those of the other temperate japonica cultivars), many genes with CDS regions covered completely were detected similar to that of Koshihikari.In Moroberekan, 39,181 genes were completely sequenced, which was more than that of other japonica cultivars (Table 2).This finding suggests that a majority of CDS regions in Moroberekan have close similarity to a reference, likely reflecting the concept that genic regions are better conserved than intergenic regions, and that high densities of polymorphisms are found in intergenic regions.Moroberekan results further indicated that the Nipponbare sequences could be used as a reference for cultivars having a different breeding background than that of japonica rice in Japan, such as Moroberekan.Applying NGS technology to elucidate the genomes of even more distantly related cultivars would yield interesting results.
Increased completion of a whole-genome sequence is preferable for genetic analyses.However, there were regions that could not be sequenced by the method used in this study.These regions varied among cultivars depending on the genetic distance from the reference sequence (Table 1).Because the sequenced short-reads were reconstructed on a reference sequence on the basis of homology between the genomes, the excluded regions are expected to include regions containing multiple SNPs, long insertions and deletions, repetitive regions, undefined bases in the reference genome, and sequences homologous to the chloroplast and mitochondrial genomes.Sequencing of the unavailable regions of multiple SNPs and InDels might be helpful in the identification of cultivar-specific gene functions, because some genes related to the characteristics of a given cultivar included them, for example, some disease-resistance genes containing leucine-rich repeat domains [24,25].In order to obtain complete sequence, we attempted to decrease the unmapped regions by assembling the unmapped short-reads (Table S5).This analysis resulted in a total 3,718 contigs containing unique and similar sequences with the Nipponbare genome.Unexpectedly, about 90% of these contigs were assigned to the regions covered by the original short read mapping.The reason why a majority of contigs assembled by unmapped short reads were mapped to known regions was not clear.One of the reasons for this might be the difference in the search criteria used for the short-read mapping and sequence similarity searching of contigs.If the unavailable reads had many SNPs and an incorrect base by error over an acceptable range, the resultant contigs might have sufficient sequence similarity to be aligned on similar regions.Actually, the number of gene sequences covered in the 7 cultivars  The newly mapped contigs contained newly mapped sequences of Nipponbare that had not been covered by any short-reads.doi:10.1371/journal.pone.0086312.t006could be increased using this approach.Newly covered sequences were detected in the 315 contigs, which contained a total of 119 new gene regions (Table S6).Especially, the number of these regions in Moroberekan was larger than that in temperate japonica.
Interestingly, the genes, including the newly mapped sequences, tended to be enriched for kinase genes (Table S6) and contained a leucine-rich repeat domain that affords disease resistance in plants.
High frequencies of SNPs have been reported in those genes from rice, Arabidopsis, and maize genomes [27][28][29].Thus, construction of contigs using unmapped reads could cover gaps that remained by using original short-read mapping and might allow to facilitate gene identification.More contigs were annotated by expanding the target of sequence similarity search from Nipponbare to other japonica and indica sequences.Many contigs were found to show hits to the rice blast resistance Pik locus with variation of sequences in different japonica cultivars [24,25].The sequence similarity search to indica rice cultivars revealed that, in the old cultivars Omachi and Kameji and toropical japonica cultivar Moroberekan, contigs aligned to sequences of indica rice were more than those in the other cultivars.This might be related their genetic distance from indica cultivars (Table 7).Further investigation is warranted to understand the biological function of hit sequences included in the contigs.However, contigs formed by assembling unmapped short reads might increase their value.
Our results suggested a genetic relationship among the 7 cultivars.There seemed to be an increasing tendency of coverage of uniquely mapped reads in the modern cultivars similar to Nipponbare.Old cultivars, Kameji and Omachi, and tropical japonica cultivar Moroberekan showed slightly low coverage (Table 1).The increased numbers of genes not covered and of polymorphism detected seemed to indicate the increasing distance from Nipponbare, where Kameji was an exception (Tables 2, 3).Actually, about 4-10-fold numbers of SNPs, insertions, and deletions were detected in Moroberekan.The number of unmapped reads also tended to increase in cultivars with increasing distance from Nipponbare (Table S1).In Omachi, Kameji, and Moroberekan, many contigs assembled from those unmapped reads showed similarity to indica rice cultivars (Table 7).These results suggested a genetic relationship among the cultivars.
In conclusion, in this study, we analysed the whole genomes of 6 different cultivars using NGS technology and obtained a total 1,503,704 SNPs and 368,504 InDels in temperate and tropical japonica cultivars by using Nipponbare genome as a reference.A public database of this genome information has been developed.Additionally, we defined 2 approaches that efficiently used reads obtained by NGS.First, by using the Omachi sequence data, we showed that a sequencing depth of 306 would be required to sufficiently cover CDSs to allow genetic analyses.Furthermore, we could improve mapping efficiency to obtain contigs by assembling unmapped reads, resulting in the detection of new genes.Our attempts might be helpful, for example, in increasing the value of NGS reads for detecting causal genes of particular phenotypes.Taken together, the sequence data and DNA polymorphism results obtained provide a basis for advancing research in genetic and genomic analyses of rice.
Genomic DNA samples were extracted from 5 or 10 plants with Nucleon PhytoPure (GE Healthcare BioSciences, Little Chalfont, UK) and used for the preparation of sequencing libraries, according to the manufacturer's protocols (Illumina, San Diego, CA, USA).Fragments of the libraries were paired-end sequenced using Genome Analyzer IIx and HiSeq2000 (Illumina).The length of all sequences generated was 74-100 nucleotides.NGS reads of these genomes have been submitted to DDBJ (http://www.ddbj.nig.ac.jp/index-e.html)under the following accession number DRA000897.The Omachi NGS reads were previously generated [15].

Sequence Mapping and SNP/InDel Identification
Filtering rules were applied to select reads that were mapped to the rice chromosomal genome (Oryza sativa L. cv.Nipponbare, the Os-Nipponbare-Reference-IRGSP-1.0, http://rapdb.dna.affrc.go.jp/download/irgsp1.html,Rice Genome Sequencing Project, 2008) using ELAND (optional software for the Illumina GA pipeline system ver.1.6-1.8).Mapping to the chromosomal and organelle genomes (O.sativa ssp.japonica group chloroplasts and Nipponbare mitochondria, DDBJ accession numbers X15901 and DQ167400, respectively) was performed using the Burrows-Wheeler alignment tool (BWA) software algorithm, allowing 2 mismatches from the 1 st to 35 th bases of the read [32].BWA outputs were analysed using SAMtools software [33].For SNP and InDel discovery, the algorithm implemented in SAMtools considered only reads that aligned to a unique location of the Nipponbare genome.Three filters were then applied to decrease the rate of false-positive SNPs and InDels: a target depth $5; target minimum base quality of 30; and call rate of a polymorphism .90%for SNPs and .30%for InDels.The physical positions of each sequence, including SNP and InDel information, were integrated into the annotated gene set of IRGSP1.0.SNPs and InDels in gene regions and other genomic regions were annotated as genic and intergenic, respectively.The genic SNPs and InDels were classified as CDS, UTR59, and UTR39 (UTRs), or other (introns), according to those positions.SNPs in the CDSs were separated into synonymous and nonsynonymous (amino acid substitutions).These data of IRGSP1.0

Figure 1 .
Figure 1.Relationships between sequencing depth and sequence coverage in the coding sequences.Omachi reads that were originally sequenced using a sequencing depth of 586 the genome were randomly eliminated to produce adjusted sequencing depths of 506, 406, 306, 206, and 106the genome.The x-and y-axes show the sequencing depth and the number of genes covered with short-reads in over 90% of the coding sequences, respectively.doi:10.1371/journal.pone.0086312.g001

Figure 2 .
Figure 2. Configuration comparison of unique hit contigs among chromosomes.The numbers in the bars show the number of classified contigs.doi:10.1371/journal.pone.0086312.g002

Table 1 .
Coverage and sequencing depth of mapped reads with reference to the Nipponbare chromosomal genome IRGSP1.0.

Table 2 .
Numbers of genes with defined sequence coverage in each cultivar.

Table 3 .
Densities of SNPs and InDels on individual chromosomes detected between the various cultivars and the Nipponbare genome IRGSP1.0.

Table 4 .
Annotation of SNPs, insertions, and deletions.SNPs, insertions, and deletions on the IRGSP rice pseudomolecules were classified as genic and intergenic, and locations within gene models were annotated.The number of SNPs, insertions, and deletions in each class is shown.doi:10.1371/journal.pone.0086312.t004

Table 5 .
Annotation of contig sequences assembled from unmapped reads.

Table 6 .
Number of newly mapped genes by contigs in the Nipponbare genome.

Table 7 .
Numbers of contigs showing sequence similarity to Oryza sativa japonica and Oryza sativa indica. of the contig sequences that were unaligned to regions of Nipponbare sequence was sequentially conducted against Oryza sativa japonica and Oryza sativa indica by using NCBI BLASTn search.The numbers in the categories of hit to japonica or indica, and no hit show the total number of contigs classified.doi:10.1371/journal.pone.0086312.t007