Development of Novel Microsatellite Markers for the BBCC Oryza Genome (Poaceae) Using High-Throughput Sequencing Technology

Wild species of Oryza are extremely valuable sources of genetic material that can be used to broaden the genetic background of cultivated rice, and to increase its resistance to abiotic and biotic stresses. Until recently, there was no sequence information for the BBCC Oryza genome; therefore, no special markers had been developed for this genome type. The lack of suitable markers made it difficult to search for valuable genes in the BBCC genome. The aim of this study was to develop microsatellite markers for the BBCC genome. We obtained 13,991 SSR-containing sequences and designed 14,508 primer pairs. The most abundant was hexanuclelotide (31.39%), followed by trinucleotide (27.67%) and dinucleotide (19.04%). 600 markers were selected for validation in 23 accessions of Oryza species with the BBCC genome. A set of 495 markers produced clear amplified fragments of the expected sizes. The average number of alleles per locus (Na) was 2.5, ranging from 1 to 9. The genetic diversity per locus (He) ranged from 0 to 0.844 with a mean of 0.333. The mean polymorphism information content (PIC) was 0.290, and ranged from 0 to 0.825. Of the 495 markers, 12 were only found in the BB genome, 173 were unique to the CC genome, and 198 were also present in the AA genome. These microsatellite markers could be used to evaluate the phylogenetic relationships among different Oryza genomes, and to construct a genetic linkage map for locating and identifying valuable genes in the BBCC genome, and would also for marker-assisted breeding programs that included accessions with the AA genome, especially Oryza sativa.


Introduction
The Oryza genus comprises more than 22 species with 10 recognized genomic types, six of which are diploid genome sets (2n = 24, AA, BB, CC, EE, FF, and GG) and four of which are tetraploid (2n = 4x = 48, BBCC, CCDD, HHJJ, and HHKK) [1]. According to their genome constitution, species in this genus can be classified into four main complexes [2]: Oryza ridleyi complex (including the HHJJ genome); Oryza granulate complex (including the GG genome); Oryza officinalis complex (including the BB, CC, BBCC, CCDD, and EE genomes); and Oryza sativa complex (including the AA genome). There are two cultivated Oryza species, referring to Oryza sativa and Oryza glanerrima. Asian cultivated rice (Oryza sativa) is one of the most important food crops in the world, and serves as a primary food source for more than half of the world's population [3]. In the field, cultivated rice plants are continuously damaged by various biotic and abiotic factors. The planting of modern varieties with resistance and/or tolerance genes is one of the best strategies to control pests in rice production. Some populations of wild species of Oryza have been identified as extremely valuable resources that can be used to broaden the genetic background of cultivated rice to increase its resistance to adverse factors.
The BBCC Oryza genome (2n = 4x = 48) is characteristic of allotetraploid wild species with two homologous genomes, B and C. Three species have this genome type: Oryza malampuzhaensis, which is found in India; Oryza minuta, which is endemic to Philippines and Papua New Guinea; and Oryza punctata (tetraploid, 2n = 48), which is widely distributed in Africa. The BBCC genome is related to the BB and CC genomes [1]. Only Oryza punctata (2n = 24) has the BB genome [4,5], while Oryza officinalis, Oryza rhizomatis and Oryza eichingeri have the CC genome. These species are regarded as donors of genes that promote resistance to rice blast, bacterial leaf blight, brown planthopper, and white backed planthopper [6,7].
However, the transfer of valuable genes from these wild species to Oryza sativa via crossing has been proved to be extremely difficult because of low seed set, hybrid sterility, and the lack of chromosome recombination [8]. There is no doubt that appropriate gene identification technologies will promote the use of genetic material from these wild species. The traditional method to identify the genomes of Oryza was to observe chromosome pairing behavior at meiotic metaphase-I in interspecific hybrids [9,10]. However, this process was affected by genetic and environmental factors [11,12]. Subsequently, genomic in situ hybridization (GISH) was used to identify genomes [13], followed by multicolor genomic in situ hybridization (McGISH), an improved method that used two different genomic probes simultaneously [14]. Both GISH and McGISH were complex methods with highly technical requirements. More recently, DNA molecular techniques, especially simple sequence repeat (SSR) markers, have been proved to be simple and highly effective methods for genetic analysis. A large number of SSR markers have been developed for Oryza sativa [15,16]. While some of the SSRs developed for Oryza sativa could be amplified from other AA genomes in the Oryza genus, they were not suitable for cross-amplifications from Oryza species with different genome types [17], as preceding cross-amplifications by Miscanthus sinensis (Poaceae) and its relative [18] and Narcissus papyraceus (Amarillydaceae) and its relatives [19]. Since there had being no sequence information available for the BBCC genome, no special markers have been developed for it. This made it difficult to explore the BBCC genome to find valuable genes, and to study the phylogenetic relationships among diverse members of the Oryza genus.
Hence, the goal of this study was to develop the first set of microsatellite markers for the BBCC Oryza genome using next generation sequencing (NGS) technology. These microsatellite markers could be used to evaluate the phylogenetic relationships among different Oryza genomes, and to construct a genetic linkage map for locating and identifying valuable genes in the BBCC genome, and would also for marker-assisted breeding programs that include accessions with the AA genome, especially Oryza sativa.

Plant materials and DNA extraction
We chose seven Oryza species including 48 accessions (

Microsatellite loci search and SSR primer development
Genome libraries were constructed from the accession W303 (Oryza minuta) based on shotgun method, and then sequenced using the Illumina Hi Seq 2000 sequencer (Illumina Inc., San Diego, CA, USA). The genome of W303 (European Bioinformatics Institute; Accession number: PRJEB5091) was assembled using Phusion2 [20] and Phrap [21]. The N50 length of the entire assembly was calculated for the initial contigs with small contigs , 1000 bp excluded.
The SSRs were identified by the software MISA (Microsatellite identification tool, http://pgrc.ipk-gatersleben.de/misa/). The primers for each unique SSR were designed using the Primer 3.0 (http://sourceforge.net/projects/primer3/). The primer design parameters were as follows: length from 18 bp to 23 bp with 21 bp as the optimum; annealing temperature between 55uC and 63uC with 60uC as the optimum; GC content from 40% to 60% with 50% as the optimum; and PCR product size between 80 bp and 250 bp.

SSR genotyping
The PCR amplifications were carried out with a 2720 thermal cycler (Applied Biosystems, Foster City, CA, USA) in 10 mL reaction mixtures. Each reaction contained 1.0 mL 106 buffer, 1.0 mL 2 mmol/L dNTPs, 1.0 mL 25 mmol/L MgCl 2 , 0.6 mL each of forward and reverse primer (10 mmol/L), 0.1 mL 5 U/mL Taq polymerase, and 20 ng template DNA. The PCR cycling

Statistical analysis
The average number of alleles per locus (Na), the genetic diversity per locus (He), and the polymorphic information content (PIC) were calculated with the Powermarker Software [22]. All 48 accessions were clustered using the Neighbor-Joining (NJ) tree implemented in the TreeView program [23] according to the Nei's unbiased genetic distance [24] with 100 bootstrap replications, using the Oryza sativa as an out-group.

Data from sequencing and microsatellite loci detected
As shown in Table 1, a total length of the assemble sequences . 1000 bp was 480,470,380 bp (n = 225,883) (http://www.ricedata. cn/down/W303_fasta.rar). The average length of the read sequences was 2,128 bp, with a maximum length of 41,615 bp and no sequences shorter than 1,000 bp.
In total, 16,197 SSR loci were identified with discrete repeats accounting for 97% and compound repeats (C* type and C type) accounting for only 3%. We obtained 13,991 SSR-containing sequences, and 1,814 sequences contained more than one SSR. There were 503 SSRs present in compound formation (Table 2). Finally, 14,508 primer pairs were designed.

Characterization of microsatellite markers for the BBCC genome
We designed 14,508 primer pairs, and selected a set of 600 SSR markers based on proportional distribution ( Figure 1). We tested the ability of the 600 primer sets to amplify SSRs from 23 accessions with the BBCC genome. Of the 600 primer pairs, 50 did not produce amplicons, probably because of mutations at the SSR locus. 55 did not amplify fragments of the expected size, probably because of In/Del mutations at the SSR locus. Of the remaining 495 microsatellite markers (Table S2, http://www. ricedata.cn/down/SSR_data.xlsx), 156 were monomorphic, and 339 were polymorphic. There were 223 single copy and 272 multicopy markers. The mean Na value was 2.5 with a range from 1 to 9. The He value varied from 0 to 0.844 with a mean of 0.333. The mean PIC was 0.290, and ranged from 0 to 0.825. Among these markers, 46 were unique to Oryza minuta, five were unique to Oryza punctata, and none were specific to Oryza malampuzhaensis. The genetic diversity of Oryza minuta was lower than that of Oryza punctata (Table 3

Cross-amplification from other related genomes
Next, we evaluated the suitability of these 495 markers for use in other closely related species. Of the 495 markers, only 12 (2.4%) were specific to the BB genome, 173 (34.9%) were specific to the CC genome, and 299 (60.4%) were common to the BB, CC, and BBCC genomes. Eleven markers (2.2%) were neither in the BB nor the CC genome. Most interestingly, 198 markers (40.0%) were also present in the AA genome.
The phylogenetic tree (Figure 2) grouped the 48 accessions into two significant, distinct clusters. Cluster I consisted of the BB, CC, and BBCC genome species; and cluster II consisted of the AA genome species. Cluster I was further divided into two groups, one  Table 3. Details of 46 and 5 microsatellites specific to Oryza minuta and Oryza punctata, respectively.  Table 3. Cont.

Locus
Repeat motif consisting of species with the BBCC and BB genomes, and the other consisting of species with the CC genome. Within the BBCC genome, Oryza minuta and Oryza punctata formed different subgroups. Oryza malampuzhaensis was more closely related to Oryza minuta than to Oryza punctata. Among the species with the CC genome, Oryza eichingeri was more closely related to Oryza officinalis than to Oryza rhizomatis. In cluster II, Oryza sativa indica and Oryza sativa japonica were clearly divided into two groups. The groups in the NJ tree were consistent with the intrinsic relationships among Oryza species [17], and further confirmed the usefulness of the new developmental microsatellite markers in genetic analyses.

Discussion
We developed the first set of microsatellite markers for the BBCC Oryza genome. The SSRs were located in both coding and non-coding regions, and therefore, they would be useful for genetic and evolutionary analyses, high-throughput mapping, and markerassisted plant improvement strategies. In this study, 82.5% of selected markers produced clear amplified fragments of the expected sizes. This was similar to the success rate of 60-90% amplification reported elsewhere [25]. Among these markers, 12 were specific to the BB genome and 173 were unique to the CC genome. Thus, these unique microsatellite markers could be developed as probes to identify different species and various genomes. We evaluated the transferability rates of the markers in different Oryza species. The transferability rate between Oryza minuta and Oryza punctata was 89.7%. This was higher than that for Oryza species with the BB, CC, and BBCC genomes (60.4%), and that between AA and BBCC genomes (40.0%). These high transferability rates suggested that different species or genomes within the Oryza genus were closely related.
Our results showed that hexanucleotide repeat motif (31.4%) was the most abundant repeat type, followed by trinucleotide (28.0%) and dinucleotide (19.3%). These findings differed from those of previous studies in which dinucleotide or trinucleotide repeats were reported to be the most abundant motifs in genomes of cultivated rice [16,26], and pentanucleotide repeats (30.5%) were the most abundant type in Gossypium raimondii [17]. The nature of the microsatellites obtained was related not only to the thresholds used to define the microsatellites, but also to genome organization, since heterogeneity could lead to differences in microsatellite size [27]. The most common hexanucleotide motif was AAAAAG/CTTTTT (4.0%), which made up a much lower proportion than that of the most common motif in faba bean, ACACGC/CGTGTG (49.5%) [28]. The main trinucleotide repeats were AGG/CTT and CCG/CGG, representing 16.3% of all of the trinucleotide repeats analyzed. The most common trinucleotide repeats were AGG/CTT in Amorphophallus [25], and CGG/GCC in cultivated rice [16,26]. These results provided further evidence that the CCG/CGG motif was very common in monocots [29]. This reflected the strong conservation of synteny among genomes of diverse monocots, and could result from a high GC content and codon bias [30,31].
In previous studies, mitochondrial restriction fragment length polymorphisms (RFLPs) [32] and inter simple sequence repeat (ISSR) [33] markers had been used to study genetic relationships among members of the Oryza genus. However, these analyses could only distinguish the AA genome from other types, and could not separate other related genomes, such as the BB, CC, and BBCC genomes. In contrast, the SSR markers developed from the BBCC genome were able to differentiate the AA, BB, CC, and BBCC genomes, and also distinguished the BB and CC genomes from the BBCC genome, even identified various species within the AA, CC, and BBCC genomes. Thus, the relationships predicted from analyses using these markers were consistent with the established evolutionary relationships among members of the Oryza genus [17]. Despite this, a new marker, SNP (Single Nucleotide Polymorphism), is now on the scene and has gained increasing popularity. In terms of genetic information provided, as simple bi-allelic co-dominant markers, they can be considered as a step backwards when compared to the highly informative multiallelic microsatellites [34].
The NJ tree further revealed that the BB genome species were more closely related to species with the BBCC genome than to those with the CC genome, demonstrating that the BB genome was the maternal parent of the BBCC genome [35,36] and CC species evolved later [37]. Oryza malampuzhaensis and Oryza officinalis, both of which had the BBCC genome, shared similar morphologies; in fact, Oryza malampuzhaensis was considered to be a subspecies of Oryza officinalis [38]. There were clear differences in the panicle and spikelet between these two species [14]. Our results showed that Oryza malampuzhaensis was more closely related to Oryza minuta than to Oryza officinalis, consistent with the fact that Oryza malampuzhaensis was an allotetraploid with the BBCC genome [39] while Oryza officinalis was a diploid with the CC genome.

Conclusions
We present the first set of microsatellite markers from the nuclear BBCC Oryza genome. Our results showed that the highthroughput approach for sequencing was useful for obtaining many high quality SSR markers. These markers can be used to study the origins and evolutionary relationships among members of the Oryza genus, and could also be used to construct physical maps and for map-based gene cloning from the BBCC genome to identify valuable genes. Furthermore, they could be used for marker-assisted trait selection in cultivated rice breeding programs. By using the pre-existing sequence information, the further analysis will focus on the SNPs development which is known as a new marker.