Development of Genic and Genomic SSR Markers of Robusta Coffee (Coffea canephora Pierre Ex A. Froehner)

Coffee breeding and improvement efforts can be greatly facilitated by availability of a large repository of simple sequence repeats (SSRs) based microsatellite markers, which provides efficiency and high-resolution in genetic analyses. This study was aimed to improve SSR availability in coffee by developing new genic−/genomic-SSR markers using in-silico bioinformatics and streptavidin-biotin based enrichment approach, respectively. The expressed sequence tag (EST) based genic microsatellite markers (EST-SSRs) were developed using the publicly available dataset of 13,175 unigene ESTs, which showed a distribution of 1 SSR/3.4 kb of coffee transcriptome. Genomic SSRs, on the other hand, were developed from an SSR-enriched small-insert partial genomic library of robusta coffee. In total, 69 new SSRs (44 EST-SSRs and 25 genomic SSRs) were developed and validated as suitable genetic markers. Diversity analysis of selected coffee genotypes revealed these to be highly informative in terms of allelic diversity and PIC values, and eighteen of these markers (∼27%) could be mapped on a robusta linkage map. Notably, the markers described here also revealed a very high cross-species transferability. In addition to the validated markers, we have also designed primer pairs for 270 putative EST-SSRs, which are expected to provide another ca. 200 useful genetic markers considering the high success rate (88%) of marker conversion of similar pairs tested/validated in this study.


Introduction
Coffee tree belongs to the genus Coffea, comprising two main cultivated species C. arabica L. (2n54x544) and C. canephora Pierre ex A. Froehner (diploid, 2n52x522), yielding arabica and robusta type of coffees, respectively. Arabica coffee is known for excellent cup quality but suffers from a narrow genetic base due to its domestication history and susceptibility to diseases and pests. In contrast, robusta coffee though poor in quality has better adaptability to various stresses. To keep pace with the environment and also of the sensibilities of market, there is a continuous need for genetic improvement of coffee, which unfortunately is severely constrained owing to inherently slow pace of tree breeding using conventional methods, and variety of other reasons [1,2]. The situation demands development of new, easy, practical technologies that can provide acceleration, reliability and directionality to the breeding efforts, as well as characterization of cultivated/secondary gene pool for proper utilization of the available germplasm in coffee genetic improvement programs. In this context, DNA polymorphism based genetic markers becomes important that have proven to be of immense value in characterization and genetic improvement of plant germplasm resources.
Among different types of DNA markers, microsatellites or SSR markers are the most ideal for studying genetic diversity, population structure, phylogenetic relationships, construction of frame-work linkage maps, QTL interval mapping, marker-assisted selection (MAS), etc., thereby aiding in genetic improvement of crop plants [1]. In the last few years a number of efforts have lead to development of a few hundred SSR markers in coffee [2][3][4][5][6][7][8][9][10][11][12], but these are insufficient to realize the full potential of markers for mapping/linkage studies in coffee, more so in arabicas which have an extremely narrow genetic base. Moreover, most of the described markers are poorly validated, especially for their utility in cultivated genepool comprising arabicas and robusta coffee. The situation thus calls for newer efforts to generate additional validated markers for them being of any gainful utility in marker-based genetic studies/coffee breeding.
With advancements in genomic studies, there has been an huge burst in the EST sequences in the public domain that provide an easy and economic/costeffective opportunity to identify and develop EST based SSR markers, which have the additional advantage of assessing the functionally effective genetic diversity [13,14], and also have very high cross-species transferability [8,11]. In this study, we have used the coffee EST database containing 13,175 unigene [15] to identify SSRs in the expressed part of coffee genome, and use the same to develop novel coffee-specific EST-SSRs for use as efficient genetic markers. Thus we describe here 44 new validated genic-SSRs, and another set of 270 putative similar markers that need further validation. In addition, we also describe 25 new genomic SSR markers that were developed using an affinity capture approach based SSRenriched partial, small-fragment genomic library.

Plant material and DNA extraction
The plant material used for the validation of SSR markers comprised a set of 16 elite coffee genotypes belonging to C. arabica (tetraploid arabicas) and C. canephora (diploid robustas) and 14 related wild species belonging to Coffea and Psilanthus [2] that were available in the Coffee Germplasm Bank maintained at Central Coffee Research Institute, Balehonnur, Karnataka, India. The fresh leaf samples collectedfrom each genotype were used for DNA isolation as described by Aggarwal et al. [16]. The DNA isolated from robusta variety CxR was used for constructing SSR enriched small-insert genomic library.
Microsatellite screening of coffee transcriptome, identification of SSRs and marker development An EST database of robusta coffee comprising 13,175 unigene ESTs [15] was downloaded from ftp site (ftp://ftp.sgn.cornell.edu/coffee/) maintained by Sol Genomics Network (SGN, http://www.sgn.cornell.edu/coffee.pl). The database was used for: (i) identification and localization of SSRs using microsatellite search module MISA (MIcroSAtellite, http://www.pgrc.ipk-gatersleben.de/misa), and the criteria being-a minimum repeat core of 12 bp, considering the base complementarities and a minimum distance of 50 bp between two SSRs; (ii) selecting the 'usable/candidate SSRs' for marker development, being those that carried a minimum of 18-bp long repeat core (nine repeat units of DNRs, six of TNRs, five of TtNRs, four of PNRs, or three of HNRs) (iii) designing of primer pairs for the selected usable SSR sequences using PRIMER 3 tool embedded in MISA and/or GENETOOL Lite version 1.0 (http://www.biotools.com/downloads/ brochures/GeneTool2.pdf); and (iv) standardizing PCR conditions followed by validation of working primer pairs for genetic studies as described earlier [2].

Construction of an SSR-enriched small-insert genomic library/ development of genomic SSRs
A partial genomic DNA library enriched for microsatellite repeats was constructed using the methods described earlier [17]. Briefly, the method involved: one-step restriction digestion of genomic DNA with Hae III enzyme (NEB) and ligation of resulting fragments with ds Mlu-I adaptor (Mlu-F: CTC TTG CTT ACG CGT GGA CTA and Mlu-R: pTAG TCC ACG CGT AAG CAA GAG CAC) [18]; amplification of the restricted-ligated DNA pool using Mlu-F primer; SSR enrichment of the amplified DNA pool using liquid phase hybridisation (in 6X SSC) with streptavidin coated paramagnetic beads (Dynal) attached with biotinylated equimolar pool of four oligos (CA) 15 , (GA) 15 , (GAA) 15 and (CAA) 15 . This was followed by amplification of the hybridized/trapped genomic DNA fragments by PCR and construction of partial genomic library in TA vector (Invitrogen) as per the manufacturer's instructions. A number of positive (white) recombinant clones were randomly picked up from the library, amplified and sequenced for both the strands using M13 universal primers on ABI 3730 DNA Analyzer (Applied Biosystems, USA). The sequences were aligned and edited using Autoassembler (Applied Biosystems, USA). The SSR-positive sequences were identified and used for development of new genomic SSRs as described earlier [2].
The amplified PCR products generated using all the new SSRs were resolved using capillary-based ABI 3730 DNA Analyzer and were precisely sized for major, comparable and conspicuous peaks using GeneMapper 3.7 (Applied Biosystems), using default parameters.

Statistical, genetic and diversity analysis
The data for EST-SSRs and genomic SSRs were analyzed separately for various genetic parameters, viz., mean, standard deviation, expected heterozygosity (H e ), Hardy-Weinberg equilibrium (HWE) and linkage disequilibrium, (LD), using Arlequin ver 3.1 [19], polymorphism information content (PIC) and private alleles (Pas) using Convert ver. 1.3.1 [20]. Cross-taxa transferability (T mark ) was calculated over 15 species (except C. canephora) as proportion of primers showing successful amplification vis-à-vis all the tested primers whereas primer conservance (C taxa ) was calculated as proportion of the species displaying successful amplification vis-à-vis all the tested markers.
Genetic diversity analysis to infer generic relatedness/affinities was performed over informative Pms (polymorphic markers) for cultivated genotypes/related species using MicroSatellite Analyzer [21] with Nei's genetic distance [22]. The genetic distance matrices were used to construct Neighbour Joining (NJ) consensus tree using Phylip ver 3.6 [23], which was viewed using Treeview ver 1.6.6 [24].
We also attempted mapping of the new markers on a robusta linkage map using JoinMap ver 4.0 [25] as described earlier [2,26] using group LOD score of 5.0.

Results
In this study, we undertook in-silico analysis of a robusta coffee transcriptome to identify and develope coffee-specific EST-SSR markers. Simultaneously, we also attempted development of genomic-SSRs by constructing a small-insert SSR enriched partial genomic DNA library. The new markers (44 EST-SSRs and 25 genomic-SSRs) were validated for their utility in genetic studies using panels of elite coffee genotypes, and related taxa of coffee for cross-species transferability. The details of these new markers viz., locus designation, primer sequences, repeat motifs, amplification temperature, amplicon size, and SGN ID or Genbank accession numbers are given in Tables 1 and 2. Details of additional primer pairs for 270 putative EST-SSRs, which were designed but need to be validated have been provided in Table 3.

Types, Frequency and Distribution of SSRs in the coffee transcriptome
The coffee EST unigene database analyzed here, comprised 13,175 unigenes having a total length of 8923 kb and an average lengthof ca. 677 bases/unigene [15]. These ESTs were found to contain a total of 2,589 SSRs (having a minimum  (Table S2). Among the individual SSRs, the most abundant EST-SSR motif was AG, followed by AAG.

Development of microsatellite markers from usable EST-SSRs
Only 483 (18.7%) of the total 2589 identified SSRs had a repeat core.18 bp, which were used for marker conversion. Primer pairs could be designed for 320 of these SSRs, of which randomly chosen 50 pairs were further tested for validation studies. These included SSRs with DNRs (30%), TNRs (64%), PNRs, HNRs (2% each) and complex SSRs (see Table 1 for marker ID, primer sequences, repeat motifs, amplicon size, sequence ID and functional identity). Of the selected 50 primer pairs, 44 could be successfully amplified as single locus SSR marker, indicating 88% primer to marker conversion ratio. Considering this high conversion ratio, another ca. 200 useful genetic markers are expected from the remaining 270 putative EST-SSRs (Primer IDs: CCESSR51 to CCESSR320) that are lsited in Table 3.  (Table S3). From the 56 SSR+ive sequences, a total of 41 primer pairs could be designed successfully (with five pairs containing two SSRs each). Of these, 25 pairs (encompassing 28 SSRs) resulted in robust PCR amplifications (Table 2), and all of them could further be validated as single locus markers indicating ,61% primer to marker conversion ratio.

Validation of EST-SSRs for use in genetic studies
All the new 44 EST-SSRs resulted in good amplicons exhibiting low to medium allelic diversity when tested on a panel of 16 elite robusta and arabica genotypes ( Figure 1). Overall, a maximum of six and seven alleles (N A ) with an average of 2.1 and 3 alleles/SSR were obtained for the tested markers of which 65.9% and 81.8% were polymorphic/informative for tetraploids and robusta genotypes, respectively (Table 4). Fifteen markers in the case of tetraploids and eight for robustas were found to be monomorphic. Moreover, 14 markers resulted in double alleles (i.e. consistent presence of two allelic amplicons across the tested samples) indicative           of duplicated loci in case of all the tested tetraploid arabicas. In general, no private alleles were evident except in one robusta genotype (Sln274) for marker CCESSR14.
The PIC values were comparable (0.19-0.67 and 0.11-0.77), and no significant differences were seen in the observed/expected heterozygosity (H o /H e : t-value50.70; P50.49; and t-value50.68; P50.40) for the new markers across the tested tetraploids and diploid robustas, respectively. However, significant differences were observed in the total number of amplified alleles (N A : t53.74, P,0.005), as well as, the behaviour of the polymorphic markers (Pms) when tested for HWE and LD in the tested tetraploids and the robusta genotypes (  Table 4). In general, more markers were in HWE and only a relatively small proportion of markers exhibited LD and heterozygote excess and/or deficiency in case of robustas, in comparision to tetraploid arabicas.

Validation of genomic SSRs for use in genetic studies
A total of 25 putative genomic SSRs were also validated as genetic markers (  Table 5). When tested on the panel of 16 elite robusta and arabica (tetraploid) genotypes, five of these markers in arabicas and one in robustas were found to be monomorphic. Twelve ofthe polymorphic markers in arabicas resulted in double alleles (putative duplicated loci). In total, a maximum of seven and eight alleles (N A ) with an average of 2.7 and 4.3 alleles/marker were obtained for the tested polymorphic markers of which 32% and 96% were informative in arabicas and robustas, respectively (Figure 1). The PIC values varied considerably, with mean PIC value being 0.47 (range 0.12-0.78) for tetraploids, which was significantly less than 0.60 (0.12-0.85) observed for robusta (Table 5). Further, the Student's t-test revealed significant differences in N A (t54.09, P50.00) but non-significant Further, it was notable that while.83% of the Pms were in HWE and only few markers showedsignificant heterozygote deficiency to varying extent in both arabicas and robustas, the number of marker-pairs that exhibited LD was significantly more in arabicas (28.0%; 8 of 28 pairs) that that seen in robustas (14.2%; 36 of 254 marker pairs).  Table 4. Allelic diversity attributes of the newly developed 44 EST-SSRs when tested over cultivated and wild related coffee genera. were not considered for calculation of various estimates as these appear to be fixed exhibiting no segregation. doi:10.1371/journal.pone.0113661.t004 Table 5. Allelic diversity attributes of the newly developed 25 genomic SSRs when tested over cultivated and wild related coffee genera.

Mapping of new EST-and genomic SSRs
The 69 new SSR markers were also tested for their suitability in linkage mapping. In total, 11 of the 44 EST-SSRs (,39%) and seven of the 25 genomic SSRs (28%) could be mapped onto an existing first-generation framework linkage map of robusta coffee [2,26]. This map comprised a total of 374 mapped markers (71 SSRs, 185 RAPDs and 118 AFLPs) on 11 major and 5 minor linkage groups. The new markers developed in the present study were mapped using the existing SSRs on the map as anchor markers. The 18 new markers that could be mapped, occupied positions on eight distinct linkage groups, with eight markers on CLG03; two markers each on CLG06, CLG11, CLG15; one marker on CLG02, CLG04, CLG05, CLG08 (Tables 1 & 2). The position of these 18 markers on robusta linkage groups alongwith positions of SSRs used as anchores (CM62, CM115, CM12, CM100, Cof_EST01_150, CaM46, CaM44, CM39_302, CM39_273) is shown in figure 2.

Cross-species/2genera transferability and marker conservation
New SSR-markers when tested on 13 related Coffea and two Psilanthus species, exhibited robust cross-species amplifications with alleles of comparable sizes in the tested taxa ( Figure 1, Tables 4 & 5, Table S4). The EST-SSRs showed 100% transferability accross the tested Coffea and Psilanthus spp., whereas the genomic-SSRs indicated 96% amplification and transferability for Coffea spp. and 92% for the related Psilanthus spp. The analysis also indicated some private alleles (PAs), which possibly could be species-specific (Tables 4 & 5).

Generic affinities within/between cultivated and wild coffee germplasm by new SSRs
The SSR allelic data were examined for their utility in ascertaining the genetic diversity and generic inter-relationships between the cultivated, as well as, the wild coffee genepool. The average genetic distance values calculated using the EST-SSR allelic data were in general, significantly less but comparable to that obtained using the genomic SSRs for the tested arabicas, robustas, as well as, for different Coffea and Psilanthus species. The NJ phenetic tree generated using the genetic distance estimates of EST-SSRs allelic data clearly resolved the tested germplasm in two distinct clusters, one representing all the tetraploid arabicas, while the other comprised all the diploid robusta genotypes (Figure 3a) with significant branch support. The selections formed a single cluster within the tetraploids cluster, while pure arabicas and hybrid-selections appeared in distict sub-clusters. Similarly, in clustering analysis of 14 related species (12 Coffea and two Psilanthus spp.; Figure 3b) along with two genotypes each from C. arabica and C. canephora, tetraploid Erythrocoffeas (C. arabica) and diploid Erythrocoffea (C. canephora) formed coherent clusters. Moreover, the grouping of the related taxa, in general, was as per their botanical type with few changes. Though all the entries from Erythrocoffeas fell into one cluster, it contained two entries from Pachycoffeas (C. dewevrei with C. canephora and C. liberica with C. congensis). The remaining four of the Pachycoffeas (C. excelsa, C. arnoldiana, C. aruwemiensis, C. abeokutae) grouped with each other with good bootstrap support. The C. salvatrix a Mozambicoffea was also grouped with these Pachycoffeas, while the other three Mozambicoffeas (C. racemosa, C. eugenioides, C. kapakata) and two Paracoffeas (Psilanthus spp.) appeared as independent strong groups. Single Melanocoffea species (tested in this study), C. stenophylla was not grouped with any of the above species cluster but was found close to the Coffea species than the Psilanthus spp.
Similar results were obtained using the data from genomic SSRs (CCRMs, data not shown).

SSR motifs in coffee transcriptome, and development of new EST-SSR markers
In the present study, 15.4% of the coffee ESTs were found to contain SSRs, which is comparable with our earlier study [11], but much higher than 2.7-10.8% that was reported for 18 representative dicotyledonous species [27], and 7 210% reported for monocot species [28]. Notwithstanding this apparent enrichment/ higher abundance, the SSRs in coffee transcriptome were very comparable to other plant species in observations like: 1. Abundance of TNRs than DNRs; 2. Abundance of AG among the DNRs followed by AT; 3. CG as the least abundant among the DNRs; 4. Abundance of AAG among TNRs (among the dicots); 5. Predominance of GC-rich TNRs (but not CCG/GGC) than the non-GC-rich TNRs.
A total of 18.7% of the detected EST-SSRs were found to be suitable candidates for primer design, a comparatively lower proportion (,50%) then we reported earlier [11]. The main attributes that rendered majority of the identified SSRs unsuitable for marker development were: a shorter repeat core (,18 bp) and/or flanking sequences of low complexity (AT/GC-rich and/or regions prone to secondary structure formation) or shorter lengths seriously constraining designing of optimal primer pairs. However, in this study, primer-to-marker conversion ratio (ca. 88%), was higher than many earlier similar studies in other crops. Such differences in marker conversion ratios are expected due to differences-in the quality of primers designed), GC content of the genome, the genome complexity, and/or genome size [13].

SSR enrichment and development of genomic SSRs
The genomic DNA library constructed in this study, resulted in very high proportion of SSR+ive sequences, with very low degree of redundancy (9.1%; 6 out of 66 identified SSRs positive sequences). This was notable, as in earlier similar studies the apparent high success rates were generally confounded by high degree of redundancy [29]. Similarly, the proportion of SSR positive sequences found suitable for primer designing was also higher (87%) in our study than the average of 54¡3% recorded in other species [30]. These obsrevationssuggest that the enrichment approach used in this study may be a desirable strategy for efficiently entrapping and targeting the SSRs even in genome(s) like coffee that are relatively poor in SSR motifs [2].

Utility of new EST-and genomic SSRs as genetic markers
The SSRs provide desirable markers for studying genetic diversity, germplasm characterization, constructing reference panels/bar codes, for individualization of genotypes, linkage mapping, population biology, and taxonomic relationships of related taxa [2]. Therefore, it becomes desirable to validate the new markers for their utility in genetic studies, which unfortunately has been lacking in majority of published studies describing development of coffee-specific SSR markers.
Various genetic parameters viz., allelic diversity, PIC, H o , H e , HWE, LD calculated for all the new EST and genomic SSRs and mapability on linkage map, amply suggested their possible utility as genetic markers (see Table 4 & 5). In general, the extent and pattern of allelic/genetic diversity revealed by the new markers conform to that reported earlier for the coffee genomic SSRs [5,6,31], and the EST-SSRs [8,11].
Different genetic parameters/tests such as H o , H e , LD, HWE are important indicators of origin, evolution and distribution of diversity in the available genepool. The heterozygosity measures (H o , H e ) for the new SSR markers indicated heterozygote decay (deficiency) in the tested germplasm. The HWE and LD analysis of the polymorphic markers were in general agreement with our earlier observations with genomic as well as EST-SSRs [5,8,11]. Overall, these studies indicated that the tested robusta germplasm comprised allogamous, relatively unrelated genotypes, while autogamous tetraploids comprised mostly of hybrid varieties/selections with overlapping/shared pedigrees. The results thus suggest the suitability of the new markers for reliably ascertaining genetic diversity in the coffee gene pool.

Cross-species/2generic transferability
All the new EST-and genomic SSR markers revealed very high and robust cross species/2generic amplifications with alleles of comparable sizes when tested on 12 other Coffea and two Psilanthus taxa. The data revealed that the markers described here show much higher taxa transferability than earlier published genomic2/EST-SSR markers [2,5,7,11]. This is significant as successful cross-species amplification is generally restricted to related species within a genus and reduces when tested for different genera [32]. Further, it was interesting to note that the new SSRs that were monomorphic/uninformative for the tested arabica/robusta germplasm, exhibited considerable polymorphism across the tested related taxa (the only exceptions were the marker CCESSR16 and 18 that showed a very low conservation even across the Coffea spp.). Thus the new SSR markers described here strengthen the possibility of their use as Conserved Orthologous Sets (COS) for genetic characterization of different related wild coffee taxa, and also for coffee taxonomic/synteny studies.

Diversity analysis and genetic relatedness within/between Coffea and Psilanthus species
The EST2/genomic-SSRs described in this study were able to group all the 16 genotypes (representing the cultivated genepool) in phenetic clustering that was indicative of their species status and known pedigrees (Figure 1a). Similarly, the analysis 14 Coffea and two Psilanthus species, revealed generic affinities that were largely in agreement with their known taxonomic relationships (Figure 1b), based on their geographical distribution as well as Chevalier's botanical classification [33]. Importantly, the analysis distinctly separated the two Paracoffea species (P. bengalensis and P. wightiana) from all the other Coffea spp. These results are similar to the earlier published studies undertaken to ascertain species relationships using SSRs [2,7,8,11], as well as other marker approaches [34][35][36]. These results, thus, amply demonstrate that the new SSR markers developed in the present study can be considerably informative in exploring the taxonomic relationship of coffee species complex.

Conclusions
The present study describes a total of 69 new validated SSRs; 44 EST-SSRs developed from coffee transcriptome using in-silico methodology, and 25 genomic SSRs developed using SSR enrichment approach. In addition, it provides primer pairs for additional 270 putative EST-SSRs. Analysis of the identified SSR-positive ESTs also provided insights into the relative abundance and distribution pattern of different SSR motifs in the coffee transcriptome, which was found to be relatively rich in its SSR abundance. Among the identified EST-SSRs, TNRs followed by DNRs were more abundant than other SSRs, and among different types of SSR motifs, AG was the most abundant. All the 69 markers were found to be polymorphic in the tested coffee/related germplasm and their utility as efficient genetic markers could be demonstrated for diversity analysis, germplasm individualization, linkage mapping, cross-species transferability and taxonomic studies. As many of these SSRs showed a very high cross-species transferability, they can aid in conservation, management and resolving taxonomic relationships, as Conserved Orthologous Sets (COS) for Coffea and Psilanthus species and more importantly as efficient, and informative genetic landmarks on molecular linkage maps. Table S1. Summary statistics of screening of the coffee unigene ESTs for SSRs. doi:10.1371/journal.pone.0113661.s001 (PDF)