Saturation of an Intra-Gene Pool Linkage Map: Towards a Unified Consensus Linkage Map for Fine Mapping and Synteny Analysis in Common Bean

Map-based cloning and fine mapping to find genes of interest and marker assisted selection (MAS) requires good genetic maps with reproducible markers. In this study, we saturated the linkage map of the intra-gene pool population of common bean DOR364×BAT477 (DB) by evaluating 2,706 molecular markers including SSR, SNP, and gene-based markers. On average the polymorphism rate was 7.7% due to the narrow genetic base between the parents. The DB linkage map consisted of 291 markers with a total map length of 1,788 cM. A consensus map was built using the core mapping populations derived from inter-gene pool crosses: DOR364×G19833 (DG) and BAT93×JALO EEP558 (BJ). The consensus map consisted of a total of 1,010 markers mapped, with a total map length of 2,041 cM across 11 linkage groups. On average, each linkage group on the consensus map contained 91 markers of which 83% were single copy markers. Finally, a synteny analysis was carried out using our highly saturated consensus maps compared with the soybean pseudo-chromosome assembly. A total of 772 marker sequences were compared with the soybean genome. A total of 44 syntenic blocks were identified. The linkage group Pv6 presented the most diverse pattern of synteny with seven syntenic blocks, and Pv9 showed the most consistent relations with soybean with just two syntenic blocks. Additionally, a co-linear analysis using common bean transcript map information against soybean coding sequences (CDS) revealed the relationship with 787 soybean genes. The common bean consensus map has allowed us to map a larger number of markers, to obtain a more complete coverage of the common bean genome. Our results, combined with synteny relationships provide tools to increase marker density in selected genomic regions to identify closely linked polymorphic markers for indirect selection, fine mapping or for positional cloning.


Introduction
A linkage map indicates the position and relative genetic distances between markers along chromosomes and is based on the principle that genes and markers segregate via chromosome recombination during meiosis [1]. Therefore, genes or markers that are close or tightly-linked will be transmitted together from parent to progeny more frequently than genes or markers that are located further apart. Genetic linkage maps are an essential prerequisite for studying the inheritance of both qualitative and quantitative traits, to develop markers for marker assisted selection (MAS), for fine mapping and map-based cloning of genes of interest, and for comparative genomic studies. However, the utility of the linkage map information is often limited to the genetic background of the mapping population.
In common bean (Phaseolus vulgaris L.), the first linkage maps were developed with small numbers of linkage groups and included genes controlling mostly morphological and pigmentation traits such as flower and seed color or seed pattern [2,3]. The advent of DNA based markers, restriction fragment length polymorphism (RFLP) and random amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), and simple sequence repeats (SSR) [4,5], led to more detailed maps.
The construction of a consensus map combining the information of multiple segregating populations from diverse genetic backgrounds, offers the opportunity to map a larger number of loci than in most single crosses, thus increasing the number of potentially useful markers across divergent genetic backgrounds and providing greater genome coverage, in addition to providing opportunities to validate marker order [23]. The consensus map captures more markers, genes or QTL than could be mapped in a single population study due to limited marker and phenotypic polymorphisms found within a single population [24]. For common bean, a consensus map would collate loci discovered using populations developed within Mesoamerican [25][26][27] or Andean gene pools [28], where it has only been possible to develop low density maps because of low polymorphism rates.
Consensus maps have been developed in several crops using different methodologies such as a visual approach in wheat (Triticum aestivum L. em. Tell) [29] or pooling the marker data of different mapping populations of maize (Zea maize) to generate a ''pooled map'' [30]. The software JoinMap [31] weights pairwise genetic distances based on population structure and size, and has become a very popular consensus map tool in several crops like soybean Glycine max [32], rye (Secale cereale L.) [33], melon (Cucumis melo L.) [34] and cotton (Gossipum spp.) [35].
Graph theory is now being utilized as an approach to identify the most accurate consensus map [24,36]. The map is modeled as a directed acyclic graph (DAG) in which nodes represent mapped markers and edges define the order of adjacent markers. Based on shared vertices, DAGs are merged into a consensus map. Earlier this year, MergeMap software was developed [37], where the order of conflicts or cycles are resolved parsimoniously, an approach that showed improved performance in terms of accuracy and run time when compared to other programs. This software has been successfully used for the construction of a consensus map from six populations based on 1,375 SNP markers in cowpea Vigna ungliculata [38] and for three mapping populations with 2,943 SNP markers in barley (Hordeum vulgare) [39].
Synteny analysis is the comparison of genetic maps between species rather than between populations and usually requires whole genome sequences. In terms of legume genomics, soybean, medicago (Medicago truncatula) and lotus (Lotus japonicus) are three legumes that have complete or almost complete genome sequence information. These genome sequences have been useful to compare genomes, and to transfer information from genome sequence information to other crop species. However, the ability to transfer knowledge between species depends on both the evolutionary distance between species, and the rate and nature of changes in the genome over time [40].
Therefore, analyses that have compared common bean and sequenced legumes reported syntenic blocks of various sizes [12,21,22,41]. However, in these studies the common bean information came from low or medium saturated linkage maps developed using a single bi-parental population. Here, we report the saturation of the linkage map from a Mesoamerican population, DOR3646BAT477, using SSR and gene based markers, followed by comparisons to the inter-gene pool linkage maps of the crosses DOR3646G19833 and BAT936JALO EEP558 to finally build a saturated, consensus map. Additionally, the consensus map was compared with the genome of the soybean. Syntenic relationships were defined which provide in silico evidence for the position of new markers that can be used for fine mapping projects and positional cloning.

Plant material
The population DOR3646BAT477 consists of 113 F5:7 recombinant inbred lines (RILs) as described in [27]. For map saturation, the first 92 lines were selected. The DNA of the population and the parents was extracted using 5 g of tissue as described in [42]. The extraction quality was checked on 1% agarose electrophoresis, and the DNA was quantified with Quantity OneH v 4.0.3 software (Bio-Rad) using a DNA lambda ladder as a size reference. Finally, DNA was diluted to a final concentration of 5 ng/ml.

Map saturation
The parental genotypes were evaluated with 2,706 common bean DNA based markers including SSRs based on EST libraries and BAC end sequences, as well as gene-based markers from a total of 24 sources of markers (Table S1). Among these, the legume anchor markers (LEG) reported by [21] were evaluated in both the DB and in an additional population, DOR3646G19833, as described below. The electrophoresis and PCR parameters for SSR and gene based markers were as described previously [7,12 respectively]. Polymorphic markers were then evaluated on the entire DB mapping population. The linkage groups were named after previous reports [22,41].

Linkage analysis
Segregation data was used to place the new markers on the DOR3646BAT477 population linkage map described in [27]. Linkage analysis was conducted with the Kosambi mapping function using the software application Mapmaker 2.0 for Windows [43]. The markers were placed to the established linkage groups with the 'try' and 'compare' commands with a minimum LOD of 4.0. All linkage maps were drawn using MapChart [44].

Consensus map
The core mapping populations were derived from inter-gene pool crosses: DOR3646G19833 (DG; n = 87) and BAT936JALO EEP558 (BJ; n = 79). These were used to build a consensus map with the less saturated DOR3646BAT477 population (DB). The DG linkage map was developed by CIAT Bean Project [7,[10][11][12]19,20]. The BJ linkage map was developed using reported map information [21,22]. The consensus map was constructed with MergeMap [37]. The consensus map coordinates from MergeMap were normalized to the arithmetic mean cM distance [39] for each linkage group using data reported for the three individual maps. The consensus map and the relationships with the single linkage maps were drawn using MapChart [44].

Synteny analysis
The first genomic synteny analysis was conducted using a total of 772 marker sequences from the consensus map, downloaded in FASTA format from NCBI and compared with the soybean (version Glyma1) genome sequence following the methodology reported by [12] with some modification. The common bean sequences were aligned against the chromosome based assembly of soybean using local blastn. Graphics were drawn with MapSynteny, an in-house software created with Visual Basic Script programming language in a Microsoft Excel TM environment (available upon request from the corresponding authors). The genic synteny analysis was carried out by aligning the marker sequences against the public common bean EST assembly from Bean Gene Index (Dana-Farber Cancer Institute -DFCI) (March 24, 2011). A total of 491 tentative consensus (TC) sequences were aligned against the coding sequences (CDS) of soybean, with the same blast parameters described above. The relationships of the homeologous segments within the soybean genome were then drawn with Circos software version 0.54 [45].

Parent marker survey
At the beginning of this study, the DOR3646BAT47 linkage map consisted of 186 markers, linked by 60 SSRs and 126 dominant AFLP or RAPD markers [27]. With the aim of increasing the marker saturation in this linkage map, a total of 2,706 markers were evaluated between the parents DOR364 and BAT477, including 1,136 genomic SSR, 866 genic SSR and 393 gene-based markers (Table S1). Averaged over all markers, the polymorphism rate was low at 7.7% with monomorphism for several sets of markers [46][47][48][49].
The polymorphism frequency was higher in genomic than in genic SSR. A polymorphism rate higher than 10% was obtained for genomic SSR reported by [9,10,50]. Interestingly, the SSR markers developed by Buso et al. [51] had the highest polymorphism rate of 40%. In contrast, few polymorphisms were found for genic SSR. The most polymorphic genic SSRs were the set developed by Hurtado (unpublished) with a polymorphism rate of 6.6%. On average, the polymorphism rate for the DB population was 3.6% for genic SSRs and 10.7% for genomic SSRs. The same low polymorphism rate was found with gene based markers using the single-strand conformation polymorphism (SSCP) technique. On average, the polymorphism frequency was 1.6%. In summary, 111 new markers comprising 100 SSR markers and 11 gene-based markers were polymorphic and were mapped along with 120 of the dominant markers originally used in the previous analysis with the DB population [27].

Segregation analysis
A new DB map was developed by incorporating these 111 markers with the previous segregation analysis of 180 markers [27]. A total of 291 markers were placed in the linkage map, including AFLP, RAPD, SSR and gene-based markers (Figure 1, Table 1). The SSR and RAPD were the most abundant markers in the linkage map, with 160 and 98 markers, respectively. Specifically, 74% of the Pv1 markers were SSRs. The total map length was 1,789 cM and linkage group size ranged from 80 cM (Pv9) to 277 cM (Pv4) with an average of 163 cM per linkage group ( Figure 1, Table 1).
In general, the marker loci were well distributed within the linkage groups with an average of 26 markers per linkage group. The number of marker loci per linkage group ranged from 10 on Pv11 to 54 on Pv02. The average distance between markers was 6 cM, ranging from 4.6 cM on Pv6 to 8.4 cM on Pv7. Based on Chi square tests (P,0.05), segregation distortion was found at the top of linkage group Pv4 which represented preferential transmission of the DOR364 allele. Some gaps greater than 20 cM were still present in linkage groups Pv3, Pv4, Pv5, Pv7, Pv9, Pv10 and Pv11, despite the addition of the new markers.

Consensus map
Due to the low polymorphism rate found in the DB population, a consensus map was developed in order to increase the marker saturation and to improve the marker order. The DG and BJ  (Table 2). On average, each linkage group shared 18 anchor markers with a range from 39 (Pv2) to 7 (Pv5) ( Table 2). In total 1,010 markers were placed in the consensus map, including 446 SNP, 392 SSR, 99 RAPD, 45 RFLP, 22 AFLP, 5 STS and one phenotypic marker ( Figure 3). On average the consensus maps consisted of 91 markers per linkage group with a maximum of 151 on linkage group Pv2 and a minimum of 67 on the linkage group Pv9. The total full map length was 2,041 cM while linkage groups ranged in size from 131 cM (Pv10) to 276 cM (Pv2) with an average of 185 cM per linkage group. The average distance between markers was 2 cM, and the largest gaps were of 21 and 25 cM in linkage groups Pv9 and Pv4, respectively. Moreover, even though marker order among the four maps (consensus, DB, DG and BJ) was reliable, some slight differences were observed between the consensus and single maps (Figure 2).
The SNPs and SSRs markers were well distributed throughout the linkage groups. However, in general the SNP markers were more frequent, with the exception of the LG Pv4 and Pv10 where the SSR markers were more frequent (Figure 3). In linkage groups Pv6, Pv08, Pv9 and Pv11, more than 50% of the markers were SNPs.

Synteny analysis with soybean
A total of 772 marker sequences distributed in the common bean consensus map were aligned with the soybean 1.01 genome [52]. The soybean genome is thought to be based on two duplications that occurred approximately 59 and 13 million years ago, resulting in homeologous relationships between segments of the 20 soybean chromosomes [52]. Therefore, two highest hits were selected for the synteny analysis [12,41]. As such, 506 and 470 soybean orthologous sequences were identified with the first and second hit, respectively.
The difference between the number of identified sequences for the first and second hits was because the second hit sometimes did not meet the e-value threshold. The most syntenic loci were found on Pv2, with 156 orthologous sequences, whereas Pv10 had the fewest loci with 50 only. On average, 88 hits were found per linkage group. A total of 87 synteny groups were found corresponding to 44 common bean regions (Table 2, Figure 4a). The linkage group Pv6 contained seven syntenic blocks while Pv9 only contained two. Some syntenic gaps were noted at the top of the linkage group Pv4 and Pv6 and at the end of Pv3 and Pv10 (Figure 4a).
Using transcript information, a total of 491 common bean TC sequences were also compared against soybean CDS sequences. A total of 405 and 382 soybean genes were identified in the first and second hit, respectively. On average 71 genes per linkage group were found, ranging from 121 genes on linkage group Pv2 to 44 on linkage group Pv10. Figure 4b represents the collinear gene blocks among 20 soybean chromosomes and 11 linkage groups in common bean. The five most saturated syntenic blocks were the Gm8/Gm5 with 58 genes on Pv2, Gm6/Gm4 with 40 genes on Pv9, Gm20/Gm10 with 36 genes on Pv7, Gm12/Gm11 with 32 genes on the Pv11 and Gm2/Gm14 with 30 genes on the Pv08 (Table 2).

Linkage map saturation
The first objective of this study was to saturate the linkage map of the intra-gene pool population DB. However, marker screening in the parents revealed a low polymorphism rate. The low polymorphism reported here was consistent with the results using AFLP, RAPD and SSR in the construction of the original DB framework map [27]. On average, the genomic SSR polymorphism rate was 9.5%, a lower rate than observed for other intragene pool populations. Low to medium polymorphism rates were found using the Mesoamerican population BAT 8816G21212 (30%) [26] and 31% using the Andean population G198336 AND696 [28]. In contrast, using inter-gene pool populations, researchers have reported polymorphism rates of 56% for the DG population [7] and 42% and 55.7% for the BJ population [7,9]. In addition, higher polymorphism rates were reported for the DG population using other markers [10,20]. The low polymorphism reported here could be explained by the fact that the genotypes DOR364 and BAT477 belong to the Mesoamerican gene pool and also belong to the Mesoamerican race, thus showing less polymorphism compared with other intra-gene pool populations developed from members of different races [53]. However, despite this narrow genetic base, these genotypes exhibit contrasting physiological behavior in key agricultural traits like drought [27], low phosphorus stress and symbiotic nitrogen fixation [25].

Consensus map
Efforts to compare linkage maps in common bean based on RFLP and SSR markers were reported previously in integrated mapping by [6,7]. Here we report the first consensus map in common bean built from a Mesoamerican intra-gene pool and inter-gene pool (DG, BJ) populations. The consensus map was created using MergeMap [37], which has recently been used for other species [38,39,54]. Other approaches have been used in the past to construct consensus maps, most commonly using the JoinMap software [31]. Both methodologies were compared using the same set of data [37], and Mergemap was found to be more accurate in terms of marker order, and significantly faster than JoinMap. Similar comparisons reported that Mergemap appeared to outperform Joinmap in terms of marker order consistency between integrated maps [54]. However, Joinmap tended to produce more accurate estimates of genetic distances. Another drawback of Joinmap is that when using linkage maps generated by MapMaker software changes in markers order and distances were observed. JoinMap uses all pairwise estimates, above the defined LOD threshold, to establish map length, whereas MapMaker establishes map length using only adjoining marker pairs to calculate the sum of adjacent distances [33].
The common bean consensus map exhibits a higher marker density than previous linkage maps reported for bi-parental populations. Maps based on the DG population with 280 [19] and 288 [12] markers have been reported. Likewise, using the BJ population, 275 markers have been placed on the common bean genetic map [22]. Here, a consensus map with nearly thousand markers distributed on 11 linkage groups with a mean distance of 2 cM between adjacent loci was developed. In terms of marker order, the consensus map had few changes as compared to the individual maps. These small differences could be explained by different recombination events among population parents, small progeny size in any single population, and a generally increased recombination rate in terminal regions of linkage groups [55,56].
Therefore, the consensus marker order is significantly more reliable, because a much higher number of individuals and higher number of recombination events was taken into account when combining the three populations. Similar results were reported when a consensus map was developed for grape (Vitis vinifera L.) based on three populations [55]. Also, a consensus map using three populations of Brassica napus producing a highly saturated map with 5,162 genetic markers [54]. In addition, the length of our consensus map is 2,041 cM, slightly higher than single maps of DB and BJ populations and previous maps reports [6,7,8,9,27,28]. Consensus maps with increased map size have been reported with other species [23,55,56]. Part of this increase may be due to an improved coverage of the ends of the chromosomes [56].   In our consensus map two gaps greater that 20 cM remain. These areas of low marker density may correspond to genomic regions of similar ancestry or identity by descent in the populations used in this study. Similar gaps were obtained in the consensus map of sorghum (S. bicolor L.) [23] with low polymorphism and that were identical by descent.

Synteny relationship
The large and consistent synteny blocks reported here resulted from an extended consensus map based on mapping information from three mapping studies in common bean [12,21,41]. The syntenic groups identified here (Table 2) are consistent with the previous reports and allow us to extend the syntenic analyses of these two species, as well as to confirm homeologous segment analysis in soybean that has been extensively reported on. Interestingly, almost the entire Pv7 linkage group showed a strong relationship with the syntenic block Gm10/Gm20. These results are corroborated with soybean genome analysis where chromosome 20 is highly homologous to the long arm of chromosome 10 [52] suggesting that Gm10, Gm20, and Pv7 are good candidates to identify ancestral chromosomal duplication of legume genomes.
Another good candidate for evolutionary genomics is the linkage group Pv9 that showed very strong relationships with the synteny blocks Gm6/Gm4 and Gm15/Gm19. That the one-two relationship does not extend over the entire Pv chromosomes further supports the conclusion by McClean et al. [41] that the large scale order of soybean chromosomes is the result of chromosome breakage/union events possibly directly associated with the tetra-ploidization event in the genome history of soybean.
Synteny-based analysis in cereals has allowed the identification of seven shared duplications which led to the modeling of a common ancestral genome structure of 33.6 Mb structured in five protochromosomes containing 9,138 protogenes. This type of analysis provided new insights into the evolution of cereal genomes from their extinct ancestors [57] and this approach provides a reference tool for improved gene annotation and cross-genome marker development.

Common bean breeding application
A consensus map in common bean increases the genome coverage and makes it possible to compare locations of major genes controlling important phenotypic traits or QTL positions between populations from multiple crosses. This is especially useful in populations with low recombination polymorphism, as the crosses within Andean or Mesoamerican gene pools, where genetic map saturation is difficult to obtain [27]. One of the uses of combining consensus maps with synteny relationships is to provide tools to increase marker density in selected genomic regions.
Such increases in marker density can be used to identify closely linked polymorphic markers for indirect selection, fine mapping or for map-based cloning. Examples of the advantage of the consensus map and their synteny analysis in other species have been recently reported in cereals. A meta-QTL analysis in sorghum (Sorghum bicolor L.) revealed that QTL and genes were located in heterochromatin regions [58]. In bread wheat (Triticum aestivum L.), a major nitrogen use efficiency (NUE) ortho-metaQTL is conserved at orthologous positions in wheat, rice, sorghum and maize [59]. In legumes, the consensus map in cowpea V. unguiculata was utilized for synteny based candidate gene identification and definition of QTL location for Macrophomina phaselina resistance [60].
Finally, given that the consensus map we have constructed for common bean contains more that 50% of the markers corresponding to coding regions this study provides an excellent functional framework for candidate gene dissection, expression network analysis, or analysis of legume genome evolution.

Supporting Information
Table S1 Summary of the markers evaluated in the DOR3646BAT477 (DB) population. (DOCX) Figure 4. Synteny relationships between common bean and soybean. a). Associations between common bean and soybean linkage groups through sequence based markers are shown. The colored boxes represent the homologies with chromosome segments from the soybean genome with each chromosome from soybean assigned a given specific color. The boxes to the right side of the linkage group are the first similarity matches, while to the left side are the second similarity matches. b). Schematic representation of the genic synteny relationship of the common bean transcript map with the CDS of soybean. Each line represents the direct relationship with a specific soybean gene. doi:10.1371/journal.pone.0028135.g004