High-density multi-population consensus genetic linkage map for peach

Highly saturated genetic linkage maps are extremely helpful to breeders and are an essential prerequisite for many biological applications such as the identification of marker-trait associations, mapping quantitative trait loci (QTL), candidate gene identification, development of molecular markers for marker-assisted selection (MAS) and comparative genetic studies. Several high-density genetic maps, constructed using the 9K SNP peach array, are available for peach. However, each of these maps is based on a single mapping population and has limited use for QTL discovery and comparative studies. A consensus genetic linkage map developed from multiple populations provides not only a higher marker density and a greater genome coverage when compared to the individual maps, but also serves as a valuable tool for estimating genetic positions of unmapped markers. In this study, a previously developed linkage map from the cross between two peach cultivars ‘Zin Dai’ and ‘Crimson Lady’ (ZC2) was improved by genotyping additional progenies. In addition, a peach consensus map was developed based on the combination of the improved ZC2 genetic linkage map with three existing high-density genetic maps of peach and a reference map of Prunus. A total of 1,476 SNPs representing 351 unique marker positions were mapped across eight linkage groups on the ZC2 genetic map. The ZC2 linkage map spans 483.3 cM with an average distance between markers of 1.38 cM/marker. The MergeMap and LPmerge tools were used for the construction of a consensus map based on markers shared across five genetic linkage maps. The consensus linkage map contains a total of 3,092 molecular markers, consisting of 2,975 SNPs, 116 SSRs and 1 morphological marker associated with slow ripening in peach (SR). The consensus map provides valuable information on marker order and genetic position for QTL identification in peach and other genetic studies within Prunus and Rosaceae.


Introduction
A genetic linkage map represents positions and genetic distances of molecular markers on chromosomes allocated based on segregation data and recombination events of individuals in a mapping population [1,2]. Genetic maps are important tools for a vast number of genetic applications and are widely used in plant breeding programs, genetics and genomics studies. In particular, these maps are crucial for a better understanding of marker-trait associations through quantitative trait loci (QTL) mapping, discovery of genes associated with economically important fruit quality and disease resistance traits, and successful deployment of molecular markers in plant breeding programs via marker-assisted selection (MAS) [3][4][5]. In addition, linkage maps provide an important foundation for other biological applications including candidate gene identification, map-based gene cloning, genome evolution, comparative genomics studies and genome assembly [6][7][8][9][10][11]. High-resolution maps which cover the entire genome with co-segregating, reproducible and high-throughput markers at short intervals are most valuable because of the increased resolution that leads to more effective QTL mapping, candidate gene detection, and more precise estimates of QTL effect [5,12,13].
Peach is a recognized model for Rosaceae genetics and genomics with a wealth of publicly available resources [14,15]. Recent advances in next-generation high-throughput sequencing and genotyping techniques, such as development of the IPSC 9K peach array [16], have permitted rapid development of high-quality genetic linkage maps [17][18][19][20].
Multiple maps developed for the same species usually contain many common markers, which can be used as anchor points for consensus map integration [4,5,32,33]. Highly saturated genetic maps with evenly distributed markers across linkage groups, with no regions of low marker density are most suitable for the construction of a consensus map. Consensus maps developed from multiple populations provide a higher marker density and a greater genome coverage when compared to the individual maps. They also serve as valuable tools for estimating genetic positions, detecting inconsistencies among maps, comparing marker distributions and QTL locations [5]. Consensus maps could also aid estimation of genetic positions of unmapped markers (markers without genetic position) included in genotyping arrays. This is especially important in pedigree-based QTL analyses [34] that require precise genetic positions of the markers to accurately detect QTLs in pedigree-related individuals, when development of mapping populations is improbable. To assign genetic positions to unmapped markers, the common approach was to use a genome-wide mean as a conversion factor [35]. In order to overcome the problem of using the static conversion factor, Fresnedo et al. [30] developed a consensus RosBREED [36] linkage map (RC 1 ) for peach predicting genetic distances by incorporating the physical and genetic positions of 68 markers from the Prunus bin map [37]. However, this map was developed by calculating genetic positions using polynomial equations, not by merging individual peach linkage maps.
In the Rosaceae family, a consensus map was developed for pear [38] and two integrated linkage maps have been reported in apple based on merging five and three populations [5,39]. Although a peach consensus map was previously reported [25], it was constructed using only two peach linkage maps and the GoldenGate genotyping platform which is less commonly used in the peach community compared to the IPSC 9k SNP array.
In this study, we report the improvement of the previously developed peach linkage map 'Zin Dai' x 'Crimson Lady' (ZC 2 ) by genotyping additional progenies. In addition, a consensus peach linkage map was created based on the improved ZC 2 map and four other unrelated high-density maps using two algorithms (MergeMap and LPMerge). The consensus map provides valuable information on marker order and genetic position and will be useful in future studies of pedigree-based QTL analyses in peach.

Plant material and DNA extraction
An F 2 mapping population obtained from selfing an individual from the cross between 'Zin Dai' and 'Crimson Lady' (ZC 2 ) was previously reported [17]. A map was elaborated based on 25 selected seedlings, genotyped with the 9k peach SNP array [16]. In this paper, we have genotyped an additional set of 65 individuals (for a total of 90 individuals) for the development of an improved genetic linkage map. DNA was isolated from young and healthy leaf tissue as described previously by Dellaporta et al. [52]. The concentration and purity of DNA was measured by a NanoDrop ND-1000 spectrophotometer. The final concentrations of all DNA samples were adjusted to 50 ng/μl for high-throughput genotyping.

Genotypic data
DNA samples for a total of 65 'Zin Dai' × 'Crimson Lady' seedlings and parental genotypes were submitted to the Research Technology Support Facility at Michigan State University (East Lansing, MI, USA) for genotyping by the peach Illumina 9K SNP array v1. The iScan data output files were analyzed as previously described by Frett et al. [17]. Briefly, the Geno-meStudio software was used to verify the quality for all samples and SNPs observed. Markers with GenTrain score above 0.4 were inspected. The failed and monomorphic markers were excluded, whereas the polymorphic SNPs were further inspected for clustering analysis. Markers with more than three expected clusters (AA, AB and BB) and missing in at least one of the parental genotypes were excluded from further analysis. SNP markers for which the number of missing genotypes was greater than 10% were not considered for map construction.

SNP-based linkage map construction
The improvement of the existing SNP-based genetic linkage map was based on combining polymorphic SNP marker data, observed in this study, with previously mapped marker data from the ZC 2 mapping population [17]. A genetic linkage map was constructed using SNPs homozygous for alternate allele in two grandparents (AA in one parent and BB in other) as well as SNPs homozygous in one and heterozygous in the other grandparent. F 2 population type codes were applied [53].
SNP markers mapped to the same location, identical markers, were grouped into single bins with the purpose of reducing map complexity for linkage analysis. A single SNP containing no missing data for a progeny was used for linkage analysis from each bin.
Linkage map construction was performed by the JoinMap 4.1 (Kyazma, NL) software applying Maximum Likelihood (ML) function [53]. The parameters used for map construction were as follows: a minimum of a logarithm of the odds (LOD) score of 3.0 was used to assign markers to linkage groups with a maximum recombination fraction of 0.4, goodness-of-fit jump threshold of 5.0 and a triplet threshold of 1.0. Markers exhibiting segregation distortion were identified applying the Chi-square (X 2 )-goodness-of-fit test (p < 0.05) and also integrated into the map. Graphical presentation of an improved SNP-based genetic linkage map of the ZC 2 progeny consisting of eight linkage groups was generated by MapChart version 2.3 software [54]. Marker genetic distances on the linkage groups were presented in centimorgans (cM).

Comparison of an improved ZC 2 linkage map with the peach genome sequence v2.0
The genetic positions of each SNP marker mapped to the ZC 2 linkage map was aligned with their physical position on the peach genome v2.0 sequence [55] by MapChart 2.3 [54], similarly to what had been previously described by Frett et al. [17].

Consensus map construction
Genetic distances of SSR and SNP markers, as well as slow ripening locus (Sr), mapped across four integrated F 2 linkage maps: PI91459('NJ Weeping') × 'Bounty' (WB) [18], 'O'Henry' × 'Clayton' (OC) [19], 'Venus' × 'Venus' (VxV) [20] and 'Texas' × 'Earlygold' (T×E) [56] were obtained from the Genome Database for Rosaceae (GDR) [14][15]. The MergeMap software [40] and LPmerge R package [41] were used to merge four previously reported genetic linkage maps with the improved (ZC 2 ) map developed in this study. To prepare input data for Mergemap and LPmerge, the SNP markers that were non-collinear in comparison with the peach genome were removed from individual maps. For LPmerge, the maximum interval parameter K varied from 1 to 4, and the composite map with the lowest root mean square error (RMSE) was selected. The consistency of all marker names across five linkage maps was verified to avoid marker duplications on the consensus map. The consensus map was constructed by merging a single linkage group (LG) of all five maps at the time, following the protocol reported by Khan et al. [5]. A weight of 1.0 was applied to all linkage groups across all maps. The RMSE in marker order between the consensus maps and the input maps, were calculated by the R package hydroGOF [57], as described in Westbrook et al. [47], and the consensus map with the lowest average RMSE was used for further analysis. The physical positions of all markers mapped to the consensus peach linkage map were compared to the peach genome sequence v2.0 [55] and visualized in Mapchart 2.0 [54].

Estimating the genetic position (cM) for unmapped SNP markers in the 9K SNP array
A Perl Script was developed to estimate the genetic positions for the unmapped SNP markers in the 9K SNP array using the peach consensus map as a reference. The term "unmapped" designates the markers from the genotyping array that were not mapped in one of the individual maps used for building the consensus map. The genetic position for each unmapped marker was estimated using the two closest mapped SNPs in the peach consensus map reported in this study. The equations are as follows: where: delta_bp is distance in bp between mapped SNPs in the consensus map; delta_cM is distance in cM between mapped SNPs in the consensus map; snp_bp is the physical position of the peach SNP being estimated (in bp); snp1_bp and snp2_bp are the immediate left and right physical positions (bp) of SNPs that map to the genetic map and snp1_cM and snp2_cM are their corresponding genetic positions (in cM).
In cases where a SNP was beyond the last mapped SNP, the same delta_cM from the last two SNPs on the linkage group and snp2_bp became the position at the scaffold end.

The improved linkage map for ZC 2 population
The construction of the improved SNP-based linkage map was based on heterozygous SNPs observed in this study combined with SNP marker data previously reported by Frett et al. [17]. A total of 1,478 SNPs were informative in the ZC 2 progeny. Out of those, 2 SNPs were unlinked (0.1%) and 1,476 were used for map construction. Maximum Likelihood mapping successfully mapped 1,476 SNP markers with 351 unique positions (S1 Table; Fig 1).
The revised linkage map of the ZC 2 progeny spanned a total genetic distance of 483.3 cM, with linkage group 1 (LG1) being the longest (95.3 cM) and LG5 the shortest (31.2 cM). The highest number of SNPs mapped to a single linkage group was 263 on LG7 and the lowest was 40 on LG5. The number of unique map positions mapped on a single linkage group ranged from 63 on LG6 to 17 on LG5. The largest gap was observed in LG1 (24.2 cM) between SNP_IGA_103771 and SNP_IGA_120926 (Table 1). SNP marker density per linkage group ranged from 0.96 to 2.58 cM with the average of 1.38 cM.

Comparison of the ZC 2 linkage map with the peach physical map v 2.0
The ZC 2 map covers approximately 82.7% of the peach genome v2.0 ( Table 2). LG3 had the largest coverage (96%), while the lowest coverage (26%) was observed on LG5. The improved ZC 2 genetic map had 97.8% of all SNP markers in agreement with their positions on the scaffolds of the peach genome v 2.0 with differences in the marker order identified in LGs 1, 2, 3 and 6 (Table 2, Fig 2). LG3 had the highest number of non-collinear SNP markers (28). The recombination rate of different chromosomes was estimated as the quotient between the genetic distance (cM) covered by the corresponding LG and the size in Mb of the chromosome fragment covered with markers. This value ranged from 2.20 cM/Mb on LG6 to 6.53 cM/Mb on LG5, almost a three-fold difference in the recombination rate of the corresponding genomic regions ( Table 2).

Comparison of the two versions of the ZC 2 map
The reconstruction of the ZC 2 linkage map resulted in a higher number of mapped markers, from 1,335 mapped on existing map [17] up to 1,476 SNPs mapped on the improved ZC 2 map. The number of unique SNP positions mapped increased from 190 in the previous map to 351 in the improved map. In addition, the SNP marker density in the improved map (1.38 cM/ marker) was higher than that reported in the previous one (2.3 cM/marker). The improved genetic linkage map consisted of eight linkage groups, corresponding to the number of scaffolds in the peach genome, while the previous map consisted of 14 linkage groups.

Consensus genetic map of peach
Four previously published bi-parental linkage maps, WB [18], OC [19], VxV [20], TxE [56] and an improved ZC 2 map developed in this study, were used to construct the consensus peach map. The number of markers mapped on these maps ranged from 1,948 in TxE to 877 in WB.
SNPs that mapped to positions that are non-collinear with their physical position on the peach genome were removed from individual maps and 3,092 markers, including 2,975 SNPs, 116 SSRs and one morphological marker (SR) associated with slow ripening in peach [31]   A total of 1,416 SNPs were common to at least two linkage maps with 2,547 anchor points (Table 3). There were 457 anchor points between VxV and TxE maps, while only 98 anchor points were observed between WB and ZC 2 maps. LG4 had the highest number of anchors points (648), while the lowest number was detected in LG5 (100). The highest number of common markers among the LGs was observed in LG4 (325) and the lowest was observed on the LG5 with only 70 common markers.

Consensus genetic maps built by MergeMap and LPmerge algorithms
Consensus maps were successfully developed by MergeMap and LPMerge algorithms. However, mismatch in marker order between the two versions of the consensus map was observed (S2 and S3 Tables). The MergeMap consensus map had a genetic distance of 830.62 cM with the length of individual LGs ranging from 86.96 to 143.95 cM, observed in LG5 and LG1, respectively (S2 Table). Average distance between the markers was 0.92 cM and the largest gap size was 8.8 cM on LG2. There were 906 uniquely mapped positions ranging from 156 on LG1 to 76 on LG5 (Table 4; S1 Fig).
The consensus map built with the LPMerge algorithm spanned 537.92cM, with the length of individual LGs ranging from 46.6 to 96.05cM for LG3 and LG1, respectively (S3 Table). Average distance between the markers was 0.78 cM and the largest gap of 7.31 cM was observed on LG5 (Table 4; Fig 3). The number of uniquely mapped positions were 693, with the lowest in LG3 (59), and the largest in LG1 (121). The LPMerge peach consensus map had the lowest average RMSE and was further referred to as the peach consensus map (S4 Table).

Comparison of the peach consensus map with the peach physical map v2.0
The physical length of the peach consensus map was estimated to cover approximately 98% of the pseudomolecules of peach genome v2.0 with most of the scaffolds having a coverage above 95%, except for scaffold 5 (91.0%). The recombination rate of different chromosomes ranged from 1.63 cM/Mb on LG4 to 3.77 cM/Mb on LG5. The consensus map was collinear with the peach genome revealing complete agreement in the SNP marker order (S2 Fig; Table 5). The peach consensus map was used as a reference with a Perl script (developed in-house) to calculate genetic positions of markers from the peach 9K array, and the genetic positions of 6,019 unmapped SNP markers were provided in S5 Table.

The improved linkage map for ZC 2 population
Genotyping of additional 65 F 2 individuals from the cross 'Zin Dai' and 'Crimson Lady' improved the existing ZC 2 map [17] and resulted in a map with a better resolution and more  (Table 1) as well as marker density (from 2.4 to 1.38cM/marker). The first version of the map covered 61.6% of the pseudomolecules of the peach genome, while the improved map covered 82.7%. Genetic length (483.3 cM) and SNP density (1.38 cM/SNP) of the improved ZC 2 map were similar to previously reported SNP maps in peach [25,28,19]. The new ZC 2 map had a higher marker density than the other maps based on the 9K SNP array [28,19]. The observed gaps on LGs 1 and 6 (24.2 and 23.4 cM, respectively) agreed with those reported by Yang et al. [19] and Frett et al. [17] who used the same genotyping strategy.
Marker order comparison between the ZC 2 genetic map and the physical map, based on peach genome v2.0, revealed discrepancies in marker positions across LGs 1, 2, 3 and 6. Non- Table 3. Comparison between five peach genetic maps for common markers and anchor points across different linkage groups used to construct a consensus genetic map.

Linkage Maps
LG1 LG2 LG3 LG4 LG5 LG6 LG7 LG8 collinearity in other peach maps has been reported when both the peach genome v1 [24][25]17,18] and v2.0 [28] were used for comparison. Non-collinearity in marker order could be due to specific characteristics of the population, such as size, presence of chromosome rearrangements, and/or linkage mapping and genotyping errors. It could also indicate misassemblies in the peach genome sequence v2.0 [55]. The improved ZC 2 map provides an excellent resource for mapping QTLs associated with fruit quality and phytochemical compounds, since the ZC 2 progeny segregate for many traits including flowering and ripening time, blush, fruit size, flesh adhesion and texture, and phytochemical content [58]. Thus, the improved ZC 2 map provides a valuable tool for future work to better understand genetic mechanisms that control these traits in peach.

Consensus genetic map of peach
The peach research community has been using a Prunus genetic map based on an interspecific cross between almond 'Texas' and peach 'Earlygold' (TxE) [3,56,59] as a reference for establishing linkage group orientation and comparative QTL studies. Prior to the availability of the peach genome sequence, the TxE map was a valuable tool as a source of mapped and transferable markers (mainly SSRs and RFLPs) for the construction of low density maps and the comparison between intraspecific peach and other Prunus species maps [59]. The release of the peach genome sequence [55,60] triggered the development of the 9K peach SNP array [16] and promoted genetic studies in peach using a common genotyping strategy [17,18,19,20,56]. This established the foundation for the development of the peach consensus map reported in this study.
The five highly saturated maps used for building the consensus peach map were based on SSR and SNP markers [20,56] or exclusively SNP markers [17,18]. The high number of common markers (1,416) and anchor points (2,547) facilitated the integration of the individual linkage maps into the consensus map and provided reliable information about SNP marker order and genetic distance in the consensus map. The number of anchor points observed in the peach consensus map was higher than that observed in the consensus maps developed for apple [5,39] and pear [38].  The MergeMap algorithm resulted in consensus map with a higher genetic length (830.62cM) and a lower marker density (0.92cM/marker) compared to the LPMerge algorithm (537.92 cM and 0.78cM/marker, respectively). A possible explanation for the observed differences between the two algorithms is that the MergeMap assigned unique positions to most of the markers, while the LPMerge binned markers into the same map positions. Thus, the nonbinning attribute of the MergeMap provided higher genetic length of the consensus map [47]. The overestimated genetic length in the consensus map constructed by the MergeMap was previously reported in pear [38], barley [43] Pinus taeda, and Pinus elliottii [47]. On the other hand, the genetic length of the LPMerge peach consensus map was within the range of the five individual maps used in this study (336.0-536.6 cM). In addition, each algorithm ordered markers differently in the consensus map resulting in non-collinearity in the MergeMap peach consensus map with peach genome v2.0. A possible explanation is that MergeMap simplified consensus graphs were not ordinally equivalent to the original linkage maps used for building the consensus map [61]. The LPMerge map had the lowest RMSE compared to the input maps and was chosen as the consensus map.
The peach consensus map described here exhibited approximately 98% coverage and full SNP collinearity with the pseudomolecules/scaffolds of the peach genome v2.0 [55], which is similar to coverage obtained with consensus maps developed for apple [5] and pear [38]. The high level of genome coverage confirms the correct positioning of the markers in the consensus map that emerges as reliable tool for future genetic studies such as QTL mapping and candidate gene analyses [5]. This is, to our knowledge, the most comprehensive peach consensus map constructed thus far. Although two consensus peach maps have been previously reported, their application is limited due to either small number of genotypes providing recombination events and less common genotyping platform in the peach community [25], or being developed not by merging individual peach linkage maps but by calculating genetic positions [30]. The consensus map reported in this study is an alternative source of information for calculating genetic positions of unmapped markers in the 9K peach SNP array and QTL mapping via pedigree [34].

Conclusions
In this study, we genotyped 65 additional F 2 individuals using the 9K SNP array and significantly increased the resolution of the previously published ZC 2 map. Using the improved ZC 2 map with four other high-density linkage maps (all genotyped with the 9K SNP array), we developed a high-resolution consensus map for peach using LPMerge algorithm. The peach consensus linkage map contains a total of 3,092 molecular markers (2,975 SNPs, 116 SSRs and 1 morphological marker associated with slow ripening in peach), 2,547 anchor points and covers approximately 98% of the physical length of the peach genome v2.0. This consensus genetic linkage map represents the most comprehensive peach map available to date and could serve as a new reference map for peach. The consensus map provides valuable information on marker order and genetic position for QTL identification and molecular marker development in peach and other genetic studies within the Prunus and Rosaceae.