Development and Evaluation of a Genome-Wide 6K SNP Array for Diploid Sweet Cherry and Tetraploid Sour Cherry

High-throughput genome scans are important tools for genetic studies and breeding applications. Here, a 6K SNP array for use with the Illumina Infinium® system was developed for diploid sweet cherry (Prunus avium) and allotetraploid sour cherry (P. cerasus). This effort was led by RosBREED, a community initiative to enable marker-assisted breeding for rosaceous crops. Next-generation sequencing in diverse breeding germplasm provided 25 billion basepairs (Gb) of cherry DNA sequence from which were identified genome-wide SNPs for sweet cherry and for the two sour cherry subgenomes derived from sweet cherry (avium subgenome) and P. fruticosa (fruticosa subgenome). Anchoring to the peach genome sequence, recently released by the International Peach Genome Initiative, predicted relative physical locations of the 1.9 million putative SNPs detected, preliminarily filtered to 368,943 SNPs. Further filtering was guided by results of a 144-SNP subset examined with the Illumina GoldenGate® assay on 160 accessions. A 6K Infinium® II array was designed with SNPs evenly spaced genetically across the sweet and sour cherry genomes. SNPs were developed for each sour cherry subgenome by using minor allele frequency in the sour cherry detection panel to enrich for subgenome-specific SNPs followed by targeting to either subgenome according to alleles observed in sweet cherry. The array was evaluated using panels of sweet (n = 269) and sour (n = 330) cherry breeding germplasm. Approximately one third of array SNPs were informative for each crop. A total of 1825 polymorphic SNPs were verified in sweet cherry, 13% of these originally developed for sour cherry. Allele dosage was resolved for 2058 polymorphic SNPs in sour cherry, one third of these being originally developed for sweet cherry. This publicly available genomics resource represents a significant advance in cherry genome-scanning capability that will accelerate marker-locus-trait association discovery, genome structure investigation, and genetic diversity assessment in this diploid-tetraploid crop group.


Introduction
Within Prunus (Rosaceae), two cherry species, sweet (P. avium) and sour cherry (P. cerasus), are highly valued for their excellent quality fruit. These two species represent a natural diploidtetraploid series with the tetraploid sour cherry (2n = 4x = 32) arising through natural hybridization between sweet cherry (2n = 2x = 16) and the wild tetraploid ground cherry (P. fruticosa) [1,2]. In cherry, linkage maps constructed for the genetically less complex sweet cherry are primarily based on simple sequence repeat (SSR) markers [3,4,5,6]. The application of these linkage maps for other studies such as quantitative trait locus (QTL) discovery is limited by the low-throughput and, in many cases, low density and low levels of polymorphism in cultivated sweet cherry germplasm for the SSR markers.
High-throughput and low-cost next generation sequencing (NGS) is a powerful approach to identify single nucleotide polymorphisms (SNPs) which can be used as markers for the development of high-density genome scans. This approach has been successfully used to develop genome scan platforms for diploid crops such as apple [7], maize [8], rice [9] and polyploid crops such as potato [10,11], and wheat [12]. While low diversity in a crop still reduces the proportion of observed polymorphic markers, the sheer number of markers that can be efficiently generated by the NGS approach overcomes this historical limitation. A 9K publicly available SNP array for peach, recently developed by an international consortium, is widely being used for advanced genetic studies of peach [13]. The development of this array was facilitated by release of the peach (Prunus persica) reference genome sequence by the International Peach Genome Initiative [14].The high quality of the peach sequence was reinforced by high congruence between genetic and physical map positions of peach array SNPs (I. Verde, pers. comm.).
Extensive synteny conserved among the diploid Prunus species [3] suggests that the peach genome sequence can be used as a template to develop a genome-wide set of markers for Prunus crops for which a high quality genome sequence is not available. For example, comparisons of available sweet cherry genetic maps with the high-density peach6almond ''T6E'' Prunus reference genetic map [3]identified extensive co-linearity [4], yet complete colinearity could not be rigorously tested as the largest sweet cherry population used consisted of just 118 individuals [6]. Dirlewanger et al. [15] have estimated the genome size of sweet cherry to be 338 Mb.
Herein we use a cherry-peach comparative genomics strategy to develop a moderate-density cherry SNP array relevant for sweet and sour cherry breeding germplasm based on SNPs discovered using next generation sequencing platforms. This effort was led by RosBREED, a community initiative to enable marker-assisted breeding for rosaceous crops [16] which led the recent equivalent developments of SNP arrays for apple [7] and peach [13].

Materials and Methods
The workflow and design parameters described below are summarized in Figure 1.
Whole genome re-sequencing of cherry breeding accessions A SNP detection panel of 16 sweet and 8 sour cherry accessions was chosen for whole genome, low-coverage resequencing ( Table 1). The accessions were founders, intermediate ancestors, or important parents used in U.S breeding programs. For each accession, paired-end libraries were prepared as recommended by manufacturer protocols (Illumina Inc., San Diego, CA, USA). For sweet cherry, equimolar amounts of four libraries were pooled (to create four pools in total) while for sour cherry, with twice the genome size of sweet cherry, equimolar volumes of two libraries were pooled (to create another four pools in total) ( Table 1). Each library pool was sequenced in one lane of Illumina GA II with 80 cycles per read at the Center for Genome Research and Biocomputing (CGRB; Oregon State University, Corvallis, OR, USA). The raw sequence data was retrieved and kept separate for each cherry accession and then aligned to the reference genome of 'Lovell' peach [14] using SOAP [17] with parameters of M = 4 (find best hits for each seed, #2 mismatches allowed), r = 1 (repeats aligned 1 time randomly), and v = 2 (#2 mismatches allowed).

Detection and Stage 1 filtering of SNPs
SNPs from resequenced accessions were detected using SOAPsnp (http://soap.genomics.org.cn/soapsnp.html) as recommended by Li et al. [17] The detected SNPs were filtered (''Stage 1'' filtering) and kept if: (1) SOAPsnp's ''quality score'' metric of the consensus genotype was greater than 30; (2) the sequencing depth at the putative SNP position was at least 8; and (3) the sequencing depth at the putative SNP position was no greater than 1254, corresponding to the average read depth of all SNPs plus three standard deviations. For each gene, the exonic, intronic, and intergenic space was defined using the Peach v1.0 'dhLovell' genome annotation [14]. This filtration yielded ''Stage 1 SNPs''.

SNP validation with the GoldenGateH assay
A set of 144 SNPs were chosen from among Stage 1 SNPs to validate the efficiency of SNP detection and adjust subsequent filtering parameters (Figure 1). The initial choice comprised 80 Stage 1 SNPs evenly spaced over the sweet cherry genome and 40 SNPs evenly spaced over the sour cherry genome. In the absence of the availability of a whole genome sequence for cherry, the whole genome sequence of peach, i.e., the eight pseudomolecules of the haploid chromosomes and linkage groups (LGs) of the Peach v1.0 'dhLovell' genome assembly [14], was used as the proxy cherry genome. One sweet cherry SNP was chosen to be located within 100 thousand basepairs (kb) from each end of each LG. An even number of sweet cherry SNPs chosen between these ends were then evenly physically spaced along each LG according to LG genetic distances of [3], corresponding to one SNP every 2-4 million basepairs (Mb). The spacing of the 80 sweet cherry SNPs across all LGs averaged 3.00 Mb (standard deviation of 0.52 Mb), with a minimum average of 2.40 Mb (60.11 Mb) for LG8 and a maximum average of 3.8 Mb (60.09 Mb) for LG2. A total of 40 sour cherry SNPs (many of which were also SNPs for sweet cherry) were chosen equidistant between pairs of sweet cherry SNPs (first and second, third and fourth, etc.) such that the average physical distance between sour cherry SNPs was twice that of sweet cherry. For both crop sources of the 120 SNPs, 40% were chosen to be located in exons (CDS) of annotated genes, 20% in introns, 20% in 59 or 39 untranslated regions (UTR, outside genes but within 2 kb of start or stop codons), and the final 20% in intergenic regions. Sixteen further sweet cherry SNPs and eight further sour cherry SNPs spanned an 863 kb region on LG2 between the simple sequence repeat markers CPSCT038 and BPPCT034, the location of a major trait locus associated with fruit size [18]. While these further 24 trait locus-targeted SNPs were chosen for variation in genic regions where possible, preference was given to achieving uniform physical spacing in the designated windows. Approximately 20% of the 144 validation SNPs were planned to be accession-specific, i.e., their minor allele would be detected in only one re-sequenced accession of the detection panel. Twenty-two of the sweet cherry SNPs and nine of the sour cherry SNPs met this criterion within accessions of their respective crops, and a further eight sweet cherry SNPs were accession-specific within sour cherry accessions of the detection panel. Accession-specific SNPs were from 12 of the 16 sweet cherry accessions and five of the eight sour cherry accessions. The 144 SNPs also deliberately included a wide range of minor allele frequencies (MAFs).
To examine filtering parameters for associations with genotyping efficiency, validation panels of 79 sweet cherry (Table S1) and 81 sour cherry (Table S2) accessions were genotyped for the 144 SNPs described above. Individuals in the validation panel were founders, intermediate ancestors, and important breeding parents of modern cherry cultivars and included the 24 accessions of the SNP detection panel. Genomic DNA was purified from each accession using the E-Z 96 Tissue DNA Kit (Omega Bio-Tek, Inc., Norcross, GA, USA). DNA was quantitated with the Quant-iT TM PicoGreenH Assay (Invitrogen, Carlsbad, CA, USA), using the Victor multiplate reader (Perkin Elmer Inc., San Jose, CA, USA). Concentrations were adjusted to a minimum of 50 ng/ml in 5 ml aliquots and were submitted to the Research Technology Support Facility at Michigan State University (East Lansing, MI, USA) where the GoldenGateH assay was performed following the manufacturer's protocol (Illumina Inc.). After amplification, PCR products were hybridized to VeraCode microbeads via the address sequence for detection on a BeadXpress Reader. SNP genotypes were scored with the Genotyping Module of GenomeStudio Data Analysis software v2010.3 [19].

SNP final choice for RosBREED 6K array
Two further rounds of filtering were used, considering validation results. Stage 1 SNPs were converted to Illumina Assay Design Tool (ADT) format with custom scripts and their ADT scores calculated; only SNPs with a score $0.90 were retained. A/ T and C/G transversions were removed from further consider-ation, thus retaining only SNPs of normalization bin ''C'' (for InfiniumH II compatibility). SNPs with MAF ,0.2 were discarded, as well as those not supported by between 10 and 30 reads and at least five detection panel accessions for sweet cherry or four accessions for sour cherry. Non-intragenic SNPs were removed for sweet cherry, i.e., only exonic and intronic SNPs were considered Figure 1. Workflow for SNP detection, validation, and final choice in development of the RosBREED 6K cherry SNP array v1. Stage 1 filtered 1.9 million cherry SNPs anchored to the peach genome to almost 40K SNPs. More stringent filtering criteria in Stage 2, guided by a prior validation step with a small SNP subset examined for a range of potential filters, putatively enriched the quality of the remaining 32K SNP pool. Finally, the 6K array SNPs were chosen from among stage 2 SNPs by attempting to achieve even genetic spacing over species genomes and subgenomes with pre-determined proportional allocations, after preferential inclusion of certain SNPs. ADT = Illumina's Assay Design Tool. MAF = minor allele frequency. doi:10.1371/journal.pone.0048305.g001 for the array. Non-intragenic SNPs were included in the final array for sour cherry, rather than intragenic, due to a mistake in filtering. SNPs not assigned to the first eight pseudomolecules representing peach's eight chromosomes were excluded. Nonuniquely anchored SNPs (those SNPs for which at least 90% of the 60 bp flanking sequences were anchored to more than one peach genome location) were also removed, and there could be no other detected SNP within 20 bp of the targeted SNP. For sour cherry, SNPs predicted to be between the avium and fruticosa subgenomes (i.e., AABB) or polymorphic in both subgenomes (i.e., ABAB) were removed (strategy described below). This filtering process yielded ''Stage 2 SNPs'' ( Figure 1).
Subgenome specificity of sour cherry SNPs was deduced by the following strategy. MAF within the sour cherry detection panel was used as a proxy for allele dosage. SNPs close to a 1:1 ratio for two alternative SNP alleles were assumed to represent either SNPs between the avium and fruticosa subgenomes (i.e., AABB), and therefore not expected to segregate, or SNPs segregating in both subgenomes (i.e., ABAB), as neither were desirable for the final array. Sour cherry SNPs were examined for divergence from a 1:1 ratio by Chi-squared analysis, with those significantly different (p,0.05) assumed to represent subgenome-specific SNPs (i.e., ABAA or AAAB). Finally, to determine whether such SNPs were polymorphic within the avium or fruticosa subgenome, sequence information from sweet cherry detection panel accessions was consulted: presence of the rare sour cherry allele in sweet cherry was assumed to indicate that the SNP identified polymorphism within the avium subgenome; absence of the allele in sweet cherry indicated a fruticosa subgenome SNP.
Members of the international cherry genomics community requested inclusion of 487 SNPs in the final array. These ''preferentially included SNPs'' were: 304 sweet cherry RosCOS SNPs [6]; 150 Stage 1 SNPs spanning 3 Mb at a LG2 fruit size locus [18] and passing Stage 2 filtering criteria except for inclusion of: intergenic SNPs, any MAF, and sour cherry SNPs significantly different from 1:1 dosage at p,0.10; 21 of the GoldenGateHvalidated SNPs with high MAF; and 12 pre-validated SNPs from other research programs.
Choosing SNPs for the 6K SNP array considered the final number designated to each crop (approximately 75% to sweet cherry and 25% to sour cherry), their estimated genetic location, the subgenome targeted for sour cherry, and the two sources of available SNPs (i.e., Stage 2 SNPs and preferentially included SNPs). Preferentially included SNPs were automatically included, leaving 5513 SNPs to be chosen. For the ,75% SNPs allocated to sweet cherry and ,25% to sour cherry, SNPs were evenly spaced genetically across each crop's genome. For sour cherry, approximately half of the chosen SNPs were targeted to evenly genetically spanning the avium subgenome and half to the fruticosa subgenome. Genetic location in the cherry genome of each Stage 2 SNP was estimated by physically anchoring 80 genetically mapped markers (RosCOS, SSR, and CAPS from [6]) to the peach whole genome sequence and then calibrating the genetic location of SNPs between each pair of unambiguously genetically mapped markers according to physical locations. Genetic locations of Stage 2 SNPs in the T6E Prunus reference genetic map [3] were also determined using the same approach, anchored by 153 SSR and RFLP markers.

SNP array evaluation
The RosBREED cherry 6K SNP array v1 was evaluated with panels of sweet cherry (n = 269) and sour cherry (n = 330) germplasm that included a diversity of cultivars, ancestors, founders, and progeny individuals forming a complex pedigree structure linking North American cultivated cherry germplasm for these crops (Tables S3 and S4). The array, employing exclusively Illumina InfiniumH II design probes and dual color channel assays (Infinium HD Assay Ultra, Illumina), was used for genotyping following manufacturer recommendations. SNP genotypes were determined using GenomeStudio Genotyping Module Version v2010.3 [19]. All DNA samples were above the GenCall Score threshold of 0.15 and were therefore used in further analyses following the protocols and instructions provided [20]. For sweet cherry, after clustering with the GenomeStudio built-in clustering algorithm, Gentrain2 [21], all SNPs were visually examined for an expected maximum of three clusters (AA, AB, and BB) and then classified as failed, monomorphic, or polymorphic. The AB scores were converted into base pair calls by referencing the Top strand [22]. In contrast to diploid sweet cherry, five genotypes were possible in tetraploid sour cherry for each SNP: AAAA, AAAB, AABB, ABBB, and BBBB. Some sweet cherry accessions (n = 105) were included in the sour cherry GenomeStudio file to help discern the homozygous classes (''AAAA'' and ''BBBB'') and the balanced heterozygous class (''AABB''). Manual editing was done to check and adjust clusters to the expected genotypic classes following the GenomeStudio polyploid protocol [23]. This recent version of GenomeStudio (Version 2010.3) allows more than three clusters (five in the case of sour cherry) to be manually defined. SNP informativeness in sour cherry was classified using the same criteria as sweet cherry (i.e., failed, monomorphic, or polymorphic) except that a fourth class, termed ''unresolved polymorphic'', was used for polymorphic markers that exhibited ambiguous clusters. For each polymorphic SNP (excluding unresolved), MAF among all cultivar and advanced selection panel accessions were determined with GenomeStudio for sweet cherry and manually for sour cherry.

SNP detection
A total of 15.7 Gb of sweet cherry and 9 Gb of sour cherry DNA reads were obtained from 308.6 million 80-base reads generated for the 16 sweet cherry and eight sour cherry accessions (Table 1). Excluding 'Black Republican' that generated a relatively low number of reads (1.6 million), the total number of sweet cherry reads generated using the Illumina platform averaged 3.16 coverage per accession and ranged from ,8.8 million reads in 'Ambrunes' to 20.8 million reads in 'Windsor'. The total number of sour cherry reads obtained averaged 1.96 coverage per accession (assuming a genome size of 599 Mb) and ranged from 5.1 million reads in selection 23 23 (13) to 26.1 million reads in selection R1 (1). Approximately 14.1% of the cherry reads were aligned to the Peach v1.0 'dhLovell' whole genome sequence [14]. A total of 1,900,695 SNPs anchored to the peach genome were identified, including polymorphism between peach and cherry, approximately a third of which were within-cherry. Passing the filtering criteria for Stage 1 detection were 1,005,660 SNPs (52.9%), of which 368,948 were within-cherry SNPs -97,019 within sweet cherry (i.e., polymorphic among sweet cherry detection panel accessions), 320,816 within sour cherry, 63,831 within both crops (included in previous two figures), and 7472 between but not within sweet and sour cherry only (i.e., homozygous in each crop for different alleles) ( Figure 1).

SNP validation
SNP performance via the GoldenGateH assay with a subset of SNPs screened over 160 sweet and sour cherry accessions depended on SNP source, crop screened, and various parameter scores ( Table 2). Sweet cherry SNPs screened with sweet cherry accessions performed similarly to sour cherry SNPs screened with sour cherry accessions, with approximately a third of attempted SNPs polymorphic in both cases ( Table 2). Evenly spaced and trait locus-targeted SNPs performed similarly for sweet cherry SNPs/ accessions. However, for sour cherry the trait locus SNPs were more often polymorphic (50% of SNPs compared to the average of 35%) and failed less often (13% compared to the average of 33%). Accession-specific SNPs performed relatively poorly for both cherry types, as did SNPs in UTRs and intergenic regions. Exonic regions provided the best performing SNPs (42% and 47% polymorphism for sweet and sour cherry, respectively) while intronic SNPs performed only ,80% as well as exonic SNPs. SNPs with ADT scores of $0.90 were the most often polymorphic compared to SNPs with lower ADT scores. Similarly, SNPs with intermediate MAFs (11-40% for sweet cherry and 21-30% for sour cherry) were more often polymorphic than SNPs with extreme MAFs. Sweet cherry SNPs were not as successful on sour cherry accessions (20% polymorphism) while none of the 48 sour cherry SNPs were polymorphic among sweet cherry accessions ( Table 2).

SNP final choice
Stage 2 filtering resulted in 31,945 SNPs suitable for the final array. Of these, 28,562 were derived from polymorphism within sweet cherry while the other 3383 were expected to be within sour cherry subgenomes, 1730 indicative of polymorphism within the avium subgenome and 1653 within the fruticosa subgenome. Partial Stage 2 filtering of the 487 preferentially included SNPs resulted in 413 suitable SNPs.
SNP choices for the 6K array consisted of 4408 sweet cherry SNPs (227 of those being RosCOS SNPs) and 1552 sour cherry SNPs (788 avium and 764 fruticosa). After a small degree of loss due to technical issues in array manufacturing by Illumina Inc., 5696 SNPs were included on the final array, with 4214 (74%) targeting the sweet cherry genome (221 RosCOS) and 1482 (26%) targeting the sour cherry genome that consisted of 752 for the avium subgenome and 730 for fruticosa (Table 3). Sweet cherry SNPs targeting each chromosome averaged 527 and ranged from 392 (chromosome 8) to 902 (chromosome 1), which depended on the known genetic length of each chromosome (Table 3). Sour cherry chromosomes of the avium subgenome were targeted with an average of 94 SNPs ranging from 66 (chromosome 5) to 164 (chromosome 1) SNPs, while the fruticosa chromosomes were targeted with an average of 91 SNPs and ranging from 73 (chromosome 4) to 161 (chromosome 1) SNPs (Table 3). Included RosCOS SNPs were well distributed across the genome, accounting for 3.8 to 6.7% of the sweet cherry SNPs for any given chromosome (Table 3).
Approximately a third of the SNPs on the RosBREED cherry 6K SNP array v1 were observed to be polymorphic in the sweet and sour cherry evaluation panels, and some of the SNPs developed for one crop were successful for the other. Informative SNPs were randomly distributed over the genome such that genetic variation was successfully sampled at medium density in any region of the genome (Figure 2).
A total of 1825 SNPs were polymorphic across the sweet cherry evaluation panel of breeding germplasm, representing 32% of SNPs present on the array (Table 3). Of the remainder, the vast majority (63%) were monomorphic in sweet cherry while 4.7% failed (Table 3). Sweet cherry RosCOS SNPs were highly polymorphic (92% of those tested), dramatically higher than for non-RosCOS sweet cherry SNPs (38%). Of the 4214 SNPs specifically chosen to target the sweet cherry genome, 1589 (38%) were polymorphic for that crop's evaluation panel, while the remaining 236 SNPs polymorphic for sweet cherry were from sour cherry -221 avium (29% of those SNPs) and only 15 fruticosa (2.9% of SNPs targeted to that subgenome). A large proportion of sweet cherry SNPs (2432, 58%) were monomorphic in that crop while few failed (193, 5%).
The 1825 SNPs polymorphic in the sweet cherry evaluation panel provided an average physical spacing of approximately 120 kb between SNPs across the sweet cherry genome, considering the peach genome as the proxy for cherry (Table 4). Some gaps were closed by the 236 sour cherry SNPs polymorphic in sweet cherry. Chromosomes 5 and 8 had the smallest average gap length between polymorphic SNPs, each just below 100 kb ( Table 4).
The largest single gaps between polymorphic SNPs in the sweet cherry genome were on chromosomes 1 and 2 at just under 1.8 Mb each. These largest physical gaps represented estimated genetic gaps of only 0.73 cM (T6E reference map; [3]) or 1.64 cM (sweet cherry RosCOS map; [6]) for chromosome 1 and only 1.28 cM (T6E reference map) or 1.97 cM (cherry RosCOS map) for chromosome 2. The actual largest estimated genetic gap between polymorphic SNPs occurred elsewhere on chromosome 1, at 5.1 cM, the only estimated gap that was .5 cM (Table 4). Average genetic coverage achieved by polymorphic SNPs for the sweet cherry genome was one SNP per 0.29 cM ( Table 4).
Half of the polymorphic SNPs in sweet cherry had a MAF In a ,1.2 Mb region spanning the self-incompatibility S locus on Prunus LG6 for the sweet cherry cross 'New York 54' 6 'Emperor Francis', offspring representing the four possible nonrecombinant haplotypes were identified ( Figure 4A). One individual, 3 (56), was identified to have resulted from a recombination of 'New York 54' haplotypes, with the recombination site localized to a ,400 kb, ,1.5 cM interval (T6E reference map). For this sweet cherry cross, all sour cherry-derived SNPs were homozygous in this region. SNP haplotypes constructed for the same S locus region could also be followed from parents to offspring of the sour cherry cross 'Ujfehertoi Furtos' 6 'Surefire' (Figure 4B). The two sour cherry cultivars shared two of their four S-alleles (S 4 and S 35 ) for which their SNP haplotypes were identical across the examined region; the S 4 haplotype was also shared with the sweet cherry cultivar Emperor Francis ( Figure 4A and 4B). A recombination between the S 4 and S 139 haplotypes of 'Surefire' was identified in progeny individual 27-03-29, localized to a ,100 kb, ,0.5 cM interval. Nine of the 11 sweet cherry-derived SNPs were informative in this cross as the SNPs were heterozygous within at least one of the subgenomes. In this sour cherry cross, three of four sour cherry-targeted SNPs were heterozygous only in the fruticosa subgenome while polymorphism observed for the fourth, ss490550286, was between but not within the avium and fruticosa subgenomes ( Figure 4B). One sweet cherry-targeted SNP, ss490556254, was polymorphic in the sour cherry cross; however, dosage could not be resolved. Another sweet cherry-targeted SNP, ss490556251, was polymorphic in both the avium and fruticosa subgenomes ( Figure 4B, also indicated in Figure 2).

Public availability of SNP information
All 5696 SNPs on the RosBREED cherry 6K SNP array v1 (4214 sweet cherry and 1482 sour cherry; Tables S5, S6, S7) were deposited in NCBI's dbSNP repository [24] available at www.ncbi. nlm.nih.gov/projects/SNP and also at the Genome Database for Rosaceae (GDR; [25]) at www.rosaceae.org. GDR provides a downloadable Excel file on the peach genome project page containing the genomic locations, flanking sequence, and web links to a GBrowse viewer for these cherry SNPs on the peach genome.

Discussion
The RosBREED cherry 6K SNP array v1was determined to be useful for various genetics studies of cultivated cherry, despite the use of a relatively low number of detection panel accessions each sequenced at low depth. The number of SNP detection panel accessions, 16 for sweet cherry and eight for sour cherry, achieved a total depth of genome coverage (46.36for sweet cherry and 156 for sour cherry) that was less than that of recent SNP detection and array development for peach and apple, two of cherry's rosaceous relatives. The recently developed International Peach SNP Consortium peach 9K SNP array v1 used 56 accessions that achieved 1186 total genome coverage and for which 84.3% of included SNPs were informative when evaluated on diverse breeding germplasm [13]. The 27 accessions used for the International RosBREED SNP Consortium apple 8K SNP array  Table 5. SNP informativeness in sour cherry for the eight sets of chromosomes based on whether the SNP was derived from polymorphism in sweet cherry or in one of the two sour cherry subgenomes (i.e., avium or fruticosa).

Number of SNPs
Chromosome SNP source a  (11) 211 (8) 319 (31) 14 (2) 7 (3) 43 (5) 28 (6) 88 (6) 139 ( (21) 10 (0) 199 (5) 166 (5) 182 (11) 11 (3) 3 (0) 23 (2) 32 (1) 23 (1) 90 ( 165 (4) 107 (6) 158 (8) 5 (2) 7 (2) 10 (3) 28 (1) 20 ( 208 (8) 128 (7) 149 (18) 11 (7) 4 (1) 8 (3) 34 (3) 14 (1) 78 ( 13 (0) 188 (3) 145 (11) 162 (18) 10 (2) 7 (6) 12 (1) 12 (3) 36 (3) 85 ( 184 (5) 113 (3) 150 (14) 12 (5) 2 (0) 10 (2) 21 (2) 20 (3) 85 ( 14 (0) 130 (4) 140 (6) 108 (9) 7 (0) 4 (1) 12 (2) 13 (3) 16 (1) 56 ( 77 (25) 39 (14) 129 (21) 206 (24) 237 (17) 693 ( The 330 sour cherry accessions evaluated are listed in Table S3. v1 achieved 896 total genome coverage and 70.6% of SNPs were informative [7]. However, sequencing depth per accession was similar among all these efforts, averaging 3.16 genome coverage per accession for sweet cherry and 2.06 for sour cherry (Table 1), 2.26for peach [13], and 3.36for apple [7]. Sour cherry coverage was comparable because the sequencing was performed at twice the depth of sweet cherry. As the proportion of informative cherry SNPs was only half that of peach and apple (only 38% and 46% of those targeted to sweet cherry and sour cherry breeding germplasm, respectively), the relative limitation of the cherry SNP detection panel was either the number or relevance of detection panel accessions. The number of accessions was a compromise based on the available budget and intended allocation of array attention of 75% to sweet cherry and 25% to sour cherry, utilizing state-of-the-art NGS technology (Illumina GA II). Because of the intended use of the array in characterizing genetic variation in cultivated cherry, the few accessions of the detection panel were carefully chosen to efficiently represent cherry breeding germplasm. For both sweet and sour cherry, choices were based on prior knowledge of geographic origin, pedigree, SSR and RosCOS SNP diversity (sweet cherry, [6]) or isozyme diversity (sour cherry, [26]), and the potential to confer disease resistance [27,28]. The detection panel choice appears to have been suitable because the array performed as well on the original detection panel accessions as on other breeding germplasm, including on material unconnected by pedigree. With the large number of SNPs on the array, the observed levels of polymorphism and heterozygosity translate to an unprecedented resolution for assaying genetic variation in cherry. In sweet cherry, for example, the array is expected to reveal 1500-2000 polymorphic genome-wide SNPs for any given set of cultivars, and 400-700 SNPs heterozygous within any given cultivar, from our observations of polymorphism and heterozygosity in the sweet cherry evaluation panel. This estimate is supported by [29] who reported 515 to 634 SNPs heterozygous for each of four cultivars when this 6K array was used as the basis for the two highest density genetic maps to date of a Prunus species.  Therefore, the set of SNPs on the array developed from a small but diverse set of breeding-relevant accessions is expected to be generally informative for cherry germplasm in cultivation and in breeding programs. The proportion of polymorphic markers verified in the sour cherry germplasm was nearly double that found in the sweet cherry set. However, dosage could be resolved for only approximately half these SNP markers. Therefore, despite having twice the number of chromosomes, and using intergenic rather than intragenic SNPs, the final number of resolvable polymorphic sour cherry SNPs was just slightly more than that for sweet cherry, 2058 versus 1825. Given that only 314 of these markers were chosen to putatively target the fruticosa subgenome, it is suspected that the sour cherry avium subgenome will have significantly denser marker coverage than the fruticosa subgenome. However, it is probable that some of the sweet cherry SNPs that were polymorphic in sour cherry were within the fruticosa subgenome, given that 215 of the 236 sour cherry SNPs polymorphic in sweet cherry were avium-targeted.  Here, a comparative genomics approach relied heavily on the peach reference genome sequence for SNP detection (aligning sequence reads) and for final choice of SNPs for the array (evenly spanning the ''cherry'' genome rather than random positions). Cherry and peach belong to the same genus, with the same chromosome number and apparent genetic co-linearity [3,4,6]. However, the two crops are in divergent subgenera [30] and not cross-compatible; therefore, there may be significant genomic differences at the micro-syntenic and DNA sequence levels. The latter may account for the low proportion (14.1%) of raw cherry reads that could be aligned to the peach genome given the strict alignment criteria used here, such that only sequences from the most conserved regions between cherry and peach genomes were interrogated for SNPs. If the 86% of unaligned cherry reads were due to substantial sequence-level divergence in localized regions (spanning a few cM or more), the array will be non-informative for such regions of the cherry genome. Similarly, large-scale deletions in the peach genome compared to cherry would result in nonsampled regions. If such deletions occurred at chromosome ends, even the successful genetic mapping of these cherry SNPs would not detect the missing regions. Translocations between the two subgenera occurring between previously genetically mapped markers and on the scale of up to millions of base pairs or several cM are not expected to affect the array's genome coverage. As genetic locations used to evenly space SNPs across the ''cherry'' genome were based on RosCOS SNPs previously genetically mapped in the sweet cherry genome, SNP spacing should not be affected by any differences in recombination rates between peach and cherry.
The high degree of observed monomorphism appears to be due to total sequencing depth of the SNP detection panel. Of the sweet cherry SNPs that did align to the peach genome and passed the various filters to be included on the array, the large proportion (58%) of monomorphism and low failure rate (5%) in sweet cherry accessions indicates that sequences were accurately aligned to unique locations in the peach genome but that detected SNPs were false positives. While a low failure rate (1%) for sour cherry SNPs in sour cherry accessions also indicates successful sequence alignment to the peach genome, the relatively low rate of monomorphism (13%) suggests that most of the originally detected sour cherry SNPs were true SNPs for that crop. However, if unresolved polymorphic sour cherry SNPs were actually monomorphic because their variation in genotype clustering (apparent polymorphism) was due to sequence variation in flanking sequences, then such an adjusted monomorphism rate for sour cherry (53%) would be similar to that of sweet cherry. For peach and apple, monomorphism and failure rates were lower, totaling 16% for peach [13] and 28% for apple [7], probably due the greater total depth of detection panel sequencing for those crops as discussed above. In fact, there is a strong linear relationship between polymorphism rate (P) and total sequencing depth (D) among the three diploid crops of apple, peach, and sweet cherry (P = 0.75D, R 2 = 0.96), suggesting that, using similar filtering parameters among these crops, false positives due to sequencing errors and failure due to undetected polymorphism in SNP-flanking sequences could be effectively avoided with a total sequencing depth of at least 1336. This prediction is an expected consequence of the pseudo-random genomic sampling that underlies shotgun sequencing approaches; detection of individual SNPs requires deep sequencing at individual bases and detection of SNPs genome-wide requires broad sequencing over the genome space. Therefore sufficiently deep sequencing is required to identify not only individual SNPs that are potentially of value for downstream assays but to also discover nearby SNP-flanking polymorphisms that can cause SNP assay failures.
The above considerations suggest some alternative array development strategies. The number of polymorphic SNPs in the cherry array could have been doubled by focusing available resources for SNP detection on one crop or the other (sweet or sour cherry) to double the sequencing depth for that crop. Due to a shared genomic background, such a single cherry crop array would have been useful for the other crop to a certain extent. Following simple extrapolations from our observed cross-species results (Tables 3 and 5), a 6K sweet cherry array based on a detection panel of only sweet cherry accessions (achieving an estimated 1006 total genome coverage and 75% polymorphism rate) would be expected to provide polymorphic SNPs of ,4300 for sweet cherry and ,1900 for sour cherry highly skewed to the avium subgenome. Focusing on sour cherry SNP detection instead, a 6K sour cherry array would be expected to provide polymorphic SNPs of ,4300 for sour cherry (equally distributed over both subgenomes) but only ,900 for sweet cherry. Considering that SNP polymorphism was desired for both crops, both hypothetical scenarios are inferior to the actual dual-crop SNP detection strategy employed that achieved polymorphism of 1825 SNPs in sweet cherry and 2058 SNPs in sour cherry. However, the 2058 SNPs polymorphic in sour cherry are likely to be skewed somewhat toward the avium subgenome because most were developed for sweet cherry (P. avium). A better strategy for the dual-crop array would have been to strongly bias the choice of sour cherry SNPs to the fruticosa subgenome; in hindsight, targeting all sour cherry SNPs to the fruticosa subgenome would have better balanced subgenome coverage.
The RosBREED cherry 6K SNP array v1 appears to be informative beyond just cherry breeding and cultivated germplasm. The avium subgenome of sour cherry is believed to represent a subset of P. avium species diversity that pre-dates modern cultivated sweet cherry [2]. Revealingly, 29% of aviumsubgenome sour cherry SNPs (215) were polymorphic for sweet cherry and 33% of sweet cherry SNPs (1380) were polymorphic for sour cherry. Therefore, ,30% of SNPs on the array are expected to be polymorphic for any diverse set of sweet cherry germplasm, including wild populations. A similar transferability rate may extend to wild P. cerasus populations for the 1482 sour cherry SNPs. Transferability of SNP polymorphism between the distinct P. avium and P. fruticosa species (effectively targeted with 4966 and 730 SNPs on the array, respectively) is estimated to be only a tenth of the transferability within P. avium given that 10% of the sour cherry SNPs polymorphic for sweet cherry accessions were originally developed for fruticosa. The ,200 RosCOS SNPs performed exceptionally well because of their previous validation in cherry germplasm [6]. These RosCOS markers provide a valuable tool for comparative genomics in Rosaceae such as comparing functional genetic variation across genera. More than 100 RosCOS SNPs are expected to be polymorphic in apple from the 128 such markers included on the apple 8K array [7]. While only 14 RosCOS SNPs were polymorphic for peach [13], hundreds of other markers polymorphic in both peach and cherry provide sufficient anchors for comparative genomics within the Prunus genus [3,4].
Close examination of haplotype segregation for a 16-SNP region at the S locus ( Figure 4) indicated that use of the array should enable monitoring of all recombination events in cherry germplasm for any two subsequent generations examined at a time. Unique haplotypes and recombinations between them were successfully detected and localized for both sweet and sour cherry around the S locus. The unprecedented resolution of genetic variation in cherry germplasm revealed by genomescanning with the array opens the door to fine-scale QTL dissection and other linkage-based analyses, identification of incorrect pedigree records, and deduction of identity by descent for chromosomal segments across the genome. In the region examined, common haplotypes, indicating strong evidence for common ancestry, were identified both within a crop (S 4 and S 35 in sour cherry) and between the two crops (S 4 ). The S 19 , S 4 , and S 139 alleles of sour cherry are known to be avium-derived as these S alleles are present in sweet cherry germplasm [31,32,33]. For example, the S 4 allele of the sweet cherry parent 'Emperor Francis' was observed to be identical to the S 4 haplotype present in both sour cherry parents. In contrast, the S 35 , S 36a , and S 36b alleles are considered to be fruticosa-derived as they have not been observed in sweet cherry germplasm [34]. Reconstruction of the two sour cherry subgenomes that is ongoing using the SNP data and a linkage mapping approach is predicted to be complicated by the lack of balanced subgenomes within sour cherry germplasm as sour cherry is known to be a segmental allotetraploid [34,35]. Initial insight into reconstruction of sour cherry subgenomes was illustrated with the identification of seven sour cherry S-allele haplotypes. However, as the S locus region is known to be subject to recombinational suppression and as a result, a site for the accumulation of polymorphism, this level of subgenome-balanced polymorphism should not be extrapolated genome-wide.

Conclusion
The cherry SNP array described here will foster genetics studies in the Rosaceae and help bridge the gap between genomics and breeding in cherry because breeding germplasm was the basis of detected SNPs and SNP choices of the final array. The RosBREED cherry 6K SNP array v1 is commercially available from Illumina and we expect that it will be used worldwide for genetic studies in cherry and related species. The SNP markers included in the cherry 6K Illumina arrays are available for download in Excel format and viewable in GBrowse at the Genome Database for Rosaceae (GDR; http://www.rosaceae. org).