Development of a minimal KASP marker panel for distinguishing genotypes in apple collections

Accurate identification of named accessions in germplasm collections is extremely important, especially for vegetatively propagated crops which are expensive to maintain. Thus, an inexpensive, reliable, and rapid genotyping method is essential because it avoids the need for laborious and time-consuming morphological comparisons. Single Nucleotide Polymorphism (SNP) marker panels containing large numbers of SNPs have been developed for many crop species, but such panels are much too large for basic cultivar identification. Here, we have identified a minimum set of SNP markers sufficient to distinguish apple cultivars held in the English and Welsh national collections providing a cheaper and automatable alternative to the markers currently used by the community. We show that SNP genotyping with a small set of well selected markers is equally efficient as microsatellites for the identification of apple cultivars and has the added advantage of automation and reduced cost when screening large numbers of samples.

DOI 10.1007/s11295-015-0920-8) and this ref should be cited and commented, since the number of analyzed SNP is a key factor in assigning correctly ploidy level. This is a limit of the small set of SNP described here that has to be indicated/discussed.
-Null alleles: null alleles are quite frequent even in SNP markers and have already been invoked as a reason for Mendelian errors in pedigree (e.g. in Vanderzande et al. 2019or in Muranty et al. 2020see also Di Guardo et al. 2015, doi: 10.1093. This hypothesis should be considered and discussed when considering the results obtained for markers BA07b and BA022 where more disagreement was observed between results obtained here and with SeqSNP platform. For BA07b, the 71 individuals with A:G with SeqSNP and C:C here could in fact be C:null (the corresponding "T" probe may have unknown additional SNP close to its 3' end for a significant part of the genotyped accessions, ad such additional SNP would not be creating an issue with the SeqSNP approach). The same argument could be invoked for the marker BA022: the 181 individuals with C:T by SeqSNP would be C:null with the KASP approach. In both cases, C:null individuals should be located close to C:C individuals on the intensity graph, despite they could be located a bit lower on the X-axis (or Y-axis). To extend this interpretation, some of the miscalls observed for these 2 markers could actually be individuals with the genotype 'null:null' (and this could also be the case for other markers, albeit probably at a very low frequency).
-Error rate: the error rate has been computed by comparing the KASP genotyping data with the SeqSNP genotyping data of Harper et al. (2019). The comparison could be extended (for the common varieties) with the Axiom_Apple480K array genotyping data recently made available by Muranty et al. (2020) (Availability of data and materiel: https://data.inra.fr/dataset.xhtml?persistentId=doi:10.15454/IOPGYF).

Specific points:
Title: The term 'characterising' in the title is probably excessive. Interestingly this term is only used in the title and not at all after in the text. The authors further use 'distinguishing', which term is much more adequate with regard to the actual capacity of the SNP set (e.g., 'characterizing' could be used if the SNP set would allow to perform robust genetic structure analysis or ploidy determination). Line 41: add: "representing about 2,200 distinct … » Line 41 and afterwards: better not to use the term 'lines'. "Line" is generally not used for apple germplasm description or genetic diversity … Much better to use 'accession' everywhere in the text and in the table/figure captions. In some situations, 'samples' or 'individuals' could be used instead of lines.
Line 44: clarify the acronym MUNQ for the first time use.

Results
Lines 59-60: the worksheet 'KASP probes' of Supplementary Excel file S1 lacks some explanations and indications. Especially 'chromStart' and 'chromEnd' are a bit obscure and explanations are lacking for colored cells. Line 68: Table 2 is too detailed to be maintained in the main text. All individuals with fails > 2 should be summed in order to get only 4 lines in the Table (0, 1, 2, >2 fails). The detailed version could be given in Supplementary information. Red color in Table 2 is thus no more useful.
Lines 80-81: Make a second sentence instead of a comment in brackets.
Line 97: what is "S1 Lines 130-133: these two clusters should deserve a more accurate analysis to understand the reasons of the discrepancy and thus better specify the advantages and limits of either platforms. This aspect is fully in the scope of this article.
Line 142: correct 'Accessions Studied' and 'Accessions with Nominal Reps') instead of 'Lines studied" and "Accessions with Known Replicates" (in reference to the S1 table) (or worksheets 'Accessions with Known Replicates' is missing in S1?).
The issue with this table 4 is that some of the accession groups described in the worksheet 'Accessions with Nominal Reps' can be counted twice (or more) according to some counting rules which could be better described in the text … For example, for 'N° Reps = 4', the group 'Pren Glass' has to be counted in Odd One, but then could be counted also in the column '1 SNP diff' after having discarded 'Pren Glas3' of this group … This chart isn't available in your version of Excel.
Editing this shape or saving this workbook into a different file format will permanently break the chart.
Line 164: 6 replicates is not present in worksheet 'Accessions with Nominal Reps': should be discarded from Table 4 … In S1 worksheet ' Table 4' is equivalent to text Table 3: should be corrected.
Lines 186-187: this statement is a bit odd when considering the many synonyms found in apple trees. Perhaps it is the meaning of the term 'accession' that is again inappropriate here (?). The results described in the next sentence are fully expected in apple germplasm collections … Lines 192-193: worksheet 'Paired Singletons' not found in S1? The section lines190-210 cannot thus be checked … But filtering with MUNQ 163 in worksheet 'Accessions studied' indicated 21 accessions with this MUNQ (instead of 15 stated at line 197), and all but one (2032 -Vegi Cox) exhibit the same SNP profile when not considering the miscalls. Also, filtering with MUNQ 901 indicated 19 accessions with this MUNQ (instead of 17 stated at line 199) and all exhibit the same SNP profile when not considering the miscalls.
Lines 204-208: even if these cases are rare, they should be considered more carefully since they could correspond to a limitation of the detection capability of the SNP set. A solution could be to run SSR markers to check the genotype with another type of genetic markers and to verify consistency with MUNQ assignment.
Lines 211-212: again, this sentence is not a result; it contains references and thus should be in the Introduction or in the Discussion sections. At least discard the references or introduce this paragraph differently.

Discussion
Line 235 (and elsewhere in the text): should it be called 'error rate' or 'disagreement rate'. With error rate, the SeqSNP platform is seen as the truth, and the KASP as the fault maker … is it correct?! And the argumentation is more balanced at line 238 … Line 264-266: this assertion is not supported by previously reported results … Lines 271-272: similarly to a statistical test where you accept a false discovery rate (e.g., risk alpha = 5%), you could compute and indicate the risk probability of declaring two accessions distinct whereas they are actually not while exhibiting 'n' discordant SNPs as follows: (error rate)^n. According to the error rate you consider (line 259: 0.42% -2.9%), this probability will vary. E.g., for a conservative assessment (error rate = 2.9%), the risk probability of false discrimination between 2 accessions with 2 discordant SNPs is: (0.029)^2 = 0,000841 This computation does not account for close relatedness … Other more elaborated statistics exist.
Lines 297-298: it could also be the reverse situation, i.e. an error in SSR data or in collecting the correct material at the time of leaf collection by F. Fernandez-Fernandez …! Here again, where is the truth? : in SSR data or in SNP data ? It can vary from case to case … Line 302: first, these results should be previously indicated in Results section and not only here; second, this worksheet 'DEFRA comparison' is not present in Excel table S1 …