Assignment of frost tolerant coast redwood trees of unknown origin to populations within their natural range using nuclear and chloroplast microsatellite genetic markers

Considering climate change and expected changes in temperature and precipitation, some introduced timber species are prospective for growing in Germany or Europe to produce valuable wood products and support sustainable forestry. The Californian tree, coast redwood (Sequoia sempervirens [D. Don] Endl.) is one of such species due to its excellent wood properties and high growth rate. It is sensitive to the freezing temperatures, but several trees of unknown origin introduced to Germany decades ago demonstrated high frost tolerance, and their propagated cuttings were planted all over German botanic gardens and arboreta. The knowledge of their origin within the natural distribution range could help us identify the potential genetic resources of frost resistant coast redwood genotypes. Therefore, both trees of unknown origin in Germany (G) and two reference data sets representing the “Kuser provenance test” established in 1990 in France (F) and samples collected in California (C) with known origin were genotyped using 18 microsatellite markers including 12 nuclear and six chloroplast simple sequence repeat (cpSSR) markers. The number of haplotypes found in the data sets based on six cpSSR markers was surprisingly very high. These markers were used to assign the German frost resistant trees (G) to the two reference data sets (F and C). The genetic structure among California samples (C) based on nSSR and cpSSR markers was very weak and mainly due to northern and southern clusters separated by the San Francisco Bay as a geographic barrier between coast redwood populations confirming previously published data. It was impossible to confidently assign frost tolerant trees (G) to single native populations, but rather to either the northern or southern cluster. However, the existing frost tolerant genotypes can already be used to establish commercial coast redwood plantation for future German forestry.


154
The set was partitioned either into the 17 watersheds according to Douhovnikoff and Dodd 155 [31] or two groups of northern and southern (south of the Napa Valley) watersheds, 156 respectively. markers were used in this study to genotype samples in all three data sets F, C and G (S3 Table). 182 The same touch-down PCR program was used for all 18 PCR primer pairs following the 183 protocol described in Breidenbach et al. [37]. The PCR products were separated and visualized 184 using the ABI genetic analyser 3130xl with GENSCAN ROX 500 as an internal size standard.

185
Verification of the nuclear microsatellite (nSSR) markers 186 The PCR primer nucleotide sequences for the nSSR markers were mapped to the S. 187 sempervirens draft nuclear and chloroplast genome assemblies that become very recently 188 publicly available to verify annealing sites for the microsatellite markers used in this study. The grouping of the ramets representing the same clone was used for all further genotype scoring.

213
The NJ results were confirmed with the function "assignClones" of the R-package "polysat" 214 using the miss-matching threshold of 0.2 [44]. 215 The markers were finally ranked between 7 to 10 (Error! Reference source not 216 found. Table). The NJ tree (NJT) based on the final ranking combination of the markers and 217 1000 bootstraps is presented in S3 Fig.   218 In addition, for the data sets F and C, scored genotypes were converted also into a presence-219 absence matrix, and the Nei's genetic distances ( [45] after [46]) were calculated between 220 watersheds F and populations C using the AFLPsurv software with 1000 permutations [47].

221
The genetic distance matrix was used to generate a NJT with the PHYLIP v369 software [48], 222 respectively. The consensus tree was visualized using the FIGTREE software [49].

223
Genotyping of the chloroplast microsatellite (cpSSR) markers and haplotype network 224 Due to the haploid nature of chloroplasts, genotyping of the cpSSR markers was easier and 225 unambiguous. Based on all three data sets, a haplotype network was built using the Goldstein 226 distance [50] in the program EDENetwork v2.28 [51]. The program calculates a weighted 227 network based on the Goldstein distance between haplotypes with an automatically calculated 228 percolation threshold of 2.67 [52].

Genetic assignment using the chloroplast microsatellite (cpSSR) markers
The individuals from the German data set G were assigned to the data sets F and C using the 231 GeneClass2 software [53]. To do that eight closely located sampling sites (< Hintsteiner et al. [16] using the computation criteria of Rannala and Mountain [54], the 236 simulation algorithm of Paetkau et al [55] with 10 000 resampled individuals, and Type 1 error 237 of 0.01 [55,56]. NJTs were based on Nei's genetic distance ( [45] after [46]) and 1000 bootstraps 238 for the reference data sets C and F and generated using the R-packages "adegenet" [57,58] and 239 "poppr" [59,60]. within these two groups using the same STRUCTURE settings.

255
Mapping of the PCR primer nucleotide sequences for the SSR markers against the draft coast 256 redwood nuclear and chloroplast genome assemblies found annealing sites facing each other in 257 a correct configuration allowing amplification for all six chloroplast markers and eight out of 258 12 nuclear markers (see S4 Table for  The STRUCTURE analysis also suggested very low differentiation with maximum two 282 clusters (K = 2) for the data sets F and C (Fig. Fig. ). Additional STRUCTURE analysis 283 performed separately within the northern and southern subpopulations did not find additional 284 clusters and confirmed that the differentiation observed in the data sets F and C was mainly due 285 to the differences between northern and southern populations. The "locprior" function did not 286 affect much the STRUCTURE results, therefore only results obtained with this function 287 engaged are presented in Fig. Fig. 6, and results obtained with the "locprior" function 288 disengaged can be found in S7 and S8 Figs. Accuracy of the assignments was evaluated by the quality index (QI), which was low for 312 both F (QI= 7 %) and C (QI= 16 %).

313
Individuals with identical genotype representing the same clone SF71 were presented as a 314 single entry in Table 1.  Most of the assignments of German trees to the southern or northern populations divided by the San Francisco Bay in the reference data set C with scores above 48 % were in agreement with the assignments in the reference data set F (Error! Reference source not found. the Sequoiafarm, which were half-or full siblings according to the owner, were assigned to various watersheds (F) and locations (C).

Discussion
The low population differentiation within the natural distribution range of S. sempervirens based on the supposedly selectively neutral nuclear and chloroplast markers and both reference data sets F and C, respectively, confirmed results of previous studies [24,25,31]. The NJT for the data set F based on 12 nSSR markers in our study was in consensus with the NJT calculated for the same 12 watersheds (A-Q) by Douhovnikoff and Dodd [31] using clones from the Russell Reserve. To facilitate comparison between these two phylogenetic trees, the watershed I (Mendocino County) was also used as a root of the NJT. The discrepancies between the two trees could be explained by very low bootstrap support for most clusters in both trees and by using the pairwise F ST values for clustering in Douhovnikoff and Dodd [31] instead of the Nei's genetic distance used in our study. Moreover, the calculations in Douhovnikoff and Doddwere based on less than half of nSSR markers (6 vs. 12), and only two of these markers were used in both studies.
The STRUCTURE analyses based on 12 nSSR and six cpSSR markers confirmed the NJT results of low differentiation, but analyses based on the cpSSR markers were able to identify the San Francisco Bay as a border between two main clusters of populations. However, the obtained results also had only low statistical support (S7 and S8 Figs). The STRUCTURE analyses within each of the two areas, north and south of the Bay, did not reveal any additional clusters, neither in the dataset F nor C. Our data confirmed the San Francisco Bay as a border suggested already earlier by Brinegar [30] based on a single chloroplast marker. It is in consensus also with one of two borders identified by Sawyer et al. [70] based on the soil conditions and water availability provided by precipitation and fog.
The lack of strong population structure and low genetic differentiation can explain the inconsistent assignment of German trees to populations in both reference data sets C and F and the low QI for them. For reliable assignment a stronger differentiation between populations in a reference data set and sufficient sample size of each reference population are needed [15,71,72]. Sample sizes were possibly insufficient for some populations in the reference C and F ranging from 4 to 47 individuals per population or watershed. However, considering the results of the STRUCTURE analyses that suggested northern and southern clusters, all but five German trees were assigned correctly to these clusters (Error! Reference source not found.).
The potential errors in the records on the origin of trees in the reference populations need also to be taken into account when considering the reliability of the assignment of individuals to an origin, because individuals with wrongly identified origin within the reference population can decrease the assignment quality [56]. The possibility of trees with wrongly identified origin and non-local genotypes being included in the two reference sets was quite high because coast redwood is a heavily used timber species with a long tradition of planting, replanting, and plant material transfer [73]. Trees from areas north of San Francisco Bay might have been planted in the south and vice versa. This is really hard to detect and could only be verified in a comprehensive and detailed population genetic study of natural coast redwood populations in California and Oregon.
The reliable clone identification in the data set G (S6 Fig) confirmed results of previous studies based on allozyme and AFLP markers [24,74].
General difficulties to accurately genotype microsatellite markers in polyploid organisms excludes analyses based on their allele frequencies [33,[75][76][77][78]. It concerns also coast redwood, but genotyping problems could be even more aggravated due to a high probability of somatic mutations in basal sprouting shoots in these extremely long living trees, which can result in different genotypes of different tissues and clones originated from the same tree [33]. The assumption of Hardy-Weinberg equilibrium is also tricky due to common clonal growth in coast redwood populations [66], where trees within 40 m distances can belong to the same clone [74].
The correct estimation of the null allele frequencies and allelic dosage are two major difficulties associated with genotyping polyploidy organisms using microsatellite markers [79].
In our study the risk of null alleles should be reduced since null alleles are in general less frequent in EST-SSRs due to their location in more conserved sequences [79] The number of chloroplast haplotypes found in this study was exceptionally high (150) based on six cpSSR markers genotyped in 579 samples in all data sets F, C and G combined. It confirmed higher variation of cpSSR than nSSR markers usually observed in conifers [80], although the mutation rate of chloroplast microsatellite loci is lower [81]. However, cluster analysis did not reveal geographic differentiation other than two northern and southern groups of populations. Similar results were observed in another conifer species, Abies nordmanniana, which also had a similar high number of haplotypes, 111, genotyped in 361 individuals, although the sampling range included a several times larger area than for coast redwood [82].
Meanwhile, it should be noted that haplotype differentiation usually reflected the geographic origin of populations in other conifer species [83][84][85]. However, the differentiation between populations based on the cpSSR markers is usually less than the one based on nSSRs, which can be explained by the paternal inheritance of chloroplasts in S. sempervirens [32] and the long distance gene flow via pollen. Ribeiro et al. [86] found similar results when compared population differentiation based on AFLP markers with the one based on the paternally inherited cpSSR markers in the wind pollinated conifer Pinus pinaster. In addition, Petit et al. [87] showed that for various conifer species genetic differentiation based on bi-parentally inherited markers correlated with differentiation based on paternally inherited markers due to the similar gene flow vectors, but the latter one was in general lower [88].

Conclusions and future directions
Coast redwood forest used to have continuous distribution along the pacific coast in California before intensive logging started at the beginning of the nineteenth century [89]. Therefore, current coast redwood populations are considered as remnant and fragmented populations.
However, being long living clonal trees with high somatic mutation rate coast redwood maintained its high genetic diversity despite multiple bottlenecks [8,32].
The combination of low sexual reproduction and local adaptation that could be insufficient to meet the predicted future climate change will increase the pressure on coast redwood [8].
California has become drier in the last 2000 years [9], and the very important fog has been declining in its frequency during the last century [36,90]. Considering these threats O'Hara et al. [8] emphasized the necessity to find drought tolerant genotypes, especially for southern populations. Genetic and physiological mechanisms behind drought resistance are similar to those that are behind frost tolerance [91]. Geographic variation in drought tolerance in tree species often overlaps with its variation in frost tolerance because the physiological mechanisms of drought tolerance and frost tolerance are similar [92]. One of the mechanisms of frost resistance is related to preventing crystal water formation via osmotic regulation of the cells, which is also important in drought resistance [93]. The identification of water stress resistant genotypes would benefit both Californian and German forestry. Tolerant genotypes would not only provide Germany with a valuable timber species considering climate change, but also presents suitable resources for ex-situ conservation programs for coast redwood, which was already suggested for the sister species Sequoiadendron giganteum [2].  inferred without (a) and with (b) the "locprior" function (K = 6 and K = 2 for C and K = 2 for F, respectively). L(K) and ∆K statistics generated by the ClumPAK software are also presented.