Assessing site signal preservation in reference chronologies for dendro-provenancing

Regional differences in tree growth can be used to approximate the geographical provenance of ring-width series (‘dendro-provenancing’). This method relies on cross-dated ring-width series (reference chronologies) that are thought to represent the radial growth signal of trees in a given region. Reference chronologies are often established from ring-width series of living tree populations. Frequently, they are too short to allow for investigating the provenance of historical wood. Thus, references are extended by ring-width series from buildings and art-historical objects that exhibit best matching growth patterns with the living tree references. Yet, series from other provenances may erroneously be included. Thereby the local or regional growth signal of the references is progressively contaminated, but this has received little attention to date. I investigate this contamination risk using a simulation approach that allows for generating pseudo site chronologies that preserve the relevant statistical properties of the real site chronologies. While the exact provenance of historical wood is unknown, for simulated ring-width series the provenance is unambiguous. Hence, pseudo reference chronologies may be established while monitoring the signal mixture. Specifically, 15 site chronologies of Norway spruce (Picea abies (L.) H. Karst.) from northeastern Switzerland were used to generate 15 pseudo site growth signals that span 1000 years. The simulation demonstrates that quasi uncontaminated references can be established in ideal circumstances for the study area. However, the thresholds for the similarity in between-series correlation must be very high. Even then, contaminated pseudo references occurred in rare cases during the simulation. Yet, elevation-specific pseudo references were established with lower thresholds. Simulation currently offers the only approach for assessing the contamination risk of reference chronologies, and it allows for elucidating the conditions under which acceptable levels of contamination can be guaranteed. Therefore, the present approach paves the way towards a practical simulation tool for dendro-provenancing.

1 Introduction resolution and the reliability of dendro-provenancing are to be improved, the potential for establishing uncontaminated reference chronologies needs to be assessed prior to field work (i.e. prior to the establishment of historical references) and prior to the determination of provenance.
Aside from living tree chronologies, only for simulated series the site origin is retraceable. Moreover, the length of living tree chronologies is biologically limited. In simulation, however, pseudo historical series of arbitrary length can be generated whose site provenances are known. Consequently, the contamination risk can be monitored at a greater temporal depth when pseudo reference chronologies are established. Thus, an evaluation of signal preservation in reference chronologies is best approached via simulation.
The basic concept underlying this paper can be summarized as follows: Firstly, the most relevant statistical properties of recently (i.e., in 2015 and 2016) collected site chronologies are characterized and an adequate simulation model is developed. Secondly, this model is used to generate a set of simulated historic pseudo ring-width series. The latter are samples from an underlying set of pseudo site chronology signals that preserve the covariance of the original site chronologies. Thus, the (dis-)similarities of growth that exists between the original chronologies is reflected in the simulation. Thirdly, once the pseudo historical tree-ring data have been generated, an algorithm is used that mimics the extension of site chronologies by adding best matching pseudo historical series. Finally, the composition of the generated pseudo reference chronologies is evaluated. The critical point is that the site provenance is known for all pseudo historical series, as they were sampled from the respective underlying pseudo site signals. Hence, it is possible to examine the composition of the pseudo references and determine the degree of signal mixture. The current case study is limited to the material gathered earlier [15], but the approach is transferable to other dendro-provenancing studies.
The objectives of this study relate to the simulation model, the classification algorithm and the evaluation of the algorithmically generated pseudo reference chronologies, and are as follows: 1. To introduce and evaluate a simulation model for generating pseudo historical ring-width chronologies.
2. To develop an algorithm that mimics the process of establishing reference chronologies for dendro-provenancing. 3. To examine the signal composition of the pseudo reference chronologies and assess the potential for establishing real reference chronologies in the study area.

Real ring-width dataset
Real ring-width datasets are needed to initialize the simulation. The dataset used for this purpose comprises 401 ring-width series that represent a set of 15 site chronologies of Norway spruce (Picea abies (L.) H. Karst.) with a replication of 15 to 32 ring-width series per site [15]. Further details on the sampling design, measurement, cross-dating and meta data are given in [15]. The ring-width dataset is provided in S1 File (R package data chrono.rwl in S1 File).
For the simulation, the actual site name is unimportant. Hence, only abbreviations are used to label the pseudo site signals. However, the letters preceding the dot in the abbreviations (e.g. as used in subsection 'Pref composition') are identical to the real site abbreviations introduced in [15]. Thus, the pseudo sites signals can be traced back to their underlying real site signals for more detailed scrutiny.

Simulation
All calculations and statistical analyses were done in R (3.6.0) [25]. Specifically, an ad hoc R package was written to perform the simulation. The complete code can be found in S1 File.
Auto-regressive (henceforth: ar) residual series effectively preserve the signal that is relevant for dendro-provenancing [15]. Thus, ar residual chronologies were used for the simulation. Unlike in raw ring-width series, in ar residuals the medium-(5-15 years) and low-frequency (>15 years) growth variability is quasi absent [26]. The individual ar residual series hence represent the year-to-year (high-frequency) growth variability of each individual tree ring-width series. Thus, for each year of an ar chronology, there is a sample equal to the number of trees that were sampled and that featured a ring in the respective year ( Fig 1A in S2 File). These yearly distributions of ar residuals may be approximated by normal distributions, as, theoretically, ar residuals are series of normally distributed random shocks [27,28]. Hence, as for any other normally distributed variable, the yearly means and standard deviations of the theoretical population distributions of the ar residuals can be estimated by calculating the sample means and standard deviations of the empirical distributions of the ar residuals [28].
Thus, as a first step towards a simulation model of the real tree-ring data, for each of the 15 raw ring-width chronologies an auto-regressive residual chronology was calculated using the default settings of the detrender-function in the R package dplR, i.e. by fitting an auto-regressive model to the raw ring-widths and choosing the model that minimized Akaike's Information Criterion [29,30]. This choice of preprocessing was supported by the high classification performance of this method in the study of [15]. When requiring a minimal replication of 15 data points per year of the chronology, the 15 resulting ar chronologies covered a common period of 63 years (i.e., from 1951 to 2014). Subsequently, series of empirical mean values and standard deviations were calculated for these 63 years. This resulted in a 63 × 30 matrix containing 15 column vectors of yearly mean values (rmv) and 15 column vectors of yearly standard deviations (rsd).
To sidestep the limited length of the real dataset, a 1000 × 30 matrix was generated whose columns had zero covariance and followed a standard normal distribution (Fig 1B in S2 File). This matrix was transformed to a matrix with the same scale and covariance as the real data matrix. The transformation matrix was generated by decomposing the covariance matrix of the real rmv-rsd matrix via its eigenvalues [31,32].
The R implementation of the function rmvnorm (MASS package) assumes a multivariate normal distribution of the underlying variables [32]. Thus, the transformation was valid for covariance matrices calculated from such variables only. For the rmv vectors, a multivariate normal distribution was reasonable to assume because the indices of the ar residuals were themselves normally distributed [27,28]. For the rsd vectors, however, a multivariate log-normal distribution seemed more appropriate (Fig 1 in S2 File). Hence, the rsd were log-transformed (natural logarithm) prior to the calculation of the covariance matrix, whose decomposition was used to linearly transform the uncorrelated random data matrix (Fig 1B in  S2 File).
After the linear transformation, the generated matrix contained 30 column vectors of length 1000, with 15 columns for the yearly pseudo mean values (pmv) and 15 for the logarithms of the yearly pseudo standard deviations (ln(psd)) of each pseudo chronology. Next, the ln(psd) were raised to their power to provide the yearly standard deviations of the pseudo chronologies (psd). Finally, the resulting matrix contained all measures of central tendency and dispersion necessary to parameterize the respective sequences of yearly normal distributions for each pseudo site chronology (cf. matrix P, Fig 1B in S2 File).

Pseudo historical series.
To generate individual pseudo series (ps), random samples were sequentially drawn from the sequences of normal distributions defined by the parameters in the pseudo signal matrix. Thus, ps with a length of 1000 years were generated. The ps were cut into partitions (pseudo historical series, phs) that were of similar lengths as the series found in a typical dataset of historical ring-width series. Here, the dataset of the Dendrochronological Laboratory of the City of Zurich was used to determine the partitioning (R package data distr.sl in S1 File).
To partition a 1000 years ps, series lengths were drawn randomly from the pool of series lengths of Picea abies (length � 50 years) until the sum of the drawn series lengths was � 1000 years. Then the 1000 years ps was cut into subseries. Often, this resulted in the last partition being shorter than 50 years, which were discarded during later analyses (cf. section PREF-Constructor Algorithm, below). The sampling of ps and their subsequent partitioning into phs was repeated until a replication of 30 phs per year and pseudo chronology was achieved.
2.2.2 Pseudo object chronologies. Generally, object-based chronologies are the core components of reference chronologies for dendro-provenancing rather than single tree-ring series. Such object-based chronologies already represent an aggregate (i.e. the mean) of several ring-width series, which were sampled in historical objects (e.g. the roof beams of a building). If timber sources for the construction of respective objects were spatially limited, object-based chronologies represent mean local tree growth [10]. To allow for a comparison of pseudo reference chronologies established from phs and pseudo reference chronologies generated from pseudo object chronologies (poc/pocs), the above procedure of generating phs was slightly modified, as described below.
Instead of drawing a single ps, 6 ps were sampled from the sequence of yearly normal distributions of a respective pseudo chronology signal. This replication was chosen arbitrarily but seems to be a realistic assumption for the average number of samples taken per historical object in a real setting [33]. Subsequently, the 6 ps were averaged prior to the partitioning (as described in the previous section). Thus, the resulting pocs represent an object chronology with a replication of 6 individual ps per year each. This reflects an ideal scenario, in which the object chronologies are mean value chronologies calculated from phs that originated from the same pseudo site signal. Again, as with the phs approach the sampling and partitioning was repeated until a replication of 30 pocs per year and pseudo chronology was achieved.
In a real setting, there is a chance for historical building chronologies to contain a mixture of different site signals [10]. Thus, some simulations were done with pocs containing a mixture of ps originating from different pseudo signals. Specifically, the ratio of ps originating from the correct, 'local' pseudo signal was lowered. Thus, this so-called on-site ratio (osr) was lowered from 1 (all 6 ps on-site) to 0.83, i.e. 1 of the 6 ps originates from an off-site pseudo signal, to 0.67, i.e. 2 of the 6 ps are off-site ps.

PREF-Constructor algorithm
To mimic the establishment of reference chronologies by adding historical ring-width series to initial living tree chronologies and automate this process, a Pseudo Reference Constructor Algorithm (PREF-Constructor) was developed and implemented in the ad hoc R package mentioned above (S1 File). The algorithm generates pseudo reference chronologies (pref/ prefs) by adding phs/pocs to the best fitting initial pref, i.e. to the One-Nearest-Neighbor chronology [34][35][36]. Thus, prefs are established that cover 1000 years in ideal circumstances ( Fig  1).
The 15 initial prefs of this study represent the 15 mean values calculated from 30 ps of length 150 years per pseudo site chronology. These 30 ps per initial pref were sampled in addition to the already simulated pseudo dataset. This mimics a scenario in which a dendrochronologist starts out with 15 unmixed mean site chronologies (the initial prefs) and extends these by adding the best matching series from the hypothetical laboratories' historical dataset (here, the simulated phs/poc dataset).
Best matches were determined statistically, i.e. by calculating a t-value for each Pearson's correlation coefficient that resulted from pairwise comparisons between a candidate series (phs/poc) and the 15 initial prefs [37,38]. This method in combination with ar prewhitening had a high performance in a previous study [15]. Because here the ps resemble ar residual series, a comparable performance was expected.
Potential best matches were required to fulfill threshold criteria. Specifically, the thresholds for t-values that were tested were t � 5 (t5), t � 10 (t10), t � 15 (t15) and t � 20 (t20) at a minimum overlap of 50 years. If several matches fulfilled the threshold criteria, the respective candidate series was classified to the best matching initial pref, i.e. the chronology that yielded the highest t-value for the pairwise comparison.
In a real dataset, potential cross-dating errors complicate the classification. The PREF-Constructor assumes that the cross-dating of each candidate phs/poc is correct. Thus, best matches need to be found for one correlation position only.
Once the best matching initial pref had been determined for all phs/pocs, the series were labelled (i.e. classified) according to the label of the best matching initial pref. After this initial run, the first prefs were established from the classified phs/pocs. Subsequently, the phs/pocs that were yet unclassified were compared to the prefs just established. Again, all series that fulfilled the threshold criteria were classified and added to the respective pref. Series that remained unclassified were checked again during later runs, after the establishment of the prefs had advanced. Thus, the algorithm proceeded until all series were classified, or until no unclassified series fulfilled the threshold criteria anymore, i.e. no classifications were possible any more (Fig 1).

Evaluation
One thousand pseudo datasets comprising 15 pseudo chronologies each were generated to evaluate 1) the simulation model and 2) the PREF-Constructor results. This 1000-fold repetition of the simulation was done for the phs approach as well as for the poc approach and with different settings for the thresholds. Thus, the variability in pref composition that existed between the different repetitions due to random sampling was quantified to avoid unfounded interpretations based on the results of a single simulation repetition.
In the following two subsections, performance indicators are introduced that provide a basis for quantifying the stability of the pseudo signals across all repetitions. Moreover, methods are presented that allow for the comparison of PREF-Constructor classifications between different simulation settings.

Stability of the pseudo signals.
The simulated pseudo signals must remain stable across different simulation repetitions. Instability would indicate that the individual pseudo series (ps) were sampled from a different set of pseudo site signals in each repetition of the simulation. This would render the 1000 pseudo datasets incomparable. Thus, the stability of the intercorrelation between the pseudo mean values (pmv) or the sample pseudo mean value chronologies (spmv), respectively, was investigated.
Δ{pmv-rmv}. To quantify the stability of the linear transformation that yielded the pseudo mean values (pmv/pmvs) for the simulation, the pairwise correlations (i.e., the correlation matrix) between the pmv vectors were calculated for each of the 1000 simulation repetitions. Subsequently, the correlation matrix for the mean values of the real site chronologies (rmv/ rmvs) was calculated, and the range of differences between the entries of the correlation matrices calculated for the pmvs and for the rmvs (Δ{pmv-rmv}) was determined.
Δ{spmv-rmv}. The linear transformation that yielded the pmvs was expected to be stable, except for rounding errors [31]. However, more instability was anticipated for the mean values calculated from phs chronologies. Even if the PREF-Constructor algorithm were to classify each phs correctly, the pseudo reference chronologies (pref) calculated from these phs would not perfectly reproduce the intercorrelation of the pmv. This was to be expected because the phs are random realizations of the respective pseudo signals (section Simulation). Thus, the variability introduced due to random sampling needed to be quantified. Similar to Δ{pmv-rmv}, the Δ{spmv-rmv} indicates the range of differences between the correlation matrix entries of the rmv and the sample pseudo mean value chronologies (spmv) of each of the 1000 simulation repetitions. Thus, the spmv are the hypothetical mean value chronologies that result from averaging the phs according to a perfect classification. No further investigations were done on the stability of the poc chronology signals because no further instability results from averaging phs into pocs (for an on-site ratio of 1).

Pref composition.
Pref composition was investigated by calculating the percentage of generated series that were classified during each simulation repetition. Of these classified series, the percentage of correctly classified series was determined. To investigate the dynamic establishment of the prefs, the off-site poc/phs contamination was calculated for each run of classifications executed by the PREF-Constructor (section PREF-Constructor Algorithm). That is, the percentage of phs/pocs originating from a pseudo site signal other than the pseudo signal that had been used to initialize the respective pref was calculated in every run.
Hypothetically, while the generated prefs may represent mixtures of pseudo site signals they may still carry watershed and/or elevation-specific pseudo signals. Therefore, the phs/pocs were grouped according to contrast groups other than pseudo site provenance. Subsequently, the off-contrast contamination was calculated, i.e. the percentage of phs/pocs originating from a contrast group other than the group that the initial pseudo signal of the respective pref was attributed to. To define these contrast groups, the elevation bands and watersheds/regions of the real site chronologies [15] were used to group the different pseudo site signals: 1. Elevation contrasts (site abbreviations refer to names in [15]; cf. section Real ring-width dataset): • Sihl (pseudo sites: fri.pseudo, furg.pseudo, gw.pseudo, kar.pseudo, sw.pseudo).

Stability of the pmv/spmv intercorrelation
The range of deviations attributable to variability in the pmv intercorrelation between simulation repetitions, i.e. Δ{pmv-rmv}, was effectively zero, primarily being due to rounding errors (-5�10 −15 to 5�10 −15 ). Moreover, the range of deviations that resulted from the random sampling of phs across simulation repetitions, i.e. Δ{spmv-rmv}, was small. The largest positive deviation was 0.039, the largest negative deviation was -0.083 for any spmv correlation matrix entry.

phs approaches.
When setting the t-value threshold to 10 (t10) and 15 (t15), respectively, the percentage of generated series classified lay below 1.2% with the phs approach ( Table 1). The number of PREF-Constructor runs executed per simulation repetition was very low (often < 3 runs, Table 1). Only the t5 threshold resulted in prefs that were long and well replicated enough to allow for further analysis of their composition.
However, the t5 prefs were quickly contaminated. By the end of the 2nd PREF-Constructor run, contamination was <20% for all prefs in 95% of the simulation repetitions (Table 2). By the end of the final PREF-Constructor run, pref1 and pref13 were the only prefs that were contaminated by less than 20% in more than 80% of the repetitions (Fig 2A). These were also the shortest and most sparsely replicated prefs overall ( Table 2). Contamination was higher for the longer and better replicated prefs. For example, prefs no. 4, 5, 6, 7, 10 and 11 were contaminated by more than 20% off-site series in >40% of the simulation repetitions (Fig 2A).
Regarding contrasts other than the site contrast, prefs initialized with high-elevation signals only attracted phs sampled from high-elevation pseudo signals. Low-and medium-elevation signals, however, were often mixed ( Fig 2B). The same was true for signals belonging to different watersheds (Fig 2D, Fig 2 in Table 1 in S3 File). Remarkably, in 891 out of 1000 simulation repetitions, 2 full-length prefs were developed, one of which consisted of high-elevation phs and the other of a mixture of medium and low-elevation phs. The rest of the prefs were predominantly short (1-333 years) or at least shorter than 667 years (Table 3).
3.2.2 poc approaches. t15 threshold. When setting the on-site ratio to 1 (osr-1) and the tvalue threshold to 15 (t15), between 19.82% and 46.33% of the generated series were classified by the PREF-Constructor algorithm. Of these, between 71.41% and 99.93% were classified correctly (Table 1), depending on the simulation repetition. For those PREF-Constructor runs that were replicated by all repetitions, the contamination was <20% until run no. 6 for all prefs generated (Table 4). Moreover, even when calculating the contamination after the last PRE-F-Constructor run had been executed, pref6 and pref11 were the only prefs that exhibited a contamination >20% with a notable frequency, i.e. in more than 5% of the repetitions ( Fig  3A). For the other prefs, highly contaminated cases (>50%) occurred only rarely (<5% of the repetitions, Table 2 in S3 File).
On average, per repetition 4 prefs reached the full length of 1000 years. The majority of the prefs were >333 years and often >500 years long. However, in each repetition, on average 6 of the prefs reached <334 years ( Table 3, Table 2 in S3 File). The ranking of shortest to longest pref tended to differ considerably by repetition (e.g. pref1 in certain repetitions was among the longest and in other repetitions among the shortest prefs). However, over all simulation repetitions, certain prefs developed systematically better than others (  Table 2 in S3 File). Among the frequently longer prefs were no. 2, 3, 5, 6 and 7, with a median length >700 years (calculated at the last PREF-constructor run replicated by all 1000 simulation repetitions, Table 4). The shorter prefs were no. 1, 4, 8, 9, 10, 13, 14 and 15, as reflected by a generally lower median length (<500 years; calculated at the last fully replicated PREF-constructor run, Table 4). These prefs also had a lower replication, i.e. a median mean replication <20 series by the end of the last (13th) fully replicated PREF-Constructor run (Table 4; for full tables of minimum, median and maximum lengths and mean replication, respectively, cf. Table 2 in S3 File).
Lowering the on-site ratio to 0.83 (osr-0.83) and 0.67 (osr-0.67), respectively, resulted in a drop of classification rates (Table 1). Consequently, fewer PREF-Constructor runs were  Table 4 in S3 File, S4 File). t20, t10 and t5 threshold. Short and sparsely replicated prefs resulted also when increasing the t-value threshold to t20 at osr-1, i.e. the maximum on-site ratio (Figs 8-9 in S2 File, Table 5 in S3 File, S4 File). As for t15 at osr-0.67, prefs often remained in their initial state. Thus, no simulations featuring alternative on-site ratios were executed for t20.
For t10 at osr-1, the contamination risk varied considerably between prefs. After the first three PREF-Constructor runs, in 95% of the repetitions most prefs, i.e. no. 1, 2, 3, 4, 5, 7, 8, 9,  Table 5). However, in the bulk of repetitions, the majority of prefs were contaminated progressively with every subsequent PRE-F-Constructor run (Table 5). Nevertheless, usually long (median 453 to 1000 years) and well replicated (median 20 to 28.7) prefs were established with t10 at osr-1 (medians calculated at PREF-constructor run 15, Tables 3 and 5 Table 6 in S3 File, S4 File, Fig 10 in S5 File). Decreasing the on-site ratio for t10 did not affect the length and replication of the prefs as heavily as for t15 (Tables 1 and 3). In summary however, the same observations were made, i.e. the number of correctly classified series decreased the lower the on-site ratio (Table 1,  A very high contamination risk was observed when lowering the threshold to t5, even while keeping the on-site ratio at its maximum, i.e. osr-1 (Tables 1, 3 Table 9 in S3 File, S4 File, Fig 13 in S5 File). Thus, for the poc approach at t5, the simulations featuring lower onsite ratios were omitted.
For all t-value thresholds and on-site ratios investigated, grouping the pseudo signals according to the elevation bands of the underlying real sites revealed that prefs initialized with pseudo signals from high elevation sites did generally not attract poc series that were sampled from medium-or low-elevation pseudo signals (Figs 4C, 5C, 6C and 7C). However, prefs frequently attracted pocs from watersheds other than the pseudo site signal used to initialize the respective pref (Figs 4D, 5D, 6D and 7D).

Adequacy of the simulation model for investigating the contamination risk of reference chronologies
This study assesses a fundamental assumption of dendo-provenancing, i.e. that regional reference chronologies reflect regional tree-growth. Although initially plausible, a systematic evaluation of this assumption is needed prior to dendro-provenancing because non-regional series (e.g. from rafted timbers) may contaminate references (c.f. Introduction). Here, simulating the process of establishing historical reference chronologies was chosen to assess the degree to which local/regional signals can be retained. Algorithmically, pseudo references were

PLOS ONE
Assessing site signal preservation in reference chronologies for dendro-provenancing   established using different thresholds for the minimal similarity required for a given series to be included in the respective reference. This approach's success depends on how accurately the site-specific, elevation-specific and regional (dis-)similarities of tree growth in the study area are modelled. The covariance matrix of the real site chronologies (i.e. the 15 Norway spruce site chronologies) represents said growth (dis-)similarities. Hence, the simulated pseudo site signals must evince a covariance matrix that closely reflects the covariance matrix of the real chronologies. Specifically, quasi no deviation was found between the covariance matrix entries of the real chronologies and the simulated pseudo signals (i.e. Δ{pmv-rmv}). Moreover, the pseudo chronologies (sampled from the pseudo signals) consistently reproduced the covariance matrix of the real chronologies in all replications of the simulation (c.f. small range of Δ{spmv-rmv}, Results section). Thus, the 1000 randomly sampled pseudo datasets essentially reflect the same set of site signals. Consequently, these datasets can be used to assess the risk of signal contamination in the study area.

Implications and benefits for dendro-provenancing
For the study region, results indicate that the regional signal of reference chronologies is highly susceptible to contamination. This problem may also be acute in other regions but has received  little attention in previous research [5][6][7][8][9][10][11][12][13][14][18][19][20][21]. Especially, the contamination risk rises if object chronologies (i.e. the basic components of reference chronologies) already represent a mixed site signal. In the simulation, this was investigated by replacing one or two of the six series that entered a pseudo object chronology with an off-site series (i.e. pocs with osr-0.83 and osr-0.67, respectively). Lowering the on-site signal ratio in the pseudo object chronologies diluted their original pseudo site signal and resulted in a quick decrease of the length and replication of the pseudo reference chronologies constructed using such mixed signal object chronologies.
In the context of sampling real timbers from historical objects, the assumption of an unmixed site signal (i.e. osr-1) seems too optimistic [33]. Realistically, imported timbers occasionally will enter object chronologies [39]. Unfortunately, the simulation was not robust when off-site series were incorporated in pseudo object chronologies. Hence, researchers are  best advised to scrutinize the assumption of local timber supply for each object investigated when attempting to establish object chronologies for dendro-provenancing. For the area studied here, reference chronologies that reflect local to regional growth seem procurable only if unmixed object chronologies are used.
Alternatively, establishing regional reference chronologies using single ring-width series seems ineffective and bears a high contamination risk. The pseudo reference chronologies established with the single series (phs) approach were quickly contaminated. In the median, three quarters of the pseudo historical series were either unclassified or wrongly classified (Table 1). Moreover, the pseudo site signal was too weak in the pseudo historical series, thus the t-value threshold could not be raised above t �5. Hence, at least for small-scale studies like [15], establishment of reference chronologies and provenancing based on single ring-width series is likely to be highly problematic.  A t-value threshold of t �15 was necessary for the robust construction of long and uncontaminated pseudo reference chronologies. This is an extreme restriction for the minimal similarity required for valid matches. In the literature, t �15 is an uncommon threshold: often, researchers applied more or less rigorously fixed thresholds between t �9 and t �11 [7,8,[40][41][42][43]. Some studies even considered t-values <9 to narrow down the area of provenance [44][45][46][47]. Strictly speaking, the t-value thresholds are not directly comparable between different studies. Firstly, there exist species specific differences in between series similarity [20,22]. Yet, these are negligible for illustrating the strict restrictions for the minimal between series similarity that are defined by the requirement t �15 in this study. Secondly, the t-value is influenced by preprocessing and the overlap over which the correlation coefficient is calculated [48,49]. However, the most commonly used t-value calculation methods involve transformations that enhance the high-frequency signal [17,18]. Thus, the t-value for a correlation coefficient still provides a good approximation of the similarity between ring-width series that was deemed sufficient for a valid match between reference and candidate series in the respective studies cited above (i.e., [7,8,[40][41][42][43][44][45][46][47]).
The simulation approach that was developed in the present study shows that, even with t �15 and unmixed pseudo object chronologies, only one third of the pseudo object chronologies in median was correctly classified (i.e. 34.67% classified to a pseudo reference chronology, of those 96.76% to the correct pseudo reference chronology; Table 1). Lower tvalue thresholds evinced a higher risk of establishing contaminated pseudo reference chronologies. For example, although the t �10 threshold exhibited a low contamination risk for establishing pseudo reference chronologies in the majority of the repetitions of this simulation study, there were several repetitions in which pseudo reference chronologies that were uncontaminated in most other repetitions were severely contaminated. Particularly, variability was extremely high among the 5% most contaminated simulation repetitions. Thus, no t-threshold can be considered "secure". Even for t �15 with unmixed pseudo object chronologies, contaminated pseudo reference chronologies resulted in rare cases. Using the best match to extend pseudo reference chronologies represents only one implementable classifier, i.e. the One-Nearest-Neighbor classifier based on t-values for the Pearson's  correlation coefficient (c.f. Methods section). The choice of this classifier over other possibilities was supported by the results of [15]. In future applications, the current approach could be extended by implementing other time series classifiers and/or measures of proximity in the PREF-Constructor algorithm [34].
Regardless of the specific classifier, distance or similarity measurement in use, a simulation approach offers the means for assessing the level of statistical proximity necessary for lowering the risk of establishing contaminated reference chronologies within a given dataset. The tthresholds determined here are neither universally valid nor directly transferable to other study regions. Because computational and hardware resources were limited, thresholds had to be set for t-values and overlap. Preferably, the calculation would have been continuous. In any real dendro-provenancing study, the dataset displays other clusters of between ring-width series similarities [50][51][52][53]. In ideal circumstances, the between-sites signal differences in an area are pronounced enough to allow for site-specific similarity clusters. The range of t-value thresholds that capture the respective level of similarity necessary to establish site-specific, elevation-specific or regional reference chronologies, respectively, should be investigated independently for each study region. This is only possible via simulation, but it seems to have been largely disregarded up to now [5][6][7][8][9][10][11][12][13][14]54].
Moreover, thresholds or t-value ranges that were determined based on simulation require re-evaluation when geographically expanding the dataset of a study region or when investigating other tree species. On the one hand, the pseudo signals for Norway spruce generated here appear disparate when applying the t �15 threshold. For neighboring sites, however, narrowing the sampling grid by adding new site chronologies may gradually dissolve the between-site signal differences. On the other hand, certain signal differences may turn out to be more robust and may persist even when new data are added. For example, pronounced elevation specific growth signals have been reported by several studies [55][56][57][58][59][60]. In the simulation presented here, high-elevation and medium-to low-elevation pseudo site signals proved robust and were separated even when using single series (phs approach at t �5) or mixed signal object chronologies (poc approach at t �10 and osr-0.67).

Conclusions
Simulation is paramount for future dendro-provenancing studies. In any study region, the contamination risk of local reference chronologies is unknown a priori, since the site provenance of historical timbers remains ambiguous. Thus, this risk can be objectively assessed by simulation only.
In the dataset studied here, uncontaminated pseudo reference chronologies were established only with very strict settings for the t-value and on-site ratio thresholds. Consequently, a high minimal similarity between the object chronologies that enter a reference chronology is required. Moreover, these object chronologies must represent unmixed local site signals.
The approach presented here can be extended in several ways: For example, the input covariance matrix of the real sites could be replaced by any covariance matrix. Also, the thresholds for t-values, on-site ratio and overlap may be replaced by continuous inputs. Also, other classifiers than the current One-Nearest Neighbor classifier could be implemented. However, such developments require high-performance programming techniques and/or hardware. These were beyond the resources available to the author and also beyond the scope of this paper.
Another valuable path for future research would focus on validating or revising the simulation model presented here. For all of the 1000 simulated years, the same intercorrelation structure between the pseudo site signals was assumed. In reality, some temporal variability in the intercorrelation between sites is likely to occur, which could be reflected in the model but was ignored here for the sake of simplicity.
In addition to elaborating the simulation model, it could be extended to incorporate multivariate tree-ring time series of proxies such as wood-density or of stable isotopes like δ 18 O and δ 13 C [61][62][63][64]. Multivariate approaches were shown to increase cross-dating success [61,62]. Thus, such an approach is likely to enhance the spatial resolution of dendro-provenancing and lower the risk of local reference chronology contamination.
Without an adequate evaluation of the contamination risk of reference chronologies, the basis of dendro-provenancing remains inscrutable. The approach presented here is a first and not necessarily the most practicable solution. The depicted avenues of future research hopefully spur methodological innovations in the field that will lead to a more elaborated simulation tool for practical application in dendro-provenancing.