A Space-For-Time (SFT) Substitution Approach to Studying Historical Phenological Changes in Urban Environment

Plant phenological records are crucial for predicting plant responses to global warming. However, many historical records are either short or replete with data gaps, which pose limitations and may lead to erroneous conclusions about the direction and magnitude of change. In addition to uninterrupted monitoring, missing observations may be substituted via modeling, experimentation, or gradient analysis. Here we have developed a space-for-time (SFT) substitution method that uses spatial phenology and temperature data to fill gaps in historical records. To do this, we combined historical data for several tree species from a single location with spatial data for the same species and used linear regression and Analysis of Covariance (ANCOVA) to build complementary spring phenology models and assess improvements achieved by the approach. SFT substitution allowed increasing the sample size and developing more robust phenology models for some of the species studied. Testing models with reduced historical data size revealed thresholds at which SFT improved historical trend estimation. We conclude that under certain circumstances both the robustness of models and accuracy of phenological trends can be enhanced although some limitations and assumptions still need to be resolved. There is considerable potential for exploring SFT analyses in phenology studies, especially those conducted in urban environments and those dealing with non-linearities in phenology modeling.


Introduction
Historical trends in plant phenology (environment-mediated chronology of periodic life-history events (phenophases)) have become widely used for assessing climate change and its effects on ecological processes [1][2][3][4]. Such analyses rely on long-term records of phenological observations accumulated within national networks or collected by individual researchers [5][6][7][8]. Global warming most often advances spring (earlier budbreak and earlier emergence of leaves and flowers) and slightly delays autumn phenophases (later leaf coloration and leaf drop) with the overall effect of growing season lengthening in mid-to higher latitudes [9][10][11]. Yet, in some areas warmer cold period conditions may prevent the fulfillment of chilling requirements and, as a consequence, delay spring phenophases [12].
Historical records often span relatively short periods or have missing data preventing researchers from making meaningful statistical inferences when they regress phenophase timing on years [13]. Consequently, it may be problematic to evaluate historical trends in phenology and extrapolate them to the future because some records may miss the full magnitude of climatic cycles or extreme events. This raises concerns about the validity of some published literature with conclusions based on less than 10 years of data [14].
A typical approach to overcome the problem of scarce phenological time-series has been to combine data from multiple stations and either derive a mean regional signal [9] or use alternative methods of time-series optimization, which also allows reducing the noise and removing outliers in data [15][16][17]. Recently, records other than direct phenology observations, including dated historical photographs and herbarium specimens, have been used to estimate historical changes in phenology [18,19]. However, the temporal accuracy of such data is generally significantly lower. The exclusive reliance on statistical techniques, that do not require combining data from multiple stations, is also an option for estimating missing values. They include imputation-based procedures, weighting procedures, and model-based procedures. In the imputation-based approach missing values are filled in using any of the imputation techniques (hot-deck, mean, and regression), followed by data analysis using standard methods [20]. The more advanced approach is model-based procedures, such as the expectation-maximization (EM) maximum likelihood method [21]. However, when too many important observations are missing, obtaining such data can only be done through controlled field experiments, mainly artificial warming of natural plant populations or communities [2,22,23], or by making use of naturally occurring latitudinal/altitudinal gradients of tem-perature [5,24]. A recent meta-analysis of warming experimentation in phenology research revealed its lower sensitivity compared to long-term observations [25]. Some of the limitations of experimentation and gradient analysis can be alleviated by integrating them in a single study [26]. Experimental approach is feasible in herbaceous and shrub communities, but it is much more difficult to apply in forest stands. Furthermore, it is challenging to experiment with effects of cooling, which may be of interest if data for historical lows are not available. On the other hand, the use of spatial climatic gradients may be a more achievable option in evaluating tree responses to changes in climate.
It has been suggested that elevated temperatures and CO 2 concentrations in urban areas provide a low-cost alternative to experimental studies of climate change effects [27]. From a landscape ecology perspective, the urban-rural temperature gradient, known as the Urban Heat Island (UHI), may be considered as a broad-scale experiment that provides an opportunity not only for assessing the effects of global warming but also for filling gaps in phenological records. While phenological observations have been traditionally conducted in the vicinity of human settlements, most such data are spatially isolated point features. Several recent studies focused specifically on spatial variation of temperature and phenology across urban areas and reported that variations in spring phenology were correlated with temperature patterns [27][28][29]. Most such investigations support the prevailing view that urban warming leads to earlier onset of spring phenophases and longer growing season, with a few exceptions [30]. We propose that scarce historical records can be combined with data collected during one or more growing seasons over many locations along urban-rural temperature gradients. Phenological models developed using this combined phenology and climate data can help us restore missing observations of the past and thus estimate phenology trends more accurately. This should be straightforward if monitored plants commonly grow in a metropolitan area as either managed or remnant urban vegetation. Otherwise, experimental approach [27,29] may be an option. Use of spatial data for inferring temporal dynamics has a long tradition in ecology. The approach is commonly known as the space-for-time (SFT) substitution whereby spatially separated sites selected based on either ecological or environmental gradients serve as proxies for predicting ecological timeseries, e.g. vegetation succession [31,32]. Therefore, the primary goal of our study is to test the usefulness of the SFT in studying urban phenology. Specifically, we combine historical temperature and spring phenological records of several tree species with temperature data and phenological observations of same species conducted during two growing seasons at multiple sites along an urban-rural temperature gradient. We use these combined datasets to build complementary spring phenology models and compare them to those constructed from historical data alone. Historical trends of phenology are then estimated using these models.

Materials and Methods
Our study area comprises the capital of Inner Mongolia province of China, Hohhot (40u49' N, 111u41' E), which occupies 2158 km 2 and has a population of 2 million people. Hohhot is located in cold semi-arid climate characterized by mean annual temperature of 6.6uC and mean annual precipitation of 394 mm and lies at an average elevation of 1050 m above sea level (Fig. 1). The whole area is in the typical steppe zone with natural vegetation dominated by C4 grasses with semi-shrubs and woodland and shrub communities on mountain slopes.
Our field campaign was designed to collect phenological and temperature data during two growing seasons (2010-2011). Considering that potential differences in temperature are influenced by land use, we selected trees in urban parks, urban dense forests (city botanical gardens, plant nurseries), edges of busy streets and major roads, and agricultural hedgerows. No specific permits were required for conducting our field research because all sites were in public areas and did not have endangered or protected species. Three representative healthy individuals of each species growing in close proximity were sampled and tagged at each site. Sites were visited 2-3 times a week during the early to late spring and about 1-2 times a week in the early summer. Julian days of occurrence of phenophases were recorded by several observers following a standardized protocol and by taking digital photographs. In final analyses we use phenophase timing averaged between the three individuals of each species.

Temperature Data Collection
Historical temperature data for Hohhot were obtained from the China Meteorological Data Sharing Service System of the World Data Center (WDC) using online data retrieval tools (http://cdc.cma.gov.cn/). To collect data at different sites across the city we used 1-Wire iButtonH devices including 36 ThermochronH DS-1921G (61uC accuracy and 0.5uC precision) and 8 HydrochronH DS-1923 (60.5uC accuracy and 0.0625uC precision) data loggers. Accuracies of all devices were cross-checked prior to deployment by calibrating in the common room environment to the scientific grade Dynamet (temperature accuracy 60.25uC) weather station (Dynamax Inc.). Data loggers were deployed in March before maximum daily temperatures reached 0uC. We mounted them at 2 m height on one centrally located tree (one iButton per site) by drilling cavities in tree trunks, spraying paint to protect cavity walls, and inserting PVC pipes (3-cm diameter and 4-5 cm in length). Cavities were facing north to minimize direct solar illumination. Data loggers were placed inside these pipes and cavities shielded by two layers of plastic mesh to allow for ventilation and camouflage from potential vandals and birds. To deal with natural defenses of trees against the wounds we inspected trees during regular visits and removed, if necessary, new cambium growth on outer edges of the cavities. This turned out to be necessary for some of the trees only once in June. Temperature data were recorded at 3-hour interval and converted to daily maximum, minimum, and mean temperature for further analyses. To check whether historical data from Hohhot weather station can be used as a proxy for meteorological conditions in the arboretum, we used nearest (within several hundred meters of each location) spatial sites in similar park-like environments ( Fig. 1) and compared their temperature seasonal patterns during 2011. We found remarkably similar temperature patterns with absolute differences predominantly under the accuracy level of our data loggers, which suggests historical temperature data can be justifiably used to model historical phenology.

Statistical Analyses of Spring Phenology
Phenological models are often based on linear relationships between thermal regime and developmental rates of plants and range from two(three)-parameter 'spring warming' regression [34] to more physiologically realistic complex models [35]. There are variants with regards to the form of the explanatory variable included in the 'spring warming' model. While some researchers consider mean temperature for periods of different length preceding phenological events [10,36], others employ the concept of accumulated growing degree-days (AGDD) [35,37] of the form: where T m is the daily mean temperature and T b is the base temperature (typically 0uC or 5uC). Days with T m ,T b do not contribute to the sum. The GDD accumulated since the first day of year (DOY) until the DOY at which a specified phenological event occurs signifies a phenophase forcing temperature.
We used a simple 'spring warming' AGDD-based linear regression model fitted first to historical data and then to the combined historical and spatial data using the standard ordinary least squares method. In cases where sample size allowed we also constructed regressions from just the spatial data for the purpose of comparison. The model does not take dormancy into account and assumes the chilling requirement necessary for dormancy release has been fulfilled at the start of temperature accumulation. We chose T b = 5uC as the most commonly used threshold for cereal crops and woody plants of temperate regions [38]. The starting date for GDD accumulation was shortly after March 20 (DOY 80) for both 2010 and 2011. Both the starting date and T b are essentially identical to those derived by optimization for deciduous forest of the northeastern United States [39]. It is also consistent with some previous observations in Germany suggesting high sensitivity of spring phenology to late spring (March-May) temperatures [39]. We tested the idea that timing of phenophases may be influenced more strongly by AGDD either slightly before or after the average date of phenophase occurrences. This was done by plotting correlations between DOYs of each phenophase and AGDDs with a daily time step and selecting the nearest optimum DOY at which correlation is the highest. Further details on this approach are given in [40,41].
To assess whether combined spatial and historical phenology data improve predictions based on historical data alone we used combined dataset with the added indicator variable specifying each group (''s'' = spatial and ''t'' = temporal) and the Analysis of Covariance (ANCOVA) [42,43]. ANCOVA has the general form where y ij is the value of phenophase timing for the j th observation in the i th group (spatial vs temporal); m is the overall mean (constant) of phenophase timing; a i is effect of i th group on phenophase timing, defined as the difference between each of the group mean and the overall mean; b i is the slope term for group i; x ij is the value of AGDD for the j th observation from the i th group; X i is the mean value of AGDD for group i; and e ij is the random error [43]. If the regression slope is the same for both groups (b C ), the model is formulated as Finally, if there is no interaction and no group effect, the model reduces to a linear regression where groups are ignored and slope b T is the fit to all the data.
By fitting the three models sequentially and using corresponding null hypotheses and the F statistic, we checked whether individual regression lines for the two groups are 1) not parallel (significance of the interaction effect), 2) parallel (significance of the indicator variable), or 3) coincident (single fitted line for the two groups) (Fig. 3). We assessed normality of all variables with Shapiro-Wilk tests. Linearity assumption was checked by fitting a regression with slope = 0 to the residuals. We examined plots of predicted values and residuals to inspect for homoscedasticity. The Durbin-Watson test was used to check for autocorrelation in the residuals.
After selecting best models from combined data (the SFT approach) we applied them to temperature records  and estimated historical gaps in phenology observations. We used these gap-filled time-series to estimate linear trends and compared them to those fitted to historical data alone. This was done, again, by combining these two datasets and fitting separate linear regressions to whole time series and using the same ANCOVA approach for comparing regression slopes. The independent variable here is the year of observation. In addition, we assessed the effect of historical data sample size on these regressions by eliminating one data point at a time, starting from the earliest, and fitting corresponding models. Regression assumptions were checked as described above. All statistical analyses were performed in SAS JMP 7.0.2 software.

Temporal vs Spatial Variation of Temperature and Phenology Data
Spring temperature in Hohhot exhibited a warming trend over the last five decades, although patterns of change in phenology may not be easy to detect (Fig. 2). Note that the spring in 2010 (last point on the graph) was abnormally cooler than the previous 15 years while in 2011 (not shown) it was as warm as in most recent years. To assess whether variability of spatial data exceed or at the level of historical data variability we compared phenology (Table 1) and AGDD (Fig. 4) data statistics. Mean historic AGDD stays close to the mean of urban sites and higher than average AGDD in park and rural locations. However, historic standard deviation is consistently higher (Fig. 4). Both temporal and spatial variation increases with the season progression, although spatial variation increases slower, especially after DOY 120 (Fig. 4). Temporal variation of phenophase timing of all species is also consistently higher than the spatial variation (Table 1). Moreover, this pattern does not always correlate with sample size, e.g. STD of P. bolleana is lower despite larger N ( Table 1). The comparison of statistical distributions of temperature and phenology suggests our spatial data should probably serve as auxiliary to fill some of the gaps in historical data but cannot be used to completely substitute historical records because spatial variation is consistently lower for both variables.

Differences in Regression Slopes of Combined Models
There is no difference in the way the timing of phenophases from historical data and that from spatial data change with increasing AGDD (Table 2, note large p-values for the interaction term effect). Small p-values for the indicator variable effect and ttest results of the least squares means (not shown here) suggest two parallel lines can be fit to the combined dataset for about half of the phenophases. This information was used to fit regressions using combined historical and spatial data. Those phenophases with high p-values for the indicator variable effect were fitted using the parallel lines model, while others were fitted using the single line model. We then evaluated outputs from these regressions with those constructed using just the historical data by considering statistical significance and standard error (SE) of slopes and percent of phenophase timing variance explained by these models (Table 3). Most slopes and SE are similar when compared between the two groups of models (historical vs combined). We note some improvements, mainly because of considerable increases of R 2 adj . Combining temporal and spatial data for P. bolleana increased the degrees of freedom and improved R 2 adj of leaf budburst and first leaf appearance phenophases (Table 3). Similar improvements are evident when leaf budburst and flowering phenology of Salix babylonica are examined. Historical correlations of all Ulmus pumila phenophases with AGDD were quite low. Combining historical with spatial observations increased R 2 adj of leaf phenophases to some degree but the slope of leaf budburst remained nonsignificant ( Table 3).

Analysis of Temporal Trends of Phenology
Comparing between linear fits to historical data and to the full time-series of predictions from best combined (SFT) models suggests no difference in slopes, although the former are often not significant due to small data sizes (Table 4). Phenology observations of P. bolleana started only in 1991, so large slope differences are mostly the result of temporal mismatch between the two datasets compared (Fig. 2). This confirms high dependence of trend estimation on such factors as starting year, end year, and duration of analysis [14]. Furthermore, linear trends of time-series predicted by historical data alone and those by best combined models are also similar in their rate of change (Fig. 5, first pair of bars in each graph) suggesting historical data are actually sufficient for model building and trend estimations. All slopes are negative and translate into phenophase advancement of 2-3 days for P. bolleana and S. babylonica and one day per decade for U. pumila. The advantage of combining spatial and temporal data becomes evident when historical data size is further reduced (Fig. 5). Resulting slopes are shown for historical data reduced by half or more (and the corresponding reduction of combined data), except for U. pumila whose full sequence is not shown here. Most slopes of linear trends fitted to predictions from combined data remain at the same level, but they are less stable for the LA phenophase of P. bolleana and BF phenophase of S. babylonica. Slopes of linear trends fitted to predictions from historical data are more sensitive to data size reduction, especially for BL, BF, and PF phenophases of S. babylonica and LA phenophase of U. pumila. Several statistical differences between slopes for the two species (Fig. 5) are the result of this instability and support the idea that combining spatial and temporal data is beneficial, especially when the number of historical records is really small, e.g. 6 or 4. Interestingly, the significant difference in slopes for LA of P. bolleana is caused by slope instability for the combined dataset.

Observed Improvements in Phenology Models and Estimation of Historical Trends Achieved by SFT Substitution
SFT substitution appears to be expedient for phenology modeling of large woody plants for which overall historical sample sizes are either small or the continuity of time-series is interrupted by substantial gaps. This is supported by our findings for P. bolleana and S. babylonica that each had only 8-10 observations for different phenophases. By collecting additional data we were able to fill gaps in the lower AGDD and later phenophase timing space of scatter plots for these variables, as well as adding more points to the middle (Fig. 3). It eventually improved phenology models but did not actually affect estimates of historical rates of change in phenophase timing. As expected, significant differences in slopes of trend lines were detected only for the reduced dataset of S. babylonica, though P. bolleana did not fit this pattern. The results illustrate the importance of data availability for just one or two earliest years for prediction of realistic linear trends. Our findings for the other species are variable. Two other species of poplar were not as common in the urban landscape, so adding scanty spatial data did not affect regression models constructed with historical data. Although U. pumila had one of the most complete historical records, we could not identify strong enough relationships of phenophase timing with temperature. Combining it with spatial data provided only a small improvement.
The highest percent variance was consistently explained by AGDD in either BL or F1 models. We attribute this to the more accurate and unambiguous detection of these phenophases by different observers, while human errors are presumably higher for the LA phenophase. During our field campaign flowering phenology was only observed in S. babylonica and U. pumila. Combining these observations with temporal data improved Figure 3. Graphical representation of ANCOVA analysis for the leaf emergence phenophase. Historical data are depicted by blue star symbols and spatial data by red crosses. Linear fits (either a single or two parallel lines) are selected based on ANCOVA. X axis is the growing degree days accumulated to a specified day of year. doi:10.1371/journal.pone.0051260.g003 phenology models sufficiently, but percent of variance explained by temperature remained low or relatively moderate.
Chen and Xu [40] recently demonstrated high contrasts in historical trends of growing season onset of U. pumila across the entire temporal zone of China. In particular, among the four stations centered on our study area, two showed significant advancing trends for the beginning of growing season (4-7 days/ decade), while one station showed a delayed trend of 3 days/ decade, and another exhibited almost no change (stations 17-20 in Table 1 of Chen and Xu (2011)). However, with one exception, none of those trends were statistically significant, which is also demonstrated by our results for this tree ( Table 4). As noted earlier, these trends are weak because of high year-to-year variation in phenophase timing and its lower correlation with AGDD. When we substituted historical data gaps with those estimated from our regression models, trend lines became statistically significant, but slopes remained similar.

Potential for Improving the SFT Substitution in Phenology Research
We believe there are still underexplored opportunities in bringing temporal and spatial perspectives together, which deserve further research efforts in phenology studies. Most our spatial data did not cover whole ranges of historical data so it could not be used to predict historical dynamics per se, but it was useful as complementary data. We envision most urban temperature/ phenology gradients surveyed within a single year cannot surpass variability of historical records. Combining data from two or more years, as well as incorporating sharp altitudinal gradients [44], where possible, is expected to alleviate this shortcoming. Collecting spatial phenological data is labor intensive and may sometimes be thwarted by logistical challenges. Recent technological advances can alleviate some of these problems. With the increase in remote sensing capabilities, introduction of advanced monitor-ing devices, and development of cyber-infrastructure [45], high quality spatial data on phenology can be collected.
In situations when historical data are too few to construct robust statistical models and make inferences, the sheer benefit of the SFT approach is in boosting the sample size and gaining statistical power. This was partly demonstrated by our analyses although we did not have independent datasets to conduct true validation, which is one of the major shortcomings of the current study. We also believe the success of the SFT substitution depends much on both temporal and spatial data structures and illustrate this in the hypothetical examples (Fig. 6). The two upper graphs show favorable situations when adding spatial data sufficiently improves phenology models. This is especially true for the upper right case where both the total number of data points is substantially increased and the range of spatial data far exceeded the historical range. Two lower graphs are examples of unfavorable situations when adding spatial data is unlikely to help in producing useful models, especially in the lower right case. In addition, if long-term observations are terminated because of land use changes or the loss of trees at a site, the SFT may serve as an actual physical substitute for the discontinued time-series.
Since a linear relationship between temperature and phenology is often assumed, it can be theoretically predicted with a limited number of points. However, if the true relationship is non-linear, a sufficient number of observations are required to build a reasonable model. By the same token, historical trends of phenology do not need to be linear. For example, one may be interested in analyzing historical change by fitting a locally weighted scatterplot smoothing (LOESS) regression, which requires a large, densely sampled dataset. The SFT substitution has great potential for handling such non-linearity, provided that the combined dataset meets such data density requirements.
Finally, the statistical framework in our study was quite simple, but the SFT substitution for estimation of trends in plant phenology can certainly benefit from more elaborate approaches, which include linear models combined with Gaussian Mixture   Models [46], Bayesian statistics [13], penalized regression [47], or Generalised Additive Models for Location, Scale and Shape (GAMLSS) [48].

Theoretical Assumptions and Practical Limitations
SFT substitution implementation in phenology research is faced with a number of assumptions and uncertainties. The most underlying assumption is that plants respond to spatial patterns of temperature in the same way they respond to year-to-year temperature variation. Besides, climate-based phenology models rely on the simplifying assumption of both temporal and spatial stationarity, i.e. phenology responses to environmental cues are stable through time and across space [49]. The validity of these assumptions remains to be tested. Another important limitation of our study is likely correlated error structure in repeated phenological observations from multiple locations [50], which may increase the likelihood of type I error in hypothesis testing and create problems in statistical inference. Drawing upon the discussion of models with missing data in repeated (longitudinal)   observations Kelly [50] suggested several statistical methods to deal with the issue, which should be considered in future research. In our study we did not consider other environmental cuesphotoperiod, chilling requirement, or precipitation. The effect of photoperiod is only relevant when studies are conducted at broad scales, or if they involve late successional plant species known to be under photoperiodic control [51]. Besides, photoperiod does not vary from one year to another, and it is unlikely to influence interannual variation of phenology in a single place [52]. Precipitation was not considered in our study, partly because of our inability to measure its spatial variation across the urban area and partly because of widespread irrigation. For simplicity and for the ease of comparison between spatial and temporal data we assumed all these cues are not as critical as forcing temperature, a primary trigger of spring phenology in temperate regions [6,[53][54][55]. The relative importance of environmental factors in triggering phenophases, exact molecular and physiological mechanisms of phenology, and the relative role of genetic differences vs plastic responses to environmental heterogeneity are still open-ended questions for these and most other plant species [56]. Weaker correlations of phenophase timing with thermal regime found for some trees in our study are likely because of the omission of some these factors.
Although we selected mature trees, we did not control for internal factors of phenology -tree age and genotype. Species identity in our study was checked without examining their genetic structures. This is another important limitation, which we plan to address directly in future research. Previous analysis using common garden experiments [57] suggested temporal deciduous trees exhibit high phenotypic plasticity, i.e. effects of locational differences may override the effect of genetic differences of species.
Major uncertainties in phenology modeling are related to the fact that phenological data are derived from visual observations, not from instrumental measurements [46]. Observations are prone to bias and misinterpretation of protocols, which increase with the increase of the number of observers. Finally, one particular difficulty of studying urban phenology is rapid land use changes and unexpected management decisions. These processes change plant distribution patterns and microclimate of each site, which may further affect the reliability of long-term studies. An additional challenge is related to safeguarding meteorological and photographic instrumentation in places of high population density.

Conclusions
In our study, we proposed and tested the usefulness of the indirect SFT substitution whereby quality spatial phenology and temperature data obtained in the same urban landscape are used as a leverage to fill gaps in scarce historical records and build complementary phenology models. Because of the legacy of phenological monitoring in or near populated places urban regions are well suited for applying the SFT substitution and should receive more attention from researchers developing approaches to examining phenology and climatic trends. Not only they create temperature gradients (UHI), but also modify other factors that may affect phenology, including CO 2 enrichment, environmental pollution, reduced pollination by insects, and, potentially, light pollution.
Potential benefits in applying the SFT substitution in phenology studies are summarized as follows. The approach allows increasing the overall sample size and developing more accurate statistical models of phenology when historic data are scarce. Important data gaps in phenological records can be filled in, provided that a strong relationship of phenophase timing with environmental triggers (temperature) is established. Slopes of historical linear trends may be estimated more accurately. We conclude there is considerable potential for exploring SFT substitution analyses in phenology studies, especially those conducted in urban settings.