Application of Genetic Algorithm to Predict Optimal Sowing Region and Timing for Kentucky Bluegrass in China

Temperature is a predominant environmental factor affecting grass germination and distribution. Various thermal-germination models for prediction of grass seed germination have been reported, in which the relationship between temperature and germination were defined with kernel functions, such as quadratic or quintic function. However, their prediction accuracies warrant further improvements. The purpose of this study is to evaluate the relative prediction accuracies of genetic algorithm (GA) models, which are automatically parameterized with observed germination data. The seeds of five P. pratensis (Kentucky bluegrass, KB) cultivars were germinated under 36 day/night temperature regimes ranging from 5/5 to 40/40°C with 5°C increments. Results showed that optimal germination percentages of all five tested KB cultivars were observed under a fluctuating temperature regime of 20/25°C. Meanwhile, the constant temperature regimes (e.g., 5/5, 10/10, 15/15°C, etc.) suppressed the germination of all five cultivars. Furthermore, the back propagation artificial neural network (BP-ANN) algorithm was integrated to optimize temperature-germination response models from these observed germination data. It was found that integrations of GA-BP-ANN (back propagation aided genetic algorithm artificial neural network) significantly reduced the Root Mean Square Error (RMSE) values from 0.21~0.23 to 0.02~0.09. In an effort to provide a more reliable prediction of optimum sowing time for the tested KB cultivars in various regions in the country, the optimized GA-BP-ANN models were applied to map spatial and temporal germination percentages of blue grass cultivars in China. Our results demonstrate that the GA-BP-ANN model is a convenient and reliable option for constructing thermal-germination response models since it automates model parameterization and has excellent prediction accuracy.


Introduction
Seed germination rate is often used to evaluate the suitability of an environment for the cultivation of different plant species [1][2][3]. Constructing a mathematical model that accurately predicts the effect of temperature on germination percentage helps to reduce failure of grass establishment caused by inappropriate sowing dates or mismatching of grass species with climate zones; therefore it is particularly useful in selecting appropriate grass species and sowing times. Several mathematic models have been developed to simulate the germination response to temperature based on the experimental data [4][5][6][7][8][9]. These previous models were mainly used to predict: I. the time, under a constant temperature condition (cumulative temperature), required for the expected germination of a specific variety [4][5][6], and II. the germination percentage under a temperature fluctuation regime [10][11][12]. It is well-documented that the natural fluctuations in temperature between day and night could be required for initiating and/or facilitating seed germination [8], these diurnal fluctuations of temperature are frequently adopted to generate data for building prediction models [13]. To date, the core functions of the published temperature-germination response models [11,12] consist on the estimation of populations' thermal response parameters [14][15][16][17][18] or on optimization of polynomial equations using iterative curve fitting [6,10]. These functions constructed by various scientists are usually different from each other because their parameters are selected for fitting the germination data of a particular batch of seeds. In addition, the different researchers' preferences on selections of core functions and parameters for fitting the germination models might generate different outputs. Since the seed germination is influenced by various factors, scientists should develop multi-objective evolutionary algorithms for the germination response model. Recent years, the genetic algorithm (GA) is widely used as a non-dominated sorting based multi-objective evolutionary approach that doesn't need specifying a set of sharing parameters [19][20][21]. In an effort to simplify and standardize the selection of core function for a temperature-germination model, we propose a GA-based data mining approach that automatically generates a core function of seed germination and temperature correlation. Meanwhile, the back propagation (BP) algorithm is also recruited to optimize the GA based temperature-germination model [10].
Poa pratensis L. (Kentucky bluegrass, KB) has long been used in lawns because of its excellent agronomic characteristics [22][23][24][25]. Compared to other winter-season turfgrass species, this species has fairly low irrigation requirements due to its good tolerance to drought stress [26][27][28]. It is also adapted to a moderate range of salinity and alkali stresses [29][30][31][32]. Though the KB could survive a wide range of temperature conditions, its germination is very sensitive to extreme temperatures [33][34][35][36]. Hence, the KB would be an ideal species for optimizing temperature-germination response models. In addition, the optimized models would directly help decision-maker in selecting optimal sowing regions and times for this broadly cultivated grass. To further facilitate KB cultivation in China, the GIS temperature data covering the whole nation were used to generate accurate and quantitative suitability maps for cultivation of KB cultivars, which have already been imported and are quickly gaining acceptance to many areas in the country [37,38]. Briefly, the means of minimum and maximum temperature data on the national geo-grid of China for a 25-years period, obtained from NASA, were used for calculating suitability values of the tested cultivar in the optimized GA-BP-ANN temperature-germination prediction functions.
The objectives of this paper are to: (i) provide an automatic approach (GA-BP-ANN) on revealing the seed temperature-germination relationships, (ii) use the GA-BP-ANN based temperature-germination models to predict the suitability of five KB cultivars ('Midnight II', 'Diva', 'Rugby II', 'Leopard' and 'Sapphire') throughout the national temperature grids in China. In short, it is tried to construct a new approach for grass suitability evaluation, and provide decision-maker with some useful information for selecting reasonable sowing regions and times for KB.

Germination response to diurnal fluctuations of temperature
The germination responses of the five tested KB cultivars were similar to each other (Tables 1-5), and the mean germination percentages under all the tested temperature regimes showed no significant difference for all the five tested cultivars (P > 0.05). They could all germinate at cool-period temperatures ranging from 10 to 30°C combined with warm-period temperatures ranging from 15 to 35°C. Considering optimum germination is usually defined as a germination percentage of not lower than the maximum germination minus one-half of its confidence interval (P = 0.05), optimum germination was found to be reached when the tested seeds were grown under a temperature regime with a cool-temperature between 10~25°C combined with a warm-temperature between 25~35°C. The germination percentages of all five cultivars were lower than 50% under constant temperatures within the thermal range from 10 to 30°C, while no germination was registered at constant 35°C. The maximum germination was observed when the warm-temperature was 5~10°C above the cool-temperature of the fluctuating thermal regime (Tables 1-5). In other words, the fluctuating temperature regimes in the range from 15 to 35°C promote KB germination.

Performance of different temperature-germination response models
The performances of GA-BP-ANN temperature-germination response models generated in this study were compared with the previously published regression approaches including general quadratic and BP-ANN based quintic equations [10,12].
The RMSE values, which is proposed as statistical indicators for the evaluation and comparison of multi-dimensional models [39,40], also present similar performances among different KB cultivars (Table 7). For every KB cultivar, the RMSE values: GA-BP-ANN < BPquintic < General quintic < BP-quadratic < General quadratic models. It suggested that GA-BP-ANN models present the best fitness for simulating the temperature-germination response of the tested five KB cultivars. In addition, the back propagation (BP) algorithm is an effective optimization tool for the tested non-linear regression models. Table 3. Cumulative seed germination of 'Rugby II' at different days in 36 temperature regimes (50 seeds in total).

Cool period temperature (°C) 16h
Warm period temperature (°C) 8h   The cultivation suitability of a grass cultivar is defined by its acceptable germination percentage in the planned cultivation area for a particular period of time. Daily means of minimum and maximum earth surface temperature for a 25-years period in each cell of the Chinese map grid were fed into the new GA-BP-ANN temperature-germination functions, so as to predict a germination percentage for the tested cultivars in different months (Figures A~D in S1 File). The predicted germination percentages were subsequently converted to the suitability for the tested cultivars within each grid cell of the map via FreeMicaps (Figs 1-5). Among all the tested KB cultivars, suitability of 'Rugby II' was found to be the narrowest in both geological and time scales (Fig 3). In contrast, 'Leopard' was shown to have the widest suitability in both geological and time scales (Fig 4). To consider the germination capability, the sowing time of all five tested cultivars should not be arranged before March (Figs 1-5). However, the sowing time should not be later than October since the seedlings will face the cold stress in the later months.
Our results also showed that the fluctuation in temperature between day and night was an essential factor in facilitating seed germination of KB. The best evidence supporting this is that KB are documented to have very low germination in Hainan (the southernmost region in the map) and Taiwan (the southeast island in the map) provinces where the day/night temperatures are amenable around 20°C without substantial fluctuation from December to March [41]; all the five KB cultivars were also predicted to have very low germination in these two provinces during that period (Figs 1-5, Figures A~D in S1 File). Table 5. Cumulative seed germination of 'Sapphire' at different days in 36 temperature regimes (50 seeds in total).

Discussions
The mathematic models to correlate environmental factors and germination responses of grasses have significantly contributed to the selection of suitable grasses for various regions with different environmental conditions [42][43][44]. These prediction models could provide efficient approaches to evaluate desirable characteristics and even to identify new traits of a candidate grass species that help it to prevail in new environment conditions [45]. These models could also help elucidating the relationship between genotype and germination-related phenotypes to support rapid expansion in the cultivation of various grass cultivars [45]. Furthermore, these environmental factor-germination response models could also provide us data to foretell how the changes in agricultural systems could influence the grass germination [10,46]. Grass scientists have already developed various models for simulating the seed germination responses to temperature conditions [10,[42][43][44]. This research tries to construct simulating models based on the unsupervised GA-BP-ANN, which automatically generates regressions directly from the inputted experimental data. When combined with a visual suitability map, these new regression models could provide decision makers a confident approach to select grass species and to plan seeding times [11,12]. The back propagation (BP) network is the most widely used for nonlinear relationship simulation [47]. The BP based simulation belongs to supervised learning; its training process has two phases: forward propagation and backward propagation [48]. In the forward propagation, the weighted and threshold values of each layer are calculated by iteration and passed into the three-layer BP network. The backward propagation (BP) uses the weighted and threshold values for revision [49]. In this study, the BP algorithm was used to smoothing the surface of nonlinear temperature-germination responses and it was proven to effectively optimize both core functions of quintic equation and GA.
The quadratic response surface used to be a dominant method for analyzing grass seed germination performance under a series of temperature regimes, especially to test the impact of diurnal temperature treatments on the seed germination [11,12]. However, the two-dimensional response surface could not show the global fitting error between the quadratic function and the experimental data [10,12]; the drawback of high prediction errors (RMSE ranges from 0.21 to 0.23) could not be ignored. Consistent with our previous study [10], the quadratic/quintic equation showed significantly lower fitting errors and higher confidences than their corresponding general quadratic/quintic ones (Table 7). That might be because the temperaturegermination correlation was nonlinear [10]. Significantly, the GA-BP-ANN models for all tested five cultivars demonstrate lowest prediction errors (RMSE ranges from 0.02 to 0.09). Moreover, the GA-BP-ANN models provide with us a more reasonable prediction on region/ season suitability, especially in the warm regions with less temperature fluctuation between day and night (eg. Hainan and Taiwan). The tested germination percentages in warm regimes (Table 1-5) are very low when the temperature decreases are less than 5°C from T 1 to T 2 . We introduced a temperature-germination response model, the GA-BP-ANN, in this study for predicting optimal sowing region and timing of five KB cultivars. It shows better performances than several published models, including general quadratic regression and BP-ANN based quadratic/quintic equations in two main aspects. Firstly, the construction of temperature-germination response is simplified since there is no requirement on selections of core functions (such as quintic equation in previous study [10]) and parameters for fitting the proposed GA-BP-ANN models. Secondly, the GA-BP-ANN models showed a better fit (lower RMSE values) than BP-ANN models previously developed. However, the present version of GA-BP-ANN model for germination response still has potential for further improvement in several aspects. For example, data of field experiences for seed temperature-germination response should be collected for further validating these GA-BP-ANN models of KB cultivars. As suggested by Hardegree and Van Vactor, both field-variable and chamber-variable temperature-germination response data should be included in the regression equations [50][51][52][53]. Moreover, the advantage of GA-BP-ANN should be utilized in building plant response prediction module to other environmental factors which also influence the seed germination. Bradford [54] quantified the seed germination behaviors upon a wide array of environmental conditions, such as temperature, water potential, so as to build general germination-response models of grass seed. In addition, plant responses to environmental factors at different growth stages, especially the seedling stage, should be tested in laboratory conditions in future research, and the results could be applied to building GA-BP-ANN models to predict the growth performance of crop in field. GA-BP-ANN model predictions of crop responses to different environmental factors at various developing stage would provide more, reliable information for us to select optimal planting regions and sowing times for various grasses and crops.

Conclusions
In this study, we tested the influence of diurnal fluctuations of temperatures on seed germination of five KB cultivars ('Midnight II', 'Diva', 'Rugby II', 'Leopard', 'Sapphire'). Optimum germination for the five tested cultivars was observed at four fluctuating temperature regimes: 20/ 25, 15/30, 20/30, and 25/30°C. Germination percentages of all five cultivars were found to be lower than 50% at constant temperature regimes ranging from 20 to 30°C.
Both automatic GA-BP-ANN and other regressions were utilized to simulate the grass temperature-germination response function in the current study. Since the GA-BP-ANN method is independent of empirical assumptions, artificial bias is not to be a concern of its prediction. When used by different researchers, the GA-BP-ANN method should always produce unbiased results although the researchers might have different preferences in their selection of the core equation. Therefore, this GA-BP-ANN approach will provide us a user-friendly way to tackle the temperature-germination problem and achieve very high prediction accuracies.
Based on the experimentally derived GA-BP-ANN models and available climate data, a seed-suitability national map of China for the five KB cultivars was generated. The suitability of 'Rugby II' was found to fit the narrowest spatial and temporal ranges, while 'Leopard' fit the widest ranges (Fig 1). In addition, the seed sowing time of tested KB should be arranged from March to October.

Seeds and conditioning
The widely used KB cultivars in China ('Midnight II', 'Diva', 'Rugby II', 'Leopard', and 'Sapphire') were tested in this study. The grass seeds were purchased from Shanghai Chunyin Turf Inc., (Shanghai, China) and stored at room temperature until use. Seed viability was evaluated on moistened filter paper at 25/25°C on receipt of the seeds and after the germination experiments [12]. Finally, the loss of viability during storage was found less than 1% [10,12].
Seeds were surface sterilized by soaking in 0.01% HgCl 2 for one minute and rinsed four times with sterilized distilled water. Then the seeds were placed on moistened filter papers in petri dishes and grown in different incubators consisting of 36 different regimes of diurnal temperature fluctuations: briefly 16 hours of day time at temperature T 1 and 8 hours of night time at temperature T 2 . Both T 1 and T 2 ranged from 5 to 40°C with 5°C increments [12]. Germination counts were conducted daily until no further germination occurred after about 15~20 days (S2 File). In each experiment, three replicas of 50 seeds were tested.

Genetic algorithm
Genetic algorithms (GA) is an iterative stochastic optimization approach inspired by nature's evolutionary genetics: the most fit individual has the highest chance of survival [55,56]. The GA method was widely used in solving many nonlinear optimization problems, including those found in computational biology [57][58][59]. In this study, the GA method is used to generate the fittest mapping function between the independent variable (the temperature matrix T) and As mentioned above, the GA approach simulates the survival of the fittest individuals in the population, controlled by the definition of a fitness score [39]. An initial individual is a map function generated randomly to describe the relationship between the selected temperature matrix T' and its corresponding germination percentage matrix G', where T' = (t 1j, t 2j ), G' = (g j ), and j < i. Different functions represent different solutions for the temperature-germination problem. In general, the genetic algorithm contains the following steps in a sequential order: initial population selection, fitness function evaluation, individuals selection, population reproduction, individuals' crossover and mutation operation modules [55]. In the meantime, the back propagation artificial neural network was also integrated for solution optimization [10]. To construct the GA-BP-ANN model, the observed 108 pairs temperature-germination data in 36 regimes of each cultivar, were divided into a training set (90 observations, 83%) and a test set (18 observations, 17%). The training set was chosen to cover all temperature regimes from 5/5 to 40/40°C. Model validation was performed on 5/5, 40/40, 25/25°C, etc, (test sets) which represented severe and moderate temperature conditions for seed germination, respectively [39]. The Root Mean Square Errors (RMSE) was used to evaluate the fitness and predictive capability [60].
As a reference to the performance of GA-BP-ANN models, the previous published based regression methods, like general quadratic/quintic and BP optimized quadratic/ quintic models (Figure E~I in S1 File, S3 File) were also utilized to simulate temperaturegermination responses [61]. The generalized quadratic equation was [12]: Y 1 = A 0 + A 1 Ã T 1 + A 2 Ã T 2 + A 3 Ã T 1 2 + A 4 Ã T 2 2 + A 5 Ã T 1 Ã T 2 , where Y 1 : percent predicted germination, A 0 : intercept, A 1 through A 5 : coefficients, T 1, and T 2 : diurnal fluctuations of temperature. The general quintic equation was previously described [10]. Briefly, it could be presented as þ A 0 0 , where Y 1 : percent predicted germination, A 0 ': intercept, and f(A): coefficient function.

Spatial mapping
The grass suitability was represented by the germination percentage. The grass suitability maps were created using the FreeMicaps software (http://bbs.121323.com/guojf/ FreeMicaps20111001.rar). Similar to the Surfer software [62], FreeMicaps also uses point (station) data on the grid that is compatible with GIS. The temperature values of adjacent regions around the station were generated using Cressman interpolation method [63,64]. Means of minimum and maximum of daily earth surface temperature in the grids made of 313 weather stations (Figures A~D in S1 File) and for a period of 25-years (from 1983 to 2007) were calculated from the data sets obtained at the American National Aeronautics and Space Administration (NASA) website (http://power.larc.nasa.gov/cgi-bin/cgiwrap/solar/sse.cgi?grid@larc.nasa. gov#s11). The mean, minimum, and maximum daily temperatures over the grids were used as the T 2 and T 1 variables respectively in the GA-BP-ANN functions for calculating germination percentages (grass suitability).