High-Resolution Gene Flow Model for Assessing Environmental Impacts of Transgene Escape Based on Biological Parameters and Wind Speed

Environmental impacts caused by transgene flow from genetically engineered (GE) crops to their wild relatives mediated by pollination are longstanding biosafety concerns worldwide. Mathematical modeling provides a useful tool for estimating frequencies of pollen-mediated gene flow (PMGF) that are critical for assessing such environmental impacts. However, most PMGF models are impractical for this purpose because their parameterization requires actual data from field experiments. In addition, most of these models are usually too general and ignored the important biological characteristics of concerned plant species; and therefore cannot provide accurate prediction for PMGF frequencies. It is necessary to develop more accurate PMGF models based on biological and climatic parameters that can be easily measured in situ. Here, we present a quasi-mechanistic PMGF model that only requires the input of biological and wind speed parameters without actual data from field experiments. Validation of the quasi-mechanistic model based on five sets of published data from field experiments showed significant correlations between the model-simulated and field experimental-generated PMGF frequencies. These results suggest accurate prediction for PMGF frequencies using this model, provided that the necessary biological parameters and wind speed data are available. This model can largely facilitate the assessment and management of environmental impacts caused by transgene flow, such as determining transgene flow frequencies at a particular spatial distance, and establishing spatial isolation between a GE crop and its coexisting non-GE counterparts and wild relatives.


Introduction
The potential environmental impact caused by transgene flow from a genetically engineered (GE) crop to its non-GE counterparts (crop-to-crop) and to wild relatives (crop-to-wild) through pollen-mediated gene flow (PMGF) has aroused great biosafety concerns worldwide, can be directly measured in situ. However, this type of modeling still requires some critical parameters, such as the gene flow coefficient (GFC) [25] and synthetic parameter [27] that must be generated from the field experiments. The requirement of these parameters makes the PMGF modeling impractical to use and dependent of field experiments. Therefore, it is necessary to develop a more practical model that can predict gene flow frequencies more accurately but without data from actual PMGF field experiments.
To achieve the accurate description of PMGF, biological and climatic factors that determine the PMGF frequency should be considered in the modeling. Also, it should be more practical to predict gene flow frequencies only involving measurable parameters that are independent of any PMGF field experiment in the modeling. Rong et al. 2010 [26] developed a quasi-mechanistic model to estimate PMGF frequencies in rice, which included both biological (e.g., outcrossing rate of a pollen recipient, and cross compatibility between a pollen donor and recipient) and climatic (e.g., wind speed and humidity) factors. The model by Rong et al. 2010 [26] attempted to estimate PMGF frequencies more reliably than those only including climatic parameters. However, the model included a decay parameter of an exponential function-a β value that describes the pollen dispersal pattern, for which one needs to conduct field experiments to generate relevant data. In addition, the β values generated only from a few field experiments may not represent the decay parameter for all environmental conditions. Therefore, this quasi-mechanistic model [26] still has its constraint to estimate PMGF frequencies.
To overcome this constraint, we adopted the inverse Gaussian function [36] to replace the exponential dispersal function of the quasi-mechanistic model [26]. Through this replacement, we can avoid the inclusion of the experiment-dependent β values for PMGF modeling, and all the necessary parameters can be measured directly from the target environment. Here, we report the establishment of a modified quasi-mechanistic PMGF model using the inverse Gaussian function. The accuracy of the developed model was validated by comparing its simulation results with five sets of published PMGF data from field experiments. The modified model can provide a useful tool to assess the environmental impact of transgene flow and design suitable biosafety management strategies by accurately estimating PMGF frequencies without field experiments.

Procedure of model establishment
To establish a PMGF model, we used the inverse Gaussian function (Eq 1) by Katul et al. 2005 [36], which describes the wind dispersal pattern of small particles.
This function was used to predict the dispersal of released seeds, including the tiny ones such as Senecio jacobaea (~2 mm) and Solidago rigida (~3 mm) [36] from the plants onto the ground. In our case, this function is used to predict PMGF, in which the updraft pollen grains disperse from donors' flowers to recipients' flowers for monoclinous plants, and from male flowers (donors) to female flowers (recipients) for diclinous plants (Fig 1). Consequently, we made a slight modification to calculate the two key parameters included in the inverse Gaussian function of Katul et al. 2005 [36]: the scale parameter λ and the location parameter μ. As indicated by Katul et al. 2005 [36], the two parameters were estimated by the release height, wind speed, and deposition velocity. Here, we estimate the relative pollen release height, which is defined as the height between the position of updraft pollen grains of donor's flowers and the position of recipient's flowers for monoclinous plants (Fig 1, left panel), and the height between the position of updraft pollen grains of donor's male flowers and the position of recipient's female flowers for diclinous plants (Fig 1, right panel). Therefore, the equations (5b) and (9) of Katul et al. 2005 [36] to calculate λ and μ were modified as follows: where h is the relative pollen release height (see Fig 1 for details); Ū is the average wind speed; κ is the von Karman's constant (determined as 0.4) [36,37]; σ w is the standard deviation of vertical wind speed; and V t is the deposition velocity of pollen grains. The wind speed (Ū) can be directly measured, and the standard deviation of vertical wind speed (σ w ) is approximately equal to 1.1 × uÃ [36]. Let Ū equals to the wind speed at the plant height, then uÃ is calculated using the following equation [25,37]: where H is the plant height; d is the zero-plane displacement, which was commonly determined as 0.63 × H [25]; and z 0 is the roughness length, which was determined as 0.13 × H [37]. The deposition velocity of pollen grains (V t ) can either be obtained from previous studies or estimated by Stoke's Law as follows [38,39]: where d p is the diameter of a pollen grain (m); g is the gravitational acceleration (9.8 m s −2 ); ρ p is the density of a pollen grain (kg m −3 ); ρ a is the air density, which is equal to 1.178 kg m −3 under usual conditions; and η is the dynamic viscosity of air (~1.8×10 −5 kg m −1 s −1 ) [40]. Through the above modification, we can describe the pollen dispersal pattern at different distances using the inverse Gaussian function which is based on the measurable biological and wind speed parameters. Therefore, we can establish a new model to predict PMGF frequencies using the inverse Gaussian function to replace the negative exponential function in the equation (7) developed by Rong et al. 2010 [26].

Model validation
We used five sets of published data representing five major crop species (rice, wheat, maize, oilseed rape, and mustard) from field experiments [41][42][43][44][45] to test the accuracy of our modified PMGF model. The first set of data included PMGF frequencies generated from a transgenic herbicide-resistant rice (Oryza sativa) line to a wild rice (O. rufipogon) strain [41]. The second set of data included PMGF frequencies from a blue-grained wheat variety (Triticum aestivum) to a common wheat variety [42]. The third set of data included PMGF frequencies from a transgenic herbicide-resistant maize line (Zea mays) to its non-transgenic counterpart [43]. The fourth set of data included PMGF frequencies from a transgenic herbicide-tolerant oilseed rape line (Brassica napus) to a non-transgenic line [44]. The fifth set of data included PMGF frequencies from a transgenic canola (Brassica napus, AACC genomes) line resistant to herbicide (glyphosate) to its relative mustard (B. juncea, AABB genomes) for assessing PMGF between different species with divergent genomes [45].
All the parameters used for the model validation were listed in Table 1. Among these, the relative pollen release height was estimated as 0.5 m for the monoclinous wheat and oilseed rape assuming the updraft height of pollen grains from donor's flowers is 0.5 m, using the updraft height of pollen grains of rice as a reference [46], because the hermaphrodite flowers of donors and recipients in these cases (wheat and Brassica species) are nearly at the same height Table 1. Detail information of the five sets of gene flow data from experimental fields used for the validation of the modified model.

Type of data
Relative  [42,44,45]. The relative pollen release height was estimated as 0.2 m for the crop-to-wild rice gene flow, because the recipient's flowers (wild rice) were~0.3 m taller than the donor's flowers (cultivated rice) [41]. The relative pollen release height was estimated as 1.5 m for the diclinous maize case, because the male flowers of maize were~1.0 m higher than their female flowers [43]. The parameter of deposition velocity of pollen grains (V t ) for the first and third sets of data (rice and maize cases) was obtained from the published data [47,48], and that for the second and fourth sets of data (wheat and oilseed rape case) was estimated based on the pollen diameters of rice (40-50 μm), wheat (48-57 μm), maize (80-100 μm), and oilseed rape (24-26 μm) [49][50][51]following Eq 4. In addition, for the parameter of cross compatibility (δ AB ) between a pollen donor and recipient, we set δ AB as 100% for the second, third, and fourth sets of data because their pollen donors and recipients are the same species. For the first set of data, we set δ AB as 70% following the literature by Oka who estimated the cross compatibility between cultivated rice and wild O. rufipogon [52], because the cultivated rice was used as the pollen donors and wild rice as the pollen recipients [41]. For the fifth set of data, we set t B as 15.0% according to previous studies [53][54][55] and calculated δ AB from t B and observed PMGF data, and the wind speed was set as a moderate value of 3 m s −1 because it was not provided in this experiment [45]. In simulating the first set of data using Eq 5, we replaced the parameter of the recipient pollen density (D B ) by the adjusted recipient pollen density (δ AB ×D B ) where pollen competition between GE and non-GE rice lines was considered because the recipient field included both wild and non-GE rice [41].
The analyses of covariance and correlation were applied to validate the accuracy of the model. The analysis of covariance was conducted to test the consistency of slopes (p s values) and the consistency of intercepts (p i values) between PMGF frequencies obtained from the published field experiment (represented by empty circles) and model simulation (represented by curves) at different spatial distances. Correlation analysis was also conducted to test the goodness-of-fit between the published field experiment data and model simulation results (r values). All the statistical analyses were conducted using the software SPSS ver. 22.0 (SPSS, Chicago, Illinois, USA).

Model
We replaced the exponential function in the equation (7) of the PMGF model by Rong et al. 2010 [26] with the adjusted inverse Gaussian function of Katul et al. [36]. As a result, a modified quasi-mechanistic PMGF model is established, and can be used to predict the gene flow frequency (F AB (x)) from a donor (A) to a recipient (B) as follows: where t B is the outcrossing rate of a recipient plant; x is the spatial distance from the edge of a donor field to the measured points of recipient fields; b is the length of the donor field; R is the spatial distance between donor and recipient fields (see Fig 1 in Rong et al. 2010 [26]); and φ(x) is the cumulative distribution function of the inverse Gaussian function, which can be calculated as follows [56]: where the symbol F denotes the cumulative distribution function of the standard normal distribution.
Here, the modified quasi-mechanistic PMGF model (Eq 5) is used to calculate PMGF frequencies for the dataset collected from a parallel donor-to-recipient field design, such as in the case of rice-wild rice and canola-mustard gene flow [41,45]. However, we found that the model (Eq 5) can also be applied for dataset collected from the donor-surrounded-by-recipient field design such as in the cases of wheat, maize, and oilseed rape [42][43][44]. This indicates that the modified PMGF model can produce well-fitted predicting results for datasets from the two types of field experimental designs.

Validation
In general, the simulation results (represented by the curves) based on our modified PMGF model from this study fitted significantly well with the observed PMGF frequencies (represented by the circles) from the five sets of published data [41][42][43][44][45] generated from the field experiments for rice-wild rice, wheat-wheat, maize-maize, oilseed rape-oilseed rape, and canola-mustard gene flow (Figs 2-6).
For the rice data validation, the PMGF frequencies estimated based on the model simulation fitted perfectly with the frequencies obtained from the two independent crop-to-wild gene flow field experiments at the Guangzhou and Sanya sites (Fig 2). The p s and p i values between model-simulated and experiment-observed PMGF frequencies are 0.888 and 0.660 for data from the Guangzhou site, and 0.961 and 0.930 for data from the Sanya site, respectively. These values showed no significant differences (>0.05) in the slope (p s value) and intercept (p i value) of the curves representing PMGF frequencies at different spatial distances between the modelsimulated and field-experimental results (Fig 2), indicating the similar PMGF trends of the model-simulated and experiment-observed data. In addition, the correlation coefficients (r) between the model-simulated and experiment-observed PMGF frequencies were 0.965 (p<0.001) and 0.979 (p<0.001) for data from the Guangzhou and Sanya sites, respectively. Significant positive correlation was observed between the PMGF frequencies obtained from the model-simulated and field-experimental results.
For the wheat data validation, the PMGF frequencies estimated based on the model simulation fitted perfectly with the frequencies obtained from the two independent crop-to-crop gene flow field experiments conducted in 2000 and 2001 (Fig 3). The p s and p i values between model-simulated and experiment-observed PMGF frequencies are 0.899 and 0.803 for the data from 2000 experiment, and 0.732 and 0.699 for the data from 2001 experiment, respectively. These values showed no significant differences (>0.05) in the slope (p s value) and intercept (p i value) of the curves representing PMGF frequencies at different spatial distances between the model-simulated and field-experimental results (Fig 3), suggesting the similar PMGF trends of the model-simulated and experiment-observed data. In addition, the correlation coefficients (r) between the model-simulated and experiment-observed PMGF frequencies were 0.989 (p<0.001) and 0.993 (p<0.001) for the data from 2000 and 2001 experiments, respectively. Significant positive correlation was observed between the PMGF frequencies obtained from the model-simulated and field-experimental results.
For the maize data validation, the PMGF frequencies estimated based on the model simulation fitted perfectly with the frequencies obtained from the two independent crop-to-crop gene flow field experiments conducted in 2006 and 2007 (Fig 4). The p s and p i values between model-simulated and experiment-observed PMGF frequencies are 0.741 and 0.664 for the data These values showed no significant differences (>0.05) in the slope (p s value) and intercept (p i value) of the curves that represented PMGF frequencies at different spatial distances between the model-simulated and field-experimental results (Fig 4), indicating the similar PMGF trends of the model-simulated and experiment-observed data. In addition, the correlation coefficients (r) between the model-simulated and experiment-observed PMGF frequencies were 0.987 (p<0.001) and 0.990 (p<0.001) for the data from 2006 and 2007 experiments, respectively. Significant positive correlation was observed between the PMGF frequencies from the model-simulated and field-experimental results.
For the oilseed rape data validation, the PMGF frequencies estimated based on the model simulation fitted perfectly with the frequencies obtained from the two independent crop-tocrop gene flow field experiments conducted in 2008 and 2009 (Fig 5). The p s and p i values between model-simulated and experiment-observed PMGF frequencies are 0.779 and 0.830 for the data from 2008 experiment, and 0.805 and 0.939 for the data from 2009 experiment, respectively. These values showed no significant differences (>0.05) in the slope (p s value) and intercept (p i value) of the curves that represented PMGF frequencies at different spatial distances between the model-simulated and field-experimental results (Fig 5), indicating the similar PMGF trends of the model-simulated and experiment-observed data. In addition, the correlation coefficients (r) between the model-simulated and experiment-observed PMGF frequencies were both 0.996 (p<0.01) for the data from 2008 and 2009 experiments. Significant positive correlation was observed between the PMGF frequencies from the model-simulated and field-experimental results. For the validation of canola-mustard gene flow data, the PMGF frequencies estimated based on the model simulation fitted well with the frequencies obtained from the two independent gene flow field experiments (Fig 6). The p s and p i values between model-simulated and experiment-observed PMGF frequencies are 0.116 and 0.343 for the data from the site 1 (Fig 6A), and 0.096 and 0.307 for the data from the site 2 (Fig 6B), respectively. These values showed no significant differences (>0.05) in the slope (p s value) and intercept (p i value) of the curves that represented PMGF frequencies at different spatial distances between the model-simulated and field-experimental results, indicating the similar PMGF trends of the model-simulated and experiment-observed data. In addition, the correlation coefficients (r) between the modelsimulated and experiment-observed PMGF frequencies were 0.996 (p<0.001) and 0.563 (p = 0.071) for the data from the site 1 and site 2, respectively.

Discussion
In this study, we established a new pollen-mediated gene flow (PMGF) model based on the quasi-mechanistic model of Rong et al. 2010 [26], by replacing the exponential function in the quasi-mechanistic model with the adjusted inverse Gaussian function of Katul et al. 2005 [36]. By such a modification, the new PMGF model can circumvent the input of a decay parameter (the β value) that must be generated from separate PMGF field experiments. Our modified model includes the following biological parameters: pollen diameter and relative pollen release height of a donor, outcrossing rate of a pollen recipient, cross compatibility between a pollen donor and recipient, spatial distance between donor and recipient fields, and the length (depth) of a donor field, in addition to the parameter of wind speed. All of these parameters can either be measured directly in the target locations or obtained from published studies/databases. In contrast to the model of Rong et al. 2010 [26], some climatic factors such as the temperature and relative humidity were not considered in our modified PMGF model, although they may affect PMGF frequencies [26,57,58,59]. This is because the relationship between the two factors and PMGF frequencies has not yet been accurately determined. Consequently, we neglected the temperature and relative humidity in our modified PMGF model. Once the relationship is clearly determined in the future, we can include the two parameters in the PMGF model. Thus, our modified PMGF model can be applied to accurately calculate/predict the gene flow frequencies of plant species mediated by pollen at different spatial distances under various environments, independent of PMGF field experiments. The new feature of our modified model makes the estimation of PMGF frequencies more practical and is easier to use, provided that the required biological parameters and wind speed data are available.
To validate the modified PMGF model, we compared the model-simulated PMGF pattern and the gene flow frequencies obtained from the field experiments. The results of model-simulated and experiment-generated PMGF showed a high level of goodness-of-fit, suggesting the high predicting power for gene flow frequencies using the modified PMGF model. The high probability (p s ) and (p i ) values (>0.05) indicate the consistency of the slopes and the intercepts between the PMGF frequencies obtained from the model simulation and the gene flow data from the five sets of field experiments. The high correlation coefficients (r values, >0.90) indicate the consistency of PMGF frequencies between the results obtained from the model simulation and the five sets of gene flow data, except for the site 2 of the canola-mustard PMGF experiment in the fifth set of data [45], which showed slightly lower r value (~0.56). Notably, Accurate Gene Flow Model Based on Biological Parameters and Wind Speed the slightly weaker correlation between the model-predicted result and observed data from the site 2 is probably due to the unexpectedly low PMGF frequencies (0.03%) observed at the close distance (0 m), which is unusual in PMGF experiment. These results indicate the accuracy and effectiveness of the modified model for predicting PMGF frequencies. For example, based on the field experiment, Langhof et al. 2010 suggested the spatial isolation distance of 50 m between a GE maize and its non-GE counterpart with the size of 200 × 200 m 2 because the frequency of transgene flow was reduced to <0.9% (the European standard for a gene flow frequency between co-existing GE and non-GE crops) at this isolation distance [60]. Similarly according to our model simulation, the predicted average PMGF frequency from a donor to a recipient maize field with the same situation was <0.7% (data is not shown) at the spatial distance of 50 m. The two examples showed considerable consistency between the field-experimental and model-simulated results. In addition, the gene flow frequencies of different wheat varieties generated from a field experiment were <0.1% at the spatial distance of 27 m [61], which was highly consistent with the frequencies generated from model simulation. All these indicate the usefulness of our modified model for the accurate prediction of PMGF frequencies.
Noticeably at the relatively long spatial distances, the model-predicted PMGF frequencies are slightly higher than the gene flow frequencies obtained from field experiments in some cases. This is probably due to the viability of pollen grains in the air, which can considerably influence the actual gene flow frequencies at the long spatial distance. Pollen grains of many plant species, particularly wind-pollination species, lose their viability quickly after being released to the air [62,63]. The number of viable pollen grains might be considerably reduced after a long-distance travel in the air. Therefore at the long spatial distance, the model-predicted PMGF frequency becomes greater than actually obtained gene flow frequencies from field experiments. Nevertheless, the predicted PMGF frequencies based on our modified model can provide useful information for assessing and monitoring environmental consequences caused by (trans)gene flow, particularly for the worst scenario assessment [26,64] and for establishing a safe isolation spatial distance between co-existing GE and non-GE crops [8,63].
With the rapid expansion and commercial production of GE crops over the world, transgene flow and its environmental impact has aroused increasing biosafety concerns worldwide [65]. The effective assessment and monitor of the environmental impact caused by transgene gene flow are important to guarantee the safe and sustainable application of GE crops. To estimate the frequency (exposure) of transgene flow is the critical and first step for assessing and monitoring the environmental impact [66,67]. Our modified PMGF model provides a practical tool to meet the objective of transgene flow estimation, particular for wind-pollinated plant species. This model can be applied to estimate the frequency of crop-to-crop and crop-to-wild transgene flow relatively accurately for target plant species in various locations, based only on the required biological parameters and wind speed data that can be easily obtained. The application of our modified PMGF model can make the prediction of transgene flow more effective and timely, because the modeling has excluded the expensive and time-consuming gene flow filed experiments to generate necessary parameters. In addition, the strategic design of a spatial isolation distance can be made between coexisting GE and non-GE crops based on the model predicted transgene flow frequencies to minimize the transgene "contamination" to a permitted threshold value [60,68]. Similarly, a proper spatial isolation distance can be established between a GE crop and its wild relative species to restrict transgene flow based on the model predicted result for reducing the potential environmental impact [69]. This can largely facilitate the biosafety assessment and management concerning the transgene flow and its environmental impact. Furthermore, the model-based prediction of gene flow can also be applied in the studies of evolutionary and invasive biology [70,71].
In conclusion, we established a PMGF model with a great advantage and potentially broader use for predicting pollen-mediated gene flow, provided that the desired biological parameters and wind speed data are available. This model retains the feature of accuracy of Rong et al. 2010 model [26], but has added a new feature that no longer requires the input of field experimental data to generate parameters as in previous models [26,28,35]. The validation results indicate the accuracy of the model prediction for gene flow of various plant species. Because this PMGF model is practical to use and free of field experiment, it can be broadly applied in assessing (trans)gene flow. Therefore, this PMGF model can be useful for biosafety assessment of potential environmental impact caused by transgene flow, in addition to addressing related questions in evolutionary biology.