Identification of factors affecting rice yield gap in southwest China: An experimental study

Knowledge about the relative importance of influencing-factors on rice yield gap is crucial to rice production, especially in southwestern China where topography is extremely complicated. In the current study, the data of rice yield from a total of 76 experiments were collected in 2008 and 2009 in Chongqing, southwest China. For each location, two treatments with fertilizer and without fertilizer were carried out, each treatment was performed with three replications, and yield gap was calculated using fertilized yield minus unfertilized yield. Seventeen influencing-factors including variety, fertilization, climate, terrain, and soil properties were obtained at each location. Regression tree (RT) model were employed to investigate relative important of influencing-factors to rice yield gap variability. The result of Pearson correlation analysis suggested yield gap of rice was positively correlated with sunshine hours, phosphorous and potassium fertilizers, while negatively correlated with soil available nitrogen content. The results of RT showed that the selected influencing-factors explained about 74.1% of rice yield gap variation. Meanwhile, the result also indicated variety followed by others had more influence on rice yield gap variation. Our findings analyzed by regression model at a regional scale suggested that more precise fertilization recommendation should be formulated based on comprehensive factors (e.g., soil, climate, terrain, variety), which reasonably guided farmer and government for rice production.


Introduction
China is one of the largest rice production countries, with a total planting area of about 30.3 million hectares, accounting for about 30% of the total rice output in the world (FAO) [1]. Rice is a major grain crop after wheat and its yield is closely related to food security and sustainable development of the society. In recent years, a great deal of efforts had been conducted to keep pace with the increasing food requirement of people. However, it was generally considered that rice production was associated with massive limiting factors, such as variety, climate, terrain, and soil properties [2][3][4]. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Previous studies reported variety updating and the improvement of management practices were contributed to increasing rice yield [5][6][7][8]. They pointed out the adoption of new variety could enhance harvest index and overcome the negative effect of climate changes. However, some studies reported climate had significant effect on rice production in the world [4,[9][10][11][12]. For example, Sarker et al. [13] illustrated the temperature-related indicators (maximum temperature and minimum temperature) had more significant impact on rice yield than rainfall in Bangladesh. In India, night temperature and radiation showed significantly negative and positive influence on rice yield, respectively [14]. In China, Tao et al. [15] demonstrated appropriate temperature was positively correlated with rice yield and drought in summer was possible to decrease rice yield in Chongqing [16].
Additionally, crop could not be separated from soil nutrients to growth. Soil properties, such as available nitrogen, available phosphorus, and available potassium, exert enormous effect on crop yields. Relevant research suggested spatial variability of rice yield was mainly caused by soil chemical properties [17].
Nevertheless, soil properties and climate were closely related to terrain factors [18,19]. Thus, the variability in rice yield caused by terrain was observed [20,21]. For instance, in southwestern China, Li et al. [3] found that rice yield was strongly affected by rising elevation because of decrease accumulated temperature.
In recent years, increased rice yield resulting from application of fertilizer has been observed by long-term observational experiments [22,23]. Shrestha and Deb [24] reported fertilizer not only offset gap between yield with fertilizer and without fertilizer, but also overcame the negative effect of climate on rice growth. However, preliminary observation found that increased rice yield varied obviously among different experiments. Decision trees are gaining favor in various fields for exploring non-linear relationships between independent and dependent variables [25][26][27][28][29][30]. They are non-parametric methods and can automatically deal with both categorical and continuous variables. Decision trees are scalable to large problems and can handle smaller data set than artificial neural networks [31]. Classification and regression tree (CART) is a typical decision tree algorithm for predicting continuous variable (regression) or categorical variable (classification). A particular benefit of CART is its cross-validation feature that attempts to detect over-fitting [25]. One of the outcomes of CART is the relative importance of independent variables to the response, which could be used to investigate the factors controlling rice yield gap (fertilized yield minus unfertilized yield). Therefore, the main objectives of this study are to (1) analyze the relationship between influencing-factors and rice yield gap between fertilization and no fertilization, (2) quantify the relative importance of influencing-factors on yield gap.

Study area
Chongqing (105˚11'~110˚11'E, 28˚10'~32˚13'N) is located in southwestern China and covers 8.24×10 4 km 2 (Fig 1). It is one of the most important rice production regions in China. Chongqing is characterized by hills and mountainous with elevation varying from 145 to 2763 m. The climate is moderate subtropical with striking resource superiority. The annual rainfall is about 1200 mm and mainly concentrates in April to September. Average annual sunshine hours varies from 1000 to 1650 h and temperature ranges from 6 to 38˚C. Paddy soil is mostly distributed across the study area. required for these locations/activities. A total of 76 trial points were conducted across the study area (Fig 1). Each plot had an area of 20 m 2 and four guarding rows around it (Fig 2). For each field trial, two treatments with fertilizer (nitrogenous, phosphorous, and potassium fertilizers) and without fertilizer were performed. In order to avoid the random effect, each treatment had three replications. Rice is usually transplanted in April and harvested in September. For each trial, rice yield was the mean of the three replications for the two scenarios. Usually, rice yield with fertilizer was higher than that of without fertilizer. Then, yield gap was calculated by fertilized yield minus unfertilized yield.
Three main hybrid rice series including Q-you (QY), Zhongyou (ZY), and Gangyou (GY) were planted widely in Chongqing. Specifically, QY series includes QY-1, 5, 6, 8, 12, 108 varieties, GY series includes GY-158, 188, 364, 615, 6366, 725, 825, 881 varieties, ZY series includes ZY-177, 36, 838, 9801 varieties. In this work, the rice varieties that the local farmers preferred were assigned to the trials (Table 1). For each trial, the two treatments were planted the same variety. The rates of fertilization recommend by Soil Testing and Formulated Project (a kind of fertilization technology in China) were summarized in Table 2, which were carried out based on absorption regularity of crop to fertilizer, the nutrient supplying capability of soil, and fertilizer use efficiency. Therefore, the difference in rates of fertilization could be found in different trials.
Soil properties. Soil samples were collected from a depth of 0-20 cm of cultivated horizon using a manual coring tube before cultivating. At each site, 15 sub-samples were mixed in a bag, then extracted soil of about 1 kg as representative soil samples. Successively, soil samples were analyzed for chemical properties by conventional soil Agro-chemical analysis methods after air drying. Specifically, organic matter (OM), available nitrogen (AvN), available phosphorus (AvP), and available potassium (AvK) were measured by glass soil bath-K 2 Cr 2 O 7 titration method [32], micro-diffusion method [33], Mo-Sb colorimetric method [34], flame photometer, respectively. Parent material was not included in this paper because they had no significant effect on rice growth and grain yield in our study area. Meanwhile, pH with the  Table 1. The numbers and distributions of three series.

Series Variety
County Number mean value of 6.02 also had no obvious difference due to sub-acid environment where rice was planted generally. Climate variables. During the years of 2008-2009, daily climate data recorded at 34 stations in Chongqing were obtained from the National Meteorological Information Center (NMIC), China Meteorological Administration (CMA). Four climate variables, namely, mean temperature (Temp), daily temperature difference (diurnal maximum temperature minus minimum temperature, hereafter TDiff), sunshine hours (sun hours) and rainfall were calculated and used in this paper. During the rice growth period (April to September), each climate factor was examined and no missing data was found over the study area. Monthly averages for daily observed climate data during the rice growth period in 2008 and 2009 were shown in Fig  3. Monthly maxima of rainfall and sunshine hours were in August and July, respectively. There was no significant difference between 2008 and 2009 for mean temperature, mean temperature difference, and total of sunshine hours, rainfall during the growth period. Recently, thin Plate Spline (TPS) methods have been widely applied to interpolate climate parameters [35][36][37][38][39][40][41]. Previous works demonstrated that TPS with latitude, longitude, and elevation performed better than others, such as ordinary kriging, inverse distance weighting, multiple linear regression with latitude, longitude, and elevation plus ordinary kriging. Therefore, TPS with latitude, longitude, and elevation was applied to interpolate the climate parameters over the study area [40,41]. Then, the climate variables were extracted by the 76 trial points to analyze their effects on rice yield gap variation.
Topographic variables. A digital elevation model (DEM) with a spatial resolution of 30 m was used in the current study. Five commonly used terrain indicators including elevation, slope, aspect, topographic wetness index (TWI), and topographic position index (TPI) were generated from the DEM. TWI and TPI were calculated as follow: where ln indicates natural logarithm, α and Tanβ represent upslope area per unit contour length and local slope angle acting on a cell, respectively [42].
where Z 0 and � z represent the elevation at central point and mean elevation around it within a determinate radius (R), respectively. Positive value of TPI denotes that elevation at central point is higher than its mean surroundings and vice versa. According to Weiss [43], six

Methods
Statistical analysis. Descriptive statistics analysis was employed to examine the variation of rice yield gap and influencing-factors. Pearson correlation analysis which is a widely used method was performed to explore the relationship between rice yield gap and influencing-factors. Analysis of variance (AVOVA) combined with multiple comparisons by Tukey's honestly significant difference was applied to test differences in yield gap and influencing-factors among the three varieties and topographic positions (QY, GY, ZY). Root mean squared error (RMSE) and mean absolute error (MAE) were used to evaluate accuracy of prediction model. All formulas were defined as: ðf i À � f Þðy i À � yÞ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi P ðf where f i and y i represent the prediction and observation (i = 1,2,3,. . .,76), respectively. Regression tree. Regression tree (RT) which was proposed by Breiman et al. [25] is a nonparametric statistical method. RT automatically selects variables holding the most information. In regression tree, the least squared deviation (LSD) as impurity measure for splitting rules was employed, it aims to minimize intra-class variance and maximize variance among groups. Given a set of D = {(x 1 ,y 1 ), (x 2 ,y 2 ), . . ., (x n ,y n )}, the regression model was calculated as following: where R 1 , R 2 ,. . ., R m represent units which was divided, c 1 , c 2 ,. . .c m represent the fixed value outputted in each unit, respectively. The LSD criterion function is defined as: where t and N(t) represent node and the numbers of sample in it, respectively. y(t) represent the mean value of response variable in each node.
where R(t L ) and R(t R ) represent the sum of square of left and right child node, respectively. The split node s was used to maximize Q(s,t).
RT has been applied to various fields, such as, groundwater level prediction [44], plant litter decomposition [45], as well as crop yield [30]. In the current study, RT was employed to investigate the relative importance of climate, soil properties, terrain, and management factors affecting rice yield gap variation. After several experiments, the optimal parameters were obtained by RT. Numbers of parent node and child node were 4 and 2, respectively, tree depth was 3. To avoid overfitting, the ten-fold cross-validation was applied to examine the model performance [46][47][48]. Samples were randomly separated into ten subsets. Each subset contains all the three varieties. The performance of RT was compared with multiple linear regression (MLR, Eq (10)).
where y was yield gap, n was the number of the independent variables (x). All calculations were done by SPSS. 19.0 and Excel 2016.

Descriptive statistics
Descriptive statistics of yield gap and influencing-factors were summarized in Table 3. The coefficient of variation was used to examine the variability of variables. Yield gap varied from 0.2 to 4.1 t/ha, showing moderate variability with CV of 37.4%. All climate parameters presented low variability across the study area. For terrain indicators, elevation, aspect, and TWI showed moderate variability (CV = 35-64%), while slope presented strong variability. For soil properties, AvN and AvP showed the lowest and highest variability, with a changing magnitude of 24.9 mg/kg and 81.5 mg/kg, respectively. The rates of fertilizer had low variability. Analysis of variance (AVOVA) combined with multiple comparisons of yield gap and influencing-factors for each variety were summarized in Table 4. Obviously, the yield gap for QY with mean value of 2.55 t/ha was significantly higher than those of GY and ZY, indicating variety had significant effects on rice yield gap. The rates of fertilizers applied to ZY was significantly higher than GY, while the mean temperature and sun hours for ZY were significantly lower than GY. Other factors among series had no obvious difference, indicating the obvious � difference might be because of the difference among varieties to some extent. Additionally, yield gap among different slope positions had no significant difference (p > 0.05).

Correlation analysis
Yield gap for rice was negatively correlated with soil AvN while positively correlated with sun hours, P 2 O 5 , and K 2 O (Table 5). No significant correlations were observed between yield gap and other factors.

Regression tree
The relationship between the observations and the estimated yield gaps generated by regression tree were given in Fig 4. The model performed good with R 2 of approximately 0.741 (p < 0.05), RMSE of 0.41 t/ha, and MAE of 0.33 t/ha. The values of R 2 , RMSE, and MAE of MLR were 0.332 (p > 0.05), 0.67 t/ha, and 0.55 t/ha. According to the statistical indicators, regression tree performed much better than MLR. The regression tree was produced by the RT method and shown in Fig 5. The root of tree was bifurcated by variety. For upper sub-tree, variety includes QY, GY, and ZY, while only partial QY and GY were separated for lower sub-tree. At the second stage, TDiff (8.85˚C) and Aspect (305.4˚) were applied, the yield gap was higher with larger TDiff and smaller aspect, suggesting relatively higher daily temperature difference is beneficial to improve rice yield to some extent, and larger aspect was to the disadvantage of increasing yield. At the terminal  nodes of the tree, larger TDiff accompanied by higher soil AvP content could increase rice yield, while larger TDiff with higher sunshine hours had adverse influence, showing the growth of rice was affected by integrated factors. Meanwhile, GY 158 had more yield gap than QY 1 and QY 108, the possible reason is that the photonasty of rice varied with variety.

Identification of factors for rice in China
The relative importance of factors affecting rice yield gap The relative importance of factors affecting rice yield gap generated by RT was shown in Fig 6. Obviously, variety was the most important factor with relative importance of 100%. Sun hours and daily temperature difference (TDiff) were the second and the third most important factors affecting rice yield gap, respectively. Specifically, climate variables were ranked in order of sun hours > TDiff > mean temperature > rainfall. For soil properties, the rank order was AvN > AvP > AvK > OM. For terrain factors, the rank order was elevation > TPI > aspect > slope > TWI. For fertilizer factors, N fertilizer followed by K and P fertilizer played the most important role in yield gap. On average, the rank order of the studied factors was variety > climate > soil properties > terrain > fertilization.

Discussion
In the current study, about 74.1% of rice yield gap variation was explained by using RT model. The uncertainty might be other management factors such as pest, weed, as well as plough, which were not inputted to the model. This model performed better than MLR which had lower values of R 2 (0.332, p > 0.05) and higher values of RMSE (0.67 t/ha) and MAE (0.55 t/ ha). Additionally, RT is a non-parametric method and could deal with nonlinear relationships between independent variables and response one [49]. Meanwhile, RT could provide relative importance of independent variables to dependent one. Therefore, it has been widely employed to explore the relationships between crop yield variations and soil parameters, management practices, as well as climate [30,50,51]. The presented results produced by regression tree and ANVOA suggested variety had more important influence on rice yield gap in this study area. Liu et al. [4] revealed variety updating had significant impact on improvement of rice yield and could compensate negative effect of Identification of factors for rice in China climate change, similar results were also reported by Liu et al. [7]. Among the climate parameters, sunshine hours and daily temperature difference with higher values of relative importance were of great importance to rice yield gap. This finding was supported by previous reports [2,52,53]. For example, Xiong et al. [2] suggested sufficient sunshine was benefit to photosynthesis of leaf and transformation of carbohydrate, which increased the grain weight of rice. Liu et al. [53] reported temperature difference had direct influence on rice yield in southwest China. Higher temperature difference was beneficial to the accumulation of dry matter [8]. Another possible reason for significant effect of temperature difference to rice yield gap is ascribed to heat stress [54], which affect not only soil respiration [55], but also the absorption of rice to nutrients [56].
Although soil properties and terrain indicators had lower values of relative importance, the effects of these factors on variation in rice yield gap could not be ignored. Among soil properties, AvN played a leading role in rice yield gap variation, suggesting rice yield gap was sensitive to AvN. A possible explanation for this might be that the direct effect of AvN on grain yield by increasing the number of panicles [57] and improving stomatal conductance, net photosynthesis and transpiration [58]. Noticeably, the negative response of yield gap to AvN in Table 5 is likely due to soil inherent fertility, providing narrow space for rice yield to improve [59]. Negative correlation also was found between yield increase and soil fertility supply [60]. Among terrain variables, elevation was the most important terrain factor limiting rice yield gap. It is well-known that there is significant relationship between elevation and climate, soil property [61,62]. Therefore, elevation affects the crop yield indirectly [63].
In this paper, our result that fertilization had no significant impact on rice yield gap variation may be contrary to early reports [8,64]. For instance, Xu et al. [65] demonstrated nutrient management played the most important role in improving rice yield. On the one hand, the possible explain for our finding might be tiny difference in rates of fertilization guided by government among trials (Table 2), indicating a more precise fertilizer recommend combined with regional characteristic should be enacted to meet requirement for rice growth and decrease pollution of chemical fertilizer. On the other hand, it is likely due to spatial variation of fertilizer use efficiency caused by soil, climate, terrain among regions [66]. Therefore, in current study, although same rates of fertilizer were applied, rice yield gap varied with regions.

Conclusion
Agricultural production system is a complicated and unique system affected by numerous factors, such as climate, soil properties, terrain. Based on detailed experimental data, results obtained from RT model suggested the selected factors could account for about 74.1% of rice yield gap variation. More detailed management factors, such as pest and weed, which were not included in present paper might become candidates to explain the remaining of variation. This paper solved a pressing problem to identify the most important factors limiting rice yield gap variation and was expected to provide reasonable advice for government in southwestern China to predict the developmental trend of rice production. Through specific analysis, we found variety and climate became the most important factors limiting improvement of rice yield, alarming the breeding researches to cultivate more new varieties with stronger adaptability to the complex environment, especially the climate warming. Meanwhile, government should enact more precise fertilization strategies to adapt to change in soil, climate, and terrain within region. Furthermore, our results also will provide valuable information to other crops in specific districts. More additional efforts should focus on systematic and detailed analysis of other factors limiting rice yield gap.