Remotely Sensed Rice Yield Prediction Using Multi-Temporal NDVI Data Derived from NOAA's-AVHRR

Grain-yield prediction using remotely sensed data have been intensively studied in wheat and maize, but such information is limited in rice, barley, oats and soybeans. The present study proposes a new framework for rice-yield prediction, which eliminates the influence of the technology development, fertilizer application, and management improvement and can be used for the development and implementation of provincial rice-yield predictions. The technique requires the collection of remotely sensed data over an adequate time frame and a corresponding record of the region's crop yields. Longer normalized-difference-vegetation-index (NDVI) time series are preferable to shorter ones for the purposes of rice-yield prediction because the well-contrasted seasons in a longer time series provide the opportunity to build regression models with a wide application range. A regression analysis of the yield versus the year indicated an annual gain in the rice yield of 50 to 128 kg ha−1. Stepwise regression models for the remotely sensed rice-yield predictions have been developed for five typical rice-growing provinces in China. The prediction models for the remotely sensed rice yield indicated that the influences of the NDVIs on the rice yield were always positive. The association between the predicted and observed rice yields was highly significant without obvious outliers from 1982 to 2004. Independent validation found that the overall relative error is approximately 5.82%, and a majority of the relative errors were less than 5% in 2005 and 2006, depending on the study area. The proposed models can be used in an operational context to predict rice yields at the provincial level in China. The methodologies described in the present paper can be applied to any crop for which a sufficient time series of NDVI data and the corresponding historical yield information are available, as long as the historical yield increases significantly.

Different methods have been developed to predict crop yields using remotely sensed data, and the most common approach is, by generating regression model, to develop direct empirical relationships between the NDVI measurements and the crop yield [15,19,45,57]. These approaches assume that measures of the photosynthetic capacity from spectral-vegetation indices are directly related to crop yield. This assumption is used because many of the conditions that affect crop growth, development and ultimately yield could be captured through spectra measurements such as the NDVI [64]. By using long-term historical-yield data as a dependent variable and remotely sensed data as an independent variable, a statistical regression function was generated to perform crop-yield predictions, whereas the actual crop yields depend on many more factors than the presence of spectral-vegetation indices [37]. Tilman et al. [65] noted that increased yields in cereal are mainly the result of greater inputs of fertilizer, water and pesticides, new crop species, and the improvement of management over the last decades. For all developing countries, modern varieties accounted for 21% of the growth in crop yields during the early Green Revolution period [66]. In Asia, rice production has more than doubled as a result of the expansion of cultivated area, the adoption of modern cultivars, increased investments in irrigation, and an increased use of fertilizer over the past 4 decades [67]. Hafner [68] found that linear growth has been the most common trend in maize, rice, and wheat yields for 188 nations over the past 40 years. This scenario also occurs in China. Although the inter-annual variability of NDVI (probably due to unexpected weather conditions or disasters) can reveal crop yield fluctuations [19,59]; however, remotely sensed-NDVI cannot detect those human-induced factors that resulted in increase of rice yield. Therefore, to monitor and predict crop-yield cannot use NDVI measurements solely.
For unit-yield estimation, using one simple regression function (usually known as: Y = a+b * NDVI) would be incompatible as the advance of years, because simple regression would be likely neglect those man-induced factors in yield increase. However, few studies have analyzed the time trends of crop yields, which reflect the influence of technology development, fertilizer application, and management improvement. Moreover, the regression model between statistical data and NDVI cannot be extendable [19,45] because cropping system and rice yield level is natural conditiondependent in China.
In consideration of social factors and regional differences for remotely sensed crop yield estimation in China, the objective of the present paper was to develop a methodological framework that may be adopted for the regional-, national-and international-scale prediction of crop yields. This methodology was based on a time series analysis of historical-yield information. Paddy rice was chosen to test the proposed methodology. To accomplish this objective, we needed to: (1) geographically regionalize rice cultivation area for remotely sensed monitoring; (2) analyze the historical trends in the grain yield of rice; (3) decompose the remotely sensed yield of rice from the long-term historical data; (4) select the optimal predictors, based on a correlation analysis between the remotely sensed yield and the AVHRR-derived NDVIs; (5) construct prediction models for rice yield; and (6) evaluate the potential for rice-grain-yield prediction in China using AVHRR NDVI data as predictors.

The Remote-Sensing dataset
The research presented in this paper relies on a time series of AVHRR NDVI composite imagery from July 1981 to December 2006, derived from the National Oceanic and Atmospheric Administration's (NOAA) series of Advanced Very High Resolution Radiometer (AVHRR) instruments, with a spatial resolution of 8 km, by the NASA Global Inventory Monitoring and Modeling Systems (GIMMS) group at the Laboratory for Terrestrial Physics. There are two 15-day composites per month: the first (15a) is a maximum value composite from the first day to 15 th of the month; and the 15b composite is from days 16 till the end of the month. All data are available from the University of Maryland Global Land Cover Facility (http://glcf.umiacs.umd. edu/data/gimms/). Pinzon et al. [69] and Tucker et al. [70] described in detail how the GIMMS data set was developed. A number of improvements have been made on the GIMMS NDVI database, with respect to previous NDVI data sets, including corrections for: (1) sensor degradation; (2) inter-sensor differences; (3) solar-illumination angle and sensor-view angle effects due to satellite drift; (4) volcanic stratospheric aerosol corrections for 1982-1984 and 1991-1994; (5) missing data in the Northern Hemisphere during winter, using interpolation; and (6) short-term atmospheric aerosol effects, atmospheric water-vapor effects, and cloud-cover physics [69,70]. This data set is considered to be the most accurate, longterm AVHRR data record [71]. By comparing these data to new, improved coarse-resolution remotely sensed data from SPOT Vegetation instrument and MODIS instruments, recent study confirmed its suitability for long-term vegetation studies [72].

NDVI Variables
A large number of studies found a close relationship between crop yields and NDVI variables. The theory is: the NDVI value presents the yield level corresponding to every single pixel. Therefore, a simple regression function can be explained the yield: yield = a*NDVI + b; then the total yield can be obtained by multiplying planting area. By literature review, previous studies suggest three types of NDVI variables: original NDVI [13,23,42,63], cumulative NDVI [8,23,38,42,45,63,73,74], and average NDVI [34,45,63]. The cumulative NDVI and the corresponding average NDVI for the same period were highly correlated because of the linear nature of the operations involved. Only the original NDVIs and the average NDVIs were selected as input data for the prediction models in the present paper.
NDVI variables around the time of the maximum are strongly correlated with final yields [31,35,75]. Specifically, the rice yield is most determined by crop conditions during the heading (i.e. peak phenological phase of growth); and yield-reflectance relationships are typically the strongest after mid-season. In contrast, NDVI value changes that occur outside of the rice-growing period maybe not positively related to yield [52]. These relationships within changes of NDVI value suggests that the NDVIs during the midto-late growing period should be a good indicator of rice yield; meanwhile this phenomenon provides an approach to discriminate rice planting area from remote sensing image. Therefore, the first step of this study was to extract the maximum NDVI during the rice-growth period (NDVI max ) for each studied province from the remote sensing dataset from the year 1982 to 2006. The maximum NDVI is equal to the peak value of the seasonal NDVI profile. Then, six other original NDVIs were calculated: the first, second,  third and fourth biweekly NDVIs prior to the NDVI max (NDVI maxb4 ) and the first and second biweekly NDVIs after the NDVI max (NDVI maxa2 ). These seven biweekly composites span 3 months of raw AVHRR imagery, corresponding to the ricegrowth period. Focusing on the NDVI response during the ricegrowth period helps to identify rice-specific vegetation changes. Hochheim and Barber [27] also found that NDVI estimators with longer integration periods minimized variability in yield prediction. Therefore, based on the seven original NDVIs, twentyone average NDVIs, clustered around the time of the peak NDVI, were calculated using a rigorous arithmetic mean framework ( Table 2). In total, 28 NDVI variables were generated. They include all of the possible combinations of the original seven NDVIs.  Figure 1). Unfortunately, rice planting area and yield information for Hong Kong-Macao-Taiwan areas was not available. According to NBSC crop statistical data (see Table 3), Eastern China was the region with the highest rice acreage and production levels (9808.60 kha and 64984.00 kt, respectively) in 2009. Central China ranked second in both rice acreage and production (6703.60 kha and 46215.00 kt, respectively). The third-largest rice cultivation and production area was Southwestern China (4448.10 kha and 31214.00 kt, respectively). Southern China and Northeastern China ranked fourth and fifth, respectively, in both rice acreage and production (4402.

Description of Study Area
We divided China into 7 regions together with 5 representative provinces selected to convey the information of paddy rice planting area: Heilongjiang (HLJ) in Northeastern China, Hunan (HN) in Central China, Jiangxi (JX) in Eastern China, Sichuan (SC) in Southwestern China, and Guangxi (GX) in Southern China. These provinces were selected as the study areas for the present research because these locations: (1) represented the typical cropping system in China, (2) are located in primary rice-production regions, and (3) are geographically and climatologically different (see Figure 1 and Table 4). The life span, cropping system, and planting schedule are all depend on regional hydro-thermal condition. The general information on life span, cropping system, total annual rainfall (mm), annual accumulated temperature (uC), area (kha), and production levels (kt) for the selected provinces is shown in Table 4

Calibration of Rice-Yield Prediction Models
The gradual trend in yields is due to the influence of technological development, fertilizer application, and improved management on the rice cultivation. The results of this analysis suggest that the most common trend of rice yield is a linear growth. The province-specific intercepts account for spatial variations in rice management and soil quality; province-specific time trends account for yield growth due to technology gains. This indicates us the yield is composed from the intrinsic and extrinsic factors. Therefore, we decomposed the historical rice yield Y into the trend yield Y t and the remotely sensed yield Y RS , using the following equation: To quantify past trends in yields, many different yield de-trend methods have been reported, including: least-squares regressions [76,77], moving averages [78,79], exponential algorithms [80], and polynomial regressions [81]. For rice-yield predictions in the present investigation, a linear regression model and a moving average are both generated to fit each separated provincial rice dataset (also see in Figure 2): where Y t is the trend yield in a given province during a given year (kg ha 21 ), t represents the year of harvest (the year 1979 was numeral 1979, 1980 was numeral 1980, etc., until 2009 was numeral 2009), a and b are the province-specific linear regression coefficients.
In our study, a moving average is used with historical crop-yield data to smooth out short-term fluctuations and highlight longerterm trends. Rice yields were de-trended using their deviations from the 5-year moving average. The mean changes in provincial historical rice yield (Y), the trend yield (Y t ) and the remotely sensed yield (Y RS ) were calculated for each period as an average of the changes from each single preceding year to the next by using a moving average method. Generally, the moving average method is used to calculate arithmetic mean of each five of the entire dataset: y i-n , y i-n+1 , …, y i , …., y i+n . Such method has been usually employed in meteorological data analysis to remove the stochastic errors from long-time series of data. Hence, an algorithm for a 5-year moving average is as follows: t,n{3 z2Y t,n{2 z3Y t,n{1 z4Y t,n 5 Y t,n~{ Y t,n{4 zY t,n{2 z2Y t,n{1 z3Y t,n 5 Where Y t, i is the trend yield in a given province during a given year (kg ha 21 ); n represents the number of data points; i represents the year of the harvest (e.g. the year 1979 was numeral 1, 1980 was numeral 2, etc., until 2006 it should be numeral 31); Y t,1 and Y t,2 are the trend yields for the first two harvested years; then Y t, n-1 and Y t, n are the trend yields for the last two harvested years within the 5 years. Consequently, the trend yield Y t was obtained. To remove the technological influences, it is necessary to remove the yield trend to produce a new time series that is directly related to the NDVIs. We defined this new time series as the remotely sensed yield. According to Eq. (1), the remotely sensed yield can be calculated by the following equation: Next, correlation analysis was performed between the remotely sensed yield and the NDVI variables. The correlation coefficient is a measure of the strength and the direction of a linear relationship between two variables. The symbol r in Eq. (5) represents the samples' correlation coefficient; x and y represent the remotely sensed yield and the NDVI variables respectively; n is the number of data pairs.
Statistical regression models are the most commonly used method for crop-yield prediction based on remotely sensed data [8,36]. They do not require numerous inputs and can be performed directly; also because it requires little computing power and the selected variables are distinctive and non-overlapping. Therefore, each of the provincial Y RS and NDVI dataset was analyzed separately by means of stepwise regression techniques. These models were constructed via the 'STEPWISE' regression process which was available in software Statistical Product and Service Solutions (SPSS) 17.0 [82]. The probability significance thresholds for the entry and retention of candidate independent variables in the model were both set to a = 5%.

Evaluation of Rice-Yield Prediction Models
The rice-yield prediction models were evaluated using the following indicators: Root mean square error (RMSE): Coefficient of determination (R 2 ): F-value (F): and relative error (RE): Together with the above, where n is the number of comparisons; k is the number of predictors; Y i is the statistical rice yield; Y Y is the average rice yield, and Y 0 i is the predicted yield.  data of rice yield together with average yield growth trend from 1979 to 2009 in five provinces of China is summarized in Table 5.

Rice Yield Trend Analysis
As analysis above (see Figure 2), the social input and advance of technology account for the linear trend of the rice-yield growth, whereas such human-induced factors could not be detected using remotely sensed data. To overcome this problem and make riceyield prediction methods more robust and easily exportable, one possible strategy is to integrate remote-sensing data with the rice yield time series analysis. De-trending is necessary to properly identify the remote-sensible effects in these panel datasets. Therefore, before the rice-yield predicting models are established using remotely sensed variables as predictors, we suggest that the statistical yield should be decomposed into the trend yield and the remotely sensed yield, methodology was described in Section 2.5.

Correlation Coefficients between the Remotely Sensed Yield and NDVI Variables
The correlation coefficients between Y RS and the NDVI variables for the rice-growth period from the fourth 15-day period before NDVI max (NDVI maxb4 ) to the second 15-day period after NDVI max (NDVI maxa2 ) for each of the studied provinces are summarized in Table 6. By comparing the correlation coefficients (Column 2 and 3 in Table 6), the Y RS that was de-trended by linear regression performed better than the Y RS that was de-trended by a 5-year moving average against the NDVI variables.
The correlation coefficients between the Y RS that were detrended by a 5-year moving average and the NDVI variables were generally low in HLJ, HN, and JX. For SC, the correlation coefficients were significant at the 0.01 level between the Y RS that was de-trended by a 5-year moving average and NDVI maxb4 , and the correlation coefficients were significant at the 0.05 level between the Y RS that was de-trended by a 5-year moving average and NDVI maxb3 , mNDVI maxb4-b3 , mNDVI maxb4-b2 , mNDVI maxb4-b1 , mNDVI maxb4-max , and mNDVI maxb3-b2 . The correlation coefficients were significant at the 0.01 level between the Y RS that was detrended by a 5-year moving average and NDVI maxb4 , NDVI maxb3 , and NDVI maxb4-b3 .

Remotely Sensed Yield-Prediction Models
Conclusions drawn in the yield-trend analysis and the correlation analysis between Y RS and the NDVI variables encouraged us to attempt to build a simple remotely sensed yield-prediction model for rice based on the NDVI variables. According to the correlation coefficient result summarized in Table 6, the Y RS values that were de-trended by linear regression were used as dependent variables in HLJ, HN, JX, and SC. The Y RS values that were de-trended by a 5-year moving average were used as dependent variables in GX. The NDVIs were used as independent variables. These models were constructed through the 'STEP-WISE' regression process in SPSS software. Each model contains variables using the data period from 1982 to 2004. The correlation coefficients of the selected models ranged from 0.42 to 0.92, and all models were significant at the 0.01 level, except for HLJ which is significant at the 0.05 level (see Table 7). This means that increases in NDVI during the rice-growth period are generally related to the final rice-grain yield. The influence of NDVI always had a positive impact on yield. These results are consistent with numerous previous studies [34,36,42,75]. Data from 2005 to 2006 were used for model validation.

Validation of Rice-Yield Prediction Models
The remotely sensed yield (Y RS ) of rice was calculated using the NDVI variables required by each model described in Table 7. The final rice yield (Y) was the sum of the trend yield (Y t ) and the remotely sensed yield (Y RS ).  values and the observed values was close to the diagonal (intercept = 0, slope = 1), and the coefficients of determination for the five study areas ranged from 0.84 to 0.98, indicating that the reliability of the forecasts are very high.
The yield data for 2005 and 2006 were not included in the model construction and instead were used to evaluate the prediction models independently. These data provide independent estimates of the predictive power of the selected models (Table 8).  The differences between the predicted values and the official statistical values were 5% or less in seven out of ten years. These results demonstrate the potential of a NDVI rice-yield estimate that is based on model calibration with historical data at the provincial level. However, it is noticeable that the predicted relative errors were greater than 10%, but less than 19% in both 2005 and 2006 for SC and in 2006 in HLJ when compared with the official statistical data. These error rates are likely due to a number of contamination sources that can confound the potential relationship between NDVIs and rice yield. For instance, cloud and atmospheric-moisture contamination can influence the NDVI signal. Vegetation signals from before or after the selected NDVIs can impact the final yield of rice.

Conclusion
This study focused on the obvious and important role that advance of technology plays in rice yields increase. The results of this analysis suggest that the most common trend of rice yields in China during the years 1979-2009 is a linear growth. In the light of rice-yield trend could not be detected directly by a satellite remote sensor therefore, yield de-trended analysis was necessary to properly identify the remote-sensible effects and obtain an accurate prediction for rice yield. Only with de-trending analysis could we interpret the NDVI's evolution as being mainly due to variations in the photosynthetic activity and growth conditions of rice and then predict the rice yield using NDVI variables.
The AVHRR-based indices explored in the present research were useful for the remotely sensed rice yield-prediction in major rice cultivation areas of China. This method allowed us to have a fine provincial estimate which satellite image could be difficult to obtain, or else a similar cost and a similar time frame data is easily available. However, it is cautious to restrict these analysis to those areas where the common trend of the crop yield is linear growth for the period considered.
The two steps for de-trending the statistical yield to obtain new time series, that are the trend yield (Y t ) and the remotely sensed yield (Y RS ); And by constructing the prediction models of Y RS using NDVI variables enabled the development of a robust, simple, remotely sensed data-based model that was applicable at the provincial level in China. We believe the approach introduced here has a wide applicability to other rice-producing countries as well as other crops, such as wheat and corn.
More empirical studies should be performed on the use of AVHRR-derived NDVI time series as predictors for crop yield to enhance the understanding its forecasting capacity and limitations, and to validate the methods of remotely sensed yield estimation further. A future study should also include the application of a longer AVHRR NDVI time series in combination with other data sets such as SPOT-VEG, MODIS and SeaWiFS, especially in the event of one of these dataset's unexpected absence.