Prediction of Potato Crop Yield Using Precision Agriculture Techniques

Crop growth and yield monitoring over agricultural fields is an essential procedure for food security and agricultural economic return prediction. The advances in remote sensing have enhanced the process of monitoring the development of agricultural crops and estimating their yields. Therefore, remote sensing and GIS techniques were employed, in this study, to predict potato tuber crop yield on three 30 ha center pivot irrigated fields in an agricultural scheme located in the Eastern Region of Saudi Arabia. Landsat-8 and Sentinel-2 satellite images were acquired during the potato growth stages and two vegetation indices (the normalized difference vegetation index (NDVI) and the soil adjusted vegetation index (SAVI)) were generated from the images. Vegetation index maps were developed and classified into zones based on vegetation health statements, where the stratified random sampling points were accordingly initiated. Potato yield samples were collected 2–3 days prior to the harvest time and were correlated to the adjacent NDVI and SAVI, where yield prediction algorithms were developed and used to generate prediction yield maps. Results of the study revealed that the difference between predicted yield values and actual ones (prediction error) ranged between 7.9 and 13.5% for Landsat-8 images and between 3.8 and 10.2% for Sentinel-2 images. The relationship between actual and predicted yield values produced R2 values ranging between 0.39 and 0.65 for Landsat-8 images and between 0.47 and 0.65 for Sentinel-2 images. Results of this study revealed a considerable variation in field productivity across the three fields, where high-yield areas produced an average yield of above 40 t ha-1; while, the low-yield areas produced, on the average, less than 21 t ha-1. Identifying such great variation in field productivity will assist farmers and decision makers in managing their practices.


Introduction
Achieving the maximum crop yield at the lowest investment is an ultimate goal of farmers in their quest towards an economically efficient agricultural production. Early detection of problems associated with crop yield can greatly help in reducing the loss and reaching the targeted yield and profit. Potato is classified as being the fourth major staple around the globe, which is still quickly attaining importance [1]. The growing interest in potato, along with the diminishing agricultural lands, introduces the need for germplasm yield enhancement, better crop protection and much more efficient and productive management systems. Prediction of potato crop yield prior to the harvest period can be very useful in pre-harvest and marketing decision making. Many studies [2,3] showed that traditional methods of crop yield estimation could lead to poor crop yield assessment and inaccurate crop area appraisal. In addition, these methods normally depend on rigorous field data collection of crop and yield, which is a costly and time-consuming process.
Remote sensing (RS) and global positioning system (GPS) technologies can be used to assess the temporal variation in crop dynamics, including crop yield and its spatial variability [4]. Visible (blue, green and red) and near infrared (NIR) portions of the electromagnetic spectrum have already proven their effectiveness in accessing information on crop type, crop health, soil moisture, nitrogen stress and crop yield [5][6][7][8][9][10][11][12][13]. Advancement in RS techniques enhanced the use of multispectral images as an effective tool in determining and monitoring vegetation conditions, crop stress and crop yield prediction. Liu and Kogan [14] revealed that remote sensing data offered exceptional spatial and temporal land surface characteristics, including the environmental impacts on crop growth. Numerous studies have reported that there could be a good correlation between the vegetation indices provided by the RS techniques and the crop yield and biomass [14][15][16][17]. A crop yield research that is conducted at a regional scale, which employs coarse or low-resolution satellite images, can provide a broader information on the crop canopy conditions and crop yield estimates. Hence, decisions in the quantitative export and import of the product within the region could be made in assured way.
Prediction of crop yields is typically associated with certain agronomic variables (density, vigor, maturity and disease), which can be used as yield indicators. Remote sensing offers a close diagnosis of plant health; however, the spectral reflectance of the crop is dependent on phenology, stage type and crop health. Several studies [4,6,[18][19][20][21][22] have focused on crop growth analysis using normalized difference vegetation index (NDVI) to enhance precision agriculture. Research in plant life monitoring has proven that NDVI is associated with the leaf area index (LAI) and the photosynthetic activity of crops. The NDVI is an indirect way of measuring the primary productivity through its quasi-straight line relation using the Fraction of Absorbed Photosynthetically Active Radiation (FAPAR) [23] and [24]. Also, Baez-Gonzalez [6] used Landsat ETM (enhanced thematic mapper) data with an NDVI model to estimate corn yield, where a prediction error of 9.2% in the yield was determined. Yang [25] used the United States Department of Agriculture (USDA) EPIC model to predict yield, where the difference between recorded and predicted yield was less than 10%. Baez-Gonzalez [18] modeled a corn yield with NDVI generated from NOAA Advanced High-Resolution Radiometer (AVHRR) images. A study by Gopalapillai and Tian [19] reported correlation coefficient (r) values varying from 0.13 to 0.98 for predicting corn yield from nine different fields using a span of two-year datasets. They used aerial pictures of the corn plots and calculated NDVI to predict yield, where the average correlation coefficient (r) between the NDVI and the yield over all the nine fields was determined at 0.54. On the other hand, soil adjusted vegetation index (SAVI) was used in some studies as it showed a tendency to minimize soil brightness, a phenomenon which has been addressed by Miura [26] and Lamb [27]. Jayanthi [28] carried out research on yield estimation of potato integrating the SAVI from high resolution airborne multispectral imagery and developed various yield models. Huete [29] introduced a soil calibration element in the NDVI equation to take into account the very first order optical interactions between soil and vegetation. Bala and Islam [30] used TERRA MODIS images to estimate a potato yield, where a prediction error of 15% was determined using ground truth data collected from 50 fields.
In addition to providing a decision support tool and revenue expectation, the predicted yield maps can be used as spatial databases for the implementation of variable rate technology (VRT) systems to achieve a precise application of field-level inputs in order to optimize production across the entire field. Therefore, this study was designed to provide a means of early prediction of potato yield (i.e. prior to the harvest period) using multispectral satellite remote sensing on a field scale. However, the specific objectives of this study were (i) to obtain an empirical equation for the early prediction of potato yield using multispectral images in conjunction with field collected potato yields, (ii) to determine the suitable growth stage for early prediction of potato yield, and (iii) to classify the obtained yield maps into distinct zones for the implementation of precision agriculture activities.

Study area
This study was conducted on three 30 ha center pivot irrigated agricultural fields of Saudi Agricultural Development Company (INMA) in Wadi Al-Dawasir area south of Riyadh, the capital city of Saudi Arabia. The study area was located within the latitudes of 19.90°and 20.33°N and the Longitudes of 44.81°and 44.95°E (Fig 1).
Wadi Al-Dawasir region is one of the major irrigation water abstractions from Al-Wajid Aquifer. Agriculture in this region is dominated by developed agriculture enterprises that operate modern pivot irrigation systems. The size of the center pivot fields varies from 30 to 60 ha, where one farm can contain hundreds of fields irrigated with a number of aquifer wells. The main crops grown in winter are wheat, potato, tomato and melon. Fodder crops, including the biennial multi-cut crops of alfalfa and Rhodes grass, are grown throughout the year; however, inactive during winter. Meteorological features of the region are speckled. Diurnal temperature varies from 6°C (winter) to 43°C (hot summers) with an annual mean temperature of 27.4°C. The mean annual rainfall is around 37.6 mm [31].

Satellite Images
Cloud-free satellite images of landsat-8 and sentinel-2 (Table 1) for the study period (January 26 th , 2016 to March 14 th , 2016) were downloaded from the archives of USGS Earth Explorer website (http://earthexplorer.usgs.gov/). Landsat-8 (the Operational Land Imager (OLI)) was calibrated using the data-specific utilities of ENVI (Ver. 5.3) software program, in which the sensor digital numbers were converted into spectral radiance in order to measure the amount of electromagnetic radiation reflected from a spot on the surface. The spectral radiance (Lλ) was determined using the calibration coefficients from the image metadata. Subsequently, reflectance images were generated from the obtained radiance. Various atmospheric correction techniques (dark object removal, haze removal, cloud masking, etc.) were applied to correct the sensor radiance for atmospheric effects by mathematically modeling the physical inclinations of the radiation as it passes through the atmosphere. Image enhancement and histogram stretch (linear) were also carried out. Sentinel-2, in turn, is based on a satellite constellation deployed in polar sun-synchronous orbit. While ensuring data continuity of previous Spot and Landsat multi-spectral missions, Sentinel-2 can even offer broad improvements, such as a unique blend of global coverage with a wide field of view (290 km), a very high revisit (5 days with two satellites), a high resolution (10 m, 20 m and 60 m) and multi-spectral imagery (13 spectral bands in visible and shortwave infra-red domains). Image provided for this study was level-1C, which was an ortho-rectified top of atmosphere reflectance with a sub-pixel multi-spectral and multi-date registration; a cloud and land/water mask was associated to the product. The Level-1C processing is enhanced by algorithms of resampling and cloud and land/ water masks computations, which make it a stage that lead to the computation of level-1B (Top of Atmosphere (TOA) reflectance) product. The spectral bands span from the visible and the near infrared to the short wave infrared. Four bands at 10 m ground resolution include the classical blue (490 nm), green (560 nm), red (665 nm) and near infrared (842 nm) bands specialized in land applications.
Vegetation indices (VIs), such as NDVI and SAVI, were extracted from the pre-processed images and used for yield prediction. In addition, the cumulative value of the generated indices (CNDVI and CSAVI) was also examined for the development of potato yield prediction algorithms. The NDVI and SAVI were calculated using red (ρ RED ) and near-infrared (ρ NIR ) spectral bands of Landsat-8 and Sentinel-2 as in Eqs 1 and 2 [32] and [33]. Where, L is the canopy background adjustment factor. An L value of 0.5 in reflectance space was found to minimize soil brightness variations and eliminate the need for additional calibration for different soils [28]. As depicted in Fig 2, the obtained satellite images were analyzed for vegetation indices and used for the determination of sample locations for in-situ potato yield collection. The field collected potato yield data points were correlated to the corresponding VIs in order to generate algorithms for the early prediction of potato yield for the given fields.
Sampling strategy for in-situ potato yield collection Three center-pivot irrigated fields (67-S, 68-S and 44-S) were considered as sample fields for this study. In each of the three study fields, stratified random sampling-based sixty locations (60 points: 45 for model generation and 15 for validation) were determined for in-situ potato (tuber) yield data collection. These locations were distributed across each of the experimental fields using the randomizer function of ENVI (Ver. 5.3) software program according to a prescription map generated based on vegetation cover variability and were located in the field with the help of a GPS receiver (Trimble GeoXH). The cumulative NDVI (CNDVI) layer, which is a compilation of the four extracted NDVI images during the growth stages, was used as a source for the stratification base. The stratified random sampling provided by ENVI software, which also called proportional or quota random sampling, involves splitting up the population (the complete classification image) into homogeneous subgroups (the individual classes). After that, a basic random sample in every subgroup was selected. The proportionate sampling type, which was used in this study, tends to create sample sizes that are proportional to the size of the classes (the greater the class, the more samples are to be drawn from it).
Intensive fieldwork from April 10 th to April 16 th , 2016 was carried out 2 to 3 days prior to the harvest time of each field to assure the steadiness of crop status. In-situ collection of potato yield (actual yield) was achieved by harvesting potato over a 3 m 2 area at each sampling point. The harvested potatoes were weighed and up-scaled to the common yield unit (t ha -1 ). The composite samples selected from the aggregates were analyzed for the determination of dry matter (DM) using the hydrometry test. Out of the collected 60 samples, 75% were used for model generation and the rest (25%) were considered for model validation.

Prediction of potato yield
In order to reveal the relationship between actual yield and VIs and to generate an empirical equation for the prediction of potato yield, a scatter plot was applied between the actual yield and the RS based generated VIs (NDVI, SAVI, CNDVI and CSAVI) across the growth period. The yield samples (45 yield points) availed for model construction were analyzed for Pearson correlation (linear) coefficient against the single-date VI (NDVI and SAVI) and the multi-date cumulative VIs (CNDVI and CSAVI). Moreover, growth stage-wise correlation coefficients were analyzed for the best fit empirical equations; hence, the most suitable VI for the prediction was identified accordingly. The optimum growth stage, at which the VIs were highly correlated to the yield, was assessed for the determination of the appropriate time of potato yield prediction prior to the harvest period.

Model validation and accuracy assessment
The empirical models used for the prediction of potato yield were validated against the in-situ potato yields (actual yields). For this purpose, a scatter plot was applied between the 25% of the actual yield (apart from the samples used for model generation) and the predicted yields, and then analyzed for Pearson correlation coefficient (R 2 ). Performance indicators, such as the root mean square error (RMSE) and the mean bias error (MBE) were determined in addition to the Nash-Sutcliffe Efficiency (NSE), which is a normalized statistic that establishes the relative value of the residual variance in comparison to the measured data variance [34].

Zonation of predicted yield maps
In the view of implementing precision agriculture practices and to provide an insight of the field productivity variation for better future management, the predicted yield maps were classified into five distinct zones based on their productivity. These zones included very high (above 40 t ha -1 ), high (35-40 t ha -1 ), medium (30-35 t ha -1 ), low (20-30 t ha -1 ) and very low (below 20 t ha -1 ) productivity zone. The pixel value, at each location in the image, represented the average yield (t ha -1 ) within the pixel dimensions. Pixels with equal or close yield values were compiled to assess the zonal yield.

Results and Discussion
In-situ potato yield The potato yield samples collected at the three fields from the pre-allocated points were statistically analyzed to assess the spatial variability of the yield and the productivity of different zones within one field. Table 2 presents some statistics of the collected potato samples.

Potato yield prediction models
Linear regression analysis showed that the relationships between the actual crop yield and VIs were varying throughout the growth period (Table 3). Results revealed that the early and late stages of crop life showed the least correlations. This can be attributed to that at the early crop stages, reflection from vegetation cover is highly noised by the soil surface. On the other hand, at late stages (i.e. when the tubers are completely riep), potato leaves tend to turn into yellow (i.e. reduced chlorophyll) [35]. However, the best-fit models were obtained with NDVI, SAVI, CSAVI and CNDVI. The highest correlation (R 2 ) value was observed to be found 60 to 70 days after planting dates of the potato crop (Table 3). For Landat-8 data, the best-fit models were obtained with CSAVI, NDVI and SAVI for fields 67-S, 68-S and 44-S, respectively. However, for sentinel-2, SAVI provided the best-fit for all of the studied three fields. Table 4 presents the obtained prediction equations that have been used to generate potato yield maps. Prediction algorithms from Landsat-8 showed a preference of using different VIs for yield prediction (CNDVI, NDVI and SAVI for fields 67-S, 68-S and 44-S, respectively). This can be attributed to the effect of the relatively coarse (30 m) spatial resolution which hinders the optimum representation of the vegetation cover reflection. In addition, the variation in crop age over the three fields (difference in cultivation dates) could be a cause of different indices selection. However, the yield was found to be correlating well with the SAVI (as in Sentinel-2) because SAVI minimized the soil background reflection effects caused by natural variability in surface reflection [29].
What can be revealed from the equations is that each prediction equation must be handled as sensor and location specific. Hence, for each field there are two prediction equations from two different satellite images. In general, the mid-growth stage (60-70 days after planting) was found to be the best time for yield prediction for potato crop. Accuracy of models used in potato yield prediction Accuracy of the obtained empirical (yield prediction) models was assessed against the actual yield. The highest correlation was observed for the field number 67-S, where both cumulative SAVI (Landsat-8) and single-date SAVI (Sentinel-2) resulted in a similar R 2 value of 0.65. Validation results for field 68-S showed R 2 values of 0.49 and 0.45, when potato yield was correlated with NDVI (Landsat-8) and SAVI (Sentinel-2), respectively. Furthermore, for field 44-S, the R 2 values of 0.39 and 0.47 were obtained by incorporating SAVI from Landsat-8 and Sentinel-2, respectively (Fig 3).
Results of the performance indicators used in the validation of prediction model are provided in Table 5. The correlation between the yield predicted by the developed models and the actual yield revealed consistent similarity in terms of yield spatial and quantitative distribution with high significances.
The obtained validation accuracies were in agreement with previous findings, where the prediction was conducted on different crops using different space borne sensors [33] and [36]. As stated by [28], yield prediction accuracy increased with the increase of the density of observation points at the field. Given that statement, it was noticed from the accuracy investigation that high values of R 2 could be obtained at a field when zonal-management strategy was applied in terms of sufficient observation points within each field zone. Temporal variability also affects the validation accuracy, where the results of applying the obtained model for the subsequent growth periods would determine the applicability of the tested model for a continuous prediction of potato yield. Table 6 represents a comparison between the actual yields and the predicted potato yields of the entire fields (67-S, 68-S and 44-S) after applying the prediction equations to the whole  field's pixels (not only the 45 locations). Thus, the prediction errors were described as results of comparing the total actual collected yields versus the total predicted ones. Total field's yield data were acquired after performing the full harvest of the three fields. From Table 6, it can be seen that the analyzed image from Sentinel-2 produced a better prediction accuracy for fields 67-S and 68-S (prediction errors of 3.8% and 7.5%, respectively) compared to that produced by Landsat-8 images (prediction errors of 7.9% and 13.5%, respectively). This was attributed to the high spatial resolution of Sentinel-2 images, which enabled the reduction of the generalization in crop spectral reflection and delineation of sharp field boundaries, so that noise from the neighboring soils was removed. However, the prediction errors from Landsat-8 (9.1%) and Sentinel-2 (10.2%) in field 44-S were very much indifferent.

Zonal analysis of the classified yield maps
The predicted yield maps of the three study fields (Fig 4) were developed from the Landsat-8 and Sentinel-2 satellite images employing the different indices shown in Table 4 and Fig 3. These maps showed a considerable variation in yields within each field as represented by the five classes. For all the study fields, it can be observed that the very-low-yield class was mainly distributed across the field boundaries. This was attributed to the fact that pixels of this yield class were located at the cross-boundary between the vegetation and the bare soil areas (field's The yield classes of Landsat-8 and Sentinel-2 maps were described based on pixel count that associated with class creation. By comparing the yield potentials within the very-high-yield and the very-low-yield classes, a variation of yield ranging from 20 to 40 t ha -1 was found to exist within each field. The least yield values (0 to 20 t ha -1 ) were observed along the fields boundary, where the cumulative reflectance of soil and vegetation was detected at the field's end. The class-wise histograms of potato yield distributed across the studied fields were developed (Fig 5). From Fig 5, it can be seen that the most frequent productivity was achieved within the high-yield classes of fields 67-S and 68-S, with total absence of the very-high-yield class in field 68-S, which were observed to have average values of 37 t ha -1 for both fields and a yield range of 37 to 38 t ha -1 for field 44-S. On the other hand, the most productive areas (very-highyield classes) were observed to be less frequent compared to the high-yield classes. These results highlighted the importance of improving the productivity over the three least yield classes (medium, low and very low) to increase the total field productivity through the use of potato yield monitoring techniques and a suitable management. On the other hand, it is believed that expanding the analysis to cover the environmental and soil influential factors can assist in the enhancement of the yield productivity. The normal distribution shown by graphs (e) and (f) revealed that the field 44-S produced a broad variety of yields within the five yield classes. Production with such variability could complicate the planning for the zone management in terms of following a specific type agricultural activity or applying fixed additive doses.
In order to demonstrate the importance of reclaiming the zones with the low-yields, a quantitative assessment of the predicted potato yield derived using both satellites (Landsat-8 and Sentinel-2) over the three fields was conducted. Fig 6 shows the relative variation in area versus yield classes for Landsat-8 and Sentinel-2 maps. The resulting analysis highlighted the potential of using precision agriculture in the enhancement of the productivity through managing the low yield locations for more profitability (vertical expansion in crop yield). From Landsat-8 yield maps, it was calculated that the area of very-high-yield class (yield ! 40 t ha -1 ) occupied 25% of the total area of field 67-S (Fig 6a); however, the same class occupied 68% of the total area of field 44-S. On the other hand, no such class (very-high-yield zone) was predicted for field 68-S, which indicated a lower field productivity that was limited to less than 40 t ha -1 . Sentinel-2 maps, however, revealed that the very-high-yield classes (yield ! 40 t ha -1 ) occupied 30%, 4% and 62% of the total area for fields 67-S, 68-S and 44-S, respectively (Fig 6b). Prediction by Sentinel-2 of the very-high-yield class in field 68-S contradicted with the results from Landsat-8 for the same field. However, the actual yield observations for field 68-S agreed with the Sentinel-2 predictions. The contradiction could be attributed to the coarse spatial resolution of Landsat-8, in which the surface reflection was associated with multiple factors, such as soil, weeds, etc.

Conclusions
A field study was conducted to provide an efficient and practical means of predicting potato tuber yield cultivated under a center pivot irrigation system using remote sensing techniques. The following conclusions are inferred from the study: • The study provided an effective method to predict potato yield and map its spatial variability.
Images from Landsat-8 (30 m resolution) and Sentinel-2 (10 m resolution) satellite vehicles were downloaded for the study area free of charge. The use of vegetation indices (VIs) extracted from the satellite images was found to provide an effective and an efficient means of potato yield prediction.
• Prediction of crops yield through analyzing their bioactivities using satellite images could avail considerable information to improve productivity, such as recognizing the variation in field productivity across the field. It has been noticed in this study that high-yield areas produced average yields of above 40 t ha -1 ; however, the low-yield areas produced, on the average, yields of less than 21 t ha -1 . Displaying and highlighting such great variations in field productivity can help farmers and decision makers be aware of the problem; hence, adjust their management practices.