Site-specific machine learning predictive fertilization models for potato crops in Eastern Canada

Statistical modeling is commonly used to relate the performance of potato (Solanum tuberosum L.) to fertilizer requirements. Prescribing optimal nutrient doses is challenging because of the involvement of many variables including weather, soils, land management, genotypes, and severity of pests and diseases. Where sufficient data are available, machine learning algorithms can be used to predict crop performance. The objective of this study was to determine an optimal model predicting nitrogen, phosphorus and potassium requirements for high tuber yield and quality (size and specific gravity) as impacted by weather, soils and land management variables. We exploited a data set of 273 field experiments conducted from 1979 to 2017 in Quebec (Canada). We developed, evaluated and compared predictions from a hierarchical Mitscherlich model, k-nearest neighbors, random forest, neural networks and Gaussian processes. Machine learning models returned R2 values of 0.49–0.59 for tuber marketable yield prediction, which were higher than the Mitscherlich model R2 (0.37). The models were more likely to predict medium-size tubers (R2 = 0.60–0.69) and tuber specific gravity (R2 = 0.58–0.67) than large-size tubers (R2 = 0.55–0.64) and marketable yield. Response surfaces from the Mitscherlich model, neural networks and Gaussian processes returned smooth responses that agreed more with actual evidence than discontinuous curves derived from k-nearest neighbors and random forest models. When conditioned to obtain optimal dosages from dose-response surfaces given constant weather, soil and land management conditions, some disagreements occurred between models. Due to their built-in ability to develop recommendations within a probabilistic risk-assessment framework, Gaussian processes stood out as the most promising algorithm to support decisions that minimize economic or agronomic risks.


Large-size tubers
110 * NPK factorial design or others where N (nitrogen), P (phosphorus) and K (potassium) 111 were kept constant. . We matched the duration from planting to harvest but the classes names differed. 116 The preceding crops were categorized as in Parent et al. [7] as grasslands, legumes, 117 cereals, low-residue crops and high-residue crops. Toponymic names, geographical 118 coordinates and years were recorded at each site. Fertilizers other than N, P or K, 119 fertilizer source, dosage and application method, seeding density and date, harvest date, 120 tuber marketable yield (excluding tubers < 2.5 cm in diameter), tuber size distribution 121 (small, medium, large) and SG were recorded. The N fertilizers were either all applied 122 at seeding or split-applied between seeding and hilling. The P fertilizers were banded at 123 planting. The K fertilizers were band-applied or split-applied before planting and at 124 planting. We added 17 trials conducted in 2016 and 2017 in the Outaouais, Centre-du-125 Québec, and Lac-Saint-Jean regions. We reported the growing season lengths provided 126 by scouting teams covering the period from seeding to harvest and not strictly 127 corresponding to the theoretical CFIA [53] growth duration as shown for cultivars 128 Superior, Goldrush, Krantz and FL 1533 from the trials used for model analysis (Table   129 2).
130  203 side, c jis the compositional vector at the left-hand side, c j + is the compositional vector 204 at the right-hand side, and g() is the geometric mean function.

205
The proportion of the textural components and the carbon content formed the 206 soil texture simplex. The balances are presented in Table 5. We followed the 207 [denominator parts | numerator parts] notation [71]. 208

Shannon Diversity
Index for rainfall 224 Rd is daily rainfall, n is the number of days and Tm is daily mean temperature.   315 Typically, values greater than 0.5 are considered acceptable [92]. The MAE is the 316 average of the absolute differences between predictions and observations as in equation  Mitscherlich, NN and GP models generated smooth response curves, while the KNN 407 and RF models generated stepped curves. The marketable yield was non-responsive to P 408 application in the RF model. There was also no effect of K fertilization on the yield 409 shown by the Mitscherlich and RF models. All models for the P trial somewhat 410 underestimated marketable yield while response curves followed data for N.  (Fig 4) showed increasing response to N fertilization across models, 419 while response was globally poor for P and K. For the [S | M] balance, responses 420 increased with increasing fertilizer doses, except for P and K trials data fitted with GP 421 model (Fig 5). There was also poor response for K trial with SG (Fig 6). The SG 422 response decreased from zero K levels and increased then decreased as P dosage 423 increased. For N trials, SG slightly increased then decreased as N dose increased in the 424 RF model, but was non-responsive with the other models.

587
The prediction of optimum fertilizer doses and optimum or maximum outputs 588 showed some disagreements for the case presented (Fig 7). There should be a single 589 economic optimal dose or agronomic optimal dose at each site each year. Some models 590 were more consistent than others in deriving optimal doses depending on the target 591 variable. At extremely low predicted N, P or K doses, it could be challenging to manage 592 the fertilization program at low economic risk for producers, who generally consider 593 that the cost of over-fertilization is low compared to the cost of under-fertilization [37, 594 38]. The probabilistic prediction capability of Gaussian processes may help to 595 determine credible dosage.