Figures
Abstract
Statistical modeling is commonly used to relate the performance of potato (Solanum tuberosum L.) to fertilizer requirements. Prescribing optimal nutrient doses is challenging because of the involvement of many variables including weather, soils, land management, genotypes, and severity of pests and diseases. Where sufficient data are available, machine learning algorithms can be used to predict crop performance. The objective of this study was to determine an optimal model predicting nitrogen, phosphorus and potassium requirements for high tuber yield and quality (size and specific gravity) as impacted by weather, soils and land management variables. We exploited a data set of 273 field experiments conducted from 1979 to 2017 in Quebec (Canada). We developed, evaluated and compared predictions from a hierarchical Mitscherlich model, k-nearest neighbors, random forest, neural networks and Gaussian processes. Machine learning models returned R2 values of 0.49–0.59 for tuber marketable yield prediction, which were higher than the Mitscherlich model R2 (0.37). The models were more likely to predict medium-size tubers (R2 = 0.60–0.69) and tuber specific gravity (R2 = 0.58–0.67) than large-size tubers (R2 = 0.55–0.64) and marketable yield. Response surfaces from the Mitscherlich model, neural networks and Gaussian processes returned smooth responses that agreed more with actual evidence than discontinuous curves derived from k-nearest neighbors and random forest models. When conditioned to obtain optimal dosages from dose-response surfaces given constant weather, soil and land management conditions, some disagreements occurred between models. Due to their built-in ability to develop recommendations within a probabilistic risk-assessment framework, Gaussian processes stood out as the most promising algorithm to support decisions that minimize economic or agronomic risks.
Citation: Coulibali Z, Cambouris AN, Parent S-É (2020) Site-specific machine learning predictive fertilization models for potato crops in Eastern Canada. PLoS ONE 15(8): e0230888. https://doi.org/10.1371/journal.pone.0230888
Editor: Vassilis G. Aschonitis, Hellenic Agricultural Organization - Demeter, GREECE
Received: March 9, 2020; Accepted: July 19, 2020; Published: August 7, 2020
Copyright: © 2020 Coulibali et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting Information files. There is no restriction on sharing of data and/or materials.
Funding: ZC is partly funded by the Natural Sciences and Engineering Council of Canada (CRDPJ 385199-09 and DG-2254 - https://www.nserc-crsng.gc.ca), the Quebec Ministry of Agriculture, Fisheries and Food (IA216581 - https://www.mapaq.gouv.qc.ca), Centre SEVE (https://centreseve.recherche.usherbrooke.ca/), Patate Dolbec Inc. (https://patatesdolbec.com/), Groupe Gosselin FG (http://gosseling2.com), Agriparmentier Inc., Ferme Daniel Bolduc Inc. (http://fermedanielbolduc.com/), Patate Laurentienne, Ferme Bergeron-Niquet, and Patates Lac-St-Jean (http://plsj.ca/). There was no additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist. All the funders (Natural Sciences and Engineering Council of Canada, Quebec Ministry of Agriculture, Fisheries and Food, Centre SEVE, Patate Dolbec Inc., Groupe Gosselin FG, Agriparmentier Inc., Patate Laurentienne, Ferme Bergeron-Niquet, and Patates Lac-St-Jean) have declared that no competing interests exist. This does not alter our adherence to PLOS ONE policies on sharing data and materials.
1. Introduction
Modeling provides a quantitative understanding of how crop systems operate [1]. Site-specific simulations of fertilizer requirements to obtain high local potato yield and quality rely on models’ ability to detect subtle variations in factors affecting plant growth and environment and to learn from the past to make predictions [2]. Several crop models have been developed with different degrees of sophistication, scale, and representativeness [2]. Mechanistic models have been published for potato cropping systems [3, 4]. Semi-mechanistic growth models could be used to downscale tuber yield assessment from regional to field levels [5, 6]. Multilevel modeling can assist in selecting a set of relevant parameters that impact tuber yield and fertilizer requirements, but can hardly predict site-specific nutrient requirements [7].
Several variables can impact fertilization at optimum tuber yield: soil type and quality [8, 9], organic fertilizers [10, 11], preceding crops [12–16], weather conditions [17], irrigation [18], timing, location and chemical form of the fertilizer applied [19], pests and diseases [20] and genetic factors such as cultivar longevity and growth rate [21, 22]. Air temperature, photoperiod, day length, intercepted radiation, water abundance, precipitations, root development and crop management were reported to be the driving variables for potato growth and development [8, 9, 23–26]. While the nitrogen (N) requirement of potato crops compares with other high N-demanding crops, phosphorus (P) uptake depends largely on close contact between roots and soil particles that, in turn, depends on soil texture, buffering capacity and moisture content [27, 28]. Due to a shallow system of fine roots and small biomass [29], especially in compacted soils [8, 9], potato is sensitive to nutrient and water stresses [30].
The N, P and K (potassium) requirements are thought to be cultivar- and market-specific [31–33]. Specific gravity (SG) is of particular concern for North-American processors [34, 35]. Other characteristics, such as tuber size and grade are also valued [34]. No model has yet addressed K requirements accounting for interactions between genetics, environment and management [36].
Growers tend to over-fertilize because of the potential economic loss from under-fertilizing [37, 38]. While N can cause nitrate contamination [39–42] and P the eutrophication of surface waters [43–45], K has no known deleterious effect on the quality of natural and drinking water. Attempts have been made to synthesize the results of fertilizer experiments using meta-analysis to derive N optima for specific soil texture and pH groups [46] or multilevel modeling combining soil, climate indices and management variables [7]. Even where field trials could identify nutrient optima [47], such optima cannot be generalized to conditions different from those of particular experiments [48, 49].
Although experimental data grow continuously in size and quality, it is still beyond researchers’ ability to integrate, analyze and make the best-informed decisions. Machine learning is an emerging technology that can aid in the discovery of rules and patterns in large sets of data [50]. The technology bypasses intermediate processes otherwise explicitly explained by a mechanistic modeling system and makes predictions directly based on input data [51]. Machine learning methods can combine fertilizer dosage, genetics, environmental and land management variables to predict tuber yield and quality. Classical models such as Mitscherlich are limited to plant-nutrient relationships [52].
We hypothesized that (1) genetics, environment and local land management practices are the main drivers of fertilizer requirements, (2) k-nearest neighbors, random forest, neural networks and Gaussian processes are more accurate in predicting marketable yield than classical Mitscherlich predictive models, and (3) the machine learning algorithms are equally able to predict economic optimal or agronomic optimal fertilizer doses. The objective of this study was to develop, evaluate and compare the performance of machine learning models in predicting N, P and K requirements for potato.
2. Methodology
2.1 Data set
The Quebec (Canada) potato data set is a collection of field fertilizer trials conducted from 1979 to 2015 between the US border (45th parallel) and the Northern limit of cultivation (49th parallel). We added 17 trials conducted in 2016 and 2017. Fig 1 shows the location of experimental sites.
The trials with maximum yield less than 28 Mg ha-1 were discarded to avoid extreme cases of diseases, management failures or catastrophic weather events. The data set contains 4254–5913 observations from 208–273 field trials, depending on the number of missing values found in the target variable. Most experiments have been carried out from 1991 (Table 1). The number of trials, the number of samples, minimum and maximum number of blocks and treatments are given in S1 and S2 Tables according to the study year and the fertilizer tested.
There were 48 cultivars classified as early (65–70 days), early mid-season (70–90 days), mid-season (90–110 days), mid-season late (110–130) or late maturity (130 days and more) as suggested on the website of the Canadian Food Inspection Agency [54], with 4%, 13%, 62%, 12% and 9% of the samples respectively. The growing season lengths were provided by scouting teams covering the period from seeding to harvest. The names of the cultivar maturity classes consigned in the data set do not strictly match those of the Canadian Food Inspection Agency [54]. The preceding crop was categorized as in Parent et al. [7] as grasslands, legumes, cereals, low-residue crops and high-residue crops (S3 Table). The data set also includes fertilizers other than N, P or K (classified as NA), fertilizer dosage and application method, seeding density and date, harvest date, tuber marketable yield (excluding tubers < 2.5 cm in diameter), tuber size distribution (small, medium, large) and specific gravity.
2.2 Experimental procedures
The experiments included four to six treatments arranged mostly in a randomized complete block design with a minimum of three replications of each treatment (S1 Table). One trial conducted in 1987 had two replications and 8% to 10% of the experiments were arranged as factorial design combining N, P and K fertilizers. We also retained one trial were N, P and K were fixed at their grower-optimum level (S2 Table). Each experimental unit consisted of four or six rows measuring 6 or 8 m in length, with an average row spacing of 0.915 m and within-row spacing varying with cultivar. The potato seeds were planted in May (excepting June in the Outaouais region) then harvested in September. Median plant density was 36000 plants ha-1 in N trials, 33100 plants ha-1 in P trials, 36400 plants ha-1 in K trials, and 43700 plants ha-1 in factorial NPK trials. The N doses varied from 0 to 260 kg N ha-1 with varying steps, and P was applied at a dosage of 0 to 130 kg P ha-1 with varying steps. The K was applied at a dosage of 0 to 350 kg K ha-1 with varying steps. The P and K fertilizers could be converted to P2O5 and K2O by multiplying P by 2.291 and K by 1.205. Nitrogen fertilizers were either entirely applied at planting or split-applied between planting and hilling. Phosphorus fertilizers were banded at planting. Potassium fertilizers were band-applied or split-applied before planting and at planting. No animal manure or compost had been applied in the spring and the preceding fall. Other practices were managed uniformly by the grower.
At harvest, 3-m-long ridges in the middle two rows of each plot were dug and hand harvested. Tubers were divided into four categories as follows: culls, small (S), medium (M) or large (L), depending on the smallest diameter size measured with a ruler. The size cut-offs varied with cultivars and market. The marketable yield was calculated as total yield minus culls (tubers < 25 mm in size). Tubers with external defects such as secondary growth and soft rot were discarded. A representative sample of 20 medium-size tubers from each plot was used to determine tuber specific gravity.
2.3 Soil characteristics
2.3.1 Basic soil composition.
Composite soil samples from the 0–20 cm layer were collected in the spring of the study year before planting to determine the initial soil physicochemical characteristics. Particle size distributions were measured as % clay (0–0.002 mm), % silt (0.002–0.05 mm), and % sand (0.05–2 mm) by sedimentation [55] or laser diffraction [56]. Where soil textural classes were not recorded, central values computed for sand, silt, and clay percentages (S4 Table) using the Quebec soil data set [57] were assigned as proxies.
Soil carbon concentration was determined using the Walkley-Black method [58] or Dumas combustion (Leco Instrument, Saint-Louis, MO). The two methods are closely related as in Eq 1 [59]:
(1)
Because soil particle-size distribution and organic matter content are compositional data, they were transformed into isometric log-ratios (ilr) to avoid self-redundancy, non-normal distribution and scale dependency [60]. The ilr transformation consists in log ratios of the geometric means of hierarchically-arranged components and groups of components, and can be interpreted as balances [61]. The hierarchical arrangement of components follows a balance scheme where balances split groups of components sequentially until each group contains a single part. Each balance is computed as in Eq 2:
(2)
where for the jth balance in [1,…, D-1] (D is the length of the compositional vector), rj is the number of parts on the left-hand side, sj is the number of parts on the right-hand side, cj- is the compositional vector at the left-hand side, cj+ is the compositional vector at the right-hand side, and g() is the geometric mean function. Hence, the textural components and carbon content were balanced as [Sand, Silt, Clay | C], [Clay | Sand, Silt] and [Silt | Sand]. We followed the [denominator parts | numerator parts] notation [62].
2.3.2 Soil pH.
Soil pH was measured in water (1:1, v/v) or in a 0.01 M CaCl2 solution (1:1 v/v) [63]. The pHCaCl2 was converted into pHH2O where required, as in Eq 3 [64]:
(3)
2.3.3 Soil Mehlich-3 extractable P, K, Al, Mg and Ca.
Soil P was extracted using the Mehlich-3 method [65] or Bray-2 converted to P Mehlich-3 values using the Khiari et al. [43] equation as in Eq 4:
(4)
Soil Al was extracted using the Mehlich-3 method or, where not available, from the typical Al-Mehlich-3 value of soil series as reported by Tabi et al. [57]. Soil K, Ca and Mg were extracted using the ammonium acetate method or its closely-related Mehlich-3 extractant [66]. The P concentration was determined colorimetrically [67] or by inductively coupled plasma (ICP). The K concentration was determined by flame emission or ICP, and Ca, Mg, and Al concentrations were quantified by atomic absorption spectrometry or ICP.
Soil chemical compositions were partitioned into two simplexes S(P, Al) and S(K, Ca, Mg). The ilr variables were [Fv | Al, P], [Al | P] on the one hand and [Fv, Mg, Ca | K], [Fv | Mg, Ca], [Mg | Ca] on the other.
2.3.4 Soil profiles.
The soils in our data set were classified according to the Canadian Soil Classification Working Group [68] and ordered along a gleyzation-podzolization gradient using tools of pedometrics [69]. Soil profile reflects the influence of subsoil on crop growth, in particular its impact in regulating the availability of water [70]. The continuous expressions for Quebec potato soil types defined by Leblanc et al. [69] and used by Parent et al. [7] i.e., poorly-drained loam, poorly-drained sand and well-drained sand, were balanced as [Gleyed | Podzolized] and [Loamy gleyed | Sandy gleyed].
2.4 Weather data
Weather data were collected from the Environment Canada information system [71] using geographical coordinates for each site. The selected weather indexes were the cumulative precipitation–PPT, the Shannon Diversity Index for rainfall distribution–SDI [72], the mean temperature, and the number of growing degree days–GDD.
The cumulative precipitation was computed as the sum of daily rainfall from planting to harvest. The Shannon Diversity Index is the precipitation evenness or the fraction of daily rainfall relative to the total rainfall in a given time period (in days). A SDI = 1 implies complete evenness i.e., equal amounts of rainfall in each day of the period while a SDI = 0 implies complete unevenness i.e., all rain in 1 day [72]. The mean temperature was computed from the planting date to harvest date. The growing degree days index was computed using daily mean temperatures and using 5°C as baseline temperature (i.e., sum of daily mean temperatures equal or superior to 5°C only). Weather variables were computed as in (Table 2) for the period between planting and harvest dates using the historical weather data of the past 5 years (from the corresponding study year) at each site.
2.5 Selection of features
2.5.1 Predictive features.
The study focused on potato yield-impacting factors reported by Parent et al. [7]. Candidate variables were soil Mehlich-3 P, K, Mg, Ca, Al and Fe composition, soil pH, and soil profile classes expressed as balances across soil textural gradients and across gleization-podzolization processes as in Leblanc et al. [69]. The length of the growing season, the preceding crop categories, seeding density and N, P and K fertilizer dosages were used as land management variables. The average 5-yr temperature (T), PTT, GDD and SDI were used as weather features.
The importance of features can be assessed by assigning them a score based on how useful they are at predicting a target variable. We assessed features importance using ExtraTreesRegressor function from the scikit-learn Python package [73] on the training set of each target variable.
2.5.2 Target variables.
The data set is a collection of several experiments with specific objectives. Target variables were total yield, yield fractions, and SG. We separated marketable yield fractions with respect to tuber size as follows: large (L), medium (M) or small (S) size. Because these three fractions must add up to 100% of the marketable yield, they were treated as compositions. These compositional variables were transformed into isometric log-ratios of large-size tubers divided by the geometric mean of small- and medium-size tubers [M, S | L], and medium-size tubers divided by small-size tubers [S | M]. Since analysis of compositional data based on log-ratios of parts is not suitable when zeros are present in a data set [74], we proceeded by firstly imputing zero observations [75], reported mostly for large-size tubers. The detection limit was fixed at 65%. Table 3 summarizes the variables used for modeling. Tuber SG was determined by the weight-in-air to weight-in-water method [76] as in Eq 5:
(5)
2.6 Data preprocessing
The data were partially preprocessed in the R 3.6.2 statistical computing environment [77]. The tidyverse 1.3.0 package [78] was used for general data handling and visualization. The compositions 1.40–3 package [79] functions helped to transform compositional data into isometric log-ratios, and the robCompositions 2.2.0 package [80] helped to robustly impute missing values. The replacement of zeros in tuber sizes was performed using zCompositions 1.3.3–1 package [75].
The data preprocessing continued in Python 3.8.1 software [81]. The data set used to model tuber SG was cleaned of outliers using the Python SciPy package version 1.4.1 [82]. We used a z-score i.e., a signed number of standard deviations by which the value of an observation or data point is above the mean value of what is being measured on the multivariate data set. The threshold of the score value was set at 3. The data were handled in Python using NumPy version 1.17.5 [83] and pandas 1.0.0 [84] libraries. The matplotlib 3.1.3 package [85] was used for data visualization.
All the quantitative variables were scaled and centered to obtain zero mean and unit variance. The categorical variables were encoded by declining their factors in binary columns, each of which was denoted by 1 to specify the membership of the group of the column, and 0 otherwise.
2.7 Training and testing data sets
Schemes for partitioning data into training and testing sets vary between studies. Fortin et al. [6] used 60% for training and 40% for testing. Parizeau [86] suggested 50%, 20% and 30% for training, validation and testing, respectively. Crisci et al. [87] used a 75%–25% split while Chantre et al. [88] used a 82%–18% partition for training and testing, respectively. In this paper, the corresponding total input/output data pairs were divided into 70% for training and 30% for testing and model accuracy assessment. Soman and Bobbie [89] found shorter learning times and highest accuracies with such split proportions. Moreover, self-contained and representative data collection is an important step to ensure the sufficiency and integrity of the training data [90]. Thereby, we partitioned the data set according to whether the tested element was N, P K, factorial design or another element (Mg, Ca). Thereafter, data were split at block level to avoid testing models on blocks comprising training samples.
2.8 Training models
2.8.1 Machine learning algorithms.
Four machine learning models were trained to derive an optimal model: k-nearest neighbors (KNN), random forest (RF), neural networks (NN) and Gaussian processes (GP). Model parameters were tuned using the random search with cross-validation method (RandomSearchCV) of the scikit-learn library version v0.22.1 [73].
2.8.2 Mitscherlich model.
We used a Mitscherlich-related 3D response surface for three variables inspired by Dodds et al. [91] in the multilevel modeling scheme of Parent et al. [7]. The Mitscherlich-related multilevel response surface was used as a predictive model for comparison with machine learning algorithms. The model was trained using the following equation:
(6)
where Y is the target variable i.e., marketable yield, A (for Asymptote) is the value of the target variable toward which the curve converges at increasing dosage, E (for Environment) describes the fertilizer-equivalent N (EN), P (EP) and K (EK) doses from the environment, and R (Rate) is the steepness of the curve relating each fertilizer equivalent environmental supply to Asymptote. The first-level parameters (A, E and R) were modeled as linear combinations of the predictors with random effect added to the intercept of the Asymptote. To make comparison with preceding models, the model performances were computed without any random effect (level = 0). The Mitscherlich multilevel model was fitted in R 3.6.2.
2.9 Evaluation of model performance
In all cases, the goodness-of-fit measure or predictive capacity of the developed models was based on the coefficient of determination (R2), the mean absolute error (MAE) and the root-mean-square error (RMSE). The R2 evaluates the proportion of variance in the target variable explained by the model as in Eq 7:
(7)
where yi is the observed target variable value,
is the predicted target variable value, and
is the mean of observed target variable. The best possible score of R2 is 1 (or 100%), but the score may also be negative when the model is arbitrarily worse. Higher R2 values indicate less error variance. A constant model that always predicts the expected value of y disregarding the input features would yield a R2 score of 0 [73]. Typically, values greater than 0.5 are considered acceptable [92]. The MAE is the average of the absolute differences between predictions and observations as in Eq 8:
(8)
The MAE attributes equal weight to individual errors and is less sensitive than R2 or RMSE to large prediction errors. The RMSE is the square root of the average of squared differences between predictions and observations computed as in Eq 9:
(9)
The RMSE attributes high weight to large errors due to squaring. Both MAE and RMSE indicate prediction errors in the units of variable of interest. Zero values indicate a perfect fit. Values less than half of the standard deviation of measured data were considered low [93]. The trained models were used to predict optimal N, P and K doses using some left-out experimental sites data.
Economic or agronomic optimal doses
The optimal nutrient input is the one returning yield of high-quality tubers [32], where profitability is maximized and the environmental footprint minimized [94, 95]. To compute the optimal economic N, P, K doses at a given site, all the predictive features, but not N, P and K doses, were held constant (fixed input data). The row of fixed input variables is stacked (reproduced) 1000 times to obtain a table with 1000 identical rows. We generated 1000 random N-P-K combinations of doses from uniform distributions of plausible doses varying between zero and 250 kg ha-1 for N, 110 kg ha-1 for P, 208 kg ha-1 for K. The table was altered in such a way that only N-P-K dosage changed following the random combinations.
A fertilizer cost was computed for each N-P-K triplet. Unit fertilizer costs were set at $1.20 CDN kg-1 for N, $1.10 CDN kg-1 for P and $0.90 CDN kg-1 for K. Tuber price was set at $250 CDN Mg-1 (1 Mg = 1000 kg) as in Parent et al. [7]. No environmental footprint effect was used because of a lack of reliable sources, although they could have been implemented as an increase in the cost of unit dosage. The difference between fertilizer cost and tuber revenue provided the marginal benefit from fertilizing. Economic optimal N-P-K dosage was reached where the net return was maximum. For tuber size and SG, an agronomic optimal N-P-K fertilizer dosage was deducted where the target variable reached a maximum.
Our results are reproducible by using the codes, data and package requirements provided in a GitHub repository at https://git.io/JvYxd.
2.11 Model interpretation data
We randomly selected four trials in the testing set for model interpretation (Table 4). The trials showed soil pH levels ranging between the adequate limits of 5.2 to 6.2 for potato crops according to the Centre de Référence en Agriculture et Agroalimentaire du Québec [96]. The phosphorus saturation environmental index (P/Al)Mehlich3 classified the sites at extremely low environmental risk for P trials (1.4% to 1.6%), medium risk for N trial (11.1%) and very high risk for K trial (28.7%). Soil potassium levels showed extremely low (71.5 mg kg-1) and very low (83.1 mg kg-1) levels for P trials, medium level for K trial and high level for N trial [97].
3. Results
Feature importance.
The feature importance, computed using the ExtratreesRegressor function, revealed that the N fertilizer dose was by far the most informative feature in the marketable yield prediction models, followed by soil type, air temperature, length of growing season and soil texture. To predict large-size tuber yield ([M, S | L] balance), the N dose remained the most informative feature, followed by soil type and texture. Tuber planting density exceeded other features for medium-size tubers ([S | M] balance), followed by N dose, soil elements (P and Al Mehlich-3) and soil type. For tuber SG, weather indices, i.e., Shannon diversity index, total rainfall and temperature, returned the highest scores (Fig 2). Preceding crops were not informative across target variables and were deleted before modeling.
3.2 Model tuning parameters
The tuning parameters varied within the models depending on target variables (Table 5). The parameters were tuned during modeling using python random search method with 5-fold cross-validation. For each target variable the corresponding training set was used.
The basic assumption in the KNN algorithm is that similar samples should return similar output (class or value) [98]. The two parameters to tune are the distance function which determines the similarity, and the optimal number of neighbors (similar known observations, k) to use for assigning the unknown output. The regressions were run with 19 nearest neighbors (k = 19) for yield, tuber size [M, S | L] balance and SG prediction models. For the [S | M] balance prediction model, k was set at 18 neighbors. With uniform weights, all the points in each neighborhood are weighted equally while with an inverse distance weight, closer neighbors have a greater influence than neighbors which are further away.
The parameters of a RF include mainly the number of decision trees in the forest and the number of features considered by each tree when splitting a node. The optimization procedure set the number of trees in the forests to 92, 12, 17 and 19 for yield, tuber size [M, S | L] balance, tuber size [S | M] balance and SG prediction models, respectively. The number of features considered for splitting at each leaf node were selected automatically.
A NN is characterized by its architecture, the training algorithm and the activation function. We used a multilayer perceptron in which neurons are organized in layers: an input layer where data are fed into the system, one or more hidden layers where the learning takes place, and an output layer where the decision/prediction is given [99]. We tuned the number of neurons for one hidden layer, and the activation function. A hyperbolic tangent activation function was selected for all the target variables prediction models. The tuned numbers of the hidden layer neurons were 100, 200, 100 and 200 for yield, tuber size [M, S | L] and [S | M] balances, and tuber SG respectively.
GPs are defined by a mean function m(x), a kernel or covariance function generating the covariance matrix k(xi,xj) between pairs of random outputs. A white noise (σ2) can optionally be added to the kernel [100]. The Matern kernel without white noise returned the lowest error for each target variables. Different noise levels were found to be optimal: 0.195 for marketable yield prediction model, 0.136 for tuber size [M, S | L] balance, 0.031 for [S | M] balance, and 0.932 for tuber SG. Because all the target variables were scaled and centered, mean functions m(x) were null.
3.3 Comparison between models
Model performance to predict marketable yield, tuber-size balances and tuber SG was assessed using R2, MAE, RMSE, response curves shapes and economic optimal N-P-K dosage predictions for each model. For all the models, the predictive accuracy level was not affected after discarding the preceding crop classes.
3.3.1 Goodness of fit.
The model scores at training and testing for the different target variables are presented in Fig 3. There was a large gap between training and testing scores. The difference was lower for the Mitscherlich model, which also showed the lowest coefficient of determination and the highest MAE and RMSE. Its R2 values were 0.35 and 0.37 at training and testing, respectively. The R2 values of machine learning algorithm-based models ranged between 0.78 (NN) and 0.92 (KNN) at training, and between 0.49 (NN) and 0.59 (RF) at testing in predicting marketable yield. With the large-size tuber yield balance [M, S | L], the R2 values ranged between 0.72 (KNN) and 0.87 (RF) at training, and between 0.55 (KNN) and 0.64 (GP) at testing. The medium- versus small-size tuber [S | M] balance and SG prediction models were the most informative, as shown by the highest R2 values at both training and testing. The R2 values ranged between 0.83 (NN) and 0.93 (KNN) at training and between 0.62 (RF) and 0.69 (KNN) at testing in predicting small-size tuber balance, while for SG, they ranged between 0.72 (KNN) and 0.94 (RF), then between 0.58 (KNN) and 0.67 (RF) at training and testing, respectively. In general, model MAE and RMSE were slightly higher when R2 values were low. The practically-similar magnitudes between RMSE and MAE meant that all the individual differences between predictions and observations had equal weight.
3.3.2 Response curves.
The marketable yield response curves are plotted in Fig 4 for each model with respect to the tested nutrient. There were disagreements between models. The Mitscherlich, NN and GP models generated smooth response curves, while the KNN and RF models generated stepped curves. The marketable yield was non-responsive to P application in the RF model. There was also no effect of K fertilization on the yield shown by the Mitscherlich and RF models. All models for the P trial somewhat underestimated marketable yield while response curves followed data for N.
The Mitscherlich model was excluded for the analysis of other target variables. Figs 5–7 show how each model fits responses of tuber size balances ([M, S | L] and [S | M]), and SG, respectively, with respect to N, P or K dosage. The NN and GP models generated smooth curves, while the KNN and RF models generated stepped curves. The [M, S | L] balance (Fig 5) showed increasing response to N fertilization across models, while response was globally poor for P and K. For the [S | M] balance, responses increased with increasing fertilizer doses, except for P and K trials data fitted with GP model (Fig 6). There was also poor response for K trial with SG (Fig 7). The SG response decreased from zero K levels and increased then decreased as P dosage increased. For N trials, SG slightly increased then decreased as N dose increased in the RF model, but was non-responsive with the other models.
3.3.3 Predictions.
The fertilizer recommendations and output predictions varied with the model and the target (Fig 8). The Mitscherlich and NN models predicted negligible economic optimal K doses (11 and 12 kg ha-1 respectively) in marketable yield prediction models, while the site Mehlich-3 K level was classified as very low (83.1 mg kg-1) according to local standards [96]. The RF model suggested the highest cumulative agronomic optimum fertilizer doses, although its outputs were not the highest. With the tuber size [M, S | L] balance prediction model, practicable doses were recommended only by the GP model for P (107 kg ha-1) and the RF model for K (185 kg ha-1), a scheme that is almost similar to the [S | M] balance prediction models. For this output, the GP model recommended only 17 kg P ha-1, while N and K were impracticable (1 kg ha-1 and 4 kg ha-1, respectively). Despite the extremely low environmental risk for P and the low level of soil K, some models predicted negligible doses of P and K mainly for tuber size balances.
3.4 Probabilistic predictions
In addition to point estimates shown by each model, the GP model can return posterior samples. Each sample is a function from which we can compute an economic optimal (marketable yield) or agronomic optimal (size balances or SG) fertilizer dose. Figs 9–12 present the results of 1000 generated samples for each target variable for the selected N, P and K trials. The average GP curve is shown as a black line, with its optimal dosage as a black dot. Five sampled GP curves are plotted as grey lines, with their optimal doses as grey dots. The probability distributions of the 1000 optimal doses are shown under the respective response curves. The figures show that predicted means of optimal dosage (black dot) did not always correspond to the most likely dosage (highest histogram bar) computed after running the sampling process. With yield prediction models (Fig 9), the mean economic optimal dose corresponded to the probabilistic prediction only for the N trial (250 kg N ha-1). For the tuber size [M, S | L] balance (Fig 10), the probabilistic prediction was equal to the mean GP prediction for P trial i.e., 87 kg P ha-1, while N and K trials returned equal predictions with the [S | M] balance prediction models with 0.0 kg ha-1 and 0.70 kg ha-1, respectively (Fig 11). For tuber SG prediction models, none of the probabilistic recommendation matched the mean GP optimal dosage (Fig 12).
4. Discussion
4.1 Selection of features
Fertilization trials were conducted over a time span of four decades (1979–2017). Although agricultural practices, soil conditions and analytical techniques have undergone substantial changes over time, Valkama et al. [101] has shown that the differences between old and recent experiments in yield responses are not statistically important. Moreover, where the analytical techniques for the same element differed, correlation equations were available to converting to one technique before data analysis. It is the case for soil carbon converted from Walkley-Black to Leco CNS (Eq 1), soil pH processed with CaCl2 converted to pH water (Eq 3), and P-Bray-2 converted to P-Mehlich-3 (Eq 4). Since there were similarities in experimental procedures and ability to uniformly convert measurement methods, we found that the data set could be used for machine learning.
The feature selection function selects a subset of variables for a learning algorithm to focus attention on the subset, especially when dealing with a large number of explanatory variables. The model-based approach incorporates the correlation structure between predictors and provides scores that indicate how useful or valuable each feature is in model building. Features with low or no importance could be removed without affecting model performance [73]. The preceding crops categories i.e., grassland, small grains, legumes, low-residue crops and high-residue crops, as categorized by Parent et al. [7], returned zero (for tuber SG) or faintest scores (for other target variables) and were thus removed despite a substantial body of literature on the advantages of crop rotation to the next crop. Nonetheless, Zebarth et al. [102] stated that the amount of nitrogen mineralized from organic matter during the growing season cannot be predicted accurately. Torma et al. [103] found that the N supplied by soil and crop residues (maize, potato, silage maize, soybean, sunflower, winter rape, winter wheat) ranged from 20 to 132 kg ha-1, while the phosphorus ranged from 2 to 24 kg ha-1 and potassium from 13 to 218 kg ha-1. Rangarajan [104] stated that nutrient availability to the next crop depends on whether the entire plant or only the root system is left in the field, and on how environmental conditions govern the rate of organic matter decomposition.
For marketable yield and tuber size balances prediction models, the N dose was the most informative feature, probably because of its close relation to photosynthesis [105]. Applied in excess, it delays tuber maturity, stimulates foliage production, increases plant susceptibility to diseases and reduces tuber SG [106]. Crop yield is also determined by environmental conditions driving the physical, chemical and biological reactions [107] that are important in empirical or mechanistic models [4, 7–9, 108].
The selection process retained soil profile characteristics and weather events as major features. Levy and Veilleux [109] reported the effects of air and soil temperatures on potato growth mechanisms and tuber yield. Leblanc et al. [69] pointed out soil drainage conditions for loamy-gleyed profiles (poorly-drained loam), sandy-gleyed profiles (poorly-drained sand) and sandy-podzolized profiles (well-drained). Soil compaction has a negative impact on root extension and water movement i.e., the reduction of nutrient uptake potential leading to a severe reduction of tuber yield [8]. Xu et al. [110] developed pedotransfer functions for potato grown on light-textured soils that could be useful in future models.
Dry matter production of potato crops is determined by the length of the growth cycle [111], which turned out to be a valuable feature. Camire et al. [112] stated that long growing season favors high-yielding late-season cultivars. Rex [113] found a close relationship between delayed harvest date and total yield, main-size marketable tubers and SG.
Seeding density was the most informative feature of the medium- to small-size tubers balance. Seeding density differentiates the number of tubers harvested, the weight of the tubers and the size distribution; higher plant densities promote higher yields in small and medium sizes [113–115].
The feature selection algorithm showed the impact of weather indices on tuber SG. The Shannon diversity index, total rainfall and temperature yielded the highest scores in a decreasing order. Al Soboh et al. [116] reviewed the factors affecting SG loss in crops of crisping potato and stressed that irrigation during early growth stages increases tuber dry matter content. Specific gravity could be reduced substantially if heavy rain occurred at the end of the season before harvest. They stated that potatoes grown during a period of increasing day length, temperature and light intensity produce tubers of high SG. In this study, GDD considered only daily mean temperatures higher than or equal to 5°C as used by Parent et al. [7]. Moulin et al. [117] used a baseline of 7°C and 30°C as upper limit. Moreover, the general trend of SG response curves with respect to fertilization supported the results of Belanger et al. [118], Zebarth et al. [19] and Laboski and Kelling [119]. Excessive application doses of N and K along with high soil levels of either nutrient may reduce SG. Phosphorous application may increase tuber solids when soil test P levels are low. Specific gravity was not influenced by the relatively high levels of N and P used by Dubetz and Bole [120], while Maier et al. [121] found contrasted effects between trials.
The relative importance of a variable in a model is related to its effect on the output through its gradient in the data set. Hence the predominance of N doses, and P and K doses to some extent, could have been caused by the origin of the data set, which is a collection of fertilizer trials, where large gradients of doses are found by design. This study did not address fertilizer source and timing of application. While Marouani et al. [122] found equivalency of ammonium nitrate (33.5% N), urea (46% N), NP fertilizer (33% N– 14% P2O5) and NPK fertilizer (27%N– 5% P2O5−5% K2O), Petropoulos et al. [123] found that the form of the fertilizer (ammonium sulfate, ammonium sulfate + zeolite, manure, slow release N fertilizer with urease inhibitor) and the cultivar (Kennebec and Spunta) may affect yield and chemical composition of potato tubers, affecting the end use of the product. Flis [124] reported that the peculiarities of potato cultivar, plant root structure, and timing of nutrient uptake impact on the selection of a site-specific fertilization regime. Trehan et al. [125] showed that some cultivars exhibit strong symptoms of N, P and K deficiencies compared to others. Potato cultivars may sustain leaf development and nutrient uptake while maintaining maximum tuber growth rates to reach higher final tuber yields with contrasting nutrient requirements [126]. Differential effects of cultivar and fertilizer on tuber yield have also been reported by Daoui et al. [127]. In a previous study, Coulibali et al. [128] found that genetic traits were not compelling to set apart clusters of cultivar based on N, P, K, Mg and Ca compositions of diagnostic leaves. The cultivar effect was thus excluded from the present study to keep models parsimonious. In our analysis, we focused on the gradients of N, P and K doses while keeping the other site-specific factors constant. Nevertheless, predictive features such as biotic factors (length of growing season, preceding crop, and seeding density), could also be predicted and optimized by the models with respect to tuber yield and quality.
4.2 Comparison of models
The performance of a predictive model is evaluated at testing or with unseen data set. The goodness of fit refers to how closely the model-predicted values match the true or observed values. Overfitting occurs where models perform well at training and badly at testing, while underfitting characterizes a model performing badly in both training and testing. Except for the Mitscherlich model, the model scores at testing showed discrepancies with training, reflecting problems of overfitting. The differences between R2 values were highest for the marketable yield prediction models (Fig 3), reaching 0.40 with KNN. Based on those gaps, one could argue that our models did not generalize well from training to testing data. However, we used a robust approach by comparing different algorithms, tuning the hyperparameters and tuning the models using 5-fold cross-validation. The R2 values at testing varied with respect to target variables but were practically similar between models. The models estimated the proportions of medium- and small-size tubers ([S | M] balance) more accurately than those of large-size tubers ([M, S | L] balance), probably because of the high number of zero weight values among large-size tubers (21%) compared to tubers of small (0.06%) and medium (0.4%) size, at the early stage of our analysis. Imputing zeros to deal with measures where the large size was completely absent [74] improved the prediction quality of this fraction. Except for the Mitscherlich model in predicting yield, the R2 values at testing were greater than 0.50 and could be considered acceptable according to Moriasi et al. [92] for complex systems.
The Mitscherlich model returned a lower coefficient of determination in tuber yield prediction and was discarded for quality analysis (tuber size balances and SG). The KNN, RF, NN and GP algorithms more accurately approximated the unknown functions explaining tuber yield given the predictive features. However, it was difficult to select the best model since scores were practically similar. Cerrato and Blackmer [129] and several others [130–134] described similar ambiguities using classical statistical models.
Figs 4–7 indicated that the calibration and generalization procedures returned smooth response curves for the Mitscherlich, NN and GP models for all the target variables. Except for the low R2 value of the former, the NN and GP models appeared more suitable for making inferences.
The prediction of optimum fertilizer doses and optimum or maximum outputs showed some disagreements for the case presented (Fig 8). There should be a single economic optimal dose or agronomic optimal dose at each site each year. Some models were more consistent than others in deriving optimal doses depending on the target variable. At extremely low predicted N, P or K doses, it could be challenging to manage the fertilization program at low economic risk for producers, who generally consider that the cost of over-fertilization is low compared to the cost of under-fertilization [37, 38]. The probabilistic prediction capability of Gaussian processes may help to determine credible dosage.
4.3 Probabilistic predictions
Sampling from a Gaussian process looks like rolling a die, returning a different function each time. Figs 9–12 showed only five possible functions for each target variable. By sampling the process numerous times, we generated a distribution of economic or agronomic optimal fertilizer doses as those shown by the histograms of the figures. The distributions often show frequent optima at the edges to the NPK grid, i.e., at dose of 0 or 250 kg ha-1. This phenomenon emerges from sampling continuously increasing or decreasing GP samples, which are more frequent when the sample is close to patterns in data where the response to fertilizer is flat. A zero-fertilizer recommendation could be interpreted as a soil sufficiently fertile to supply the crop, or a soil poorly responsive due to other constraints [135] such as pests and diseases [20, 136] or weed damage [137]. Nevertheless, we covered a wide range of factors that may impact potato crop growth and yield without falling into mechanistic modeling. Fertilizer doses more than 250 kg ha-1 may be excessive, since the maximum limits according to local standards are 175 kg ha-1 for N, 87 kg ha-1 for P and 199 kg ha-1 for K [96].
To face predictions falling at the edges, the optimal fertilizer dosage could be selected within a range of conditional expectation as processed by Khiari et al. [43] when defining P optimal dose for acid coarse-textured soils. The xth conditional expectation dose is the optimal dose that produces optimal yield x% of the time. For example, the 60th percentile would be the sampled optimal dose that produces optimal yield 60% of the time for a given site. Khiari et al. [43] assessed the 50th and 80th percentiles. The mean (50%), the median or any other percentile dose could be computed to support decision-making. For example, the mean GP and the probability distribution processes returned the upper bound of the simulation dosage (i.e., 250 kg N ha-1) as the economic optimal dose for the N trial with the marketable yield prediction model (Fig 9). The conditional expectation percentiles showed that a lower dose (i.e., 223 kg N ha-1) could be recommended, producing optimal yield 55% of the time. At the 60th percentile or more, the full dose i.e., 250 kg N ha-1 must be applied.
5. Conclusion
This study assessed machine learning techniques as an alternative for potato fertilizer recommendations at local scale usually handled by statistical models or meta-analysis at regional scale. A large collection of field trial data provided information to fit machine learning models with specific traits of cultivars, soil properties, weather indexes, and N, P and K fertilizers dosage used as predictive features. Five models, Mitscherlich, KNN, RF, NN and GP, were evaluated against optimal economic N, P and K doses derived from yield, or against optimal agronomic N, P and K doses derived from tuber size and SG. The models trained using machine learning algorithms outperformed the Mitscherlich tri-variate response predictive model. The marketable yield prediction coefficient (R2) varied between 0.49 and 0.59, while the Mitscherlich model returned 0.37. The large-size tuber balance was predicted with a coefficient varying between 0.55 and 0.64. The R2 varied between 0.60 and 0.69 in predicting medium-size tuber balance, and between 0.58 and 0.67 for SG. The N, P and K optimal doses could be recommended with respect to marketable yield, tuber size or SG using the NN and GP models, which appeared to be the most suitable for making inferences. Response surfaces were obtained by conditioning the models using N-P-K doses generated from uniform distributions under constant weather conditions, soil properties and land management factors. The GP model stood up by its probabilistic framework in risk estimation for potato fertilizer recommendation in Quebec conditions.
As large amounts of data are being assembled into observational data sets, machine learning models may surrogate statistical models in making fertilizer recommendations in the context of precision agriculture. To assess model performance under real-world situations, it was an effective strategy to combine historical weather data since accurate future weather data covering the growing season are unavailable. We also focused on using easily-available features collected from routine analyses as predictors instead of mechanistic processes models. Any biotic factor other than fertilizer, e.g., length of growing season or planting density, could be optimized with our model. Improvement will require more data from many more diverse environments and management scenarios. With more experiment data, the training and testing division could be performed at trial level to improve the model predictive ability. Moreover, since the data for this analysis were collected from small research plots, validation at production-scale fields is needed for decision making.
Supporting information
S1 Table. Description of the marketable yield modeling data set.
https://doi.org/10.1371/journal.pone.0230888.s001
(DOCX)
S2 Table. Description of the data sets used for modeling per trial type.
https://doi.org/10.1371/journal.pone.0230888.s002
(DOCX)
S3 Table. Classification of preceding crops [7].
https://doi.org/10.1371/journal.pone.0230888.s003
(DOCX)
S4 Table. Centroids of soil textural classes derived from the Quebec soils data set [57].
https://doi.org/10.1371/journal.pone.0230888.s004
(DOCX)
S5 Table. Quebec potato data set used for modeling.
‘Potato_df.csv’ file available in ‘data’ repository at https://git.io/JvYxd.
https://doi.org/10.1371/journal.pone.0230888.s005
(CSV)
References
- 1. Sinclair TR, Seligman N. Criteria for publishing papers on crop modeling. Field Crops Research. 2000;68(3):165–72.
- 2. Di Paola A, Valentini R, Santini M. An overview of available crop growth and yield models for studies and assessments in agriculture. Journal of the Science of Food and Agriculture. 2016;96(3):709–14. pmid:26227952
- 3.
Marshall B. Decision support systems in potato production. In: Vreugdenhil D, editor. Potato Biology and Biotechnology: Elsevier; 2007. p. 777–800.
- 4. Raymundo R, Asseng S, Cammarano D, Quiroz R. Potato, sweet potato, and yam models for climate change: a review. Field Crops Research. 2014;166:173–85.
- 5.
MacKerron DKL. Mathematical models of plant growth and development. In: Vreugdenhil D, editor. Potato Biology and Biotechnology: Elsevier; 2007. p. 753–76.
- 6. Fortin JG, Anctil F, Parent L-É, Bolinder MA. A neural network experiment on the site-specific simulation of potato tuber growth in Eastern Canada. Computers and Electronics in Agriculture. 2010;73(2):126–32.
- 7. Parent SE, Leblanc M, Parent AC, Coulibali Z, Parent LE. Site-specific multilevel modeling of potato response to nitrogen fertilization. Front Environ Sci. 2017;5(81):1–18.
- 8. Stalham MA, Allen EJ, Herry FX. Effects of soil compaction on potato growth and its removal by cultivation. Research review. 2005(R261):1–60.
- 9. Boiteau G, Goyer C, Rees HW, Zebarth BJ. Differentiation of potato ecosystems on the basis of relationships among physical, chemical and biological soil parameters. Canadian Journal of Soil Science. 2014;94(4):463–76.
- 10.
Firman DM, Allen EJ. Agronomic practices. In: Vreugdenhil D, editor. Potato Biology and Biotechnology: Elsevier; 2007. p. 719–38.
- 11. Neeteson JJ, Zwetsloot HJC. An analysis of the response of sugar beet and potatoes to fertilizer nitrogen and mineral soil mineral nitrogen. Netherlands Journal of Agricultural Science. 1989;37(2):129–41.
- 12. Li H, Parent LE, Tremblay G, Karam A. Potato response to crop sequence and nitrogen fertilization following sod breakup in a Gleyed Humo-Ferric Podzol. Canadian Journal of Plant Science. 1999;79(3):439–46.
- 13. Sincik M, Turan ZM, Goksoy AT. Responses of potato (Solanum tuberosum L.) to green manure cover crops and nitrogen fertilization rates. American Journal of Potato Research. 2008;85(2):150–8.
- 14. Sharifi M, Zebarth BJ, Porter GA, Burton DL, Grant CA. Soil mineralizable nitrogen and soil nitrogen supply under two-year potato rotations. Plant and Soil. 2009;320(1–2):267–79.
- 15. Zebarth BJ, Arsenault WJ, Moorehead S, Kunelius HT, Sharifi M. Italian ryegrass management effects on nitrogen supply to a subsequent potato crop. Agronomy Journal. 2009;101(6):1573–80.
- 16. Zebarth BJ, Scott P, Sharifi M. Effect of straw and fertilizer nitrogen management for spring barley on soil nitrogen supply to a subsequent potato crop. American Journal of Potato Research. 2009;86(3):209–17.
- 17. Sands PJ, Hackett C, Nix HA. A model of the development and bulking of potatoes (Solanum Tuberosum L.) I. Derivation from well-managed field crops. Field Crops Research. 1979;2:309–31.
- 18. Cambouris AN, St Luce M, Zebarth BJ, Ziadi N, Grant CA, Perron I. Potato response to nitrogen sources and rates in an irrigated sandy soil. Agronomy Journal. 2016;108(1):391–401.
- 19. Zebarth BJ, Leclerc Y, Moreau G, Botha E. Rate and timing of nitrogen fertilization of Russet Burbank potato: Yield and processing quality. Canadian Journal of Plant Science. 2004;84(3):855–63.
- 20.
Raman KV, Radcliffe EB. The potato crop: the scientific basis for improvement. 2nd ed. London: Chapman and Hall; 1992.
- 21.
Gregory PJ, Simmonds LP. Water relations and growth of potatoes. The potato crop: Springer; 1992. p. 214–46.
- 22. Kooman PL, Fahem M, Tegera P, Haverkort AJ. Effects of climate on different potato genotypes. 2. Dry matter allocation and duration of the growth cycle. European Journal of Agronomy. 1996;5(3–4):207–17.
- 23. Fortin JG, Anctil F, Parent LE, Bolinder MA. Comparison of empirical daily surface incoming solar radiation models. Agricultural and Forest Meteorology. 2008;148(8–9):1332–40.
- 24. Haverkort AJ, Struik PC. Yield levels of potato crops: Recent achievements and future prospects. Field Crops Research. 2015;182:76–85.
- 25. Dessureault-Rompre J, Zebarth BJ, Burton DL, Georgallas A. Predicting soil nitrogen supply from soil properties. Canadian Journal of Soil Science. 2015;95(1):63–75.
- 26. Dessureault-Rompre J, Zebarth BJ, Burton DL, Georgallas A, Sharifi M, Porter GA, et al. Prediction of soil nitrogen supply in potato fields using soil temperature and water content information. Soil Science Society of America Journal. 2012;76(3):936–49.
- 27.
Barber SA. Soil nutrient bioavailability: a mechanistic approach: John Wiley & Sons; 1995.
- 28.
White PJ, Wheatley RE, Hammond JP, Zhang K. Minerals, soils and roots. Potato Biology and Biotechnology: Elsevier; 2007. p. 739–52.
- 29. Bolinder MA, Katterer T, Poeplau C, Borjesson G, Parent LE. Net primary productivity and below-ground crop residue inputs for root crops: Potato (Solanum tuberosum L.) and sugar beet (Beta vulgaris L.). Canadian Journal of Soil Science. 2015;95(2):87–93.
- 30. Diriba SG. Water-nutrients interaction: exploring the effects of water as a central role for availability & use efficiency of nutrients by shallow rooted vegetable crops–a review J Agric Crops. 2017;3(10):78–93.
- 31.
Dampney P, Wale S, Sinclair A. Potash requirements of potatoes. Review. Project R443, Report 2011/4, Potato Council, Agric. Hortic. Dev. Board, Kenilworth, Warwickshire, UK. 2011.
- 32. Hüwing H. Düngung sichert ertrag und qualität. Land & Fort. 2012:12, 22nd March 2012, 36–38.
- 33.
Gianquinto G, Bona S. The significance of trends in concentrations of total nitrogen and nitrogenous compounds. In: HA J., MDK L., editors. Management of nitrogen and water in potato production. Wageningen2000. p. 35–54.
- 34.
Bohl WH, Johnson SB. Commercial potato production in North America. 2nd ed. Ann Arbor, USA: The Potato Association of America Handbook; 2010. 90 p.
- 35.
Kirkman MA. Global markets for processed potato products. Potato Biology and Biotechnology, advances and perspectives: Elsevier; 2007. p. 27–44.
- 36. Hatfield JL, Walthall CL. Meeting global food needs: realizing the potential via genetics x environment x management interactions. Agronomy Journal. 2015;107(4):1215–26.
- 37. Rajsic P, Weersink A. Do farmers waste fertilizer? A comparison of ex post optimal nitrogen rates and ex ante recommendations by model, site and year. Agricultural Systems. 2008;97(1–2):56–67.
- 38. Parent LE. Nouveaux outils de gestion de l'azote dans la production de la pomme de terre. CRAAQ, Colloque sur la pomme de terre 2014. 2014.
- 39. Peralta JM, Stockle CO. Dynamics of nitrate leaching under irrigated potato rotation in Washington State: a long-term simulation study. Agriculture, ecosystems & environment. 2002;88(1):23–34.
- 40. Jiang YF, Zebarth B, Love J. Long-term simulations of nitrate leaching from potato production systems in Prince Edward Island, Canada. Nutrient Cycling in Agroecosystems. 2011;91(3):307–25.
- 41. Zebarth BJ, Danielescu S, Nyiraneza J, Ryan MC, Jiang YF, Grimmett M, et al. Controls on nitrate loading and implications for BMPs under intensive potato production systems in Prince Edward Island, Canada. Ground Water Monitoring and Remediation. 2015;35(1):30–42.
- 42. Zebarth BJ, Ryan MC, Graham G, Forge TA, Neilsen D. Groundwater monitoring to support development of BMPs for groundwater protection: the Abbotsford-Sumas aquifer case study. Ground Water Monitoring and Remediation. 2015;35(1):82–96.
- 43. Khiari L, Parent LE, Pellerin A, Alimi ARA, Tremblay C, Simard RR, et al. An agri-environmental phosphorus saturation index for acid coarse-textured soils. Journal of Environmental Quality. 2000;29(5):1561–7.
- 44. Pellerin A, Parent LE, Tremblay C, Fortin J, Tremblay G, Landry CP, et al. Agri-environmental models using Mehlich-III soil phosphorus saturation index for corn in Quebec. Canadian journal of soil science. 2006;86(5):897–910.
- 45. Pellerin A, Parent LE, Fortin J, Tremblay C, Khiari L, Giroux M. Environmental Mehlich-III soil phosphorus saturation indices for Quebec acid to near neutral mineral soils varying in texture and genesis. Canadian Journal of Soil Science. 2006;86(4):711–23.
- 46. Valkama E, Salo T, Esala M, Turtola E. Nitrogen balances and yields of spring cereals as affected by nitrogen fertilization in northern conditions: A meta-analysis. Agriculture Ecosystems & Environment. 2013;164:1–13.
- 47.
Hofman G, Salomez J. Management of nitrogen and water in potato production. In: Haverkort AJ, MacKerron DKL, editors. Pers, Wageningen2000. p. 121–35.
- 48. Kyveryga PM, Blackmer AM, Morris TF. Disaggregating model bias and variability when calculating economic optimum rates of nitrogen fertilization for corn. Agronomy Journal. 2007;99(4):1048–56.
- 49. Kyveryga PM, Blackmer AM, Morris TF. Alternative benchmarks for economically optimal rates of nitrogen fertilization for corn. Agronomy Journal. 2007;99(4):1057–65.
- 50.
Zhang D, Jeffery JPT. Advances in machine learning applications in software engineering: IGI Global; 2007.
- 51. Qin ZS, Myers DB, Ransom CJ, Kitchen NR, Liang SZ, Camberato JJ, et al. Application of machine learning methodologies for predicting corn economic optimal nitrogen rate. Agronomy Journal. 2018;110(6):2596–607.
- 52.
Dahnke WC, Olson RA. Soil test correlation, calibration, and recommendation In: Westerman RL, editor. Soil Testing and Plant Analysis. 3rd ed. Madison, WI: Soil Science Society of America; 1990. p. 45–71.
- 53. Kahle D, Wickham H. ggmap: Spatial Visualization with ggplot2. The R Journal. 2013;5(5):144–61.
- 54.
CFIA. Potato plants characteristics, maturity. Canadian Food Inspection Agency: Canadian Food Inspection Agency; 2015 [Available from: http://www.inspection.gc.ca/plants/potatoes/characteristics/eng/1326490397702/1326490477981#mature.
- 55.
Gee GW, Bauder JW. Particle-size analysis. In: Klute A, editor. Methods of soil analysis: Part 1—Physical and mineralogical methods (Agronomy M): Soil Science Society of America, Madison, Wisconsin; 1986. p. 383–411.
- 56. Yang XL, Zhang QY, Li XZ, Jia XX, Wei XR, Shao MA. Determination of soil texture by laser diffraction method. Soil Science Society of America Journal. 2015;79(6):1556–66.
- 57. Tabi M, Tardif L, Carrier D, Laflamme G, Rompré M. Inventaire des problèmes de dégradation des sols agricoles du Québec: rapport synthèse. Entente auxiliaire Canada-Québec sur le développement agro-alimentaire Québec Service de recherche en sols. 1990.
- 58. Nelson DW, Sommers LE. Total carbon, organic carbon, and organic matter. Methods of soil analysis Part 2 Chemical and microbiological properties1982. p. 539–79.
- 59. Grewal KS, Buchan GD, Sherlock RR. A comparison of three methods of organic carbon determination in some New Zealand soils. Journal of Soil Science. 1991;42(2):251–7.
- 60. Egozcue JJ, Pawlowsky-Glahn V. Groups of parts and their balances in compositional data analysis. Mathematical Geology. 2005;37(7):795–828.
- 61. Morton JT, Sanders J, Quinn RA, McDonald D, Gonzalez A, Vázquez-Baeza Y, et al. Balance trees reveal microbial niche differentiation. mSystems. 2017;2(1):e00162–16. pmid:28144630
- 62. Parent SE, Parent LE, Rozane DE, Natale W. Plant ionome diagnosis using sound balances: case study with mango (Mangifera Indica). Frontiers in plant science. 2013;4:1–12. pmid:23346092
- 63.
Hendershot WH, Lalande H, Duquette M. Soil reaction and exchangeable acidity. In: Carter MR, Gregorich EG, editors. Soil sampling and methods of analysis. 2. 2nd ed1993. p. 201–6.
- 64. Cescas MP. Table interprétative de la mesure du pH des sols du Québec par quatre méthodes différentes. Naturaliste canadien. 1978;105:259–63.
- 65.
Tran TS, Simard RR. Mehlich-III extractable elements. In: Carter MR, editor. Soil Sampling and Methods of Analysis: Canadian Society of Soil Science, CRC Press, Boca Raton, FL; 1993. p. 43–9.
- 66. Michaelson GJ, Ping CL, Mitchell GA. Correlation of Mehlich 3, Bray 1, and ammonium acetate extractable P, K, Ca, and Mg for Alaska agricultural soils. Communications in Soil Science and Plant Analysis. 1987;18(9):1003–15.
- 67. Murphy J, Riley JP. A modified single solution method for the determination of phosphate in natural waters. Analytica chimica acta. 1962;27:31–6.
- 68.
Soil Classification Working Group. Canadian system of soil classification, 3rd Ed. Canadian system of soil classification. 1998:188.
- 69.
Leblanc MA, Gagné G, Parent LE. Numerical clustering of soil series using morphological profile attributes for potato. In: Hartemink AE, Minasny B, editors. Digital Soil Morphometrics. New York, NY: Springer; 2016.
- 70. Piikki K, Wetterlind J, Soderstrom M, Stenberg B. Three-dimensional digital soil mapping of agricultural fields by integration of multiple proximal sensor data obtained from different sensing methods. Precision Agriculture. 2015;16(1):29–45.
- 71. Hutchinson MF, McKenney DW, Lawrence K, Pedlar JH, Hopkinson RF, Milewska E, et al. Development and testing of Canada-wide interpolated spatial models of daily minimum-maximum temperature and precipitation for 1961–2003. Journal of Applied Meteorology and Climatology. 2009;48(4):725–41.
- 72. Tremblay N, Bouroubi YM, Bélec C, Mullen RW, Kitchen NR, Thomason WE, et al. Corn response to nitrogen is influenced by soil texture and weather. Agronomy Journal. 2012;104(6):1658–71.
- 73. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in Python. 12(Oct). 2011:2825−30.
- 74. Martin-Fernandez JA, Barcelo-Vidal C, Pawlowsky-Glahn V. Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Mathematical Geology. 2003;35(3):253–78.
- 75. Palarea-Albaladejo J, Martin-Fernandez JA. zCompositions—R Package for multivariate imputation of left-censored data under a compositional approach. Chemometrics and Intelligent Laboratory Systems. 2015;143:85–96.
- 76. Young DA, Voisey PW, Dixon N. A specific gravity calculator for potatoes. American Journal of Potato Research. 1964;41(12):401–5.
- 77.
R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2019.
- 78. Wickham H. Tidyverse: easily install and load the 'Tidyverse'. R package version 1.2.1. 2017.
- 79. Van den Boogaart KG, Raimon T, Bren M. compositions: compositional data analysis. R package version 1.40–1. 2014.
- 80.
Templ M, Hron K, Filzmoser P. robCompositions: an R-package for robust statistical analysis of compositional data. In: Pawlowsky-Glahn V, Buccianti A, editors. Compositional Data Analysis Theory and Applications. Chichester (UK): John Wiley & Sons; 2011. p. 341–55.
- 81.
Van Rossum G, Drake Jr FL. Python tutorial, technical report CS R9526: Centrum voor Wiskunde en Informatica (CWI) Amsterdam; 1995.
- 82. Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0—Fundamental Algorithms for Scientific Computing in Python. arXiv preprint arXiv:190710121. 2019.
- 83. Van Der Walt S, Colbert SC, Varoquaux G. The NumPy array: a structure for efficient numerical computation. Computing in Science & Engineering. 2011;13(2):22–30.
- 84.
McKinney W, editor Data structures for statistical computing in python. Proceedings of the 9th Python in Science Conference; 2010: Austin, TX.
- 85. Hunter JD. Matplotlib: A 2D graphics environment. Computing In Science Engineering. 2007;9(3):90–5.
- 86.
Parizeau M. Réseaux de neurones. University Laval. 2006:27–51.
- 87. Crisci C, Ghattas B, Perera G. A review of supervised machine learning algorithms and their applications to ecological data. Ecological Modelling. 2012;240:113–22.
- 88. Chantre GR, Blanco AM, Lodovichi MV, Bandoni AJ, Sabbatini MR, Lopez RL, et al. Modeling Avena fatua seedling emergence dynamics: An artificial neural network approach. Computers and Electronics in Agriculture. 2012;88:95–102.
- 89. Soman T, Bobbie PO. Classification of arrhythmia using machine learning techniques. WSEAS Transactions on computers. 2005;4(6):548–52.
- 90. Yuan J, Liu CL, Li YM, Zeng QB, Zha XF. Gaussian processes based bivariate control parameters optimization of variable-rate granular fertilizer applicator. Computers and Electronics in Agriculture. 2010;70(1):33–41.
- 91. Dodds KG, Sinclair AG, Morrison JD. A bivariate response surface for growth data. Fertilizer research. 1995;45(2):117–22.
- 92. Moriasi DN, Arnold JG, Van Liew MW, Bingner RL, Harmel RD, Veith TL. Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Transactions of the Asabe. 2007;50(3):885–900.
- 93. Singh J, Knapp HV, Arnold JG, Demissie M. Hydrological modeling of the Iroquois river watershed using HSPF and SWAT 1. Journal of the American Water Resources Association. 2005;41(2):343–60.
- 94. Inman D, Khosla R, Westfall DG, Reich R. Nitrogen uptake across site specific management zones in irrigated corn production systems. Agronomy Journal. 2005;97(1):169–76.
- 95. Fortin JG, Morais A, Anctil F, Parent LE. SVMLEACH—NK POTATO: A simple software tool to simulate nitrate and potassium co-leaching under potato crop. Computers and Electronics in Agriculture. 2015;110:259–66.
- 96.
CRAAQ. Guide de référence en fertilisation. 2ème ed: Centre de Référence en Agriculture et Agroalimentaire du Québec; 2010.
- 97.
Pellerin A. Les grilles de références. In: Parent LE, Gagné G, editors. Guide de référence en fertilisation. 2è ed2010. p. 359–473.
- 98. Mucherino A, Papajorgji P, Pardalos PM. A survey of data mining techniques applied to agriculture. Operational Research. 2009;9(2):121–40.
- 99. Liakos KG, Busato P, Moshou D, Pearson S, Bochtis D. Machine learning in agriculture: a review. Sensors. 2018;18(8):1–29.
- 100.
Rasmussen CE, Williams CKI. Gaussian processes for machine learning. The MIT Press, Cambridge, MA, USA. 2006;38:715–9.
- 101. Valkama E, Uusitalo R, Ylivainio K, Virkajarvi P, Turtola E. Phosphorus fertilization: a meta-analysis of 80 years of research in Finland. Agriculture Ecosystems & Environment. 2009;130(3–4):75–85.
- 102.
Zebarth BJ, Karemangingo C, Scott P, Savoie D, Moreau G. Nitrogen management for potato: general fertilizer recommendations. New-Brunswick Ministry of Agriculture, Fisheries and Aquaculture, Fredericton, NB, Canada. 2007.
- 103. Torma S, Vilcek J, Losak T, Kuzel S, Martensson A. Residual plant nutrients in crop residues—an important resource. Acta Agriculturae Scandinavica Section B-Soil and Plant Science. 2018;68(4):358–66.
- 104.
Rangarajan A. Crop rotation effects on soil fertility and plant nutrition. In: Mohler CL, Johnson SE, editors. Crop Rotation on Organic Farms. University of Maryland: NRAES; 2009.
- 105. Andrews M, Raven JA, Lea PJ. Do plants need nitrate? The mechanisms by which nitrogen form affects plants. Annals of Applied Biology. 2013;163(2):174–99.
- 106.
Hawkesford M, Horst W, Kichey T, Lambers H, Schjoerring J, Møller IS, et al. Chapter 6—Functions of Macronutrients. Marschner's mineral nutrition of higher plants. Third ed. San Diego: Academic Press; 2012. p. 135–89.
- 107.
Feddes RA. Water, heat and crop growth. Wageningen: Veenman; 1971.
- 108.
Griffin TS, Johnson BS, Ritchie JT. A simulation model for potato growth and development: Substor-potato Version 2.0: Michigan State University, Department of Crop and Soil Sciences; 1993.
- 109. Levy D, Veilleux RE. Adaptation of potato to high temperatures and salinity-a review. American Journal of Potato Research. 2007;84(6):487–506.
- 110. Xu Y, Jimenez MA, Parent SE, Leblanc M, Ziadi N, Parent LE. Compaction of coarse-textured soils: balance models across mineral and organic compositions. Frontiers in Ecology and Evolution. 2017;5.
- 111.
Struik PC. Above-ground and below-ground plant development. In: Vreugdenhil D, editor. Potato biology and biotechnology: advances and perspectives: Amsterdam: Elsevier, New York; 2007. p. 219–36.
- 112. Camire ME, Kubow S, Donnelly DJ. Potatoes and human health. Critical Reviews in Food Science and Nutrition. 2009;49(10):823–40. pmid:19960391
- 113. Rex BL. The effect of in-row seed piece spacing and harvest date of the tuber yield and processing quality of Conestoga potatoes in southern Manitoba. Canadian Journal of Plant Science. 1991;71(1):289–96.
- 114.
Ellissèche D. Aspects physiologiques de la croissance et du développement. In: Rousselle P, Robert Y, Crosnier JC, editors. La pomme de terre: production, amélioration, ennemis et maladies, utilisations. PARIS: INRA; 1996. p. 71–124.
- 115. Bussan AJ, Mitchell PD, Copas ME, Drilias MJ. Evaluation of the effect of density on potato yield and tuber size distribution. Crop Science. 2007;47(6):2462–72.
- 116. Al Soboh G, Sully R, Andreata S. Factors affecting specific gravity loss in crisping potato crops in Koo Wee Rup, Victoria. 2002.
- 117. Moulin AP, Cohen Y, Alchanatis V, Tremblay N, Volkmar K. Yield response of potatoes to variable nitrogen management by landform element and in relation to petiole nitrogen—A case study. Canadian Journal of Plant Science. 2012;92(4):771–81.
- 118. Belanger G, Walsh JR, Richards JE, Milburn PH, Ziadi N. Nitrogen fertilization and irrigation affects tuber characteristics of two potato cultivars. American Journal of Potato Research. 2002;79(4):269–79.
- 119. Laboski CAM, Kelling KA. Influence of fertilizer management and soil fertility on tuber specific gravity: a review. American Journal of Potato Research. 2007;84(4):283–90.
- 120. Dubetz S, Bole JB. Effect of nitrogen, phosphorus, and potassium fertilizers on yield components and specific gravity of potatoes. American Potato Journal. 1975;52(12):399–405.
- 121. Maier NA, Dahlenburg AP, Williams CMJ. Effects of nitrogen, phosphorus, and potassium on yield, specific gravity, crisp colour, and tuber chemical composition of potato (Solanum tuberosum L.) cv. Kennebec. Australian Journal of Experimental Agriculture. 1994;34(6):813–24.
- 122. Marouani A, Behi O, Ben Ammar H, Sahli A, Ben Jeddi F. Effect of various sources of nitrogen fertilizer on yield and tubers nitrogen accumulation of Spunta potato cultivar (Solanum tuberosum L.). J of New Sciences, Agriculture and Biotechnology. 2015;13(1):399–404.
- 123. Petropoulos SA, Fernandes Â, Polyzos N, Antoniadis V, Barros L, CFR Ferreira I. The impact of fertilization regime on the crop performance and chemical composition of potato (Solanum tuberosum L.) cultivated in central Greece. Agronomy. 2020;10(4):474–91.
- 124. Flis S. 4R practices for fertilizer management in potatoes. Crops & Soils. 2019;52(2):8–10.
- 125. Trehan SP, Roy SK, Sharma RC. Potato variety differences in nutrient deficiency symptoms and responses to NPK. Better Crops International Potash and Phosphate Institute of Canada (PPIC). 2001;15:18–21.
- 126. Kleinkopf GE, Westermann DT, Dwelle RB. Dry matter production and nitrogen utilization by six potato cultivars. Agronomy Journal. 1981;73(5):799–802.
- 127. Daoui K, Mrabet R, Benbouaza A, Achbani EH. Responsiveness of different potato (Solanum tuberosum) varieties to phosphorus fertilizer. Procedia Engineering. 2014;83:344–7.
- 128. Coulibali Z, Cambouris AN, Parent SE. Cultivar-specific nutritional status of potato (Solanum tuberosum L.) crops. Plos One. 2020;15(3):1–15.
- 129. Cerrato ME, Blackmer AM. Comparison of models for describing corn yield response to nitrogen-fertilizer. Agronomy Journal. 1990;82(1):138–43.
- 130. Angus JF, Bowden JW, Keating BA. Modeling nutrient responses in the field. Plant and Soil. 1993;155:57–66.
- 131. Belanger G, Walsh JR, Richards JE, Milburn PH, Ziadi N. Comparison of three statistical models describing potato yield response to nitrogen fertilizer. Agronomy Journal. 2000;92(5):902–8.
- 132. Bock BR, Sikora FJ. Modified-quadratic/plateau model for describing plant-responses to fertilizer. Soil Science Society of America Journal. 1990;54(6):1784–9.
- 133. Bullock DG, Bullock DS. Quadratic and quadratic-plus-plateau models for predicting optimal nitrogen rate of corn: A comparison. Agronomy Journal. 1994;86(1):191–5.
- 134. Isfan D, Zizka J, Davignon A, Deschenes M. Relationships between nitrogen rate, plant nitrogen concentration, yield, and residual soil nitrate-nitrogen in silage corn. Communications in Soil Science and Plant Analysis. 1995;26(15–16):2531–57.
- 135. Vanlauwe B, Kihara J, Chivenge P, Pypers P, Coe R, Six J. Agronomic use efficiency of N fertilizer in maize-based systems in sub-Saharan Africa within the context of integrated soil fertility management. Plant and Soil. 2011;339(1–2):35–50.
- 136.
Rich AE. Potato diseases. New York: Academic Press; 1983. xiv, 238 p p.
- 137. Mondani F, Golzardi F, Ahmadvand G, Ghorbani R, Moradi R. Influence of weed competition on potato growth, production and radiation use efficiency. Notulae Scientia Biologicae. 2011;3:42–52.