Impact of morphological traits and irrigation levels on fresh herbage yield of sorghum x sudangrass hybrid: Modelling data mining techniques

Halit Tutar; Senol Celik; Hasan Er; Erdal Gönülal

doi:10.1371/journal.pone.0318230

Abstract

In this study, the effect of morphological traits on fresh herbage yield of sorghum x sudangrass hybrid plant grown in Konya province, which is the largest cereal production area in Turkey, was analyzed with some data mining methods. For this purpose, Artificial Neural Networks (ANN), Automatic Linear Model (ALM), Random Forest (RF) Algorithm and Multivariate Adaptive Regression Spline (MARS) Algorithm were used, and the prediction performances of these methods were compared. Plant height of 251.22 cm, stem diameter of 7.03 mm, fresh herbage yield of 8010.69 kg da^-1, crude protein ratio of 9.09%, acid detergent fiber 33.23%, neutral detergent fiber 57.44%, acid detergent lignin 7.43%, dry matter digestibility of 63.01%, dry matter intake 2.11%, and relative feed value of 103.02 were the descriptive statistical values that were computed. Model fit statistics, including coefficient of determination (R²), adjusted R², root of mean square error (RMSE), mean absolute percentage error (MAPE), standard deviation ratio (SD ratio), Mean Absolution Error (MAE) and Relative Absolution Error (RAE), were used to evaluate the prediction abilities of the fitted models. The MARS method was shown to be the best model for describing fresh herbage yield, with the lowest values of RMSE, MAPE, SD ratio, MAE and RAE (137.7, 1.488, 0.072, 109.718 and 0.017, respectively), as well as the highest R² value (0.995) and adjusted R² value (0.991). The experimental results show that the MARS algorithm is the most suitable model for predicting fresh herbage yield in sorghum x sudangrass hybrid, providing a good alternative to other data mining algorithms.

Citation: Tutar H, Celik S, Er H, Gönülal E (2025) Impact of morphological traits and irrigation levels on fresh herbage yield of sorghum x sudangrass hybrid: Modelling data mining techniques. PLoS ONE 20(2): e0318230. https://doi.org/10.1371/journal.pone.0318230

Editor: Agung Irawan, Universitas Sebelas Maret, INDONESIA

Received: October 14, 2024; Accepted: January 14, 2025; Published: February 5, 2025

Copyright: © 2025 Tutar et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All datasets generated for this study can be found in the article Figshare (https://doi.org/10.6084/m9.figshare.28027733).

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Due to global climate change, temperature and atmospheric CO₂ levels are rising, and droughts are becoming widespread [1]. Drought is the primary environmental stressor that restricts crop production in arid and semi-arid regions [2]. Severe drought can significantly reduce crop yield and quality, leading to scarcity [3]. Therefore, more drought-resistant plant varieties and water-saving cultivation systems should be used to better cope with changing climate conditions [4]. In drought-prone environments, sweet sorghum and sorghum x sudangrass (SSG) hybrids can be considered as alternatives to maize and wheat [5]. Sorghum (Sorghum bicolor L. Moench), as a forage crop, is a highly drought-resistant member of the grain family when compared to other crops in this family [6,7].

Sorghum is the fifth most important cereal crop worldwide, serving as a staple food for over 500 million people across more than 30 countries. Globally, it ranks as the fourth most significant grain [8,9]. Sorghum is a rich source of protein and fiber [10], and beyond its nutritional role, it is also utilized as a raw material for bioethanol production [11]. Despite its reputation for resilience, drought stress remains a major challenge, adversely affecting both productivity and nutritional quality in key production areas. Thus, understanding how drought impacts sorghum and how the plant responds is critical for improving its resistance to such stress [12].

Statistical models are widely used in forecasting studies due to their simplicity and low computational cost requirements [13,14]. Data mining algorithms offer a powerful alternative to traditional regression methods, addressing some limitations [15]. These methods have gained widespread popularity in Lately, for predicting plant yield and various traits, as well as classifying plants by species [16,17]. Data mining involves extracting valuable and meaningful information from large datasets and is a relatively new interdisciplinary field that focuses on data analysis and knowledge discovery [18,19]. This approach combines multiple techniques, such as statistical analysis, data visualization, neural networks, knowledge discovery, pattern recognition, and database management [20].

Data mining is an approach that offers practical and effective solutions for productivity and sustainability in the agricultural sector. Data mining techniques used for agricultural yield prediction provide farmers with the opportunity to make more accurate and timely decisions by analyzing multiple factors such as climate changes, soil properties, and plant growth dynamics [21]. These techniques contribute to the efficient use of agricultural inputs by optimizing production processes. In addition, forecasts obtained through data mining can also guide agricultural policy making and help to balance agricultural supply and demand. Thus, data mining contributes to both economic and environmental sustainability by strengthening decision support systems in the agricultural sector [22].

Various studies have been conducted on field crops and other plants using data mining methods. Numerous studies have demonstrated the potential of data mining algorithms for predicting agricultural production systems [23–25]. In studies by [26,27], the relationship between certain climate parameters and maize yield was examined. [28] aimed to evaluate the effectiveness of textural features in distinguishing seeds produced under normal or drought conditions using discriminant models. [29] researched biomass yield prediction for sorghum plants using remote and proximal sensing-based model algorithms.

In this study, the factors affecting fresh herbage yield of sorghum x sudangrass (SSG) hybrid were identified using data mining algorithms based on various plant traits. A comparative study was conducted by evaluating and comparing the performances of different algorithms such as Artificial Neural Network (ANN), Multi-Layer Perceptron Artificial Neural Networks (MLP), Random Forest (RF) algorithm, Multivariate Adaptive Regression Spline (MARS), and Automatic Linear Modeling (ALM).

2. Material and methods

2.1. Research area and datasets

This research was conducted during the 2021 and 2022 growing seasons in Konya province, Türkiye, located between 37° 51´ N and 32° 33´ E. Konya is situated at an elevation of approximately 1006 meters above sea level. The study area has a semi-arid climate, with most of the annual rainfall occurring in the winter months. The recorded rainfall during the vegetation period of the SSG hybrid plant was determined to be 44 and 44.4 mm, respectively. The min., max., and average temperatures recorded in Konya during the 2021 and 2022 growing seasons align with the long-term average temperatures (Fig 1).

Download:

Fig 1. Average meteorological parameters during the plant’s growing period.

https://doi.org/10.1371/journal.pone.0318230.g001

The dataset for the study was collected from the field between 2021 and 2022. The experimental dataset consists of parameters obtained under different irrigation levels (100% (I₁₀₀), 75% (I₇₅), 50% (I₅₀), and 25% (I²⁵) replenishment of water depleted from field capacity). The research includes 32 dataset parameters: Plant height (PH), stem diameter (SD), crude protein ratio (CPR), acid detergent fiber (ADF), neutral detergent fiber (NDF), acid detergent lignin (ADL), dry matter digestibility (DMD), dry matter intake (DMI), and relative feed value (RFV) as input parameters. Fresh herbage yield is the output parameter.

Soil analysis revealed that the study area had a clay-loam texture, free of salt, calcareous, and low in organic matter content (Table 1).

Download:

Table 1. Results of soil analysis from experimental fields.

https://doi.org/10.1371/journal.pone.0318230.t001

2.2. Experimental details

The field experiment followed a randomized block design with four replications. Each plot consisted of four rows, with a row spacing of 45 cm and a row length of 5 meters. Prior to sowing, 10 kg da^-1 of NPK compound fertilizer was applied, and an additional 22 kg da^-1 of nitrogen fertilizer was administered when the plants reached a height of 40–50 cm. The treatments were based on four different irrigation levels, where 100% (I₁₀₀), 75% (I₇₅), 50% (I₅₀) and 25% (I₂₅) of the water consumed from the field capacity was supplemented at 100% (I₁₀₀), 75% (I₇₅), 50% (I₅₀) and 25% (I₂₅), respectively, to fill the moisture deficit in the 0–90 cm soil layer, which was determined as the effective root depth of SSG hybrid. In 2021, the irrigation water applied for I₁₀₀, I₇₅, I₅₀, and I₂₅ was 510 mm, 395 mm, 280 mm, and 165 mm, respectively, while in 2022, these values were 480 mm, 370 mm, 260 mm, and 150 mm, respectively. Sowing took place in the first week of June, with harvesting completed by the last week of September. After mowing the plants in each plot, the fresh herbage yield was calculated on a per decare basis. For detailed analysis, ten plants were randomly selected from the harvested samples. The crude protein (CP), acid detergent fiber (ADF), and neutral detergent fiber (NDF) ratios were measured using a NIRS device. Additionally, dry matter digestibility (DMD), dry matter intake (DMI), and relative feed value (RFV) were calculated based on established formulas [30,31].

(1)

(2)

(3)

2.3. Data mining

Tree-structured classification and regression are alternative ways to classification and regression that do not rely on normalcy assumptions or user-specified model statements, as do certain earlier methods like discriminant analysis and ordinary least squares regression [32]. The data taken from various domains inherently consists of extremely correlated observations, which coincides with the exponential increase in the magnitude of the data that must be evaluated due to technology improvements. This phenomenon, known as multicollinearity, affects the performance of both statistics and machine learning methods. Statistical models presented as a potential solution to this problem have not received enough evaluation in the literature. As a result, tackling the multicollinearity problem requires a thorough comparison of statistical and machine learning models [33]. The literature proposes a variety of strategies for dealing with multicollinearity. Although the first recommended solution is to collect additional data, this may not always be feasible owing to cost constraints or even impossible. The second option is to use non-least squares techniques (such as ridge, liu, lasso, and elastic net regression). The third step is to modify the model by adding new or groups of variables based on the multi-correlated variables [34]. Lastly, as more popular solutions for multicollinearity or other issues (such as outliers) in the field of machine learning, a variety of preprocessing techniques are used, including as centering, scaling, normalization, and standardization [33]. Three strategies were put out by [35] to address the multicollinearity issue in the data mining domain: employ ridge regression, train a two-layer neural network with the instructor, and then successively modify a single-layer neural network.

Alternative models (such as the Random Forest algorithm, Multivariate Adaptive Regression Spline, Artificial Neural Network, and Automatic Linear Modeling) will be the main focus of this investigation.

2.3.1. Artificial Neural Network (ANN).

The artificial neural network (ANN) is modeled to replicate the functioning of the human brain and nervous system. These neural network models are mathematical systems, inspired by biological neural networks, designed to mimic the processes of animal brains [36].

2.3.2. Multi-Layer Perceptron Artificial Neural Networks (MLP).

The Artificial Neural Network (ANN) used in this study is a feed-forward Multi-Layer Perceptron (MLP) model, comprising an input layer, a single hidden layer, and an output layer. In this structure, the neurons in the input layer pass signals to the hidden layer, where the neurons are interconnected through weighted connections. Both the hidden and output layers use the same activation function. The MLP was trained for 1000 epochs.

The activation function is crucial because it controls the level of activation and the strength of the output signal of an artificial neuron. Non-linear activation functions are often used as they enable better performance in function approximation [37].

The hyperbolic tangent activation function can be expressed as follows [38]: (4)

2.3.3. Random Forest (RF) algorithm.

For both regression and classification tasks, the Random Forest (RF) algorithm is a powerful ensemble learning technique [39]. This method addresses the overfitting problem inherent in decision trees [40] by combining Breiman’s Bagging algorithm [41] with decision trees. RF demonstrates strong generalization capabilities and robustness to unseen data, while also performing exceptionally well on training data. By randomly selecting features from the candidate feature set, RF can handle large-scale, high-dimensional datasets and reduces the impact of feature correlations on the model. Additionally, RF enhances model stability by averaging the results of multiple decision trees, thereby mitigating the influence of noise [42].

The creation of a decision tree typically involves three steps: feature selection, decision tree generation, and pruning. The classification error rate is often estimated using the information gain, information gain ratio, or Gini index, which are commonly used as feature selection criteria. In addition to synthesizing multiple input features to enhance the model, the random forest approach can also assess the relative importance of input features. By combining the decision outcomes of several decision trees, the random forest method, as an ensemble learning algorithm, can filter out anomalous data, paving the way for input feature optimization and improved model accuracy. Decision trees, the foundational components of random forests, possess strong generalization capabilities that allow them to address regression problems as well as classification tasks [43].

Furthermore, RF is exceptionally fast, highly resistant to overfitting, and allows users to generate as many trees as desired [44].

The expected values for unknown instances x are calculated by averaging the predictions from all the regression trees, as shown below: (5)

Bagging involves repeatedly drawing B bootstrap samples from the training data and fits t_b trees based on the Gini impurity for each sample.

In this study, a Random Forest (RF) model was trained using an ensemble of 500 regression trees, with all available factors as predictors and total tree height as the response variable [45]. The RF model was implemented with the R package "ranger," utilizing default hyperparameter settings. The "ranger" package was selected for its faster data analysis capabilities and more efficient memory usage compared to other widely used random forest packages in R [46].

2.3.4. Multivariate adaptive regression spline (MARS).

In 1991, Friedman introduced the multivariate adaptive regression spline (MARS) technique [47,48]. Typically, a regression pair is represented as (X_i, Y_i), where X_i corresponds to one or more independent variables and Y_i is the dependent variable. In the MARS model, each independent variable is associated with one or more split points (t_i). For X_i ≥ t_i, the resulting equation is known as the right-side basis function (BF), while for X_i < t_i, the equation is called the left-side basis function. These left and right basis functions (spline functions) establish the relationship between X_i and the dependent variable Y_i. The following equations present the mathematical expressions for the right and left basis functions [49]. (6) (7) where q (≥0) is the power to which the splines are raised and which defines the degree of smoothness of the outcome function estimate.

The MARS model may be expressed as follows: [50].

(8)

In this context,f (x) denotes the MARS model, and B_m(x) represents the basis function. The index of the basis function is indicated by m, while the total number of basic functions in the MARS model is also represented by m. The coefficient corresponding to the m-th basis function is labeled as a_m, and x ∈ Rⁿ signifies the vector of predictor variables. The MARS model constructs the basis functions in a product form.

(9)

In this context, b_km represents the k-th univariate function within B_m(x), and k_m indicates the total number of univariate terms multiplied in B_m(x). When k_m>1, k_m refers to the degree of the interaction term. On the other hand, if k_m = 1, the basis function is univariate. Each basis function contains refraction points, which serve as the knots for that function. The simplest form of b_km consists of truncated linear functions, structured as follows: (10) or (11) where the location t is called the knot of the basis function.

The MARS approach selects models using the GCV criterion [51]. The GCV criteria is used to assess the degree of fit and accuracy of the model [52]. GCV coefficient [53,54], as shown in the equation below. (12) where n is the quantity of sample data, C is the cost-complexity measure of the new basic functions, and M(λ) is the number of regression models created by the MARS model [55].

2.3.5. Automatic Linear Modeling (ALM).

Automatic Linear Modeling (ALM) enables researchers to automatically select the optimal subset of predictors. To enhance data fit, ALM directly transforms predictors. When conducting ALM analysis, SPSS utilizes techniques such as time rescaling, outlier reduction, category merging, and other methods. While ALM can be applied to small and medium-sized datasets, it is particularly advantageous for large and complex datasets. The benefits of ALM become more evident, especially when dealing with multiple estimators [56–58]. Additionally, the ALM technique is highly effective for selecting and categorizing variables. (13) Where Yi is dependent or outcome variable, X_i’s predictor or independent variables, β₀ is constant, β_n is the slope coefficients for each predictor and ε is the error term [59,60].

The goodness of fit statistics are summarized as follows: [61].

R² (Coefficient of Determination): Evaluates how well the model can predict future data and explain the variance within the observed dataset, playing a key role in verifying prediction accuracy.

RMSE (Root Mean Squared Error): Represents the average size of prediction errors, offering insight into the overall error magnitude.

MAE (Mean Absolute Error): Reflects the average of the absolute differences between predicted and actual values, making it easier to interpret error without considering its direction.

Both RMSE and MAE provide important measures of prediction error size, essential for evaluating forecast precision.

RAE (Relative Absolute Error): Provides a normalized error metric by comparing performance against a simple baseline model, making it useful for comparing different models.

Below are the formulas for these goodness of fit statistics [62–64].

Coefficient of Determination: (14)

Adjusted Coefficient of Determination: (15)

Root-mean-square error (RMSE) given by the following formula: (16)

Mean Absolute Error (17)

Relative Absolute Error (18)

In the methods, the following other model evaluation criteria were also calculated [65–68].

Mean absolute percentage error (MAPE); (19)

Standard Deviation Ratio; (20)

The variance inflation factor (VIF) determined for multicollinearity is the inverse of the correlation matrix and is calculated as follows.

(21)

The coefficient of determination, , is determined by regressing x_i over the other p − 1 variables. The degree of multicollinearity increases with the value of VIF based on each variable. Generally speaking, VIF values higher than 10 indicate a weak model’s capacity for estimating and generalization [34].

The IBM SPSS Automated Linear Modeling (ALM) and Artificial Neural Network (ANN) analyses were conducted using the SPSS software program (version 25, SPSS Inc., Chicago, IL, USA). The MARS and Random Forest algorithms were implemented using the R Studio program [69].

3. Results

Table 2 provides summary statistics for the plant characteristics of the SSG hybrid grown in Konya province, Türkiye, during the 2021 and 2022 growing seasons. The data showed normal distribution according to the Kolmogorov-Smirnov test and p>0.05. Also, the datasets are investigated for multicollinearity by using the diagnostic methods and results are given in Table 2. According to the results of fresh herbage yield data, it can be said that there is a problem of multicollinearity in the data due to the presence of variables (PH, SD, ADF, NDF, ADL, DMD and DMI) below the tolerance value of 0.1 and above the variance inflation factor (VIF) value of 10. The conclusions that there are strong correlations between the variables are further supported by the correlation analysis results shown in Fig 2.

Download:

Fig 2. Correlation coefficients between plant characteristics.

https://doi.org/10.1371/journal.pone.0318230.g002

Download:

Table 2. Summary statistics and multi-collinearity diagnostics for the plant characteristics of the SSG hybrid.

https://doi.org/10.1371/journal.pone.0318230.t002

Fig 2 presents the correlation coefficients for the plant traits of the SSG hybrid. Furthermore, Fig 3 shows the outcomes of the Principal Component Analysis (PCA) conducted on the plant characteristics of the SSG hybrid.

Download:

Fig 3. Results of Principal Component Analysis for plant characteristics.

https://doi.org/10.1371/journal.pone.0318230.g003

A separate test data set was used to evaluate the models’ generalization and prediction abilities after they were trained using the crossvalidation method due to the issue of multicollinearity. The results of the study show that the data mining (ANN, ALM, RF, and MARS) algorithms perform well for the data set with multicollinearity problem of statistical models. When multicollinearity is present, prior to using MARS, it was lowered the dimensionality of the input variables using principal components in order to enhance MARS’s capacity to handle multicollinearity. It was found that the resulting model enhances the accuracy of MARS in the multicollinear situation while maintaining interpretability using data on the features of sorghum plants. The results obtained are summarized as follows.

3.1. Result of Artificial Neural Network (ANN)

The Multilayer Perceptron artificial neural network model was selected due to its suitability for the data. The optimization approach used was the scaled conjugate gradient, with 70% of the data allocated for training and 30% for testing the network. Fig 4 illustrates the connections within the ANN.

Download:

Fig 4. Structure of the artificial neural network.

https://doi.org/10.1371/journal.pone.0318230.g004

Fig 4 illustrates the artificial neural network design, where the identity function is used as the activation function in the output layer, and the hyperbolic tangent is used as the activation function in the hidden layer. Table 3 displays the parameter estimations for the ANN model.

Download:

Table 3. Parameter estimates of ANN.

https://doi.org/10.1371/journal.pone.0318230.t003

Table 3 shows the connection weights between each neuron as follows:

The connection weight values between the input layer variables PH, SD, CPR, ADF, NDF, ADL, DMD, DMI, and RFV and H(1:1), the first neuron in the hidden layer, are -0.775, 0.194, 0.333, 0.188, -0.715, 0.293, 0.118, 0.336, and 0.848, respectively.

In the ANN model, the learning Sum of Squares Error (SSE) value was 0.783, and the relative error was 0.065. The test’s SSE was 0.051, with a relative error of 0.019. The percentage relevance of the independent variables is displayed in Table 4.

Download:

Table 4. Significance of independent variables.

https://doi.org/10.1371/journal.pone.0318230.t004

Table 4 shows that the impact values of the independent factors on fresh herbage yield (FHY) in the output layer are as follows: RFV is 0.284, NDF is 0.239, PH is 0.198, DMI is 0.091, CPR is 0.077, ADL is 0.039, SD is 0.025, ADF is 0.024, and DMD is 0.022. Fig 5 displays a percentage column graph illustrating the impact of these factors on the prediction.

Download:

Fig 5. Importance of variables.

https://doi.org/10.1371/journal.pone.0318230.g005

Fig 5 illustrates that in this model, RFV has the largest impact on the fresh herbage yield (FHY) of SSG hybrid, with a 100% influence from the terminal. Additionally, NDF is the second most significant independent variable, with an impact rate of 84.2%, while PH has an effect of 69.6%. DMD had the least impact on fresh herbage production from the terminals, with a rate of 7.6%. Other variables, including DMI (32.2%), CPR (27.3%), ADL (13.8%), SD (8.9%), and ADF, have an effect of 8.5%.

3.2. Automatic Linear Modeling (ALM) results

Table 5 shows the results of the model prediction coefficients, and the significance achieved when ALM was used.

Download:

Table 5. Coefficients determined for the fresh herbage production (FHY) target variable.

https://doi.org/10.1371/journal.pone.0318230.t005

The ALM method was used to assess the predictability of the mean FHY, with the key contributing factors summarized in Table 5. Notably, the SD variable was not statistically significant in the ALM analysis. Table 5 also provides parameter estimates for the overall model, showing the individual impact of each factor on the target variable. The coefficients illustrate the relationship between each predictor and the mean fresh herbage yield, assuming the other variables remain constant. The importance of each predictor, as identified by the ALM method, is also highlighted in Table 5, with standardized values summing to one. The model reached an accuracy of 89.5%, calculated by multiplying the adjusted R² by 100. The predictor importance graph (Fig 6) further demonstrates the relative significance of each factor, with RFV (0.524), PH (0.392), and SD (0.084) emerging as the most influential, with RFV being the key predictor of FHY.

Download:

Fig 6. Predictor importance in the estimating model for FHY.

https://doi.org/10.1371/journal.pone.0318230.g006

The discarded scatter plot of FHY displays predictor values on the y-axis and observed values on the x-axis, indicating that a larger percentage of sample locations are on the 45-degree line, suggesting that the model is relatively accurate (Fig 7). Fig 8 shows that FHY has a positive association with PH and SD, but a negative correlation with RFV.

Download:

Fig 7. Excluded scatterplot of observed versus predicted values for FHY.

https://doi.org/10.1371/journal.pone.0318230.g007

Download:

Fig 8. Coefficient values for FHY.

https://doi.org/10.1371/journal.pone.0318230.g008

3.3. Random Forest (RF) algorithm results

The findings of the RF algorithm are summarized here. Fig 9 illustrates the random forest trees designed to minimize the error value.

Download:

Fig 9. RF Algorithm error rate of the model.

https://doi.org/10.1371/journal.pone.0318230.g009

The RF method was employed to build the model, using fresh herbage yield (FHY) as the dependent variable. The RF model incorporates various plant linear characteristics as predictors, including PH, SD, CPR, ADF, NDF, ADL, DMD, DMI, and RFV. The random forest algorithm used 500 trees. The model explains 88.87% of the variation in the dependent variable, with MSE = 160,067, RMSE = 400, MAE = 321, and Bias = 170. Neutral detergent fiber (NDF) is the most significant factor influencing the model, followed by PH and RFV (as shown in Table 6 and Fig 10).

Download:

Fig 10. Significance graph of the variables of the RF model.

https://doi.org/10.1371/journal.pone.0318230.g010

Download:

Table 6. Predictor importance in RF.

https://doi.org/10.1371/journal.pone.0318230.t006

It is also possible to estimate fresh herbage production by varying the values of the characteristic features that represent the independent variables in the equation generated by the Random Forest (RF) technique. For example, consider the following calculation: PH = 225, SD = 7, CPR = 8.5, ADF = 31, NDF = 55, ADL = 8.5, DMD = 66, DMI = 2.1, RFV = 111, resulting in FHY = 7539.289 kg.

3.4. MARS algorithm results

Table 7 presents the model estimation coefficients of the MARS approach used for predicting fresh herbage yield (FHY).

Download:

Table 7. Outcomes of the MARS algorithm in predicting the fresh herbage yield (FHY) of the SSG hybrid.

https://doi.org/10.1371/journal.pone.0318230.t007

Below is a detailed equation that was developed by considering the interaction effects of the model’s coefficients.

FHY = 9380–234 * max(0, 231—PH) + 194 * max(0, PH—231)

- 504 * max(0, SD—8.2) - 7613 * max(0, 56.5—NDF) + 773 * max(0, RPV—95.3)

+ 1776 * max(0, 97—RPV) - 1072 * max(0, RPV—97) + 91.7 * max(0, 231—PH)* DMI

- 87.9 * max(0, PH—231) * DMI—174 * SD * max(0, 97—RPV)

- 58.8 * max(0, NDF—56.5) * ADL + 123 * max(0, 56.5—NDF) * DMD

By adjusting the values of the characteristic features that represent the independent variables in the equation produced by the MARS algorithm, it is possible to estimate the fresh herbage yield. For instance, consider the following calculation: PH = 225, SD = 7, CPR = 8.5, ADF = 31, NDF = 55, ADL = 8.5, DMD = 66, DMI = 2.1, RFV = 111, resulting in FHY = 7053.170 kg.

Fig 11 illustrates the proportional relevance of the factors used in the MARS algorithm to forecast fresh herbage yield.

Download:

Fig 11. Graph of relative importance.

https://doi.org/10.1371/journal.pone.0318230.g011

Fig 12 presents a graph comparing the observed values with the estimated values generated by the MARS algorithm.

Download:

Fig 12. The agreement between the predicted and actual fresh herbage yield (FHY) values.

https://doi.org/10.1371/journal.pone.0318230.g012

The study found a bilateral interaction between the variables. Fig 13 illustrates the three-dimensional surface graph of the analysis findings, highlighting the connection between two predictor variables and the objective variable.

Download:

Fig 13. Model surface plots in MARS algorithm.

https://doi.org/10.1371/journal.pone.0318230.g013

When the goodness of fit statistics of the MARS algorithm were evaluated, it was calculated as R2 = 0.995, Adj. R2 = 0.991, MSE = 18961, RMSE = 137.7, MAPE = 1.488, MAE = 109.718, RAE = 0.017 and SD Ratio = 0.072.

The goodness-of-fit criterion results for all methods are displayed in Table 8. Each algorithm produced quite accurate FHY forecasts. In terms of expected accuracy, MARS was found to be the best method, followed by RF, ANN, and then ALM.

Download:

Table 8. Predictive performance of MARS, RF, ANN and ALM types.

https://doi.org/10.1371/journal.pone.0318230.t008

4. Discussion

In the study conducted by [70], plant height exhibited a significant positive correlation with total dry matter yield, SD, CPY, CPC, DMD, and ME, while showing a significant negative correlation with panicle length, number of tillers, and ADF. Stem diameter also showed a strong positive correlation with plant height, crude protein yield, panicle length, ether extract, and total dry matter yield. [16] explored the use of the Random Forest (RF) data mining technique to analyze the relationship between various climate factors and maize yield, finding an average R-squared of 28% and an explained R-squared of 55%. Similarly, [26] applied the RF algorithm to a dataset from 1980 to 2016, creating a sorghum yield prediction model with an R-squared value of 0.71. In another study by [25], several algorithms, including Support Vector Regression, RF, Extreme Learning Machine, ANN, and DNN, were tested for wheat yield prediction in two provinces. The Deep Neural Network (DNN) performed best in the first province with an RMSE of 0.04 q/ha and an R-squared of 0.96, while RF outperformed other models in the second province with an RMSE of 0.05 q/ha. [27] applied data mining techniques like BG, DT, RF, and ANN to predict maize yield using variables such as annual average temperature, precipitation, rainy days, frosty days, and hot days. Their study found that the ANN algorithm achieved the highest accuracy (r = 0.98, relative absolute error = 21.87%, root relative squared error = 20.44%, and RMSE = 423.23), highlighting ANN’s effectiveness in yield prediction. Similarly, [71] identified temperature and precipitation as key factors affecting maize yield using comparable models. [72] also applied Random Forest regression and found that maximum temperature and precipitation were critical climate factors influencing maize yield.

The parameters influencing the fresh grass production of pea plants grown in Turkey were investigated using multivariate adaptive regression spline (MARS), Chi-square automatic interaction detection (CHAID), classification and regression tree (CART), and artificial neural network (ANN) models. The MARS approach was shown to be the most effective model for measuring plant fresh herbage yield, with the highest R² and adjusted R² values (0.998 and 0.986) and the lowest values of RMSE, MAPE, SD ratio, AIC, and AICc (10.499, 0.7365, 0.047, 268, and 688, respectively) [73].

The CHAID, CART, MARS, and Bagging MARS algorithms were used in the study by [74] to analyze the parameters impacting fresh herbage yield in sorghum-sudangrass hybrids. The best algorithms for predicting the dependent variable were determined to be MARS, Bagging MARS, CART, and CHAID, in that order. The MARS algorithm was shown to be the most accurate predictor of crop yield.

Another study employed multiple regression analysis (MLR), artificial neural networks (ANNs), and the Multivariate Adaptive Regression Splines (MARS) method to estimate the stem weight of alfalfa plants. In the estimation of stem weight, the ANN, MARS, and MLR correlation coefficients (r) were 0.801, 0.999, and 0.753 for the Gea clover variety, and 0.781, 0.998, and 0561 for the Basbag variety. The Gea variety’s R² in the same models was 0.642, 0.998, and 0.567, while the Basbag variety’s was 0.610, 0.997, and 0.315. The Gea variety’s MSE values were 0.023, 0.008, and 2.498, whereas the Basbag variety’s were 0.151, 0.017, and 4.641. Compared to ANNs and MLR, the MARS algorithm produced a more accurate forecast. MARS > ANN > MLR was the sequence of algorithms utilized to improve prediction results for alfalfa plant stem weight estimate [75].

[76] used donkey biometric data to examine the predicted performance of many machine learning algorithms, including CHAID, Random Forest, ALM, MARS, and Bagging MARS. With the lowest RMSE, MAPE, and SD ratio values (2.173, 1.615, and 0.291, respectively) and the greatest R2 value (0.916), the MARS algorithm was determined to be the most effective model for determining donkey body length. From best to worst, the algorithms’ performance results were as follows: MARS > Bagging MARS > Random Forest > CHAID > ALM.

To predict egg weight from certain egg quality criteria in chickens, the following methods were employed: random forest (RF), multivariate adaptive regression spline (MARS), categorization and regression trees (CART), bagging MARS, chi-square automated interaction detector (CHAID), and exhaustive CHAID. The values of the correlation coefficient (r) varied from 0.99999 (MARS and Bagging MARS) to 0.957 (CHAID). MARS and bagging MARS algorithms had the lowest RMSE (0.001), whereas CHAID had the most (2.154). MARS ≈ Bagging MARS > RF > CART > Exhaustive CHAID > CHAID was determined to be the algorithm’s supremacy order in terms of prediction accuracy [77].

The results of comparative data mining algorithms used in different plants and livestock data are consistent with the results of this study in terms of the suitability of the methods and especially the MARS algorithm giving the best results.

5. Conclusion

This study assessed the performance of the ANN, ALM, RF, and MARS methods in predicting the FHY of the SSG hybrid. The key findings of the study are as follows:

In the SSG hybrid, seven factors (PH, SD, ADF, NDF, DMD, DMI, and RFV) significantly influence fresh herbage yield. Among these, the most impactful factors are PH, RFV, and NDF.

The RF method accurately predicts the FHY of the SSG hybrid, accounting for 89.50% of the variation. In comparison, the ALM method achieved an accuracy of 88.87%, slightly lower than the RF method.

The factors that determine the fresh herbage yield in SSG hybrid are ranked in order of significance as follows: RFV, PH, and SD for the ALM technique; RFV, NDF, and PH for the ANN method; PH, NDF, and RFV for the MARS algorithm; and NDF, PH, and RFV for the RF algorithm. When analyzing the importance ranking of plant traits affecting fresh herbage yield in SSG hybrid, the RF, ANN, and MARS methods showed similar characteristics in terms of variable importance ranking. The only difference was the order of the top three variables (RFV, NDF, and PH). In the ALM method, however, the SD variable was included, whereas NDF was not among the top three.

The performance results, from worst to best, are as follows: ALM < ANN < RF < MARS.

The results of the study show that statistical models, especially the MARS algorithm, perform better among data mining algorithms for the data set with multicollinearity problem. As a result, selecting the right approach is crucial because multicollinearity is a common issue in practical applications. Thus, it can be said that data mining techniques and statistical models are strong instruments that provide efficient answers to the multicollinearity issue.

It has been shown that data mining methods are extremely effective in predicting variables and uncovering the relationships between plant traits and characteristics in agricultural field data.

References

1. Abreha KB, Enyew M, Carlsson AS, Vetukuri RR, Feyissa T, et al., Sorghum in dryland: morphological, physiological, and molecular responses of sorghum under drought stress. Planta. 2022; 255:1–23. https://doi.org/10.1007/s00425-021-03799-7
- View Article
- Google Scholar
2. Chen X, Wu Q, Gao Y, Zhang J, Wang Y, et al., The role of deep roots in sorghum yield production under drought conditions. Agronomy. 2020; 10(4), 611. https://doi.org/10.3390/agronomy10040611
- View Article
- Google Scholar
3. Kamali S, Mehraban A. Effects of Nitroxin and arbuscular mycorrhizal fungi on the agro-physiological traits and grain yield of sorghum (Sorghum bicolor L.) under drought stress conditions. Plos one. 2020; 15(12), e0243824. https://doi.org/10.1371/journal.pone.0243824
- View Article
- Google Scholar
4. Schittenhelm S, Schroetter S. Comparison of drought tolerance of maize, sweet sorghum and sorghum‐sudangrass hybrids. Journal of Agronomy and Crop Science. 2014; 200(1), 46–53. https://doi.org/10.1111/jac.12039
- View Article
- Google Scholar
5. Yahaya MA, Shimelis H. Drought stress in sorghum: Mitigation strategies, breeding methods and technologies—A review. Journal of Agronomy and Crop Science. 2022; 208(2), 127–142. https://doi.org/10.1111/jac.12573
- View Article
- Google Scholar
6. Satyavathi CT, Solanki RK, Kakani RK, Bharadwaj C, Singhal T, et al., Genomics assisted breeding for abiotic stress tolerance in millets. Genomics Assisted Breeding of Crops for Abiotic Stress Tolerance. 2019; Vol. II, 241–255. https://doi.org/10.1007/978-3-319-99573-1_13
- View Article
- Google Scholar
7. Kazemi E, Ganjali HR, Mehraban A, Ghasemi A. Yield and biochemical properties of grain sorghum (Sorghum bicolor L. Moench) affected by nano-fertilizer under field drought stress. Cereal Research Communications. 2021; 1–9. https://doi.org/10.1007/s42976-021-00198-2
- View Article
- Google Scholar
8. Jabereldar AA, El Naim AM, Abdalla AA, Dagash YM. Effect of water stress on yield and water use efficiency of sorghum (Sorghum bicolor L. Moench) in semi-arid environment. International Journal of Agriculture and Forestry. 2017; 7(1), 1–6. https://doi.org/10.5923/j.ijaf.20170701.01
- View Article
- Google Scholar
9. Naoura G, Sawadogo N, Atchozo EA, Emendack Y, Hassan MA, et al., Assessment of agro-morphological variability of dry-season sorghum cultivars in Chad as novel sources of drought tolerance. Scientific Reports. 2019; 9(1), 19581. pmid:31863053
- View Article
- PubMed/NCBI
- Google Scholar
10. Impa SM., Perumal R, Bean SR, Sunoj VJ, Jagadish SK. Water deficit and heat stress induced alterations in grain physico-chemical characteristics and micronutrient composition in field grown grain sorghum. Journal of Cereal Science. 2019; 86, 124–131. https://doi.org/10.1016/j.jcs.2019.01.013
- View Article
- Google Scholar
11. Mathur S, Umakanth AV, Tonapi VA, Sharma R, Sharma MK. Sweet sorghum as biofuel feedstock: recent advances and available resources. Biotechnology for biofuels. 2017; 10, 1–19. https://doi.org/10.1186/s13068-017-0834-9
- View Article
- Google Scholar
12. Sarshad A, Talei D, Torabi M, Rafiei F, Nejatkhah P. Morphological and biochemical responses of Sorghum bicolor (L.) Moench under drought stress. SN Applied Sciences. 2021; 3(1), 81. https://doi.org/10.1007/s42452-020-03977-4
- View Article
- Google Scholar
13. Belayneh A, Adamowski J, Khalil B, Quilty J. Coupling machine learning methods with wavelet transforms and the bootstrap and boosting ensemble approaches for drought prediction. Atmospheric research. 2016; 172, 37–47. https://doi.org/10.1016/j.atmosres.2015.12.017
- View Article
- Google Scholar
14. Xu L, Chen N, Zhang X, Chen Z. An evaluation of statistical, NMME and hybrid models for drought prediction in China. Journal of hydrology. 2018; 566, 235–249. https://doi.org/10.1016/j.jhydrol.2018.09.020
- View Article
- Google Scholar
15. Jeong J, Hong T, Ji C, Kim J, Lee M, Jeong K. Development of an integrated energy benchmark for a multi-family housing complex using district heating. Applied energy. 2016; 179, 1048–1061. https://doi.org/10.1016/j.apenergy.2016.07.086
- View Article
- Google Scholar
16. Biswas B, Singh J. Assessing yield-weather relationships in kharif maize under Punjab conditions using data mining method. J. Agrometeorology. 2020; 22, 104–111.
- View Article
- Google Scholar
17. Ropelewska E. Effect of boiling on classification performance of potatoes determined by computer vision. European Food Research and Technology. 2021; 247(4), 807–817. https://doi.org/10.1007/s00217-020-03664-z
- View Article
- Google Scholar
18. Van Klompenburg T, Kassahun A, Catal C. Crop yield prediction using machine learning: A systematic literature review. Computers and Electronics in Agriculture. 2020; 177, 105709. https://doi.org/10.1016/j.compag.2020.105709
- View Article
- Google Scholar
19. Mohammed S, Elbeltagi A, Bashir B, Alsafadi K, Alsilibe F, et al., A comparative analysis of data mining techniques for agricultural and hydrological drought prediction in the eastern Mediterranean. Computers and Electronics in Agriculture. 2022; 197, 106925. https://doi.org/10.1016/j.compag.2022.106925
- View Article
- Google Scholar
20. Oguntunde PG, Lischeid G, Dietrich O. Relationship between rice yield and climate variables in southwest Nigeria using multiple linear regression and support vector machine analysis. International journal of biometeorology. 2018; 62(3), 459–469. pmid:29032432
- View Article
- PubMed/NCBI
- Google Scholar
21. Majumdar J, Naraseeyappa S, Ankalaki S. Analysis of agriculture data using data mining techniques: application of big data. Journal of Big Data. 2017; 4(1), 20. https://doi.org/10.1186/s40537-017-0077-4
- View Article
- Google Scholar
22. Pimentel BS, Gonzalez ES, Barbosa GN. Decision-support models for sustainable mining networks: fundamentals and challenges. Journal of Cleaner Production. 2016; 112, 2145–2157. https://doi.org/10.1016/j.jclepro.2015.09.023
- View Article
- Google Scholar
23. Chlingaryan A, Sukkarieh S, Whelan B. Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: A review. Computers and electronics in agriculture. 2018; 151, 61–69. https://doi.org/10.1016/j.compag.2018.05.012
- View Article
- Google Scholar
24. Paudel D, Boogaard H, de Wit A, Janssen S, Osinga S, et al., Machine learning for large-scale crop yield forecasting. Agricultural Systems. 2021; 187, 103016. https://doi.org/10.1016/j.agsy.2020.103016
- View Article
- Google Scholar
25. Chergui N. Durum wheat yield forecasting using machine learning. Artificial Intelligence in Agriculture. 2022; 6, 156–166. https://doi.org/10.1016/j.aiia.2022.09.003
- View Article
- Google Scholar
26. Hoffman AL, Kemanian AR, Forest CE. The response of maize, sorghum, and soybean yield to growing-phase climate revealed with machine learning. Environmental Research Letters. 2020; 15(9), 094013. https://doi.org/10.1088/1748-9326/ab7b22
- View Article
- Google Scholar
27. Harsányi E, Bashir B, Arshad S, Ocwa A, Vad A, et al., Data mining and machine learning algorithms for optimizing maize yield forecasting in central Europe. Agronomy. 2023; 13(5), 1297. https://doi.org/10.3390/agronomy13051297
- View Article
- Google Scholar
28. Ropelewska E, Nazari L. The effect of drought stress of sorghum grains on the textural features evaluated using machine learning. European Food Research and Technology. 2021; 247(11), 2787–2798. https://doi.org/10.1007/s00217-021-03832-9
- View Article
- Google Scholar
29. Habyarimana E, Baloch FS. Machine learning models based on remote and proximal sensing as potential methods for in-season biomass yields prediction in commercial sorghum fields. Plos one. 2021; 16(3), e0249136. pmid:33765103
- View Article
- PubMed/NCBI
- Google Scholar
30. Oddy VH, Robards GE. Low S.G. Prediction of in vivo dry matter digestibility from the fibre and nitrogen content of a feed. 1983; 395–398.
- View Article
- Google Scholar
31. Sheaffer CC, Peterson MA, Mccalin M, Volene JJ. (eds.). Acide detergent fiber, neutral detergent fiber concentration and relative feed value. In North American Alfalfa Improvement Conference, Minneapolis;1995.
- View Article
- Google Scholar
32. Sutton CD. Classification and Regression Trees, Bagging and Boosting. Handbook of Statistics 24: Data Mining and Data Visualization, Ed. by Ed. by Rao C.R, Wegman E.J, Solka J.L, Elsevier B.V. 2005; 303–329. https://doi.org/10.1016/S0169-7161(04)24011-1
33. Yıldırım H. The Multicollinearity Effect on the Performance of Machine Learning Algorithms: Case Examples in Healthcare Modelling. Academic Platform Journal of Engineering and Smart Systems (APJESS). 2024; 12(3), 68–80. https://doi.org/10.21541/apjess.1371070
- View Article
- Google Scholar
34. Montgomery DC, Peck EA, Vining GG. Introduction to linear regression analysis. John Wiley & Sons; 2021.
35. Akhlyustin SB, Melnikov AV, Zhilin RA. Prediction of the Integrated Indicator of Quality of a New Object Under the Conditions of Multicollinearity of Reference Data. Bulletin of the South Ural State University. Ser. Mathematical Modelling, Programming & Computer Software (Bulletin SUSU MMCS). 2020; 13(4), 66–80. https://doi.org/10.14529/mmp200406
- View Article
- Google Scholar
36. Moayedi H, Mosallanezhad M, Rashid ASA, Jusoh WAW, Muazu MA. A systematic review and meta-analysis of artificial neural network application in geotechnical engineering: theory and applications. Neural Computing and Applications. 2020; 32, 495–518. https://doi.org/10.1007/s00521-019-04109-9
- View Article
- Google Scholar
37. Amaral HLMD. Desenvolvimento de uma nova metodologia para previsão do consumo de energia elétrica de curto prazo utilizando redes neurais artificiais e decomposição de séries temporais. (Doctoral dissertation, Universidade de São Paulo); 2019. https://doi.org/10.11606/T.3.2020.tde-07022020-113308
38. Komarica J, Glavić D, Kaplanović S. Comparative Analysis of the Predictive Performance of an ANN and Logistic Regression for the Acceptability of Eco-Mobility Using the Belgrade Data Set. Data. 2024; 9(5), 73. https://doi.org/10.3390/data9050073
- View Article
- Google Scholar
39. Breiman L. Random forests. Machine learning. 2001; 45, 5–32. https://doi.org/10.1023/A:1010933404324
- View Article
- Google Scholar
40. Vijayakumar V, Case M, Shirinpour S, He B. Quantifying and characterizing tonic thermal pain across subjects from EEG data using random forest models. IEEE Transactions on Biomedical Engineering. 2017; 64(12), 2988–2996. pmid:28952933
- View Article
- PubMed/NCBI
- Google Scholar
41. Breiman L. Bagging predictors. Machine learning. 1996; 24, 123–140. https://doi.org/10.1007/BF00058655
- View Article
- Google Scholar
42. Du H, Ke S, Zhang W, Qi D, Sun T. Rapid quantitative analysis of coal composition using laser-induced breakdown spectroscopy coupled with random forest algorithm. Analytical Sciences. 2024; 1–14. https://doi.org/10.1007/s44211-024-00610-x
- View Article
- Google Scholar
43. Meng F, Shi Z, Song Y. The TPRF: A Novel Soft Sensing Method of Alumina–Silica Ratio in Red Mud Based on TPE and Random Forest Algorithm. Processes. 2024; 12(4), 663. https://doi.org/10.3390/pr12040663
- View Article
- Google Scholar
44. Breiman L, Cutler A. Random Forests. 2005; Available online: http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#inter (accessed on 15 August 2024).
- View Article
- Google Scholar
45. Yang SI, Brandeis TJ, Helmer EH, Oatham MP, Heartsill-Scalley T, Marcano-Vega H. Characterizing height-diameter relationships for Caribbean trees using mixed-effects random forest algorithm. Forest Ecology and Management. 2022; 524, 120507. https://doi.org/10.1016/j.foreco.2022.120507
- View Article
- Google Scholar
46. Wright MN, Ziegler A. ranger: A fast implementation of random forests for high dimensional data in C++ and R. arXiv preprint arXiv:1508.04409. 2015; https://doi.org/10.48550/arXiv.1508.04409
47. Friedman JH. Multivariate adaptive regression splines. The annals of statistics. 1991; 19(1), 1–67. https://doi.org/10.1214/aos/1176347963
- View Article
- Google Scholar
48. Deconinck E, Coomans D, Vander Heyden Y. Exploration of linear modelling techniques and their combination with multivariate adaptive regression splines to predict gastro-intestinal absorption of drugs. Journal of pharmaceutical and biomedical analysis. 2007; 43(1), 119–130. pmid:16859855
- View Article
- PubMed/NCBI
- Google Scholar
49. Jalali-Heravi M. Asadollahi-Baboli M, Mani-Varnosfaderani A. Shuffling multivariate adaptive regression splines and adaptive neuro-fuzzy inference system as tools for QSAR study of SARS inhibitors. Journal of Pharmaceutical and Biomedical Analysis. 2009; 50(5), 853–860. pmid:19665859
- View Article
- PubMed/NCBI
- Google Scholar
50. Ju X, Chen VC, Rosenberger JM, Liu F. Fast knot optimization for multivariate adaptive regression splines using hill climbing methods. Expert systems with applications. 2021; 171, 114565. https://doi.org/10.1016/j.eswa.2021.114565
- View Article
- Google Scholar
51. Stevens JG. An investigation of multivariate adaptive regression splines for modeling and analysis of univariate and semi-multivariate time series systems. (Doctoral dissertation, Monterey, California. Naval Postgraduate School); 1991.
52. Kriner M. Survival Analysis with Multivariate Adaptive Regression Splines, Dissertation, LMU München, Faculty of Mathematics, Computer Science and Statistics; 2007.
- View Article
- Google Scholar
53. Craven P, Wahba G. Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numerische mathematik. 1978; 31(4), 377–403. https://doi.org/10.1007/BF01404567
- View Article
- Google Scholar
54. Hastie T, Tibshirani R, Friedman JH, Friedman JH. 2009. The elements of statistical learning: data mining, inference, and prediction. 2009; (Vol. 2, 1–758). New York: Springer. https://doi.org/10.1007/978-0-387-21606-5
55. Kornacki J, Ćwik J. Statistical learning systems (in Polish). WNT Warsaw. 2005; 16.
56. Yang H. The case for being automatic: introducing the automatic linear modeling (LINEAR) procedure in SPSS statistics. Multiple Linear Regression Viewpoints. 2013; 39(2), 27–37.
- View Article
- Google Scholar
57. Genç S, Mendeş M. Evaluating performance and determining optimum sample size for regression tree and automatic linear modeling. Arquivo Brasileiro de Medicina Veterinária e Zootecnia. 2021; 73, 1391–1402. http://dx.doi.org/10.1590/1678-4162-12413
- View Article
- Google Scholar
58. Mendeş M. Re-evaluating the Monte Carlo simulation results by using graphical techniques. Türkiye Klinikleri Biyoistatistik. 2021; 13(1), 28–38. https://doi.org/10.5336/biostatic.2020-78896
- View Article
- Google Scholar
59. Bevilacqua M, Braglia M, Montanari R. The classification and regression tree approach to pump failure rate analysis. Reliability Engineering & System Safety. 2003; 79(1), 59–67. https://doi.org/10.1016/S0951-8320(02)00180-1
- View Article
- Google Scholar
60. Larsen DR, Speckman PL. Multivariate regression trees for analysis of abundance data. Biometrics. 2004; 60(2), 543–549. pmid:15180683
- View Article
- PubMed/NCBI
- Google Scholar
61. Xu W, Tu J, Xu N, Liu Z. Predicting daily heating energy consumption in residential buildings through integration of random forest model and meta-heuristic algorithms. Energy. 2024; 301, 131726. https://doi.org/10.1016/j.energy.2024.131726
- View Article
- Google Scholar
62. Khajavi H, Rastgoo A. Predicting the carbon dioxide emission caused by road transport using a Random Forest (RF) model combined by Meta-Heuristic Algorithms. Sustainable Cities and Society. 2023; 93, 104503. https://doi.org/10.1016/j.scs.2023.104503
- View Article
- Google Scholar
63. Zhussupbekov M, Memon SA, Khawaja SA, Nazir K, Kim J. Forecasting energy demand of PCM integrated residential buildings: A machine learning approach. Journal of Building Engineering. 2023; 70, 106335. https://doi.org/10.1016/j.jobe.2023.106335
- View Article
- Google Scholar
64. Sadaghat B, Ebrahimi SA, Souri O, Niar MY, Akbarzadeh MR. Evaluating strength properties of Eco-friendly Seashell-Containing Concrete: Comparative analysis of hybrid and ensemble boosting methods based on environmental effects of seashell usage. Engineering Applications of Artificial Intelligence. 2024; 133, 108388. https://doi.org/10.1016/j.engappai.2024.108388
- View Article
- Google Scholar
65. Willmott CJ, Matsuura K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate research. 2005; 30(1), 79–82. https://doi.org/10.3354/cr030079
- View Article
- Google Scholar
66. Liddle AR. Information criteria for astrophysical model selection. Monthly Notices of the Royal Astronomical Society: Letters. 2007; 377(1), L74–L78. https://doi.org/10.1111/j.1745-3933.2007.00306.x
- View Article
- Google Scholar
67. Takma C; Atil H, Aksakal V. Comparison of multiple linear regression and artificial neural network models goodness of fit to lactation milk yields. Kafkas Üniversitesi Veteriner Fakültesi Dergisi. 2012; 18:941–944. https://doi.org/10.9775/kvfd.2012.6764
- View Article
- Google Scholar
68. Chen C, Twycross J, Garibaldi JM. A new accuracy measure based on bounded relative error for time series forecasting. PloS one. 2017; 12(3), e0174202. pmid:28339480
- View Article
- PubMed/NCBI
- Google Scholar
69. R Core Team. R: A Language and environment for statistical computing. (Version 4.1) [Computer software]. 2021; Retrieved from https://cran.r-project.org. (R packages retrieved from MRAN snapshot 2022-01-01).
- View Article
- Google Scholar
70. Khalilian ME, Habibi D, Golzardi F, Aghayari F, Khazaei A. Effect of maturity stage on yield, morphological characteristics, and feed value of sorghum [Sorghum bicolor (L.) Moench] cultivars. Cereal Research Communications. 2022; 50(4), 1095–1104. https://doi.org/10.1007/s42976-022-00244-7
- View Article
- Google Scholar
71. Shahhosseini M, Martinez-Feria RA, Hu G, Archontoulis SV. Maize yield and nitrate loss prediction with machine learning algorithms. Environmental Research Letters. 2019; 14(12), 124026. http://doi.org/10.1088/1748-9326/ab5268
- View Article
- Google Scholar
72. Meng L, Liu HL, Ustin S, Zhang X. Predicting maize yield at the plot scale of different fertilizer systems by multi-source data and machine learning methods. Remote Sensing. 2021; 13(18), 3760. https://doi.org/10.3390/rs13183760
- View Article
- Google Scholar
73. Çatal MI, Çelik S¸ Bakoğlu A. Investigation of factors affecting fresh herbage yield in pea (Pisum arvense L.) using data mining algorithms. Front. Plant Sci. 2024; 15, 1482723. pmid:39634062
- View Article
- PubMed/NCBI
- Google Scholar
74. Çelik Ş, Tutar H, Gönülal E, Er H. Prediction of fresh herbage yield using data mining techniques with limited plant quality parameters. Scientific Reports. 2024; 14, 21396. pmid:39271726
- View Article
- PubMed/NCBI
- Google Scholar
75. Çelik Ş, Çaçan E, Yaryab S. Prediction of Stem Weight in Selected Alfalfa Varieties by Artificial Neural Networks, Multivariate Adaptive Regression Splines and Multiple Regression Analysis. Journal of Animal & Plant Sciences. 2023; 33(4), 1006–1020. https://doi.org/10.36899/JAPS.2023.4.0694
- View Article
- Google Scholar
76. Çelik Ş, Yılmaz O. Investigation of the Relationships between Coat Colour, Sex, and Morphological Characteristics in Donkeys Using Data Mining Algorithms. Animals. 2023; 13(14), 2366. pmid:37508143
- View Article
- PubMed/NCBI
- Google Scholar
77. Tyasi TL, Çelik Ş. Investigation of Egg Quality Characteristics Affecting Egg Weight of Lohmann Brown Hen with Data Mining Methods. Poultry Science Journal. 2024; 12(1), 107–117. https://doi.org/10.22069/psj.2024.21337.1934
- View Article
- Google Scholar

[ref1] 1. Abreha KB, Enyew M, Carlsson AS, Vetukuri RR, Feyissa T, et al., Sorghum in dryland: morphological, physiological, and molecular responses of sorghum under drought stress. Planta. 2022; 255:1–23. https://doi.org/10.1007/s00425-021-03799-7
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Chen X, Wu Q, Gao Y, Zhang J, Wang Y, et al., The role of deep roots in sorghum yield production under drought conditions. Agronomy. 2020; 10(4), 611. https://doi.org/10.3390/agronomy10040611
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Kamali S, Mehraban A. Effects of Nitroxin and arbuscular mycorrhizal fungi on the agro-physiological traits and grain yield of sorghum (Sorghum bicolor L.) under drought stress conditions. Plos one. 2020; 15(12), e0243824. https://doi.org/10.1371/journal.pone.0243824
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Schittenhelm S, Schroetter S. Comparison of drought tolerance of maize, sweet sorghum and sorghum‐sudangrass hybrids. Journal of Agronomy and Crop Science. 2014; 200(1), 46–53. https://doi.org/10.1111/jac.12039
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Yahaya MA, Shimelis H. Drought stress in sorghum: Mitigation strategies, breeding methods and technologies—A review. Journal of Agronomy and Crop Science. 2022; 208(2), 127–142. https://doi.org/10.1111/jac.12573
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Satyavathi CT, Solanki RK, Kakani RK, Bharadwaj C, Singhal T, et al., Genomics assisted breeding for abiotic stress tolerance in millets. Genomics Assisted Breeding of Crops for Abiotic Stress Tolerance. 2019; Vol. II, 241–255. https://doi.org/10.1007/978-3-319-99573-1_13
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Kazemi E, Ganjali HR, Mehraban A, Ghasemi A. Yield and biochemical properties of grain sorghum (Sorghum bicolor L. Moench) affected by nano-fertilizer under field drought stress. Cereal Research Communications. 2021; 1–9. https://doi.org/10.1007/s42976-021-00198-2
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Jabereldar AA, El Naim AM, Abdalla AA, Dagash YM. Effect of water stress on yield and water use efficiency of sorghum (Sorghum bicolor L. Moench) in semi-arid environment. International Journal of Agriculture and Forestry. 2017; 7(1), 1–6. https://doi.org/10.5923/j.ijaf.20170701.01
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Naoura G, Sawadogo N, Atchozo EA, Emendack Y, Hassan MA, et al., Assessment of agro-morphological variability of dry-season sorghum cultivars in Chad as novel sources of drought tolerance. Scientific Reports. 2019; 9(1), 19581. pmid:31863053
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref10] 10. Impa SM., Perumal R, Bean SR, Sunoj VJ, Jagadish SK. Water deficit and heat stress induced alterations in grain physico-chemical characteristics and micronutrient composition in field grown grain sorghum. Journal of Cereal Science. 2019; 86, 124–131. https://doi.org/10.1016/j.jcs.2019.01.013
View Article
Google Scholar

[30] View Article

[31] Google Scholar

[ref11] 11. Mathur S, Umakanth AV, Tonapi VA, Sharma R, Sharma MK. Sweet sorghum as biofuel feedstock: recent advances and available resources. Biotechnology for biofuels. 2017; 10, 1–19. https://doi.org/10.1186/s13068-017-0834-9
View Article
Google Scholar

[33] View Article

[34] Google Scholar

[ref12] 12. Sarshad A, Talei D, Torabi M, Rafiei F, Nejatkhah P. Morphological and biochemical responses of Sorghum bicolor (L.) Moench under drought stress. SN Applied Sciences. 2021; 3(1), 81. https://doi.org/10.1007/s42452-020-03977-4
View Article
Google Scholar

[36] View Article

[37] Google Scholar

[ref13] 13. Belayneh A, Adamowski J, Khalil B, Quilty J. Coupling machine learning methods with wavelet transforms and the bootstrap and boosting ensemble approaches for drought prediction. Atmospheric research. 2016; 172, 37–47. https://doi.org/10.1016/j.atmosres.2015.12.017
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref14] 14. Xu L, Chen N, Zhang X, Chen Z. An evaluation of statistical, NMME and hybrid models for drought prediction in China. Journal of hydrology. 2018; 566, 235–249. https://doi.org/10.1016/j.jhydrol.2018.09.020
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref15] 15. Jeong J, Hong T, Ji C, Kim J, Lee M, Jeong K. Development of an integrated energy benchmark for a multi-family housing complex using district heating. Applied energy. 2016; 179, 1048–1061. https://doi.org/10.1016/j.apenergy.2016.07.086
View Article
Google Scholar

[45] View Article

[46] Google Scholar

[ref16] 16. Biswas B, Singh J. Assessing yield-weather relationships in kharif maize under Punjab conditions using data mining method. J. Agrometeorology. 2020; 22, 104–111.
View Article
Google Scholar

[48] View Article

[49] Google Scholar

[ref17] 17. Ropelewska E. Effect of boiling on classification performance of potatoes determined by computer vision. European Food Research and Technology. 2021; 247(4), 807–817. https://doi.org/10.1007/s00217-020-03664-z
View Article
Google Scholar

[51] View Article

[52] Google Scholar

[ref18] 18. Van Klompenburg T, Kassahun A, Catal C. Crop yield prediction using machine learning: A systematic literature review. Computers and Electronics in Agriculture. 2020; 177, 105709. https://doi.org/10.1016/j.compag.2020.105709
View Article
Google Scholar

[54] View Article

[55] Google Scholar

[ref19] 19. Mohammed S, Elbeltagi A, Bashir B, Alsafadi K, Alsilibe F, et al., A comparative analysis of data mining techniques for agricultural and hydrological drought prediction in the eastern Mediterranean. Computers and Electronics in Agriculture. 2022; 197, 106925. https://doi.org/10.1016/j.compag.2022.106925
View Article
Google Scholar

[57] View Article

[58] Google Scholar

[ref20] 20. Oguntunde PG, Lischeid G, Dietrich O. Relationship between rice yield and climate variables in southwest Nigeria using multiple linear regression and support vector machine analysis. International journal of biometeorology. 2018; 62(3), 459–469. pmid:29032432
View Article
PubMed/NCBI
Google Scholar

[60] View Article

[61] PubMed/NCBI

[62] Google Scholar

[ref21] 21. Majumdar J, Naraseeyappa S, Ankalaki S. Analysis of agriculture data using data mining techniques: application of big data. Journal of Big Data. 2017; 4(1), 20. https://doi.org/10.1186/s40537-017-0077-4
View Article
Google Scholar

[64] View Article

[65] Google Scholar

[ref22] 22. Pimentel BS, Gonzalez ES, Barbosa GN. Decision-support models for sustainable mining networks: fundamentals and challenges. Journal of Cleaner Production. 2016; 112, 2145–2157. https://doi.org/10.1016/j.jclepro.2015.09.023
View Article
Google Scholar

[67] View Article

[68] Google Scholar

[ref23] 23. Chlingaryan A, Sukkarieh S, Whelan B. Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: A review. Computers and electronics in agriculture. 2018; 151, 61–69. https://doi.org/10.1016/j.compag.2018.05.012
View Article
Google Scholar

[70] View Article

[71] Google Scholar

[ref24] 24. Paudel D, Boogaard H, de Wit A, Janssen S, Osinga S, et al., Machine learning for large-scale crop yield forecasting. Agricultural Systems. 2021; 187, 103016. https://doi.org/10.1016/j.agsy.2020.103016
View Article
Google Scholar

[73] View Article

[74] Google Scholar

[ref25] 25. Chergui N. Durum wheat yield forecasting using machine learning. Artificial Intelligence in Agriculture. 2022; 6, 156–166. https://doi.org/10.1016/j.aiia.2022.09.003
View Article
Google Scholar

[76] View Article

[77] Google Scholar

[ref26] 26. Hoffman AL, Kemanian AR, Forest CE. The response of maize, sorghum, and soybean yield to growing-phase climate revealed with machine learning. Environmental Research Letters. 2020; 15(9), 094013. https://doi.org/10.1088/1748-9326/ab7b22
View Article
Google Scholar

[79] View Article

[80] Google Scholar

[ref27] 27. Harsányi E, Bashir B, Arshad S, Ocwa A, Vad A, et al., Data mining and machine learning algorithms for optimizing maize yield forecasting in central Europe. Agronomy. 2023; 13(5), 1297. https://doi.org/10.3390/agronomy13051297
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref28] 28. Ropelewska E, Nazari L. The effect of drought stress of sorghum grains on the textural features evaluated using machine learning. European Food Research and Technology. 2021; 247(11), 2787–2798. https://doi.org/10.1007/s00217-021-03832-9
View Article
Google Scholar

[85] View Article

[86] Google Scholar

[ref29] 29. Habyarimana E, Baloch FS. Machine learning models based on remote and proximal sensing as potential methods for in-season biomass yields prediction in commercial sorghum fields. Plos one. 2021; 16(3), e0249136. pmid:33765103
View Article
PubMed/NCBI
Google Scholar

[88] View Article

[89] PubMed/NCBI

[90] Google Scholar

[ref30] 30. Oddy VH, Robards GE. Low S.G. Prediction of in vivo dry matter digestibility from the fibre and nitrogen content of a feed. 1983; 395–398.
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref31] 31. Sheaffer CC, Peterson MA, Mccalin M, Volene JJ. (eds.). Acide detergent fiber, neutral detergent fiber concentration and relative feed value. In North American Alfalfa Improvement Conference, Minneapolis;1995.
View Article
Google Scholar

[95] View Article

[96] Google Scholar

[ref32] 32. Sutton CD. Classification and Regression Trees, Bagging and Boosting. Handbook of Statistics 24: Data Mining and Data Visualization, Ed. by Ed. by Rao C.R, Wegman E.J, Solka J.L, Elsevier B.V. 2005; 303–329. https://doi.org/10.1016/S0169-7161(04)24011-1

[ref33] 33. Yıldırım H. The Multicollinearity Effect on the Performance of Machine Learning Algorithms: Case Examples in Healthcare Modelling. Academic Platform Journal of Engineering and Smart Systems (APJESS). 2024; 12(3), 68–80. https://doi.org/10.21541/apjess.1371070
View Article
Google Scholar

[99] View Article

[100] Google Scholar

[ref34] 34. Montgomery DC, Peck EA, Vining GG. Introduction to linear regression analysis. John Wiley & Sons; 2021.

[ref35] 35. Akhlyustin SB, Melnikov AV, Zhilin RA. Prediction of the Integrated Indicator of Quality of a New Object Under the Conditions of Multicollinearity of Reference Data. Bulletin of the South Ural State University. Ser. Mathematical Modelling, Programming & Computer Software (Bulletin SUSU MMCS). 2020; 13(4), 66–80. https://doi.org/10.14529/mmp200406
View Article
Google Scholar

[103] View Article

[104] Google Scholar

[ref36] 36. Moayedi H, Mosallanezhad M, Rashid ASA, Jusoh WAW, Muazu MA. A systematic review and meta-analysis of artificial neural network application in geotechnical engineering: theory and applications. Neural Computing and Applications. 2020; 32, 495–518. https://doi.org/10.1007/s00521-019-04109-9
View Article
Google Scholar

[106] View Article

[107] Google Scholar

[ref37] 37. Amaral HLMD. Desenvolvimento de uma nova metodologia para previsão do consumo de energia elétrica de curto prazo utilizando redes neurais artificiais e decomposição de séries temporais. (Doctoral dissertation, Universidade de São Paulo); 2019. https://doi.org/10.11606/T.3.2020.tde-07022020-113308

[ref38] 38. Komarica J, Glavić D, Kaplanović S. Comparative Analysis of the Predictive Performance of an ANN and Logistic Regression for the Acceptability of Eco-Mobility Using the Belgrade Data Set. Data. 2024; 9(5), 73. https://doi.org/10.3390/data9050073
View Article
Google Scholar

[110] View Article

[111] Google Scholar

[ref39] 39. Breiman L. Random forests. Machine learning. 2001; 45, 5–32. https://doi.org/10.1023/A:1010933404324
View Article
Google Scholar

[113] View Article

[114] Google Scholar

[ref40] 40. Vijayakumar V, Case M, Shirinpour S, He B. Quantifying and characterizing tonic thermal pain across subjects from EEG data using random forest models. IEEE Transactions on Biomedical Engineering. 2017; 64(12), 2988–2996. pmid:28952933
View Article
PubMed/NCBI
Google Scholar

[116] View Article

[117] PubMed/NCBI

[118] Google Scholar

[ref41] 41. Breiman L. Bagging predictors. Machine learning. 1996; 24, 123–140. https://doi.org/10.1007/BF00058655
View Article
Google Scholar

[120] View Article

[121] Google Scholar

[ref42] 42. Du H, Ke S, Zhang W, Qi D, Sun T. Rapid quantitative analysis of coal composition using laser-induced breakdown spectroscopy coupled with random forest algorithm. Analytical Sciences. 2024; 1–14. https://doi.org/10.1007/s44211-024-00610-x
View Article
Google Scholar

[123] View Article

[124] Google Scholar

[ref43] 43. Meng F, Shi Z, Song Y. The TPRF: A Novel Soft Sensing Method of Alumina–Silica Ratio in Red Mud Based on TPE and Random Forest Algorithm. Processes. 2024; 12(4), 663. https://doi.org/10.3390/pr12040663
View Article
Google Scholar

[126] View Article

[127] Google Scholar

[ref44] 44. Breiman L, Cutler A. Random Forests. 2005; Available online: http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#inter (accessed on 15 August 2024).
View Article
Google Scholar

[129] View Article

[130] Google Scholar

[ref45] 45. Yang SI, Brandeis TJ, Helmer EH, Oatham MP, Heartsill-Scalley T, Marcano-Vega H. Characterizing height-diameter relationships for Caribbean trees using mixed-effects random forest algorithm. Forest Ecology and Management. 2022; 524, 120507. https://doi.org/10.1016/j.foreco.2022.120507
View Article
Google Scholar

[132] View Article

[133] Google Scholar

[ref46] 46. Wright MN, Ziegler A. ranger: A fast implementation of random forests for high dimensional data in C++ and R. arXiv preprint arXiv:1508.04409. 2015; https://doi.org/10.48550/arXiv.1508.04409

[ref47] 47. Friedman JH. Multivariate adaptive regression splines. The annals of statistics. 1991; 19(1), 1–67. https://doi.org/10.1214/aos/1176347963
View Article
Google Scholar

[136] View Article

[137] Google Scholar

[ref48] 48. Deconinck E, Coomans D, Vander Heyden Y. Exploration of linear modelling techniques and their combination with multivariate adaptive regression splines to predict gastro-intestinal absorption of drugs. Journal of pharmaceutical and biomedical analysis. 2007; 43(1), 119–130. pmid:16859855
View Article
PubMed/NCBI
Google Scholar

[139] View Article

[140] PubMed/NCBI

[141] Google Scholar

[ref49] 49. Jalali-Heravi M. Asadollahi-Baboli M, Mani-Varnosfaderani A. Shuffling multivariate adaptive regression splines and adaptive neuro-fuzzy inference system as tools for QSAR study of SARS inhibitors. Journal of Pharmaceutical and Biomedical Analysis. 2009; 50(5), 853–860. pmid:19665859
View Article
PubMed/NCBI
Google Scholar

[143] View Article

[144] PubMed/NCBI

[145] Google Scholar

[ref50] 50. Ju X, Chen VC, Rosenberger JM, Liu F. Fast knot optimization for multivariate adaptive regression splines using hill climbing methods. Expert systems with applications. 2021; 171, 114565. https://doi.org/10.1016/j.eswa.2021.114565
View Article
Google Scholar

[147] View Article

[148] Google Scholar

[ref51] 51. Stevens JG. An investigation of multivariate adaptive regression splines for modeling and analysis of univariate and semi-multivariate time series systems. (Doctoral dissertation, Monterey, California. Naval Postgraduate School); 1991.

[ref52] 52. Kriner M. Survival Analysis with Multivariate Adaptive Regression Splines, Dissertation, LMU München, Faculty of Mathematics, Computer Science and Statistics; 2007.
View Article
Google Scholar

[151] View Article

[152] Google Scholar

[ref53] 53. Craven P, Wahba G. Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numerische mathematik. 1978; 31(4), 377–403. https://doi.org/10.1007/BF01404567
View Article
Google Scholar

[154] View Article

[155] Google Scholar

[ref54] 54. Hastie T, Tibshirani R, Friedman JH, Friedman JH. 2009. The elements of statistical learning: data mining, inference, and prediction. 2009; (Vol. 2, 1–758). New York: Springer. https://doi.org/10.1007/978-0-387-21606-5

[ref55] 55. Kornacki J, Ćwik J. Statistical learning systems (in Polish). WNT Warsaw. 2005; 16.

[ref56] 56. Yang H. The case for being automatic: introducing the automatic linear modeling (LINEAR) procedure in SPSS statistics. Multiple Linear Regression Viewpoints. 2013; 39(2), 27–37.
View Article
Google Scholar

[159] View Article

[160] Google Scholar

[ref57] 57. Genç S, Mendeş M. Evaluating performance and determining optimum sample size for regression tree and automatic linear modeling. Arquivo Brasileiro de Medicina Veterinária e Zootecnia. 2021; 73, 1391–1402. http://dx.doi.org/10.1590/1678-4162-12413
View Article
Google Scholar

[162] View Article

[163] Google Scholar

[ref58] 58. Mendeş M. Re-evaluating the Monte Carlo simulation results by using graphical techniques. Türkiye Klinikleri Biyoistatistik. 2021; 13(1), 28–38. https://doi.org/10.5336/biostatic.2020-78896
View Article
Google Scholar

[165] View Article

[166] Google Scholar

[ref59] 59. Bevilacqua M, Braglia M, Montanari R. The classification and regression tree approach to pump failure rate analysis. Reliability Engineering & System Safety. 2003; 79(1), 59–67. https://doi.org/10.1016/S0951-8320(02)00180-1
View Article
Google Scholar

[168] View Article

[169] Google Scholar

[ref60] 60. Larsen DR, Speckman PL. Multivariate regression trees for analysis of abundance data. Biometrics. 2004; 60(2), 543–549. pmid:15180683
View Article
PubMed/NCBI
Google Scholar

[171] View Article

[172] PubMed/NCBI

[173] Google Scholar

[ref61] 61. Xu W, Tu J, Xu N, Liu Z. Predicting daily heating energy consumption in residential buildings through integration of random forest model and meta-heuristic algorithms. Energy. 2024; 301, 131726. https://doi.org/10.1016/j.energy.2024.131726
View Article
Google Scholar

[175] View Article

[176] Google Scholar

[ref62] 62. Khajavi H, Rastgoo A. Predicting the carbon dioxide emission caused by road transport using a Random Forest (RF) model combined by Meta-Heuristic Algorithms. Sustainable Cities and Society. 2023; 93, 104503. https://doi.org/10.1016/j.scs.2023.104503
View Article
Google Scholar

[178] View Article

[179] Google Scholar

[ref63] 63. Zhussupbekov M, Memon SA, Khawaja SA, Nazir K, Kim J. Forecasting energy demand of PCM integrated residential buildings: A machine learning approach. Journal of Building Engineering. 2023; 70, 106335. https://doi.org/10.1016/j.jobe.2023.106335
View Article
Google Scholar

[181] View Article

[182] Google Scholar

[ref64] 64. Sadaghat B, Ebrahimi SA, Souri O, Niar MY, Akbarzadeh MR. Evaluating strength properties of Eco-friendly Seashell-Containing Concrete: Comparative analysis of hybrid and ensemble boosting methods based on environmental effects of seashell usage. Engineering Applications of Artificial Intelligence. 2024; 133, 108388. https://doi.org/10.1016/j.engappai.2024.108388
View Article
Google Scholar

[184] View Article

[185] Google Scholar

[ref65] 65. Willmott CJ, Matsuura K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate research. 2005; 30(1), 79–82. https://doi.org/10.3354/cr030079
View Article
Google Scholar

[187] View Article

[188] Google Scholar

[ref66] 66. Liddle AR. Information criteria for astrophysical model selection. Monthly Notices of the Royal Astronomical Society: Letters. 2007; 377(1), L74–L78. https://doi.org/10.1111/j.1745-3933.2007.00306.x
View Article
Google Scholar

[190] View Article

[191] Google Scholar

[ref67] 67. Takma C; Atil H, Aksakal V. Comparison of multiple linear regression and artificial neural network models goodness of fit to lactation milk yields. Kafkas Üniversitesi Veteriner Fakültesi Dergisi. 2012; 18:941–944. https://doi.org/10.9775/kvfd.2012.6764
View Article
Google Scholar

[193] View Article

[194] Google Scholar

[ref68] 68. Chen C, Twycross J, Garibaldi JM. A new accuracy measure based on bounded relative error for time series forecasting. PloS one. 2017; 12(3), e0174202. pmid:28339480
View Article
PubMed/NCBI
Google Scholar

[196] View Article

[197] PubMed/NCBI

[198] Google Scholar

[ref69] 69. R Core Team. R: A Language and environment for statistical computing. (Version 4.1) [Computer software]. 2021; Retrieved from https://cran.r-project.org. (R packages retrieved from MRAN snapshot 2022-01-01).
View Article
Google Scholar

[200] View Article

[201] Google Scholar

[ref70] 70. Khalilian ME, Habibi D, Golzardi F, Aghayari F, Khazaei A. Effect of maturity stage on yield, morphological characteristics, and feed value of sorghum [Sorghum bicolor (L.) Moench] cultivars. Cereal Research Communications. 2022; 50(4), 1095–1104. https://doi.org/10.1007/s42976-022-00244-7
View Article
Google Scholar

[203] View Article

[204] Google Scholar

[ref71] 71. Shahhosseini M, Martinez-Feria RA, Hu G, Archontoulis SV. Maize yield and nitrate loss prediction with machine learning algorithms. Environmental Research Letters. 2019; 14(12), 124026. http://doi.org/10.1088/1748-9326/ab5268
View Article
Google Scholar

[206] View Article

[207] Google Scholar

[ref72] 72. Meng L, Liu HL, Ustin S, Zhang X. Predicting maize yield at the plot scale of different fertilizer systems by multi-source data and machine learning methods. Remote Sensing. 2021; 13(18), 3760. https://doi.org/10.3390/rs13183760
View Article
Google Scholar

[209] View Article

[210] Google Scholar

[ref73] 73. Çatal MI, Çelik S¸ Bakoğlu A. Investigation of factors affecting fresh herbage yield in pea (Pisum arvense L.) using data mining algorithms. Front. Plant Sci. 2024; 15, 1482723. pmid:39634062
View Article
PubMed/NCBI
Google Scholar

[212] View Article

[213] PubMed/NCBI

[214] Google Scholar

[ref74] 74. Çelik Ş, Tutar H, Gönülal E, Er H. Prediction of fresh herbage yield using data mining techniques with limited plant quality parameters. Scientific Reports. 2024; 14, 21396. pmid:39271726
View Article
PubMed/NCBI
Google Scholar

[216] View Article

[217] PubMed/NCBI

[218] Google Scholar

[ref75] 75. Çelik Ş, Çaçan E, Yaryab S. Prediction of Stem Weight in Selected Alfalfa Varieties by Artificial Neural Networks, Multivariate Adaptive Regression Splines and Multiple Regression Analysis. Journal of Animal & Plant Sciences. 2023; 33(4), 1006–1020. https://doi.org/10.36899/JAPS.2023.4.0694
View Article
Google Scholar

[220] View Article

[221] Google Scholar

[ref76] 76. Çelik Ş, Yılmaz O. Investigation of the Relationships between Coat Colour, Sex, and Morphological Characteristics in Donkeys Using Data Mining Algorithms. Animals. 2023; 13(14), 2366. pmid:37508143
View Article
PubMed/NCBI
Google Scholar

[223] View Article

[224] PubMed/NCBI

[225] Google Scholar

[ref77] 77. Tyasi TL, Çelik Ş. Investigation of Egg Quality Characteristics Affecting Egg Weight of Lohmann Brown Hen with Data Mining Methods. Poultry Science Journal. 2024; 12(1), 107–117. https://doi.org/10.22069/psj.2024.21337.1934
View Article
Google Scholar

[227] View Article

[228] Google Scholar

Figures

Abstract

Introduction

2. Material and methods

2.1. Research area and datasets

2.2. Experimental details

2.3. Data mining

2.3.1. Artificial Neural Network (ANN).

2.3.2. Multi-Layer Perceptron Artificial Neural Networks (MLP).

2.3.3. Random Forest (RF) algorithm.

2.3.4. Multivariate adaptive regression spline (MARS).

2.3.5. Automatic Linear Modeling (ALM).

3. Results

3.1. Result of Artificial Neural Network (ANN)

3.2. Automatic Linear Modeling (ALM) results

3.3. Random Forest (RF) algorithm results

3.4. MARS algorithm results

4. Discussion

5. Conclusion

References