Figures
Abstract
In this study, the effect of morphological traits on fresh herbage yield of sorghum x sudangrass hybrid plant grown in Konya province, which is the largest cereal production area in Turkey, was analyzed with some data mining methods. For this purpose, Artificial Neural Networks (ANN), Automatic Linear Model (ALM), Random Forest (RF) Algorithm and Multivariate Adaptive Regression Spline (MARS) Algorithm were used, and the prediction performances of these methods were compared. Plant height of 251.22 cm, stem diameter of 7.03 mm, fresh herbage yield of 8010.69 kg da-1, crude protein ratio of 9.09%, acid detergent fiber 33.23%, neutral detergent fiber 57.44%, acid detergent lignin 7.43%, dry matter digestibility of 63.01%, dry matter intake 2.11%, and relative feed value of 103.02 were the descriptive statistical values that were computed. Model fit statistics, including coefficient of determination (R2), adjusted R2, root of mean square error (RMSE), mean absolute percentage error (MAPE), standard deviation ratio (SD ratio), Mean Absolution Error (MAE) and Relative Absolution Error (RAE), were used to evaluate the prediction abilities of the fitted models. The MARS method was shown to be the best model for describing fresh herbage yield, with the lowest values of RMSE, MAPE, SD ratio, MAE and RAE (137.7, 1.488, 0.072, 109.718 and 0.017, respectively), as well as the highest R2 value (0.995) and adjusted R2 value (0.991). The experimental results show that the MARS algorithm is the most suitable model for predicting fresh herbage yield in sorghum x sudangrass hybrid, providing a good alternative to other data mining algorithms.
Citation: Tutar H, Celik S, Er H, Gönülal E (2025) Impact of morphological traits and irrigation levels on fresh herbage yield of sorghum x sudangrass hybrid: Modelling data mining techniques. PLoS ONE 20(2): e0318230. https://doi.org/10.1371/journal.pone.0318230
Editor: Agung Irawan, Universitas Sebelas Maret, INDONESIA
Received: October 14, 2024; Accepted: January 14, 2025; Published: February 5, 2025
Copyright: © 2025 Tutar et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All datasets generated for this study can be found in the article Figshare (https://doi.org/10.6084/m9.figshare.28027733).
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Due to global climate change, temperature and atmospheric CO2 levels are rising, and droughts are becoming widespread [1]. Drought is the primary environmental stressor that restricts crop production in arid and semi-arid regions [2]. Severe drought can significantly reduce crop yield and quality, leading to scarcity [3]. Therefore, more drought-resistant plant varieties and water-saving cultivation systems should be used to better cope with changing climate conditions [4]. In drought-prone environments, sweet sorghum and sorghum x sudangrass (SSG) hybrids can be considered as alternatives to maize and wheat [5]. Sorghum (Sorghum bicolor L. Moench), as a forage crop, is a highly drought-resistant member of the grain family when compared to other crops in this family [6,7].
Sorghum is the fifth most important cereal crop worldwide, serving as a staple food for over 500 million people across more than 30 countries. Globally, it ranks as the fourth most significant grain [8,9]. Sorghum is a rich source of protein and fiber [10], and beyond its nutritional role, it is also utilized as a raw material for bioethanol production [11]. Despite its reputation for resilience, drought stress remains a major challenge, adversely affecting both productivity and nutritional quality in key production areas. Thus, understanding how drought impacts sorghum and how the plant responds is critical for improving its resistance to such stress [12].
Statistical models are widely used in forecasting studies due to their simplicity and low computational cost requirements [13,14]. Data mining algorithms offer a powerful alternative to traditional regression methods, addressing some limitations [15]. These methods have gained widespread popularity in Lately, for predicting plant yield and various traits, as well as classifying plants by species [16,17]. Data mining involves extracting valuable and meaningful information from large datasets and is a relatively new interdisciplinary field that focuses on data analysis and knowledge discovery [18,19]. This approach combines multiple techniques, such as statistical analysis, data visualization, neural networks, knowledge discovery, pattern recognition, and database management [20].
Data mining is an approach that offers practical and effective solutions for productivity and sustainability in the agricultural sector. Data mining techniques used for agricultural yield prediction provide farmers with the opportunity to make more accurate and timely decisions by analyzing multiple factors such as climate changes, soil properties, and plant growth dynamics [21]. These techniques contribute to the efficient use of agricultural inputs by optimizing production processes. In addition, forecasts obtained through data mining can also guide agricultural policy making and help to balance agricultural supply and demand. Thus, data mining contributes to both economic and environmental sustainability by strengthening decision support systems in the agricultural sector [22].
Various studies have been conducted on field crops and other plants using data mining methods. Numerous studies have demonstrated the potential of data mining algorithms for predicting agricultural production systems [23–25]. In studies by [26,27], the relationship between certain climate parameters and maize yield was examined. [28] aimed to evaluate the effectiveness of textural features in distinguishing seeds produced under normal or drought conditions using discriminant models. [29] researched biomass yield prediction for sorghum plants using remote and proximal sensing-based model algorithms.
In this study, the factors affecting fresh herbage yield of sorghum x sudangrass (SSG) hybrid were identified using data mining algorithms based on various plant traits. A comparative study was conducted by evaluating and comparing the performances of different algorithms such as Artificial Neural Network (ANN), Multi-Layer Perceptron Artificial Neural Networks (MLP), Random Forest (RF) algorithm, Multivariate Adaptive Regression Spline (MARS), and Automatic Linear Modeling (ALM).
2. Material and methods
2.1. Research area and datasets
This research was conducted during the 2021 and 2022 growing seasons in Konya province, Türkiye, located between 37° 51´ N and 32° 33´ E. Konya is situated at an elevation of approximately 1006 meters above sea level. The study area has a semi-arid climate, with most of the annual rainfall occurring in the winter months. The recorded rainfall during the vegetation period of the SSG hybrid plant was determined to be 44 and 44.4 mm, respectively. The min., max., and average temperatures recorded in Konya during the 2021 and 2022 growing seasons align with the long-term average temperatures (Fig 1).
The dataset for the study was collected from the field between 2021 and 2022. The experimental dataset consists of parameters obtained under different irrigation levels (100% (I100), 75% (I75), 50% (I50), and 25% (I25) replenishment of water depleted from field capacity). The research includes 32 dataset parameters: Plant height (PH), stem diameter (SD), crude protein ratio (CPR), acid detergent fiber (ADF), neutral detergent fiber (NDF), acid detergent lignin (ADL), dry matter digestibility (DMD), dry matter intake (DMI), and relative feed value (RFV) as input parameters. Fresh herbage yield is the output parameter.
Soil analysis revealed that the study area had a clay-loam texture, free of salt, calcareous, and low in organic matter content (Table 1).
2.2. Experimental details
The field experiment followed a randomized block design with four replications. Each plot consisted of four rows, with a row spacing of 45 cm and a row length of 5 meters. Prior to sowing, 10 kg da-1 of NPK compound fertilizer was applied, and an additional 22 kg da-1 of nitrogen fertilizer was administered when the plants reached a height of 40–50 cm. The treatments were based on four different irrigation levels, where 100% (I100), 75% (I75), 50% (I50) and 25% (I25) of the water consumed from the field capacity was supplemented at 100% (I100), 75% (I75), 50% (I50) and 25% (I25), respectively, to fill the moisture deficit in the 0–90 cm soil layer, which was determined as the effective root depth of SSG hybrid. In 2021, the irrigation water applied for I100, I75, I50, and I25 was 510 mm, 395 mm, 280 mm, and 165 mm, respectively, while in 2022, these values were 480 mm, 370 mm, 260 mm, and 150 mm, respectively. Sowing took place in the first week of June, with harvesting completed by the last week of September. After mowing the plants in each plot, the fresh herbage yield was calculated on a per decare basis. For detailed analysis, ten plants were randomly selected from the harvested samples. The crude protein (CP), acid detergent fiber (ADF), and neutral detergent fiber (NDF) ratios were measured using a NIRS device. Additionally, dry matter digestibility (DMD), dry matter intake (DMI), and relative feed value (RFV) were calculated based on established formulas [30,31].
2.3. Data mining
Tree-structured classification and regression are alternative ways to classification and regression that do not rely on normalcy assumptions or user-specified model statements, as do certain earlier methods like discriminant analysis and ordinary least squares regression [32]. The data taken from various domains inherently consists of extremely correlated observations, which coincides with the exponential increase in the magnitude of the data that must be evaluated due to technology improvements. This phenomenon, known as multicollinearity, affects the performance of both statistics and machine learning methods. Statistical models presented as a potential solution to this problem have not received enough evaluation in the literature. As a result, tackling the multicollinearity problem requires a thorough comparison of statistical and machine learning models [33]. The literature proposes a variety of strategies for dealing with multicollinearity. Although the first recommended solution is to collect additional data, this may not always be feasible owing to cost constraints or even impossible. The second option is to use non-least squares techniques (such as ridge, liu, lasso, and elastic net regression). The third step is to modify the model by adding new or groups of variables based on the multi-correlated variables [34]. Lastly, as more popular solutions for multicollinearity or other issues (such as outliers) in the field of machine learning, a variety of preprocessing techniques are used, including as centering, scaling, normalization, and standardization [33]. Three strategies were put out by [35] to address the multicollinearity issue in the data mining domain: employ ridge regression, train a two-layer neural network with the instructor, and then successively modify a single-layer neural network.
Alternative models (such as the Random Forest algorithm, Multivariate Adaptive Regression Spline, Artificial Neural Network, and Automatic Linear Modeling) will be the main focus of this investigation.
2.3.1. Artificial Neural Network (ANN).
The artificial neural network (ANN) is modeled to replicate the functioning of the human brain and nervous system. These neural network models are mathematical systems, inspired by biological neural networks, designed to mimic the processes of animal brains [36].
2.3.2. Multi-Layer Perceptron Artificial Neural Networks (MLP).
The Artificial Neural Network (ANN) used in this study is a feed-forward Multi-Layer Perceptron (MLP) model, comprising an input layer, a single hidden layer, and an output layer. In this structure, the neurons in the input layer pass signals to the hidden layer, where the neurons are interconnected through weighted connections. Both the hidden and output layers use the same activation function. The MLP was trained for 1000 epochs.
The activation function is crucial because it controls the level of activation and the strength of the output signal of an artificial neuron. Non-linear activation functions are often used as they enable better performance in function approximation [37].
The hyperbolic tangent activation function can be expressed as follows [38]:
(4)
2.3.3. Random Forest (RF) algorithm.
For both regression and classification tasks, the Random Forest (RF) algorithm is a powerful ensemble learning technique [39]. This method addresses the overfitting problem inherent in decision trees [40] by combining Breiman’s Bagging algorithm [41] with decision trees. RF demonstrates strong generalization capabilities and robustness to unseen data, while also performing exceptionally well on training data. By randomly selecting features from the candidate feature set, RF can handle large-scale, high-dimensional datasets and reduces the impact of feature correlations on the model. Additionally, RF enhances model stability by averaging the results of multiple decision trees, thereby mitigating the influence of noise [42].
The creation of a decision tree typically involves three steps: feature selection, decision tree generation, and pruning. The classification error rate is often estimated using the information gain, information gain ratio, or Gini index, which are commonly used as feature selection criteria. In addition to synthesizing multiple input features to enhance the model, the random forest approach can also assess the relative importance of input features. By combining the decision outcomes of several decision trees, the random forest method, as an ensemble learning algorithm, can filter out anomalous data, paving the way for input feature optimization and improved model accuracy. Decision trees, the foundational components of random forests, possess strong generalization capabilities that allow them to address regression problems as well as classification tasks [43].
Furthermore, RF is exceptionally fast, highly resistant to overfitting, and allows users to generate as many trees as desired [44].
The expected values for unknown instances x are calculated by averaging the predictions from all the regression trees, as shown below:
(5)
Bagging involves repeatedly drawing B bootstrap samples from the training data and fits tb trees based on the Gini impurity for each sample.
In this study, a Random Forest (RF) model was trained using an ensemble of 500 regression trees, with all available factors as predictors and total tree height as the response variable [45]. The RF model was implemented with the R package "ranger," utilizing default hyperparameter settings. The "ranger" package was selected for its faster data analysis capabilities and more efficient memory usage compared to other widely used random forest packages in R [46].
2.3.4. Multivariate adaptive regression spline (MARS).
In 1991, Friedman introduced the multivariate adaptive regression spline (MARS) technique [47,48]. Typically, a regression pair is represented as (Xi, Yi), where Xi corresponds to one or more independent variables and Yi is the dependent variable. In the MARS model, each independent variable is associated with one or more split points (ti). For Xi ≥ ti, the resulting equation is known as the right-side basis function (BF), while for Xi < ti, the equation is called the left-side basis function. These left and right basis functions (spline functions) establish the relationship between Xi and the dependent variable Yi. The following equations present the mathematical expressions for the right and left basis functions [49].
(6)
(7)
where q (≥0) is the power to which the splines are raised and which defines the degree of smoothness of the outcome function estimate.
The MARS model may be expressed as follows: [50].
In this context,f (x) denotes the MARS model, and Bm(x) represents the basis function. The index of the basis function is indicated by m, while the total number of basic functions in the MARS model is also represented by m. The coefficient corresponding to the m-th basis function is labeled as am, and x ∈ Rn signifies the vector of predictor variables. The MARS model constructs the basis functions in a product form.
In this context, bkm represents the k-th univariate function within Bm(x), and km indicates the total number of univariate terms multiplied in Bm(x). When km>1, km refers to the degree of the interaction term. On the other hand, if km = 1, the basis function is univariate. Each basis function contains refraction points, which serve as the knots for that function. The simplest form of bkm consists of truncated linear functions, structured as follows:
(10)
or
(11)
where the location t is called the knot of the basis function.
The MARS approach selects models using the GCV criterion [51]. The GCV criteria is used to assess the degree of fit and accuracy of the model [52]. GCV coefficient [53,54], as shown in the equation below.
(12)
where n is the quantity of sample data, C is the cost-complexity measure of the new basic functions, and M(λ) is the number of regression models created by the MARS model [55].
2.3.5. Automatic Linear Modeling (ALM).
Automatic Linear Modeling (ALM) enables researchers to automatically select the optimal subset of predictors. To enhance data fit, ALM directly transforms predictors. When conducting ALM analysis, SPSS utilizes techniques such as time rescaling, outlier reduction, category merging, and other methods. While ALM can be applied to small and medium-sized datasets, it is particularly advantageous for large and complex datasets. The benefits of ALM become more evident, especially when dealing with multiple estimators [56–58]. Additionally, the ALM technique is highly effective for selecting and categorizing variables.
(13)
Where Yi is dependent or outcome variable, Xi’s predictor or independent variables, β0 is constant, βn is the slope coefficients for each predictor and ε is the error term [59,60].
The goodness of fit statistics are summarized as follows: [61].
R2 (Coefficient of Determination): Evaluates how well the model can predict future data and explain the variance within the observed dataset, playing a key role in verifying prediction accuracy.
RMSE (Root Mean Squared Error): Represents the average size of prediction errors, offering insight into the overall error magnitude.
MAE (Mean Absolute Error): Reflects the average of the absolute differences between predicted and actual values, making it easier to interpret error without considering its direction.
Both RMSE and MAE provide important measures of prediction error size, essential for evaluating forecast precision.
RAE (Relative Absolute Error): Provides a normalized error metric by comparing performance against a simple baseline model, making it useful for comparing different models.
Below are the formulas for these goodness of fit statistics [62–64].
Coefficient of Determination:
(14)
Adjusted Coefficient of Determination:
(15)
Root-mean-square error (RMSE) given by the following formula:
(16)
In the methods, the following other model evaluation criteria were also calculated [65–68].
Mean absolute percentage error (MAPE);
(19)
Standard Deviation Ratio;
(20)
The variance inflation factor (VIF) determined for multicollinearity is the inverse of the correlation matrix and is calculated as follows.
The coefficient of determination, , is determined by regressing xi over the other p − 1 variables. The degree of multicollinearity increases with the value of VIF based on each variable. Generally speaking, VIF values higher than 10 indicate a weak model’s capacity for estimating and generalization [34].
The IBM SPSS Automated Linear Modeling (ALM) and Artificial Neural Network (ANN) analyses were conducted using the SPSS software program (version 25, SPSS Inc., Chicago, IL, USA). The MARS and Random Forest algorithms were implemented using the R Studio program [69].
3. Results
Table 2 provides summary statistics for the plant characteristics of the SSG hybrid grown in Konya province, Türkiye, during the 2021 and 2022 growing seasons. The data showed normal distribution according to the Kolmogorov-Smirnov test and p>0.05. Also, the datasets are investigated for multicollinearity by using the diagnostic methods and results are given in Table 2. According to the results of fresh herbage yield data, it can be said that there is a problem of multicollinearity in the data due to the presence of variables (PH, SD, ADF, NDF, ADL, DMD and DMI) below the tolerance value of 0.1 and above the variance inflation factor (VIF) value of 10. The conclusions that there are strong correlations between the variables are further supported by the correlation analysis results shown in Fig 2.
Fig 2 presents the correlation coefficients for the plant traits of the SSG hybrid. Furthermore, Fig 3 shows the outcomes of the Principal Component Analysis (PCA) conducted on the plant characteristics of the SSG hybrid.
A separate test data set was used to evaluate the models’ generalization and prediction abilities after they were trained using the crossvalidation method due to the issue of multicollinearity. The results of the study show that the data mining (ANN, ALM, RF, and MARS) algorithms perform well for the data set with multicollinearity problem of statistical models. When multicollinearity is present, prior to using MARS, it was lowered the dimensionality of the input variables using principal components in order to enhance MARS’s capacity to handle multicollinearity. It was found that the resulting model enhances the accuracy of MARS in the multicollinear situation while maintaining interpretability using data on the features of sorghum plants. The results obtained are summarized as follows.
3.1. Result of Artificial Neural Network (ANN)
The Multilayer Perceptron artificial neural network model was selected due to its suitability for the data. The optimization approach used was the scaled conjugate gradient, with 70% of the data allocated for training and 30% for testing the network. Fig 4 illustrates the connections within the ANN.
Fig 4 illustrates the artificial neural network design, where the identity function is used as the activation function in the output layer, and the hyperbolic tangent is used as the activation function in the hidden layer. Table 3 displays the parameter estimations for the ANN model.
Table 3 shows the connection weights between each neuron as follows:
The connection weight values between the input layer variables PH, SD, CPR, ADF, NDF, ADL, DMD, DMI, and RFV and H(1:1), the first neuron in the hidden layer, are -0.775, 0.194, 0.333, 0.188, -0.715, 0.293, 0.118, 0.336, and 0.848, respectively.
In the ANN model, the learning Sum of Squares Error (SSE) value was 0.783, and the relative error was 0.065. The test’s SSE was 0.051, with a relative error of 0.019. The percentage relevance of the independent variables is displayed in Table 4.
Table 4 shows that the impact values of the independent factors on fresh herbage yield (FHY) in the output layer are as follows: RFV is 0.284, NDF is 0.239, PH is 0.198, DMI is 0.091, CPR is 0.077, ADL is 0.039, SD is 0.025, ADF is 0.024, and DMD is 0.022. Fig 5 displays a percentage column graph illustrating the impact of these factors on the prediction.
Fig 5 illustrates that in this model, RFV has the largest impact on the fresh herbage yield (FHY) of SSG hybrid, with a 100% influence from the terminal. Additionally, NDF is the second most significant independent variable, with an impact rate of 84.2%, while PH has an effect of 69.6%. DMD had the least impact on fresh herbage production from the terminals, with a rate of 7.6%. Other variables, including DMI (32.2%), CPR (27.3%), ADL (13.8%), SD (8.9%), and ADF, have an effect of 8.5%.
3.2. Automatic Linear Modeling (ALM) results
Table 5 shows the results of the model prediction coefficients, and the significance achieved when ALM was used.
The ALM method was used to assess the predictability of the mean FHY, with the key contributing factors summarized in Table 5. Notably, the SD variable was not statistically significant in the ALM analysis. Table 5 also provides parameter estimates for the overall model, showing the individual impact of each factor on the target variable. The coefficients illustrate the relationship between each predictor and the mean fresh herbage yield, assuming the other variables remain constant. The importance of each predictor, as identified by the ALM method, is also highlighted in Table 5, with standardized values summing to one. The model reached an accuracy of 89.5%, calculated by multiplying the adjusted R2 by 100. The predictor importance graph (Fig 6) further demonstrates the relative significance of each factor, with RFV (0.524), PH (0.392), and SD (0.084) emerging as the most influential, with RFV being the key predictor of FHY.
The discarded scatter plot of FHY displays predictor values on the y-axis and observed values on the x-axis, indicating that a larger percentage of sample locations are on the 45-degree line, suggesting that the model is relatively accurate (Fig 7). Fig 8 shows that FHY has a positive association with PH and SD, but a negative correlation with RFV.
3.3. Random Forest (RF) algorithm results
The findings of the RF algorithm are summarized here. Fig 9 illustrates the random forest trees designed to minimize the error value.
The RF method was employed to build the model, using fresh herbage yield (FHY) as the dependent variable. The RF model incorporates various plant linear characteristics as predictors, including PH, SD, CPR, ADF, NDF, ADL, DMD, DMI, and RFV. The random forest algorithm used 500 trees. The model explains 88.87% of the variation in the dependent variable, with MSE = 160,067, RMSE = 400, MAE = 321, and Bias = 170. Neutral detergent fiber (NDF) is the most significant factor influencing the model, followed by PH and RFV (as shown in Table 6 and Fig 10).
It is also possible to estimate fresh herbage production by varying the values of the characteristic features that represent the independent variables in the equation generated by the Random Forest (RF) technique. For example, consider the following calculation: PH = 225, SD = 7, CPR = 8.5, ADF = 31, NDF = 55, ADL = 8.5, DMD = 66, DMI = 2.1, RFV = 111, resulting in FHY = 7539.289 kg.
3.4. MARS algorithm results
Table 7 presents the model estimation coefficients of the MARS approach used for predicting fresh herbage yield (FHY).
Below is a detailed equation that was developed by considering the interaction effects of the model’s coefficients.
FHY = 9380–234 * max(0, 231—PH) + 194 * max(0, PH—231)
- 504 * max(0, SD—8.2) - 7613 * max(0, 56.5—NDF) + 773 * max(0, RPV—95.3)
+ 1776 * max(0, 97—RPV) - 1072 * max(0, RPV—97) + 91.7 * max(0, 231—PH)* DMI
- 87.9 * max(0, PH—231) * DMI—174 * SD * max(0, 97—RPV)
- 58.8 * max(0, NDF—56.5) * ADL + 123 * max(0, 56.5—NDF) * DMD
By adjusting the values of the characteristic features that represent the independent variables in the equation produced by the MARS algorithm, it is possible to estimate the fresh herbage yield. For instance, consider the following calculation: PH = 225, SD = 7, CPR = 8.5, ADF = 31, NDF = 55, ADL = 8.5, DMD = 66, DMI = 2.1, RFV = 111, resulting in FHY = 7053.170 kg.
Fig 11 illustrates the proportional relevance of the factors used in the MARS algorithm to forecast fresh herbage yield.
Fig 12 presents a graph comparing the observed values with the estimated values generated by the MARS algorithm.
The study found a bilateral interaction between the variables. Fig 13 illustrates the three-dimensional surface graph of the analysis findings, highlighting the connection between two predictor variables and the objective variable.
When the goodness of fit statistics of the MARS algorithm were evaluated, it was calculated as R2 = 0.995, Adj. R2 = 0.991, MSE = 18961, RMSE = 137.7, MAPE = 1.488, MAE = 109.718, RAE = 0.017 and SD Ratio = 0.072.
The goodness-of-fit criterion results for all methods are displayed in Table 8. Each algorithm produced quite accurate FHY forecasts. In terms of expected accuracy, MARS was found to be the best method, followed by RF, ANN, and then ALM.
4. Discussion
In the study conducted by [70], plant height exhibited a significant positive correlation with total dry matter yield, SD, CPY, CPC, DMD, and ME, while showing a significant negative correlation with panicle length, number of tillers, and ADF. Stem diameter also showed a strong positive correlation with plant height, crude protein yield, panicle length, ether extract, and total dry matter yield. [16] explored the use of the Random Forest (RF) data mining technique to analyze the relationship between various climate factors and maize yield, finding an average R-squared of 28% and an explained R-squared of 55%. Similarly, [26] applied the RF algorithm to a dataset from 1980 to 2016, creating a sorghum yield prediction model with an R-squared value of 0.71. In another study by [25], several algorithms, including Support Vector Regression, RF, Extreme Learning Machine, ANN, and DNN, were tested for wheat yield prediction in two provinces. The Deep Neural Network (DNN) performed best in the first province with an RMSE of 0.04 q/ha and an R-squared of 0.96, while RF outperformed other models in the second province with an RMSE of 0.05 q/ha. [27] applied data mining techniques like BG, DT, RF, and ANN to predict maize yield using variables such as annual average temperature, precipitation, rainy days, frosty days, and hot days. Their study found that the ANN algorithm achieved the highest accuracy (r = 0.98, relative absolute error = 21.87%, root relative squared error = 20.44%, and RMSE = 423.23), highlighting ANN’s effectiveness in yield prediction. Similarly, [71] identified temperature and precipitation as key factors affecting maize yield using comparable models. [72] also applied Random Forest regression and found that maximum temperature and precipitation were critical climate factors influencing maize yield.
The parameters influencing the fresh grass production of pea plants grown in Turkey were investigated using multivariate adaptive regression spline (MARS), Chi-square automatic interaction detection (CHAID), classification and regression tree (CART), and artificial neural network (ANN) models. The MARS approach was shown to be the most effective model for measuring plant fresh herbage yield, with the highest R2 and adjusted R2 values (0.998 and 0.986) and the lowest values of RMSE, MAPE, SD ratio, AIC, and AICc (10.499, 0.7365, 0.047, 268, and 688, respectively) [73].
The CHAID, CART, MARS, and Bagging MARS algorithms were used in the study by [74] to analyze the parameters impacting fresh herbage yield in sorghum-sudangrass hybrids. The best algorithms for predicting the dependent variable were determined to be MARS, Bagging MARS, CART, and CHAID, in that order. The MARS algorithm was shown to be the most accurate predictor of crop yield.
Another study employed multiple regression analysis (MLR), artificial neural networks (ANNs), and the Multivariate Adaptive Regression Splines (MARS) method to estimate the stem weight of alfalfa plants. In the estimation of stem weight, the ANN, MARS, and MLR correlation coefficients (r) were 0.801, 0.999, and 0.753 for the Gea clover variety, and 0.781, 0.998, and 0561 for the Basbag variety. The Gea variety’s R2 in the same models was 0.642, 0.998, and 0.567, while the Basbag variety’s was 0.610, 0.997, and 0.315. The Gea variety’s MSE values were 0.023, 0.008, and 2.498, whereas the Basbag variety’s were 0.151, 0.017, and 4.641. Compared to ANNs and MLR, the MARS algorithm produced a more accurate forecast. MARS > ANN > MLR was the sequence of algorithms utilized to improve prediction results for alfalfa plant stem weight estimate [75].
[76] used donkey biometric data to examine the predicted performance of many machine learning algorithms, including CHAID, Random Forest, ALM, MARS, and Bagging MARS. With the lowest RMSE, MAPE, and SD ratio values (2.173, 1.615, and 0.291, respectively) and the greatest R2 value (0.916), the MARS algorithm was determined to be the most effective model for determining donkey body length. From best to worst, the algorithms’ performance results were as follows: MARS > Bagging MARS > Random Forest > CHAID > ALM.
To predict egg weight from certain egg quality criteria in chickens, the following methods were employed: random forest (RF), multivariate adaptive regression spline (MARS), categorization and regression trees (CART), bagging MARS, chi-square automated interaction detector (CHAID), and exhaustive CHAID. The values of the correlation coefficient (r) varied from 0.99999 (MARS and Bagging MARS) to 0.957 (CHAID). MARS and bagging MARS algorithms had the lowest RMSE (0.001), whereas CHAID had the most (2.154). MARS ≈ Bagging MARS > RF > CART > Exhaustive CHAID > CHAID was determined to be the algorithm’s supremacy order in terms of prediction accuracy [77].
The results of comparative data mining algorithms used in different plants and livestock data are consistent with the results of this study in terms of the suitability of the methods and especially the MARS algorithm giving the best results.
5. Conclusion
This study assessed the performance of the ANN, ALM, RF, and MARS methods in predicting the FHY of the SSG hybrid. The key findings of the study are as follows:
In the SSG hybrid, seven factors (PH, SD, ADF, NDF, DMD, DMI, and RFV) significantly influence fresh herbage yield. Among these, the most impactful factors are PH, RFV, and NDF.
The RF method accurately predicts the FHY of the SSG hybrid, accounting for 89.50% of the variation. In comparison, the ALM method achieved an accuracy of 88.87%, slightly lower than the RF method.
The factors that determine the fresh herbage yield in SSG hybrid are ranked in order of significance as follows: RFV, PH, and SD for the ALM technique; RFV, NDF, and PH for the ANN method; PH, NDF, and RFV for the MARS algorithm; and NDF, PH, and RFV for the RF algorithm. When analyzing the importance ranking of plant traits affecting fresh herbage yield in SSG hybrid, the RF, ANN, and MARS methods showed similar characteristics in terms of variable importance ranking. The only difference was the order of the top three variables (RFV, NDF, and PH). In the ALM method, however, the SD variable was included, whereas NDF was not among the top three.
The performance results, from worst to best, are as follows: ALM < ANN < RF < MARS.
The results of the study show that statistical models, especially the MARS algorithm, perform better among data mining algorithms for the data set with multicollinearity problem. As a result, selecting the right approach is crucial because multicollinearity is a common issue in practical applications. Thus, it can be said that data mining techniques and statistical models are strong instruments that provide efficient answers to the multicollinearity issue.
It has been shown that data mining methods are extremely effective in predicting variables and uncovering the relationships between plant traits and characteristics in agricultural field data.
References
- 1. Abreha KB, Enyew M, Carlsson AS, Vetukuri RR, Feyissa T, et al., Sorghum in dryland: morphological, physiological, and molecular responses of sorghum under drought stress. Planta. 2022; 255:1–23. https://doi.org/10.1007/s00425-021-03799-7
- 2. Chen X, Wu Q, Gao Y, Zhang J, Wang Y, et al., The role of deep roots in sorghum yield production under drought conditions. Agronomy. 2020; 10(4), 611. https://doi.org/10.3390/agronomy10040611
- 3. Kamali S, Mehraban A. Effects of Nitroxin and arbuscular mycorrhizal fungi on the agro-physiological traits and grain yield of sorghum (Sorghum bicolor L.) under drought stress conditions. Plos one. 2020; 15(12), e0243824. https://doi.org/10.1371/journal.pone.0243824
- 4. Schittenhelm S, Schroetter S. Comparison of drought tolerance of maize, sweet sorghum and sorghum‐sudangrass hybrids. Journal of Agronomy and Crop Science. 2014; 200(1), 46–53. https://doi.org/10.1111/jac.12039
- 5. Yahaya MA, Shimelis H. Drought stress in sorghum: Mitigation strategies, breeding methods and technologies—A review. Journal of Agronomy and Crop Science. 2022; 208(2), 127–142. https://doi.org/10.1111/jac.12573
- 6. Satyavathi CT, Solanki RK, Kakani RK, Bharadwaj C, Singhal T, et al., Genomics assisted breeding for abiotic stress tolerance in millets. Genomics Assisted Breeding of Crops for Abiotic Stress Tolerance. 2019; Vol. II, 241–255. https://doi.org/10.1007/978-3-319-99573-1_13
- 7. Kazemi E, Ganjali HR, Mehraban A, Ghasemi A. Yield and biochemical properties of grain sorghum (Sorghum bicolor L. Moench) affected by nano-fertilizer under field drought stress. Cereal Research Communications. 2021; 1–9. https://doi.org/10.1007/s42976-021-00198-2
- 8. Jabereldar AA, El Naim AM, Abdalla AA, Dagash YM. Effect of water stress on yield and water use efficiency of sorghum (Sorghum bicolor L. Moench) in semi-arid environment. International Journal of Agriculture and Forestry. 2017; 7(1), 1–6. https://doi.org/10.5923/j.ijaf.20170701.01
- 9. Naoura G, Sawadogo N, Atchozo EA, Emendack Y, Hassan MA, et al., Assessment of agro-morphological variability of dry-season sorghum cultivars in Chad as novel sources of drought tolerance. Scientific Reports. 2019; 9(1), 19581. pmid:31863053
- 10. Impa SM., Perumal R, Bean SR, Sunoj VJ, Jagadish SK. Water deficit and heat stress induced alterations in grain physico-chemical characteristics and micronutrient composition in field grown grain sorghum. Journal of Cereal Science. 2019; 86, 124–131. https://doi.org/10.1016/j.jcs.2019.01.013
- 11. Mathur S, Umakanth AV, Tonapi VA, Sharma R, Sharma MK. Sweet sorghum as biofuel feedstock: recent advances and available resources. Biotechnology for biofuels. 2017; 10, 1–19. https://doi.org/10.1186/s13068-017-0834-9
- 12. Sarshad A, Talei D, Torabi M, Rafiei F, Nejatkhah P. Morphological and biochemical responses of Sorghum bicolor (L.) Moench under drought stress. SN Applied Sciences. 2021; 3(1), 81. https://doi.org/10.1007/s42452-020-03977-4
- 13. Belayneh A, Adamowski J, Khalil B, Quilty J. Coupling machine learning methods with wavelet transforms and the bootstrap and boosting ensemble approaches for drought prediction. Atmospheric research. 2016; 172, 37–47. https://doi.org/10.1016/j.atmosres.2015.12.017
- 14. Xu L, Chen N, Zhang X, Chen Z. An evaluation of statistical, NMME and hybrid models for drought prediction in China. Journal of hydrology. 2018; 566, 235–249. https://doi.org/10.1016/j.jhydrol.2018.09.020
- 15. Jeong J, Hong T, Ji C, Kim J, Lee M, Jeong K. Development of an integrated energy benchmark for a multi-family housing complex using district heating. Applied energy. 2016; 179, 1048–1061. https://doi.org/10.1016/j.apenergy.2016.07.086
- 16. Biswas B, Singh J. Assessing yield-weather relationships in kharif maize under Punjab conditions using data mining method. J. Agrometeorology. 2020; 22, 104–111.
- 17. Ropelewska E. Effect of boiling on classification performance of potatoes determined by computer vision. European Food Research and Technology. 2021; 247(4), 807–817. https://doi.org/10.1007/s00217-020-03664-z
- 18. Van Klompenburg T, Kassahun A, Catal C. Crop yield prediction using machine learning: A systematic literature review. Computers and Electronics in Agriculture. 2020; 177, 105709. https://doi.org/10.1016/j.compag.2020.105709
- 19. Mohammed S, Elbeltagi A, Bashir B, Alsafadi K, Alsilibe F, et al., A comparative analysis of data mining techniques for agricultural and hydrological drought prediction in the eastern Mediterranean. Computers and Electronics in Agriculture. 2022; 197, 106925. https://doi.org/10.1016/j.compag.2022.106925
- 20. Oguntunde PG, Lischeid G, Dietrich O. Relationship between rice yield and climate variables in southwest Nigeria using multiple linear regression and support vector machine analysis. International journal of biometeorology. 2018; 62(3), 459–469. pmid:29032432
- 21. Majumdar J, Naraseeyappa S, Ankalaki S. Analysis of agriculture data using data mining techniques: application of big data. Journal of Big Data. 2017; 4(1), 20. https://doi.org/10.1186/s40537-017-0077-4
- 22. Pimentel BS, Gonzalez ES, Barbosa GN. Decision-support models for sustainable mining networks: fundamentals and challenges. Journal of Cleaner Production. 2016; 112, 2145–2157. https://doi.org/10.1016/j.jclepro.2015.09.023
- 23. Chlingaryan A, Sukkarieh S, Whelan B. Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: A review. Computers and electronics in agriculture. 2018; 151, 61–69. https://doi.org/10.1016/j.compag.2018.05.012
- 24. Paudel D, Boogaard H, de Wit A, Janssen S, Osinga S, et al., Machine learning for large-scale crop yield forecasting. Agricultural Systems. 2021; 187, 103016. https://doi.org/10.1016/j.agsy.2020.103016
- 25. Chergui N. Durum wheat yield forecasting using machine learning. Artificial Intelligence in Agriculture. 2022; 6, 156–166. https://doi.org/10.1016/j.aiia.2022.09.003
- 26. Hoffman AL, Kemanian AR, Forest CE. The response of maize, sorghum, and soybean yield to growing-phase climate revealed with machine learning. Environmental Research Letters. 2020; 15(9), 094013. https://doi.org/10.1088/1748-9326/ab7b22
- 27. Harsányi E, Bashir B, Arshad S, Ocwa A, Vad A, et al., Data mining and machine learning algorithms for optimizing maize yield forecasting in central Europe. Agronomy. 2023; 13(5), 1297. https://doi.org/10.3390/agronomy13051297
- 28. Ropelewska E, Nazari L. The effect of drought stress of sorghum grains on the textural features evaluated using machine learning. European Food Research and Technology. 2021; 247(11), 2787–2798. https://doi.org/10.1007/s00217-021-03832-9
- 29. Habyarimana E, Baloch FS. Machine learning models based on remote and proximal sensing as potential methods for in-season biomass yields prediction in commercial sorghum fields. Plos one. 2021; 16(3), e0249136. pmid:33765103
- 30. Oddy VH, Robards GE. Low S.G. Prediction of in vivo dry matter digestibility from the fibre and nitrogen content of a feed. 1983; 395–398.
- 31. Sheaffer CC, Peterson MA, Mccalin M, Volene JJ. (eds.). Acide detergent fiber, neutral detergent fiber concentration and relative feed value. In North American Alfalfa Improvement Conference, Minneapolis;1995.
- 32.
Sutton CD. Classification and Regression Trees, Bagging and Boosting. Handbook of Statistics 24: Data Mining and Data Visualization, Ed. by Ed. by Rao C.R, Wegman E.J, Solka J.L, Elsevier B.V. 2005; 303–329. https://doi.org/10.1016/S0169-7161(04)24011-1
- 33. Yıldırım H. The Multicollinearity Effect on the Performance of Machine Learning Algorithms: Case Examples in Healthcare Modelling. Academic Platform Journal of Engineering and Smart Systems (APJESS). 2024; 12(3), 68–80. https://doi.org/10.21541/apjess.1371070
- 34.
Montgomery DC, Peck EA, Vining GG. Introduction to linear regression analysis. John Wiley & Sons; 2021.
- 35. Akhlyustin SB, Melnikov AV, Zhilin RA. Prediction of the Integrated Indicator of Quality of a New Object Under the Conditions of Multicollinearity of Reference Data. Bulletin of the South Ural State University. Ser. Mathematical Modelling, Programming & Computer Software (Bulletin SUSU MMCS). 2020; 13(4), 66–80. https://doi.org/10.14529/mmp200406
- 36. Moayedi H, Mosallanezhad M, Rashid ASA, Jusoh WAW, Muazu MA. A systematic review and meta-analysis of artificial neural network application in geotechnical engineering: theory and applications. Neural Computing and Applications. 2020; 32, 495–518. https://doi.org/10.1007/s00521-019-04109-9
- 37.
Amaral HLMD. Desenvolvimento de uma nova metodologia para previsão do consumo de energia elétrica de curto prazo utilizando redes neurais artificiais e decomposição de séries temporais. (Doctoral dissertation, Universidade de São Paulo); 2019. https://doi.org/10.11606/T.3.2020.tde-07022020-113308
- 38. Komarica J, Glavić D, Kaplanović S. Comparative Analysis of the Predictive Performance of an ANN and Logistic Regression for the Acceptability of Eco-Mobility Using the Belgrade Data Set. Data. 2024; 9(5), 73. https://doi.org/10.3390/data9050073
- 39. Breiman L. Random forests. Machine learning. 2001; 45, 5–32. https://doi.org/10.1023/A:1010933404324
- 40. Vijayakumar V, Case M, Shirinpour S, He B. Quantifying and characterizing tonic thermal pain across subjects from EEG data using random forest models. IEEE Transactions on Biomedical Engineering. 2017; 64(12), 2988–2996. pmid:28952933
- 41. Breiman L. Bagging predictors. Machine learning. 1996; 24, 123–140. https://doi.org/10.1007/BF00058655
- 42. Du H, Ke S, Zhang W, Qi D, Sun T. Rapid quantitative analysis of coal composition using laser-induced breakdown spectroscopy coupled with random forest algorithm. Analytical Sciences. 2024; 1–14. https://doi.org/10.1007/s44211-024-00610-x
- 43. Meng F, Shi Z, Song Y. The TPRF: A Novel Soft Sensing Method of Alumina–Silica Ratio in Red Mud Based on TPE and Random Forest Algorithm. Processes. 2024; 12(4), 663. https://doi.org/10.3390/pr12040663
- 44. Breiman L, Cutler A. Random Forests. 2005; Available online: http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#inter (accessed on 15 August 2024).
- 45. Yang SI, Brandeis TJ, Helmer EH, Oatham MP, Heartsill-Scalley T, Marcano-Vega H. Characterizing height-diameter relationships for Caribbean trees using mixed-effects random forest algorithm. Forest Ecology and Management. 2022; 524, 120507. https://doi.org/10.1016/j.foreco.2022.120507
- 46.
Wright MN, Ziegler A. ranger: A fast implementation of random forests for high dimensional data in C++ and R. arXiv preprint arXiv:1508.04409. 2015; https://doi.org/10.48550/arXiv.1508.04409
- 47. Friedman JH. Multivariate adaptive regression splines. The annals of statistics. 1991; 19(1), 1–67. https://doi.org/10.1214/aos/1176347963
- 48. Deconinck E, Coomans D, Vander Heyden Y. Exploration of linear modelling techniques and their combination with multivariate adaptive regression splines to predict gastro-intestinal absorption of drugs. Journal of pharmaceutical and biomedical analysis. 2007; 43(1), 119–130. pmid:16859855
- 49. Jalali-Heravi M. Asadollahi-Baboli M, Mani-Varnosfaderani A. Shuffling multivariate adaptive regression splines and adaptive neuro-fuzzy inference system as tools for QSAR study of SARS inhibitors. Journal of Pharmaceutical and Biomedical Analysis. 2009; 50(5), 853–860. pmid:19665859
- 50. Ju X, Chen VC, Rosenberger JM, Liu F. Fast knot optimization for multivariate adaptive regression splines using hill climbing methods. Expert systems with applications. 2021; 171, 114565. https://doi.org/10.1016/j.eswa.2021.114565
- 51.
Stevens JG. An investigation of multivariate adaptive regression splines for modeling and analysis of univariate and semi-multivariate time series systems. (Doctoral dissertation, Monterey, California. Naval Postgraduate School); 1991.
- 52. Kriner M. Survival Analysis with Multivariate Adaptive Regression Splines, Dissertation, LMU München, Faculty of Mathematics, Computer Science and Statistics; 2007.
- 53. Craven P, Wahba G. Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numerische mathematik. 1978; 31(4), 377–403. https://doi.org/10.1007/BF01404567
- 54.
Hastie T, Tibshirani R, Friedman JH, Friedman JH. 2009. The elements of statistical learning: data mining, inference, and prediction. 2009; (Vol. 2, 1–758). New York: Springer. https://doi.org/10.1007/978-0-387-21606-5
- 55.
Kornacki J, Ćwik J. Statistical learning systems (in Polish). WNT Warsaw. 2005; 16.
- 56. Yang H. The case for being automatic: introducing the automatic linear modeling (LINEAR) procedure in SPSS statistics. Multiple Linear Regression Viewpoints. 2013; 39(2), 27–37.
- 57. Genç S, Mendeş M. Evaluating performance and determining optimum sample size for regression tree and automatic linear modeling. Arquivo Brasileiro de Medicina Veterinária e Zootecnia. 2021; 73, 1391–1402. http://dx.doi.org/10.1590/1678-4162-12413
- 58. Mendeş M. Re-evaluating the Monte Carlo simulation results by using graphical techniques. Türkiye Klinikleri Biyoistatistik. 2021; 13(1), 28–38. https://doi.org/10.5336/biostatic.2020-78896
- 59. Bevilacqua M, Braglia M, Montanari R. The classification and regression tree approach to pump failure rate analysis. Reliability Engineering & System Safety. 2003; 79(1), 59–67. https://doi.org/10.1016/S0951-8320(02)00180-1
- 60. Larsen DR, Speckman PL. Multivariate regression trees for analysis of abundance data. Biometrics. 2004; 60(2), 543–549. pmid:15180683
- 61. Xu W, Tu J, Xu N, Liu Z. Predicting daily heating energy consumption in residential buildings through integration of random forest model and meta-heuristic algorithms. Energy. 2024; 301, 131726. https://doi.org/10.1016/j.energy.2024.131726
- 62. Khajavi H, Rastgoo A. Predicting the carbon dioxide emission caused by road transport using a Random Forest (RF) model combined by Meta-Heuristic Algorithms. Sustainable Cities and Society. 2023; 93, 104503. https://doi.org/10.1016/j.scs.2023.104503
- 63. Zhussupbekov M, Memon SA, Khawaja SA, Nazir K, Kim J. Forecasting energy demand of PCM integrated residential buildings: A machine learning approach. Journal of Building Engineering. 2023; 70, 106335. https://doi.org/10.1016/j.jobe.2023.106335
- 64. Sadaghat B, Ebrahimi SA, Souri O, Niar MY, Akbarzadeh MR. Evaluating strength properties of Eco-friendly Seashell-Containing Concrete: Comparative analysis of hybrid and ensemble boosting methods based on environmental effects of seashell usage. Engineering Applications of Artificial Intelligence. 2024; 133, 108388. https://doi.org/10.1016/j.engappai.2024.108388
- 65. Willmott CJ, Matsuura K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate research. 2005; 30(1), 79–82. https://doi.org/10.3354/cr030079
- 66. Liddle AR. Information criteria for astrophysical model selection. Monthly Notices of the Royal Astronomical Society: Letters. 2007; 377(1), L74–L78. https://doi.org/10.1111/j.1745-3933.2007.00306.x
- 67. Takma C; Atil H, Aksakal V. Comparison of multiple linear regression and artificial neural network models goodness of fit to lactation milk yields. Kafkas Üniversitesi Veteriner Fakültesi Dergisi. 2012; 18:941–944. https://doi.org/10.9775/kvfd.2012.6764
- 68. Chen C, Twycross J, Garibaldi JM. A new accuracy measure based on bounded relative error for time series forecasting. PloS one. 2017; 12(3), e0174202. pmid:28339480
- 69. R Core Team. R: A Language and environment for statistical computing. (Version 4.1) [Computer software]. 2021; Retrieved from https://cran.r-project.org. (R packages retrieved from MRAN snapshot 2022-01-01).
- 70. Khalilian ME, Habibi D, Golzardi F, Aghayari F, Khazaei A. Effect of maturity stage on yield, morphological characteristics, and feed value of sorghum [Sorghum bicolor (L.) Moench] cultivars. Cereal Research Communications. 2022; 50(4), 1095–1104. https://doi.org/10.1007/s42976-022-00244-7
- 71. Shahhosseini M, Martinez-Feria RA, Hu G, Archontoulis SV. Maize yield and nitrate loss prediction with machine learning algorithms. Environmental Research Letters. 2019; 14(12), 124026. http://doi.org/10.1088/1748-9326/ab5268
- 72. Meng L, Liu HL, Ustin S, Zhang X. Predicting maize yield at the plot scale of different fertilizer systems by multi-source data and machine learning methods. Remote Sensing. 2021; 13(18), 3760. https://doi.org/10.3390/rs13183760
- 73. Çatal MI, Çelik S¸ Bakoğlu A. Investigation of factors affecting fresh herbage yield in pea (Pisum arvense L.) using data mining algorithms. Front. Plant Sci. 2024; 15, 1482723. pmid:39634062
- 74. Çelik Ş, Tutar H, Gönülal E, Er H. Prediction of fresh herbage yield using data mining techniques with limited plant quality parameters. Scientific Reports. 2024; 14, 21396. pmid:39271726
- 75. Çelik Ş, Çaçan E, Yaryab S. Prediction of Stem Weight in Selected Alfalfa Varieties by Artificial Neural Networks, Multivariate Adaptive Regression Splines and Multiple Regression Analysis. Journal of Animal & Plant Sciences. 2023; 33(4), 1006–1020. https://doi.org/10.36899/JAPS.2023.4.0694
- 76. Çelik Ş, Yılmaz O. Investigation of the Relationships between Coat Colour, Sex, and Morphological Characteristics in Donkeys Using Data Mining Algorithms. Animals. 2023; 13(14), 2366. pmid:37508143
- 77. Tyasi TL, Çelik Ş. Investigation of Egg Quality Characteristics Affecting Egg Weight of Lohmann Brown Hen with Data Mining Methods. Poultry Science Journal. 2024; 12(1), 107–117. https://doi.org/10.22069/psj.2024.21337.1934