Machine learning of factors for improving oyster hatchery production

doi:10.1371/journal.pone.0345084

Table 1.

Summary of the response variable (yield), original predictors, and winter averages.

More »

Expand

Table 2.

Groups of variables for aggregating SHAP values.

More »

Expand

Fig 1.

Model performance.

Root mean square error (RMSE) from cross-validation evaluating alternative model options. Here, acronyms are - GAM_PCA_All: Generalized Additive Model (GAM) fitted using all derived principal components, GAM_PCA_Stepwise: GAM fitted using principal components with stepwise model selection, NN_AllVars: Neural Network using all predictor variables, RF_AllVars: Random Forest using all predictor variables, RF_Boruta: Random Forest with feature selection using the Boruta algorithm, RF_BorutaRec: Random Forest with recursive Boruta feature selection, RF_Inter: Random Forest with interaction-based feature selection, and RF_InterRec: Random Forest with recursive interaction-based feature selection. Each boxplot represents the predictive performance of the respective model on the test set.

More »

Expand

Fig 2.

Feature importance.

Shapley-based feature importance scores for the models estimated on the whole dataset (higher values correspond to more important variables): (a) Random forest, and (b) Neural network. The horizontal axis shows average absolute impact of each predictor on the model output (hatchery yield, %). Only the most important predictors are shown.

More »

Expand

Fig 3.

Contributions of predictors to well-predicted low-yield cases.

Individual predictor contributions to three of the cases are shown for the random forest model (top row) and the neural network model (bottom row). The baseline represents the average predicted value across all dataset cases, while is the predicted value for the specific case. Arrows indicate the largest impacts of individual and grouped predictors, with less influential predictors aggregated.

More »

Expand

Fig 4.

Contributions of predictors to well-predicted high-yield cases.

Individual predictor contributions to three of the cases are shown for the random forest model (top row) and the neural network model (bottom row). The baseline represents the average predicted value across all dataset cases, while is the predicted value for the specific case. Arrows indicate the largest impacts of individual and grouped predictors, with less influential predictors aggregated.

More »

Expand