Table 1.
Target variables.
Fig 1.
Distribution of examples among classes of Question 25d in the US-FD1W-25(d) dataset after KNN imputation method is applied.
Fig 2.
Random tree accuracy scores for multiple numbers of trees with a maximum depth of 20 on the US-FD1W-25(d) dataset before and after SMOTE is applied.
Table 2.
The maximum, minimum and the average of accuracy scores of Random Forests where the number of trees is tuned between 2 and 500 before and after SMOTE is applied.
Fig 3.
SHAP values of a Random Forest with 364 trees with maximum depth of 20 before SMOTE is applied.
Class 0: Never, Class 1: Rarely, Class 2: Sometimes, Class 3: Often, Class 4: Always.
Fig 4.
SHAP values of a Random Forest with 40 trees with maximum depth of 20 after SMOTE is applied.
Class 0: Never, Class 1: Rarely, Class 2: Sometimes, Class 3: Often, Class 4: Always.
Fig 5.
Hyperparameter tuning: Random Forest accuracy scores for multiple numbers of trees on the US-FD1W-25(d) dataset after KNN imputation method and SMOTE are applied.
Fig 6.
Hyperparameter tuning: XGBoost accuracy scores for multiple numbers of trees on the US-FD1W-25(d) dataset after KNN imputation method and SMOTE are applied.
Fig 7.
SHAP values of a XGBoost with 178 trees with maximum depth of 8 after SMOTE is applied.
Class 0: Never, Class 1: Rarely, Class 2: Sometimes, Class 3: Often, Class 4: Always.
Fig 8.
CatBoost accuracy scores for multiple numbers of trees with a maximum depth of 8 on the US-FD1W-25(d) dataset before and after SMOTE is applied.
Fig 9.
LightGBM accuracy scores for multiple numbers of trees with a maximum depth of 20 on the US-FD1W-25(d) dataset before and after SMOTE is applied.
Fig 10.
Accuracy scores of Multinomial Logistic Regression, SVM, and Neural Network with different number of hidden layers, Random Forest, XGBoost, CatBoost and LightGBM before and after SMOTE is applied for the multiclass classification problem with 5 classes.
Table 3.
Precision, recall and f1-score of a Random Forest with 40 trees of a maximum depth equals 20 after SMOTE is applied on the training set.
Table 4.
The way three classes are constructed.
Fig 11.
Accuracy scores of multinomial logistic regression, SVM, and neural network with different number of hidden layers, Random Forest, XGBoost, CatBoost and LightGBM before and after SMOTE is applied on the training set for the multiclass classification problem with 3 classes.
Fig 12.
SHAP values of a XGBoost with 38 trees with maximum depth of 8 after SMOTE is applied on the training set.
Class 0: Never, Rarely, Class 1: Sometimes, Class 2: Often, Always.
Table 5.
The way two classes are constructed.
Fig 13.
SHAP values of a Random Forest with 9 trees with maximum depth of 10 before SMOTE is applied on the training set.
Class 0: Never, Rarely, Class 2: Sometimes, Often, Always.
Fig 14.
SHAP values of a Random Forest with 9 trees with maximum depth of 10 after SMOTE is applied on the training set.
Class 0: Never, Rarely, Class 2: Sometimes, Often, Always.
Fig 15.
SHAP values for the impact of features on model output.
Table 6.
Precision, recall and f1-score of a Random Forest with 9 trees of a maximum depth equals 10 after SMOTE is applied on the training set.