Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness | PLOS One

Advertisement

Browse Subject Areas

?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1 — Fig 1.

a) Location of the study region in the eastern Joseph Bonaparte Gulf, northern Australian marine margin overlaid with bathymetry; b) location of the four study areas (A, B, C, and D) in the study region and seabed hardness types (hard, hard-soft, soft-hard and soft) based on hard90 overlaid with bathymetry at video transect; and c) the geomorphic features of the four study areas.

More »

Table 1 — Table 1.

Predictive variables and their corresponding number.

More »

Table 2 — Table 2.

Spearman correlation coefficients (ρ) among 20 predictive variables and seabed hardness (i.e. hard total) (n = 140).

More »

Table 3 — Table 3.

A brief summary of RF modelling process for hard90 data using various FS methods and predictive variables.
1) models 1–25 based on the VI using 20 variables; 2) models 26–29 based on the AVI using 20 variables; 3) models 30–31 based on KIAVI using 20 variables; 4) models 32–43 based on the AVI using 41 variables; and 5) models 44–45 based on the Boruta and model 46 based on the RRF using 41 variables. Model.fit is the predictive accuracy (ccr) of training samples by each RF model developed. The corresponding predictor for each number is listed in Table 1.

More »

Fig 2 — Fig 2.

Correct classification rate (%) and kappa (mean: black line; minimum and maximum: dash red lines) of 43 RF models with different predictor sets based on the averages over 100 iterations of 10-fold cross validation for seabed hardness based on hard90 data; and the model with the maximum mean ccr and mean kappa (circle).
a) models 1–25 based on the VI using 20 predictive variables; b) models 26–29 based on the AVI and models 30–31 based on KIAVI using 20 variables; c) models 32–43 based on the AVI using 41 variables.

More »

Table 4 — Table 4.

Confusion matrix between the observed and predicted values of four hardness classes based on the average of 100 times of 10-fold cross validation using the most accurate predictive model (i.e., model 40) for hard90.

More »

Table 5 — Table 5.

Confusion matrix between the observed and predicted values of two hardness classes based on the average of 100 times of 10-fold cross validation using the most accurate predictive model (i.e., model 40) for hard90.

More »

Fig 3 — Fig 3.

Correct classification rate (%) and kappa (mean: black line; minimum and maximum: dash red lines) of 49 RF models with different predictor sets based on the averages over 100 iterations of 10-fold cross validation for seabed hardness based on hard70 data; and the model with the maximum mean ccr and mean kappa (circle).
a) models 1–25 based on the AVI using 20 predictive variables; b) models 26–38 based on the AVI using 41 variables; c) models 39–49 based on KIAVI using 41 variables.

More »

Table 6 — Table 6.

A brief summary of RF modelling process for hard70 data using various FS methods and predictive variables.
1) models 1–25 based on the AVI using 20 variables; 2) models 26–38 based on the AVI using 41 variables; 3) models 39–49 based on KIAVI using 41 variables; and 4) models 50–52 based on the Boruta with the maximal number of importance source runs of 2000, 100 and 5000, and model 53 based on the RRF using 41 variables. The model fit is the predictive accuracy (ccr) of training samples by each RF model developed. The corresponding predictor for each number is listed in Table 1.

More »

Table 7 — Table 7.

Confusion matrix between the observed and predicted values of four hardness classes based on the average of 100 times of 10-fold cross validation using the most accurate predictive model (i.e., model 50) for hard70.

More »

Table 8 — Table 8.

Confusion matrix between the observed and predicted values of two hardness classes based on the average of 100 times of 10-fold cross validation using the most accurate predictive model (i.e., model 50) for hard70.

More »

Fig 4 — Fig 4.

Correct classification rate (%) (a) and kappa (b) of the most accurate models based on the averages over 100 iterations of 10-fold cross validation for hard90 and hard70 data.

More »

Table 9 — Table 9.

Comparison of the accuracy of full models (i.e. model 43 for hard90, and models 26 for hard70) with the most accurate models based various FS methods.
The differences between these comparisons based on the Mann-Whitney tests (n = 100 for each model).

More »

Table 10 — Table 10.

Comparison of the accuracy of the most accurate models (i.e. model 40 for hard90 and model 50 for hard70) with the most accurate models based various FS techniques, and also model 40 with model 50.
The differences between these comparisons based on the Mann-Whitney tests (n = 100 for each model).

More »

Table 11 — Table 11.

Confusion matrix between predictions for individual classes based on hard90 and hard70 data for all study areas and for a portion of area A (A1).

More »

Fig 5 — Fig 5.

Spatial predictions of seabed hardness for a section of area A (A1): a) hard90, b) hard70, c) hardness with two classes, and d) geomorphic features.

More »