Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Table 1.

Accession numbers of HVR I sequences retrieved from GenBank database.

More »

Table 1 Expand

Fig 1.

Framework for machine learning.

Data is collected from database and undergo data pre-processing techniques such as one-hot encoding to transform and enhance the quality of the data. The resulting data is split into a training and testing set. The training set is used to train the ML model while the testing set is used to evaluate the model and make predictions.

More »

Fig 1 Expand

Table 2.

Accession numbers of HVR I sequences retrieved from GenBank for dataset 2.

More »

Table 2 Expand

Table 3.

AMOVA showing genetic variation.

More »

Table 3 Expand

Table 4.

Pairwise fixation index (FST) values of population differentiation due to genetic structure and p-values.

More »

Table 4 Expand

Table 5.

Summary of the diversity and neutrality indices calculated for population groups.

More »

Table 5 Expand

Table 6.

Comparison of 5-fold CV accuracy measures on the dataset.

More »

Table 6 Expand

Table 7.

Confusion matrix table of the PCA-SVM test performed on the dataset without PCA and 5-fold CV.

More »

Table 7 Expand

Table 8.

5-fold CV accuracy measures on dataset 2.

More »

Table 8 Expand

Table 9.

Comparison of machine learning algorithms model with one hot encoder, BoW and without PCA.

More »

Table 9 Expand

Table 10.

Comparison of machine learning algorithms model with one hot encoder, BoW and PCA.

More »

Table 10 Expand

Fig 2.

Confusion matrix results generated with one hot encoding, BoW and PCA on the dataset.

Numbers 0, 1 and 2 on the X and Y axis represent the African, Asian and Caucasian race groups, respectively. The values in the matrix denote the number of correct and incorrect predictions made by classifiers: (a) Support vector machine, (b) Linear discriminant analysis, (c) Quadratic discriminant analysis and (d) Random forest.

More »

Fig 2 Expand

Fig 3.

Confusion matrix results generated with one hot encoding, BoW and without PCA on the dataset.

Numbers 0, 1 and 2 on the X and Y axis represent the African, Asian and Caucasian race groups, respectively. The values in the matrix denote the number of correct and incorrect predictions made by classifiers: (a) Support vector machines, (b) Linear discriminant analysis, (c) Quadratic discriminant analysis and (d) Random forest.

More »

Fig 3 Expand

Table 11.

Machine learning algorithms with one hot encoding and PCA using Python.

More »

Table 11 Expand

Table 12.

Machine learning algorithms with one hot encoding, BoW and without PCA using Python.

More »

Table 12 Expand

Table 13.

Comparison of machine learning algorithms model with one hot encoder, BoW and without PCA dataset 2.

More »

Table 13 Expand

Table 14.

Comparison of machine learning algorithms model with one hot encoder, BoW and PCA on dataset 2.

More »

Table 14 Expand

Table 15.

Accuracy measures on a new independent dataset without PCA.

More »

Table 15 Expand

Table 16.

Accuracy measures on a new independent dataset with PCA.

More »

Table 16 Expand

Table 17.

Race group classification accuracy (%) results from Python and WEKA.

More »

Table 17 Expand