Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

Schematic representation of the study.

More »

Fig 1 Expand

Fig 2.

Machine Learning Model Architectures and Workflow for Genome Classification.

(A) k-means clustering algorithm workflow (B) Schematic architecture of the random forest classification model (C) Architecture of the 1-D convolutional neural network for whole-genome sequence classification. (D) Layer-wise structural summary of the 1-D CNN model configuration.

More »

Fig 2 Expand

Fig 3.

K-mer–Based Feature Analysis and Clustering of Delta Variant Genomes Pre- and Post-Omicron Emergence.

(A) Comprehensive set of all possible 3-mer sequence combinations used for feature extraction. (B) Global frequency distribution of k-mers across the entire sequence dataset. (C) Correlation heatmap depicting interrelationships among k-mer frequencies. (D) GC content distribution in Delta variant genomes prior to Omicron emergence. (E) GC content distribution in Delta variant genomes following Omicron emergence. (F) Two-dimensional t-SNE projection illustrating clustering patterns of DNA sequences. (G) Three-dimensional t-SNE visualization demonstrating class separation in sequence data. (H) K-means clustering of Delta variant sequences before Omicron emergence. (I) Cluster structure revealed by K-means partitioning of pre-Omicron Delta genomes. (J) Random forest–derived feature importance scores highlighting key predictive k-mers.

More »

Fig 3 Expand

Fig 4.

Model Interpretation, Motif Discovery, and Classification Performance Analysis.

(A) Comparative k-mer importance profiles derived from SHAP attribution and random forest metrics. (B) Genome-wide positional SHAP attribution landscape showing nucleotide-level contributions to model predictions. (C) Most prevalent sequence motif in pre-Omicron Delta genomes with frequency counts. (D) Most prevalent sequence motif in post-Omicron Delta genomes with frequency counts. (E) Dataset-wide discriminative motif patterns learned by the 1-D CNN model. (F) Receiver operating characteristic curve of the random forest model illustrating threshold-dependent classification performance. (G) Receiver operating characteristic curve of the 1-D CNN model depicting sensitivity–specificity trade-off.

More »

Fig 4 Expand

Table 1.

Comparison of the random forest model and the 1-D CNN model in classifying the genome sequences.

More »

Table 1 Expand