A hybrid feature extraction framework combining PCA and mutual information for gene expression based lung cancer classification

doi:10.1371/journal.pone.0342160

Fig 1.

Illustration of different types of lung cancer, highlighting the affected lung areas and unique cellular patterns for each type.

More »

Expand

Fig 2.

A Seven-Stage Framework for Gene Expression-based Lung Cancer Classification: From Data Collection to Model Deployment with Integrated Feature Engineering and PCA-MI Feature Selection.

More »

Expand

Fig 3.

A block diagram of the CNN architecture shows the flow from the input layer through convolutional pooling and fully connected layers to the output layer to classify the samples as either lung cancer or not.

More »

Expand

More »

Expand

More »

Expand

Fig 4.

Performance Evaluation of CNN Model, A) The confusion matrix highlights high classification accuracy, with minimal misclassifications among the Adenocarcinoma (A), Normal (N), and Squamous (S) classes.

B) The ROC curve shows an overall AUC of 0.99, indicating excellent discriminatory power across all classes. C, D) Training and validation accuracy and loss curves demonstrate consistent learning with minimal overfitting, achieving strong generalization and low error.

More »

Expand

Table 1.

The Accuracy, Precision, Recall, and F1-Score for various feature extraction methods. The proposed method achieves the highest accuracy across metrics.

More »

Expand

Fig 5.

This chart shows the accuracy of various feature extraction methods.

More »

Expand

Fig 6.

Showcases the significance of the top genes and the importance score identified by the hybrid approach, validating its effectiveness for biomarker discovery.

More »

Expand

Fig 7.

The PPI network visualises interactions among significant genes identified through the PCA-MI framework, with nodes representing proteins and edges denoting interactions, constructed using STRING with a confidence score ≥0.7.

More »

Expand

Fig 8.

Hub Gene Analysis: A) Top 20 hub genes ranked by degree centrality are highlighted, indicating their pivotal roles in the network.

B) chart showing degree centrality scores of the hub genes, reflecting their connectivity and influence.

More »

Expand