Automatic detection and classification of lung cancer CT scans based on deep learning and ebola optimization search algorithm

Recently, research has shown an increased spread of non-communicable diseases such as cancer. Lung cancer diagnosis and detection has become one of the biggest obstacles in recent years. Early lung cancer diagnosis and detection would reliably promote safety and the survival of many lives globally. The precise classification of lung cancer using medical images will help physicians select suitable therapy to reduce cancer mortality. Much work has been carried out in lung cancer detection using CNN. However, lung cancer prediction still becomes difficult due to the multifaceted designs in the CT scan. Moreover, CNN models have challenges that affect their performance, including choosing the optimal architecture, selecting suitable model parameters, and picking the best values for weights and biases. To address the problem of selecting optimal weight and bias combination required for classification of lung cancer in CT images, this study proposes a hybrid metaheuristic and CNN algorithm. We first designed a CNN architecture and then computed the solution vector of the model. The resulting solution vector was passed to the Ebola optimization search algorithm (EOSA) to select the best combination of weights and bias to train the CNN model to handle the classification problem. After thoroughly training the EOSA-CNN hybrid model, we obtained the optimal configuration, which yielded good performance. Experimentation with the publicly accessible Iraq-Oncology Teaching Hospital / National Center for Cancer Diseases (IQ-OTH/NCCD) lung cancer dataset showed that the EOSA metaheuristic algorithm yielded a classification accuracy of 0.9321. Similarly, the performance comparisons of EOSA-CNN with other methods, namely, GA-CNN, LCBO-CNN, MVO-CNN, SBO-CNN, WOA-CNN, and the classical CNN, were also computed and presented. The result showed that EOSA-CNN achieved a specificity of 0.7941, 0.97951, 0.9328, and sensitivity of 0.9038, 0.13333, and 0.9071 for normal, benign, and malignant cases, respectively. This confirms that the hybrid algorithm provides a good solution for the classification of lung cancer.


Introduction
Cancer is a severe public health issue that is becoming more prevalent worldwide. It is a disease in which cells of particular tissues undergo uncontrolled division, leading to malignant or tumor growth in the body [1]. In 2020, the GLOBOCAN estimated 19.3 million new cases of cancer and approximately 10 million cancer deaths globally [2,3]. Lung cancer is the most commonly diagnosed cancer and the leading cause of death of men and women globally. Globally 2.2 million new lung cancer cases are diagnosed annually, which leads to close to 1.8 million deaths [2,4]. There are several common signs and symptoms of lung cancer, including hemoptysis (coughing up blood), weight loss, and weariness. Moreover, various risk factors are associated with lung cancer, including smoking, alcohol, air quality, and food [5]. Lung cancer can be divided into two categories based on the histology of the cancer cells: small-cell lung cancer (SCLC) and non-small lung cancer (NSCLC) [1]. The NSCLC is considered the most common type of lung cancer, accounting for 85% compared to the SCLC, representing 5% of all patients [1]. Lung cancer has significantly increased in developing countries over the past two decades, including Sub-Saharan Africa, where HIV/ AIDS is also overwhelming [6]. The overall 5-year survival rate for all kinds of lung cancer is lower than 18% when compared to other cancers, such as prostate cancer (99%), colorectal cancer (65%), and breast cancer (90%) [1]. However, lung cancer demands greater attention from the medical, biological, and scientific fields to find innovative solutions to promote early diagnosis, which helps in medical decisions, and evaluates responses to improve health care. An enormous amount of computed tomography (CT) scan image data for the lungs could help detect lung cancer. Machine learning and deep learning algorithms can utilize these images to enhance cancer prediction and diagnosis as early as possible and find the best treatment strategies [7].
Deep Learning (DL) methods have enabled machines to analyze high-dimensional data such as images, multidimensional anatomical images, and videos [8,9]. The convolutional neural network (CNN) and recurrent neural network (RNN) are popular DL models which are often applied to image and sequential data classification [10][11][12]. The CNN architectures are usually composed of blocks of convolutional layers and pooling operations combined with fully connected layers and a classification layer. The training process on CNN aims to tune the layers' weights composing the architectures. This process is considered an NP-hard problem due to its susceptibility to multiple local optima requiring optimization techniques to break out of such local optima. To speed up training time and improve performance, CNNs are trained to optimize algorithms, such as stochastic gradient descent (SGD), Nesterov accelerated gradient, Adagrad, AdaDelta, and Adam, which are used to change the weights and learning rates that minimize the losses.
Building CNN architecture requires a skilful combination of hyperparameters for improved classification performance and accuracy. Approaching combinatorial problems using manual methods is daunting and reduces efficiency. However, metaheuristic algorithms have been proposed to optimize the process to obtain the best combination of hyperparameters required for improved performance. Metaheuristic algorithms are nature-inspired optimization solutions designed to help find suitable optimization constructs characterized by local search, global search and sometimes randomization and have high performances. They often require low computing capacity, which has successfully solved complex real-life problems in engineering, medical sciences, and sciences, especially in swarm intelligence algorithms [13,14]. Considering the composition of CNN and the complexity of the hyperparameter, which requires several iterations and computational time for training its optimizers [15], the use of metaheuristic algorithms have been endorsed due to their ability to find suitable optimization constructs for overcoming limitations associated with CNN [16].
Several algorithms have been applied to medical image classification problems using CNN for feature extraction. Priyadharshini and Zoraida [32] developed Bat-inspired Metaheuristic Convolutional Neural Network Algorithms for CAD-based Lung Cancer Forecast. Li et al. [33] used metaheuristic techniques to optimize the rebalancing of the imbalanced class of feature selection method for dimension reduction in clinical X-ray image datasets. Abdullah et al. [34] applied the meta-heuristic optimization algorithm using lung images. Lu et al. [35] proposed a new convolutional neural network for the optimal detection of lung cancer. They used a marine predator metaheuristic method to improve network accuracy and optimal design. Asuntha and Srinivasan [36] presented novel deep learning methods to detect malignant lung nodules using the Fuzzy Particle Swarm Optimization (FPSO) technique to select the optimal feature after extracting texture, geometric, volumetric, and intensity information. Das et al. [37] developed a method for detecting malignant tumors by classification called Velocity-Enhanced Whale Optimization Algorithm and Artificial Neural Network to classify cancer datasets (breast, cervical, and lung cancer).
Although several studies have reported various designs of CNN algorithms developed for medical images and lung cancer prediction, there still exists some challenges due to the multifaceted designs in the CT scan. In addition, previous studies show different artichecture of the CNN model that has been used in various domains such as fabric wrinkle images [38][39][40][41]. Moreover, DL models have issues affecting their performance, including choosing the feature representation, optimal architecture, suitable model parameters, and picking the best values for weights and bias [42]. Therefore, to solve these issues of finding a precise prediction model and to advance the state-of-the-art use of CNN for the classification of lung cancer, we used metaheuristic methods to optimize the CNN model. Thus, this study proposes utilizing a metaheuristic named Ebola Optimization Search Algorithm (EOSA) [24].
The EOSA algorithm has shown promising results in various optimization problems, including feature selection and parameter optimization in different domains, such as healthcare, finance, and engineering. Furthermore, the EOSA algorithm has unique features, such as population-based search, adaptive learning, and self-learning abilities, which may have also contributed to its selection as a metaheuristic optimization method. Therefore, we selected EOSA as a metaheuristic optimization method based on its previous success in similar optimization tasks and its unique features that may provide advantages over other optimization methods. The reason for hybridizing CNN with a metaheuristic algorithm is to enhance the performance of the CNN in terms of accuracy, speed, and generalization. Metaheuristic algorithms are optimization techniques that use iterative procedures to search for the best solution in a large search space. By integrating a metaheuristic algorithm with a CNN, the model can better optimize its parameters and improve its ability to learn and classify complex patterns in the data. This can improve performance in detecting diseases like lung cancer, improving diagnostic accuracy and timely treatment. Moreover, good image preprocessing techniques, such as wavelet decomposition, will be used to enhance image resolution. As a result, this study aims to combine the EOSA-CNN algorithm with some selected image preprocessing techniques to improve the classification accuracy of the deep learning model on lung cancer CT images. The metaheuristic algorithm is applied to obtain the best combination of weights required to learn the feature extraction and classification problem.
The main objective of this article is to create an optimized deep-learning model using a metaheuristic algorithm to detect lung cancer. This model could greatly assist physicians in detecting the disease early and making informed decisions to provide suitable treatment. The following are the technical contributions of the study: (1) applied a combined wavelet decomposition and erosion, among other image preprocessing techniques, to prepare the input samples; (2) proposed a hybrid EOSA-CNN algorithm for feature extraction and classification process on the preprocessed images; and (3) evaluated and compared the hybrid algorithm with other algorithms such as GA-CNN, WOA-CNN, MVO-CN, SBO-CNN, and LCBO-CNN.
The remaining sections of the paper are organized as follows: Section 2 presents related studies on using CNN to classify lung images. In section 3, we discuss the methodology applied in this study. Section 4 presents the configuration for the experimental setup and the datasets used. Section 5 presents the results obtained and discussions on findings, while the study's concluding remarks and future research directions are presented in Section 6.

Related works
This section reviews the application of deep learning and metaheuristics algorithms in detecting and classifying cancer cases in medical images.
Song et al. [43] developed three types of deep neural networks (CNN, DNN, and SAE) for lung cancer classification. These networks were applied to the CT image classification task with modest modifications for benign and malignant lung nodules. The CNN network showed an accuracy of 84.15%, a sensitivity of 83.96%, and a specificity of 84.32%. Bhatia et al. [44] proposed a method for detecting lung cancer from CT data using deep residual learning, which extracted features with UNet and ResNet models. The feature set was fed through multiple classifiers, including XGBoost and Random Forest, and the individual predictions were ensemble to obtain an accuracy of 84%. El-Regaily et al. [45] presented a survey of computeraided detection systems (CAD) for lung cancer in computed tomography. They compared the current classification methods and argued that most existing algorithms could not diagnose certain forms of nodules, such as GGN. Kriegsmann et al. [46] trained and refined a CNN model to consistently classify the three most frequent lung cancer subtypes. Alrahhal and Alqhtani [47] presented ALCD, which stands for Adoptive Lung Cancer Detection, and is based on Convolutional Neural Networks (CNN). The ALCD system performed an excellent preprocessing step, and features were extracted using Scale Invariant Feature Transform, which was input into the CNN (SIFT) to perform well.
Bhandary et al. [48] provided a Deep-Learning (DL) framework for investigating lung pneumonia and cancer, which consisted of AlexNet (MAN), AlexNet, VGG16, VGG19, and ResNet50. The categorization in the MAN was done with a Support Vector Machine (SVM) and compared to Softmax. The DL framework provided an accuracy of 97.27%. Zheng et al. [49] proposed a combination of radiology analysis and malignancy evaluation network (R2MNet) to evaluate pulmonary nodule malignancy by radiology features analysis. In addition, they proposed channel-dependent activation mapping (CDAM) to visualize characteristics and shed light on the decision process of deep neural networks for model explanations (DNN) that obtained an area under the curve (AUC) of 97.52% on nodal radiology analysis. Cengil & Cinar [50] presented a classification algorithm of lung nodules using CT images of SPIE-AAPM-LungX data and a 3D CNN architecture for classification. Coudray et al. [51] trained a deep convolutional neural network (inception v3) on whole-slide images, and they yielded an average area under the curve (AUC) of 0.97. They also used the network to predict the ten most often transformed genes in LUAD. Six of them-STK11, EGFR, FAT1, SETBP1, KRAS, and TP53 were discovered to be predicted from pathology images on a heldout population, and AUCs ranged from 0.733 to 0.856. Chon et al. [52] established a CAD system for lung cancer classification of CT scans with unmarked nodules. Their initial strategy was to send segmented CT scans straight into 3D CNNs for classification, which proved insufficient.
Priyadharshini and Zoraida [32] developed Bat-inspired Metaheuristic Convolutional Neural Network Algorithms for CAD-based Lung Cancer Forecast. The Discrete Wavelet Transform (DWT) that decomposed the image as input was able to decompose the image into a set sub-band, one of which was the Low (LL) band. They used CNN to train the lung cancer data to obtain an accuracy of 97.43%. Li et al. [33] used metaheuristic techniques to optimize the rebalancing of the imbalanced class distributed to apply it in the feature selection method for dimension reduction in clinical X-ray image datasets. Using the self-adaptive Bat algorithm, feature selection with Random-SMOTE (RSMOTE) achieved 94.6% classification accuracy with 0.883 Kappa. Abdullah et al. [34] applied the meta-heuristic optimization algorithm using lung images so that features obtained were trained using convolution layers. The system's efficiency was assessed using the F1 score value, which indicated that the system ensured a 98.9% ELT-COPD and a 98.9% NIH clinical dataset. Lu et al. [35] proposed a new convolutional neural network for the optimal detection of lung cancer using a metaheuristic method named marine predators. The proposed MPA-based approach showed 93.4% accuracy, 98.4% sensitivity, and 97.1% specificity. Asuntha and Srinivasan [36] presented a novel deep-learning method to detect malignant lung nodules and distinguish the position of the tumorous lung nodules. They used a Histogram of Oriented Gradients (HOG), wavelet transform-based features, Local Binary patterns (LBP), Scale Invariant Feature Transform (SIFT), and Zernike Moment. The Fuzzy Particle Swarm Optimization (FPSO) technique selected the optimal feature after extracting texture, geometric, volumetric, and intensity information. Das et al. [37] developed a Velocity-Enhanced Whale Optimization Algorithm, combined with an Artificial Neural Network, to classify and diagnose lung cancer. The approach is compared to C4.5, Learning Vector Quantization, Linear Discriminate Analysis, and Factorized Distribution Algorithm, giving a classification accuracy of 84%.
Senthil Kumar et al. [53] investigated and implemented new evolutionary algorithms to detect tumors and overcome the challenges related to medical image segmentation. Five evolutionary techniques were used, including k-means clustering, k-median clustering, particle swarm optimization, inertia-weighted particle swarm optimization, and guaranteed convergence particle swarm optimization (GCPSO). The GCPSO was found to have the greatest accuracy of 95.89%. Shan and Rezaei [54] designed a feature selection based on an innovative optimization method called Improved Thermal Exchange Optimization (ITEO), which aims to enhance the system's efficiency and stability. Kapur entropy maximization and mathematical morphology were used to segment lung areas. The 19 GLCM features were collected from the segmented images for the final evaluations. ITEO used an efficient artificial neural network, and the results revealed that the proposed method attained 92.27% accuracy. Hans and Kaur [55] proposed a study that presented some of the most recent techniques. The researchers attempted to solve the lung cancer image classification challenge by utilizing some of the most recent optimization techniques. Wang et al. [56] developed a new residual neural network to determine the pathological kind of lung cancer from CT scans. They investigated a medicalto-medical transfer learning technique due to the scarcity of CT images in practice with an accuracy of 85.71%. In [57] the authors suggested a new feature selection strategy that used deep learning and integrated the Bhattacharya coefficient and genetic algorithm (GA) to pick features. Oyelade & Ezugwu [58] proposed a novel Ebola optimization search algorithm (EOSA) based on the Ebola virus and its related disease propagation model. The results showed that the proposed algorithm performed comparably to other state-of-the-art optimization approaches based on scalability, convergence, and sensitivity analyses.
Harun Bingol [59] proposed a hybrid-based deep learning model for classifying Otitis Media with Effusion (OME) based on eardrum otoendoscopic images. The proposed model combined Neighborhood Component Analysis (NCA) and the Gaussian method to extract and select features. Experimental results on a dataset comprising 910 images indicated that the proposed model achieved a high accuracy of 94.8%. Harun Bingol [59] presented a novel approach for classifying cervical cancer on Gauss-enhanced Pap-smear images using a hybrid CNN model. The performance of the proposed model was tested on a dataset comprising 1000 images, and it was found to achieve an accuracy of 93.6%, which is better than that of various other existing methods.
Therefore, considering the achievements of applying the hybrid model of CNN and optimization algorithm as reported in the studies reviewed in this section, this study aims to advance the state-of-the-art to improve lung cancer detection and classification accuracy.

Methodology
In this section, the design of the proposed hybrid EOSA-CNN algorithm is presented. A brief review of the optimization algorithm, namely the Ebola optimization search algorithm (EOSA), is presented [49]. This is followed by the design of the CNN architecture. Also, the pseudocode of the EOSA-CNN algorithm and the corresponding flowchart will be discussed in this section. The combined preprocessing techniques and the corresponding pipeline of application of the techniques are also presented.

The EOSA metaheuristics algorithm
We present the metaheuristic algorithm named Ebola optimization search algorithm (EOSA) based on the propagation mechanism of the Ebola virus disease [49]. The model of the EOSA algorithm is based on an improved SIR model of the disease. The model consists of the S, E, I, R, H, V, Q, and D compartments, which further translates to Susceptible (S), Exposed (E), Infected (I), Hospitalized (H), Recovered (R), Vaccinated (V), Quarantine (Q), and Death (D). The composition of these compartments allows the creation of a search space that provides optimized sets of weights and biases needed for the CNN architecture. The SIR model was then represented using a mathematical model based on a system of first-order differential equations. A combination of the propagation and mathematical models was adapted for developing the new metaheuristic algorithm. Furthermore, the resulting mathematical model was then used to design the EOSA-CNN algorithm for experimentation. The mathematical models are as follows: In Eq (1) g. If the condition for termination is not satisfied, go back to step 4.

Return global best solution and all solutions.
In the following sub-sections, the application of EOSA to the optimization problem described by the study is designed and discussed. In Fig 1, an overview of the procedure for the use of the EOSA and other hybrid metaheuristic-based algorithms is presented.

Image preprocessing techniques
Image preprocessing techniques are often applied to image samples to improve classification accuracy by removing noise and introducing sharpness [7]. The preparation of the data, also known as preprocessing, describes any processing that makes and prepares the raw data for another task, such as classification, prediction, and clustering, to ensure or enhance the task performance. In this study, the preprocessing phase includes many functions for manipulating the images into a suitable form for further analysis. Firstly, we downloaded the data from Kaggle and then read it using python. Then we applied image resizing, converting the image into the grayscale mode, Gaussian blur filter, segmentation, normalization, erosion, noise removal, and wavelet transform into the lung cancer images. Fig 2 shows the steps we followed in our preprocessing.
The Gaussian blur is a linear filter-type technique that helps image processing by implementing smoothing and blurring effects to remove the noise. It estimates the weighted mean of pixel intensities at adjacent positions. Otsu's thresholding technique uses a threshold value that divides the image into foreground and background. The threshold value increases gradually to reach the maximum variance between the pixels of the two classes. Image normalization is an essential phase in the data preparation that changes the range of pixel intensity values. Erosion and dilation are the basic morphological operations in image processing. This process aims to extract the most relevant structure of the image viewed as a set through its subgraph representation. The mathematical equation of the erosion and dilation process is defined as shown in Eqs (9) and (10): Y is a binary image, B is a template operator, and A is the original image to be processed. Image noise is the random variation of brightness or colour information in images. This noise may come from various sources, which erode image quality. We used a contrast-limited adaptive histogram equalization (CLAHE) filter to remove the unwanted noise. Wavelet analysis is a kind of multivariate analysis commonly used in medical images. The wavelet has two decomposition levels; the first level produces two coefficient vectors, namely approximation and detail coefficient, representing low and high-frequency contents. In this study, we used the biorthogonal family using pywt.dwt2 function. After that, we partitioned the preprocessed data into 80% and 20% for training and testing sets, respectively. Then we built the CNN model to compute the solution vector used for the hybrid CNN-metaheuristic algorithm proposed in this study.

Design of the CNN architecture
Convolution Neural Networks (CNNs) are deep learning algorithms containing multi-layers between the input and output and are developed for image analysis and classification. Moreover, CNN is a mathematical model designed from convolution, pooling, and fully connected layers. The CNN conducts feature extraction using the convolution and pooling layers, while the fully connected layers map the extracted features into the final output. In this study, we proposed CNN architecture for design and experimentation. This architecture is depicted in Fig 3. The CNN architecture described in Fig 3 consists of 4 blocks of convolutional-pooling layers. Each block consists of two convolutional layers, a zero-padding layer and a mas-pooling layer. The filter size and count application for the convolutional layers in the first block are 3x3 and 32x32, respectively. The PoolHelper layer is a custom layer implemented as a class and used for preselecting some features before applying the max-pooling operation. The convolutional layers in the second block consist of 64x64 filter counts and use the same 3x3 filter size as seen in the first block. The same pattern of filter size of 3x3 is seen in the convolutional blocks 3 and 4. Meanwhile, the filter count in the convolutional layers of those blocks 3 and 4 consists of 128x128 and 256x256, respectively. The max-pooling layers applied an interleaved pattern of 2x2 and 3x3 from block 1 through block 4 of the CNN architecture. After the fully connected layer appears close to the last max-pooling layer, a dropout operation uses a 0.5 drop rate. This is followed by a dense layer using the softmax function for the classification task. Feature extraction from input samples is achieved with the blocks of convolutional-pooling layers described earlier.

EOSA-CNN algorithm
The procedure for building the proposed CNN architecture and the application of the optimization procedure is described in Fig 4. Three major phases are considered in the design: the initialization phase, the CNN composition phase, and the optimization phase. Meanwhile, we also demonstrate the need for full training in optimized CNN architecture, as seen in the flowchart. The notations ncls, nblks, fracl, and evd represent the number of convolutional layers, the number of convolutional blocks, the fraction of infected cases and the estimated virus incubation duration.
The optimization process for the CNN architecture is as follows: first, the problem size for the optimization algorithm is obtained by summing the size of the weights w and the bias b for the CNN architecture. Both w and b were obtained from the input and output sizes of the CNN, respectively. So, problem size pz is defined by Eq (11). Initial solutions of pz size were then generated, and their fitness values were computed using the Eq (12). For t iterations, the optimization algorithm is trained until the initial solutions improve to yield the most optimal solution for solving the classification problem. Meanwhile, for each 1,2. . .t, the fitness values of the solutions are recomputed using (12) so that the best solution is buffered. In addition, for each of those t, the solutions s are passed to the CNN architecture for reconstruction, as seen in Eq (13), and testing datasets are applied for predicting purposes. The error rate is computed and minimized further through progressive training of the optimizer to obtain an optimal solution.
Where e denotes a small value used to control the fit yielding a wrong value, note that once the best solutions are computed, the combination of weights and bias are unwound from the solutions and then plugged back into the CNN architecture for full training. The fully trained model is then applied for prediction to solve the domain problem of classifying lung cancer images. The procedure of creating the CNN architecture, computation of the solution vector, optimization of its weights, and full training of the architecture using the optimized weights is presented in Algorithm 1. Lines 4-13 describe the configuration required to design the CNN architecture based on the parameters supplied. The solution vector of CNN architecture is then computed and supplied to the optimization algorithm as the search space in Line 20. Lines 19-22 of the algorithm show the initialization phase of the metaheuristic algorithm applied in this study. // initialize the model 2 blk=0; 3 n = 5; 4 while blk numblocks do 5 kcount=2 n ; 6 cnn layer2D(kSize, kcount, relu); 7 cnn zeropad(1);  Meanwhile, an index case, the infected case, is generated, and then the training process is commenced within the loop. The infected cases (s) are exposed to susceptible individuals to simulate infection, hospitalization, vaccination, dead, recovery, and quarantining in each iteration. In line 8, we showed that some infected cases (I) are drawn into the quarantine compartment so that only a fraction of I infect S individuals. For lines 26-40, new infections are generated from S and then added to I. Since R, V, H, and V are only derivable from I, we applied the updated I on Lines 41-47 to generate and update individuals using the corresponding equations. In our algorithm, recovered cases are added to S while dead individuals are replaced in S with new cases, as shown in lines 46-47. Once the loop's termination condition is satisfied, the algorithm terminates, and the optimized solution vector is passed back to the CNN architecture for full training.

Experimentation
The experimentation to investigate the performance of the EOSA metaheuristic algorithm was first implemented, and after that, we experimented with its applicability to the hybrid EOSA-CNN algorithm. This section describes the experimental setup and parameter selection techniques used for these two experiments. Also, we present detailed datasets used in the study and demonstrate the outcome of the image preprocessing techniques applied. The benchmark functions used to evaluate the performance of the EOSA metaheuristic algorithm are also listed and discussed. Lastly, a brief discussion of evaluation metrics used to compare the performance of the hybrid's algorithms (EOSA-CNN, GA-CNN, MVO-CNN, LCBO-CNN, WOA-CNN and SBO-CNN) are also presented.

Parameter settings
We conducted five experiments to independently investigate and explore the performance of the traditional CNN model and the proposed CNN using the metaheuristic optimization algorithms, including GA, SBO, MVO, WOA, LCBO, and EOSA. All the experiments were carried out on a Dell machine (Optiplex 5050) with the following specifications: Intel core i5, 7 th generation, 16GB memory, and 500GB hard drive. Table 1 shows the proposed CNN hyperparameter configuration.
The input to the proposed CNN architectures is 258 × 258, representing the preprocessed images with a size of 512 × 512. Table 2 presents the metaheuristic algorithms' configuration for optimizing the proposed CNN model. All the methods shared the same values of parameters, such as the batch size and the number of epochs.
In Table 2, the initial values for each parameter are defined. Considering the stochastic nature of EOSA, which falls within the characteristic of biology-based optimization algorithms, values for some parameters are randomly assigned. The problem size applied for all experimentation is 100. We note that these values remain fixed for all experiments on the benchmark functions.

Datasets and image preprocessing
We used The Iraq-Oncology Teaching Hospital/ National Center for Cancer Diseases (IQ-OTH/NCCD) lung cancer dataset (https://www.kaggle.com/kerneler/ starter-the-iq-othnccd-lung-cancer-09c3a8c9-4/data). This dataset was collected from two specialist hospitals for three months in 2019. The data is composed of CT scans taken from lung cancer patients diagnosed in various stages and normal patients. The data consist of 1097 samples (images) taken from 110 cases categorized into three classes: normal, benign, and malignant. One hundred and twenty (120) samples are benign, 561 samples are malignant, and 416 are normal samples. Fig 5 shows random samples of the original dataset. In Section 3.2, a detailed schematic diagram of the process for the image preprocessing technique applied was discussed. These techniques included grayscale, Gaussian Blur, image segmentation, image normalization, erosion and dilation CLAHE, and wavelet transform. We used the cvtColor () function in the OpenCV library to convert the lung cancer images into  The main objective of Otsu's method was to obtain the optimum threshold value. It can be calculated by grouping pixels into two classes, C1 and C2, and has bimodal histograms. Otsu's method is suitable for distinguishable foreground and background with a widely reported interesting performance [60]. Considering the nature of the dataset used in this study, we applied the method for the preprocessing task. The method also reduces the intra-class variance by selecting a suitable threshold value. We used the threshold function in Python. Fig 8  below demonstrates the effects of Otsu's method on lung cancer images. We used normalize function in Python for normalizing the lung cancer images, as seen in Fig 9. The processed lung cancer images after applying the erosion and dilation are shown in Fig 10. The result of the CLAHE filter can be seen in Fig 11. The wavelet output is depicted in Fig 12 and is decomposed into four quadrants with different interpretations (LL, LH, HL, HH). We selected the LL part for further analysis, as shown in Fig 13.

Benchmark functions for evaluating EOSA
To evaluate the effectiveness of the performance of the EOSA metaheuristic algorithm, we employed using 15 standard and high dimensional benchmark functions via experimentation. First, we sought to investigate the relevance of EOSA in achieving the optimization required  for the classification problem. Secondly, it was necessary to compare the performance of EOSA with state-of-the-art optimization algorithms' performance. These functions are listed in Table 3 and were subsequently used to compare similar metaheuristic algorithms in Section 5. We list the names, mathematical representations, and range values of the functions in Table 3 below.

Classification evaluation metrics
In this paper, the comparison was based on seven performance measures, as defined in the following paragraphs. These measures were calculated from the generic confusion matrix in Table 4.
Accuracy is the percentage of correctly classified samples: Kappa is a chance-corrected measure of agreement between the classifications and the true classes: kappa ¼ Accuracy À Random Accuracy 1 À Random Accuracy ð15Þ Specificity is the proportion of actual negatives which are predicted negative: Sensitivity is the proportion of actual positives which are predicted positive: Precision is a metric which supports the ability to determine how correctly our model predicts Recall is used to measure the ability of a model to pick out positive samples from the data source used for the experiment.

Results and discussion
The performance of the proposed hybrid algorithm EOSA-CNN is evaluated in this section. The outcome of this evaluation is compared with other CNN solutions applied to the same classification problem. We also present the EOSA metaheuristic algorithms' performance compared with other state-of-the-art methods using the benchmark functions listed in the previous section. The performance of EOSA was compared with nine different optimization algorithms, namely Artificial Bee Colony (ABC), Whale Optimization Algorithm (WOA), Butterfly Optimization Algorithm (BOA), Particle Swarm Optimization (PSO), Differential Evolution (DE), Genetic Algorithm (GA), Henry Gas Solubility Optimization Algorithm (HGSO), Blue Monkey Optimization (BMO), and Sandpiper Optimization Algorithm (SOA). The experimentation, which was executed for five 500 iterations and 20 different runs, was applied to 15 benchmark functions.
Using the benchmark functions listed in Table 3, the performance of EOSA compared with other state-of-the-art methods showed better outcomes, as seen in Table 5. For example, the number of times when each algorithm dominated others is described as follows: for ABC, WOA, BOA, PSO, DE, GA, BMO, EOSA, HGSO, and SOA, dominant over other methods are 2, 2, 2, 1, 1, 0, 0, 6, 1, and 4 respectively. This confirms that EOSA demonstrated superiority over other methods eight times out of all the 15 benchmark functions we experimented with. The SOA algorithm is another competitive method that follows EOSA in performance with four benchmark functions. Considering the capability of the EOSA metaheuristic algorithm to obtain more best solutions out of all the benchmark functions, it became necessary to investigate its applicability to the optimization problem described in this study. Meanwhile, Fig 14 shows a convergence graph of EOSA over some selected benchmark functions. The plot showed that the convergence pattern of the EOSA method is smooth, especially in the cases of F1-F6 and F9, and even those of F7-8 and F10-13 are seen to converge well. This demonstrates that the algorithm can search for the best solution from the global search space. This also confirms the algorithm's applicability in solving complex real-life problems, as investigated in this study. Fig 15 shows  The optimized CNN architecture was fully trained to learn the classification problem of detecting and classifying lung cancer from the database samples used in the study. The trained model was then applied to a dataset for prediction. Results showed that the optimization process's impact benefited the entire process. In Table 6  EOSA-CNN hybrid algorithm benefited the classification process, leading to better classification accuracy in detecting malignancy. Furthermore, we noted that the good performance of the hybrid algorithm for specificity metric showed that it could effectively detect true negative cases, thereby reducing false-negative reports. Meanwhile, we observed that EOSA-CNN outperformed similar hybrid algorithms and outperformed the traditional CNN model, which achieved an accuracy of 0.80. We examined the performance of the EOSA-CNN algorithm on the three classes of labels seen on the samples drawn from the datasets. These results are listed in Table 6, where the specificity, sensitivity, precision, recall, F1-score and balanced accuracy are computed and reported. In most cases, all the hybrid algorithms competed very closely with the proposed EOSA-CNN algorithm, while it was seen to outperform the traditional CNN in most metrics. Again, this confirms that EOSA-CNN successfully indicated the features of each class and correctly classified them on an excellent performance. This further reinforces the need for the algorithm's usefulness in addressing the classification problem in the domain.
Furthermore, a detailed report on the performance of the hybrid algorithms, when compared with the EOSA-CNN algorithm and then with the traditional CNN, is presented in Fig 14. Convergent curves of EOSA on standard benchmark functions over 1, 50, 100, 200, 300, 400 and 500 epochs.
https://doi.org/10.1371/journal.pone.0285796.g014 Fig 15. Convergent curves of EOSA and related optimization algorithms benchmark functions over 1, 50, 100,  200, 300, 400 and 500 Table 7, where we computed the best, mean, standard deviation, median, and worst values. These were computed for all metrics of accuracy, kappa, precision, recall, F1 score, specificity, and sensitivity for the overall performance of the algorithms.   GA-CNN, LCBO-CNN, MVO-CNN, SBO-CNN, WOA-CNN, and EOSA-CNN  In Table 8, we compute the values for the same set of metrics, namely best, mean, standard deviation, median, and worst with respect to class labels seen in the samples from the dataset. This allows for investigating that the algorithms are not biased in detecting and classifying features from each class. We observed that results were obtained for accuracy, kappa, precision, recall, F1 score, specificity, and sensitivity for all hybrid algorithms and the traditional CNN for malignancy labels. The result obtained for the best values in all those metrics confirmed the good performance of the hybrid algorithms over the CNN architecture and for EOSA-CNN   The classification accuracy of all classes is indicated for each plot of the confusion matrix to give an accurate report on their performances. Taking the case of EOSA-CNN as an example, we see that 90% of all cases with normal labels were correctly identified, and over 86% of cases labelled as malignant were correctly identified by the hybrid algorithm proposed in this study. This is contrary to what is reported for the traditional CNN, where only 67.31% of samples with normal labels were correctly identified, while about 83% of those with malignancy were correctly identified. This reinforces the impact of the hybrid algorithm proposed in this study since it improved classification accuracy.

PLOS ONE
In Table 9, we compare the performance of the proposed EOSA-CNN hybrid algorithm with those reported in similar studies. The classification accuracy obtained in the approach proposed in this study competes with those seen in the works of Chen et al. [61] Sultana et al. [62], Bangare et al. [63] Al-Yasriy et al. [64] Dass and Kumar [65] and Lyu [66]. All the similar methods applied basic and benchmark CNN architectures with known use of any parameter optimization strategy. Although the result obtained by most of the studies are interesting, we note that such models will under-perform when some performance tilting conditions are introduced. These approaches are far below that proposed in this study, which aimed to stabilize and solve classification problems using optimized CNN architectures. As seen in this study, we argue that the hyperparameter optimization applied using metaheuristic algorithms promises a stabilized model that learns the classification problem effectively and can address the underlying condition. Therefore, the approach can eliminate false positive rates (FPR) and false negative rates (FNR), often making a mal-trained model yield pseudo-performance. Furthermore, several studies have confirmed that optimizing the architectural configuration of CNN models has now become the state-of-the-art (SOTA) in yielding the best-performing classification models. Therefore, considering such a SOTA approach, which resulted in an impressive performance, demonstrates that classification problem-solving is reliable.
In this study, the result of specificity and precision, which are 1.0 for both cases as obtained, confirms that classification accuracy alone is insufficient to demonstrate the methods' superiority. It can be seen that the proposed method in this study gave a very good performance in its ability to eliminate the presence of false positives and ensured that our model correctly classified negative cases as negative and positive cases as positive. Also, the value of 1.0 for specificity reported for the method proposed in this study showed that the total number of negative cases (normal and benign) in our datasets discovered to be truly negative was very accurate. That means all negative cases were truly confirmed negative by our method. This is very important to rule out the possibility of false negative and false positive results. Yielding a zero level for false positive and false negative rates, as seen by our proposed method, showed that the EOSA-CNN hybrid algorithm is good for classification accuracy and obtains results. This will boost confidence in the resulting output of the proposed algorithm when deployed for use. Therefore, this study has demonstrated the importance of using the hybrid metaheuristic algorithm and CNN models to solve the difficult problem of selecting the best combination of Table 9. Performance comparison of the proposed method and some similar methods of CNN for the classification of lung cancer.

Method Dataset Performance
Chen et al., [61]  weights and biases required for training a CNN model. Moreover, the approach demonstrates that combining the methods can improve classification accuracy and the general performance of classifying lung cancer in CT images.

Study limitations
The study has a few limitations, including insufficient data sample size and the lack of consideration for possible imbalanced data and time complexity due to limited resources. We suggest that future work should address these limitations by using techniques such as random under and over-sampling or cluster-based over-sampling and incorporating larger sample sizes to improve model performance. Despite these limitations, the proposed EOSA-CNN model outperformed other hybrid algorithms and traditional CNNs on all seven metrics evaluated, which is significant compared to previous studies. Further research is necessary to evaluate the EOSA model's performance on other medical problems.

Conclusion
This study presents a novel hybrid algorithm to improve the accuracy of lung cancer classification using a CNN model. The EOSA algorithm was used to optimize the solution vector of the CNN architecture, which was trained on distinct 2D samples categorized based on their abnormalities. The resulting model performed well on new datasets, indicating its generalization ability. The EOSA-CNN algorithm outperformed traditional CNN and other metaheuristicbased hybrid algorithms, as demonstrated by accuracy, kappa, precision, recall, F1 score, specificity, and sensitivity metrics. The contribution of this study is the successful use of the EOSA algorithm, a virus-based optimization technique, to improve the solution vector of the proposed CNN architecture. Future work includes optimizing the hyperparameters of the CNN model, investigating the possibility of using the hybrid approach to auto-design the CNN architecture and comparing the proposed CNN architecture against benchmarked models for further evaluation. Overall, this study provides a promising classification model for identifying malignant and benign lung cancer cases from digital images, with potential applications in early detection and improved decision-making for patient treatment.