Machine learning for differentiating lung squamous cell cancer from adenocarcinoma using Clinical-Metabolic characteristics and 18F-FDG PET/CT radiomics

Noninvasive differentiation between the squamous cell carcinoma (SCC) and adenocarcinoma (ADC) subtypes of non-small cell lung cancer (NSCLC) could benefit patients who are unsuitable for invasive diagnostic procedures. Therefore, this study evaluates the predictive performance of a PET/CT-based radiomics model. It aims to distinguish between the histological subtypes of lung adenocarcinoma and squamous cell carcinoma, employing four different machine learning techniques. A total of 255 Non-Small Cell Lung Cancer (NSCLC) patients were retrospectively analyzed and randomly divided into the training (n = 177) and validation (n = 78) sets, respectively. Radiomics features were extracted, and the Least Absolute Shrinkage and Selection Operator (LASSO) method was employed for feature selection. Subsequently, models were constructed using four distinct machine learning techniques, with the top-performing algorithm determined by evaluating metrics such as accuracy, sensitivity, specificity, and the area under the curve (AUC). The efficacy of the various models was appraised and compared using the DeLong test. A nomogram was developed based on the model with the best predictive efficiency and clinical utility, and it was validated using calibration curves. Results indicated that the logistic regression classifier had better predictive power in the validation cohort of the radiomic model. The combined model (AUC 0.870) exhibited superior predictive power compared to the clinical model (AUC 0.848) and the radiomics model (AUC 0.774). In this study, we discovered that the combined model, refined by the logistic regression classifier, exhibited the most effective performance in classifying the histological subtypes of NSCLC.


Introduction
According to GLOBOCAN 2020, lung cancer ranks as the second most common type of cancer and stands as the leading cause of cancer-related deaths.Approximately 2.2 million new cases were diagnosed in 2020 alone, with the disease accounting for an estimated 1.8 million fatalities [1].Various types of lung cancer exist, with non-small cell lung cancer (NSCLC) being the most prevalent, constituting about 85% of all lung cancer cases globally [2].Squamous cell carcinoma (SCC) and adenocarcinoma (ADC) represent the two most common histologic subtypes.Research indicates significant variances in the genetic and epigenetic traits of ADC and SCC during tumorigenesis and progression [3].Given the differing treatment approaches for adenocarcinoma and squamous cell carcinoma, swift and precise identification of these pathological subtypes is critical.
Approximately one-third of patients diagnosed with NSCLC are at Stage III, a stage at which most are no longer viable candidates for surgical intervention [4].Consequently, the adoption of computed tomography (CT)-guided biopsies has become the gold standard for determining the pathologic subtype of lung cancer.However, this invasive method may not fully capture the entire tumor's heterogeneity.Given that biopsies typically yield only a few small tissue samples, they may not provide a comprehensive understanding of the overall tumor, posing challenges for accurate diagnosis.Additionally, potential risks associated with biopsy procedures, such as pneumothorax, intrathoracic hemorrhage, pleural reaction, air embolism, and intrapleural implantation metastasis, exist.The prospect of additional biopsies due to heterogeneous or necrotic tumor tissue may deter some patients from undergoing a biopsy, particularly those with an uncontrolled cough [5,6].Therefore, the development of a reliable, non-invasive, and practical method for predicting NSCLC histology prior to treatment is paramount.
Relevant studies suggest that certain clinical characteristics can aid in differentiating the diagnosis of lung adenocarcinoma from lung squamous cell carcinoma.These include factors such as age, smoking history, tumor diameter, imaging signs, and microvascular density [7][8][9].However, the sole reliance on clinical features for classifying pathological tissues may be influenced by the subjective judgment of the physicians or by the heterogeneity and quantity of the samples, potentially leading to variability in diagnostic outcomes.
As medical and information technologies advance, there is an exponential growth in the volume of medical data, especially in the production of medical imaging.These images harbor extensive latent details pertinent to human health.Yet, the manual examination and interpretation of such data are not only time-consuming but also prone to human bias.Leveraging the capabilities of machine learning can significantly alleviate these issues by extracting sophisticated features and minimizing subjectivity.Radiomics entails the quantitative retrieval of characteristics from conventional medical imaging.The development of predictive or diagnostic models through machine learning techniques allows for the collection of data that can be assimilated into clinical decision-making tools, consequently improving the accuracy of diagnoses or prognosis.[10,11].Zhu et al. [12] extracted 485 features from manually delineated tumor regions in 129 NSCLC patients.The results demonstrated that the area under the curve (AUC) for the training and validation sets reached 0.905 and 0.893, respectively, indicating that the imaging features have substantial efficacy in differentiating between lung adenocarcinoma and squamous cell carcinoma.Bashir et al. [13] analyzed the effectiveness of a random forest model utilizing CT image radiomics features, CT semantic features, and combined features in distinguishing lung adenocarcinoma from squamous cell carcinoma.The findings revealed that the random forest model based on radiomics features could non-invasively analyze the histological subtypes of NSCLC with an AUC of 1.
18F-fluorodeoxyglucose (FDG) PET/CT, which combines anatomical and metabolic information, is crucial for identifying primary tumors, staging diseases, and assessing treatment success.Radiomics based on PET/CT shows potential in differentiating ADC from SCC. Yan et al's study [14] developed separate models based on PET and CT, as well as a combined PET-CT model.Among these, the combined model exhibited superior performance in predicting ADC, SCC, and metastasis.Additional studies have also discovered that the inclusion of clinical characteristics, such as gender and smoking history, further enhanced the classification performance, achieving an area under the curve (AUC) of 0.859, which surpassed the performance of radiomics alone [15,16].When developing a radiomics prediction model, the selection of an appropriate machine learning algorithm can significantly enhance the model's predictive accuracy and stability.Various classifiers, including Support Vector Machine (SVM), Logistic Regression (LR), and Random Forest (RF), have been utilized to construct models in the studies mentioned earlier.The Light Gradient Boosting Machine (LGBM), a model rooted in gradient boosting decision trees (GBDT), shares principles with the XGBoost algorithm but offers several advantages, such as faster training efficiency, lower memory consumption, higher accuracy, and support for parallel learning.To the best of our knowledge, there has been scant research evaluating the efficacy of the LGBM classifier for radiomics models based on 18F-FDG PET/CT that incorporate clinico-metabolic features for differentiating between ADC and SCC in lung cancer.Therefore, the objective of this research was to create and corroborate a superior machine learning (ML) model utilizing PET/CT data to distinguish between SCC and ADC in stage III NSCLC.

Study design
The Ethics Committee of the Affiliated Cancer Hospital of Shandong First Medical University approved the current retrospective study (No.SDTHEC 2023010008).The requirement for written informed consent was waived due to the study's retrospective nature.The workflow of our study is depicted in Fig 1.

Patients
Data were accessed for research purposes beginning on March 10, 2023.In this study, we selected a cohort of 255 patients diagnosed with non-small cell lung cancer (NSCLC) between September 2018 and May 2022.The inclusion criteria for this study were as follows: (1) pathologically confirmed non-small cell lung cancer (NSCLC); (2) available PET/CT images obtained before treatment; (3) a diagnosis of stage III disease; (4) a single tumor lesion exceeding 1 cm in diameter.The exclusion criteria included: (1) patients who received anti-tumor treatment prior to the PET/CT scan; (2) individuals with a history of other thoracic malignant tumors or systemic malignancies; (3) patients with pathological confirmation of histological subtypes other than ADC or SCC; (4) patients who underwent surgical intervention after their diagnosis.
In the end, the study enrolled 255 patients, who were then randomly divided into two groups: the training cohort, consisting of 177 individuals, and the internal validation cohort, comprising 78 individuals, following a 7:3 distribution.The clinical characteristics of the patients were systematically documented.Furthermore, the researchers measured various PET metabolic parameters, including metabolic tumor volume (MTV), mean standardized uptake value (SUVmean), maximum standardized uptake value (SUVmax), and minimum standardized uptake value (SUVmin).Additionally, the total lesion glycolysis (TLG) was calculated using the formula TLG = SUVmean × MTV [17].

18F-FDG PET/CT image acquisition
18F-FDG scans were conducted using a Philips Gemini TF PET/CT system (Philips Medical Systems, Netherlands) in accordance with standard clinical scanning protocols.Patients were required to fast for at least six hours prior to the scan, ensuring their blood glucose levels remained below 140 mg/dL.Approximately one hour after the administration of an intravenous dose of 4.4 MBq/kg of 18F-FDG, PET and CT images were acquired.The PET images were reconstructed in multiple planes and reconstruction slice-thickness range of 1 to 3 mm.

Tumor segmentation
Tumor segmentation was executed using AccuContour software (version 3.2, Manteia Medical Technologies Co., Ltd., Xiamen, China).Two experienced nuclear medicine physicians employed a threshold of 40% of the maximum standardized uptake value (SUVmax) to delineate the gross tumor volume (GTV) on PET images, reaching a consensus without prior knowledge of the pathology [18,19].Concurrently, the contours of the GTV on CT slices were outlined based on the integration of PET and anatomical data from the CT images.Subsequently, two senior radiologists conducted a collaborative review of the target images.

Feature extraction
In this study, the features are divided into three categories: (I) geometric, (II) intensity, and (III) textural.Geometric features capture the three-dimensional shape properties of the tumor.Intensity features reflect the statistical distribution of voxel intensities within the tumor.In contrast, textural features leverage methods such as the gray-level co-occurrence matrix (GLCM), gray-level run length matrix (GLRLM), gray-level size zone matrix (GLSZM), and neighborhood gray-tone difference matrix (NGTDM) to characterize patterns and spatial distributions of voxel intensities.A total of 1,834 handcrafted CT features and 2,016 handcrafted PET features were extracted.All handcrafted features were extracted using a custom feature analysis program implemented in Pyradiomics (http://pyradiomics.readthedocs.io).To integrate PET and CT features, an early fusion approach was employed.

Feature selection and prediction model establishment
To ensure maximal representation of features while maintaining their distinctiveness, we assessed the correlation among highly repeatable attributes using Spearman's rank correlation coefficient.Features exhibiting a correlation coefficient greater than 0.9 with any other feature were retained.For feature selection, we employed a greedy recursive elimination technique, which systematically removes the most redundant features from the current set at each step.Then, we employed the Least Absolute Shrinkage and Selection Operator (LASSO) regression analysis to select effective features within the training dataset.The LASSO model is particularly effective in reducing regression coefficients toward zero, thereby excluding irrelevant features by setting their coefficients to zero.The initial step involved identifying the optimal regularization parameter, λ.To accomplish this, we utilized 10-fold cross-validation, adhering to the least absolute criterion.By selecting the λ value that minimized the cross-validation error, we identified features with non-zero coefficients that were instrumental in fitting the regression model.These features were then aggregated to construct the radiomic model.Additionally, a radiomics score was computed for each patient by linearly combining the selected features with weights derived from their respective coefficients in the model.The LASSO regression modeling was conducted using the scikit-learn package in Python.
To differentiate between SCC and ADC, three independent predictive models were separately developed: the Clinical-Metabolic Model (clinic model), the PET/CT Radiomic Model (RS model), and the Combined PET/CT Radiomic and Clinical-Metabolic Model (combined model).Four machine learning classifiers, including LR, LGBM, SVM, RF, were used to construct these models.During this process, 5-fold cross-validation was employed to derive these final models.

Development and validation of individualized nomogram
Furthermore, we constructed a radiomics nomogram using the validation dataset to facilitate a rapid and visual assessment of the enhanced predictive value provided by the combination of the radiomics scores with clinical risk factors.Logistic regression analysis was used in this study to combine radiomic features with clinical risk factors in the nomogram.Finally, we developed calibration curves to appraise the calibration quality of the nomogram.

Statistical analysis
Patient characteristics were compared using independent sample t-tests, Mann-Whitney U tests, Fisher's exact test, or chi-square (χ 2 ) tests where relevant.The process of identifying clinical features incorporated both univariate and multivariate logistic regression analyses.The statistical software SPSS (Version 25.0) and R (Version 3.4.0)were used for data analysis, with P-values below 0.05 signaling statistical significance.The selection of the most effective machine learning (ML) model hinged on its performance metrics: the area under the receiver operating characteristic curve (AUC), accuracy (ACC), sensitivity (SEN), and specificity (SPE).AUC comparisons of the various models on the validation set were made using the DeLong test.Decision Curve Analysis (DCA) was also implemented to evaluate the clinical usefulness of the predictive model.

Clinical characteristics of patients
This study included a total of 255 participants diagnosed with non-small cell lung cancer (NSCLC).Among these cases, there were 145 patients with squamous cell carcinoma (SCC) and 110 with adenocarcinoma (ADC).The patient population ranged in age from 26 to 85 years, with a mean age of 62 years.The distribution of baseline clinical characteristics was well-balanced between the training and validation cohorts.Table 1 displays the distribution characteristics of the two groups, providing a detailed breakdown of the baseline clinical attributes for each set of patients.

Features selection and prediction model establishment
To minimize subjective variability in the segmentation of regions of interest (ROI), only radiomic features with both inter-reader and intra-reader Intraclass Correlation Coefficients (ICCs) greater than 0.75 were included.The radiomic features extracted from PET/CT images were categorized into seven distinct groups: first-order features, shape-based features, Gray Level Dependence Matrix (GLDM) features, Gray Level Run Length Matrix (GLRLM) features, Gray Level Size Zone Matrix (GLSZM) features, Neighbouring Gray Tone Difference Matrix (NGTDM) features, and Gray Level Co-occurrence Matrix (GLCM) features.Detailed information about the handcrafted features is provided in Supplementary data (S1 Table ).4).The combined model incorporated two clinical parameters (gender and CEA) along with seven radiomic parameters (Fig 3C -3E, Table 4).To calculate each patient's pre-scores for each model, the following formulas were applied:

Prediction performance and clinical utility of prediction models
Table 5 displays a consolidated overview of the predictive capabilities for differentiating ADC from SCC across multiple machine learning classifiers within the training and internal validation groups.In the validation cohorts, the LR models demonstrated superior outcomes with respect to the AUC, ACC, SEN, and SPE compared to other ML classifiers.As a result, LR was selected as the preferred ML algorithms for the classification of the specified pathological types.
The performance evaluation of the three predictive models using the logistic regression (LR) classifier (Fig 4A and 4B), complemented by the DeLong test results (Table 6), indicated that the combined model surpassed the others, exhibiting superior discrimination and achieving the highest level of accuracy.This was substantiated by the metrics obtained using the    histological subtypes, especially when the threshold probability surpasses 20% (Fig 5B and 5C).Calibration curves for both the training and validation cohorts of the nomogram demonstrated a high degree of agreement between the predicted histology and the actual observations, with the combined model using the logistic regression algorithm showing particular effectiveness (Fig 5D and 5E).

Discussion
Personalized treatment plays a crucial role in improving patient survival outcomes.At the heart of personalized medicine lies the early and accurate diagnosis and staging of lung cancer, as well as the precise identification of its pathological subtypes.Although biopsy is considered the gold standard for diagnosing lung cancer, its invasiveness, limited reproducibility, possibility of yielding false-negative results, and the associated risk of complications highlight the urgent need for enhanced diagnostic methods.Therefore, the differentiation of pathological subtypes of non-small cell lung cancer (NSCLC) through standard imaging modalities remains a substantial challenge.In this study, we compared four classifier models to identify the pathological subtypes of non-small cell lung cancer.The optimal classifier was evaluated for its predictive efficacy across three models: the RS model, the clinic model, and the combined model.
In this study, we discovered that the combined model, refined by the logistic regression classifier, exhibited the most effective performance in classifying the histological subtypes of NSCLC.
We explored the clinical features that contribute to the differentiation between ADC and SCC in NSCLC.We found that gender, age, CEA levels, maximum standardized uptake value and minimum standardized uptake value were statistically significant discriminators between ADC and SCC, which were accordence with other studies.Previous research has validated gender and age as clinical characteristics that can distinguish between ADC and SCC.Koh et al. [20] compared intratumoral stromal proportions and positron emission tomography (PET) textural features in females and males diagnosed with either adenocarcinoma or squamous cell carcinoma.Their findings indicated a higher prevalence of ADC in females compared to males.Additionally, the variation in tumor heterogeneity between women w ith ADC and men with ADC or SCC suggests that gender may serve as a distinguishing feature.Younger patients are more commonly diagnosed with adenocarcinoma, aligning with the findings of several studies [21][22][23][24].This trend may be attributable to the different mutation rates of genes such as EGFR, ALK, and KRAS in younger versus older lung cancer patients.In our cohort, the ADC group consisted of younger individuals than the SCC group (P<0.05),although the average age in both groups exceeded 60 years.This contrasts with other studies that categorize patients as younger if under 40 and older if over 60.The discrepancy can primarily be ascribed to the specific sample population in our research.Serum CEA levels are commonly measured to identify lung cancer, serving as a tumor marker.Elevated CEA levels are seen in 35% to 70% of NSCLC patients, particularly in those with lung adenocarcinoma and advanced disease [25], a finding that our study corroborates.Furthermore, research by Karam et al. [26][27][28] on 98 NSCLC cases established a significant correlation between SUVmax and the size of primary lesions, with SCC showing notably higher SUVmax values than ADC.Our study confirms that SUVmax is indeed higher in SCC compared to ADC (P < 0.05).We also observed that SUVmin is higher in ADC than in SCC (P < 0.05), adding another layer to the diagnostic criteria for these subtypes.
In addition to analyzing clinical features, this study also integrated radiomic features from PET/CT images.In our study, the most significant radiomic features for both the RS model and the combined model were lbp_3D_m1_firstorder_Skewness, log_sigma_5_0_mm_3D_firstorder_90Percentile, squareroot_ngtdm_Busyness and log_sig-ma_2_0_mm_3D_firstorder_Maximum.A previous study [29] demonstrated that first-order features were particularly stable and robust in rectal cancer.In our research, three first-order features were also confirmed to be the most significant indicators for classifying histological types.Another noteworthy feature in our study was 'busyness,' which is associated with the spatial frequency of intensity changes.Erol M et al. [30] reported that radiomic features, including busyness, were independently correlated with the staging of lung squamous cancer.Bashir et al. [13] and Hyun et al. [16] have previously explored the application of radiomics in the classification of NSCLC.However, in their studies, the radiomic features exhibiting the highest performance include separately GLSZMSZLIE, coefficient of variation, NGTDM coarseness and gray-level zone length nonuniformity, gray-level nonuniformity for zone.The best-performing subset of radiological features in our study differs from those identified in other studies [31,32].This discrepancy may be attributed to the fact that there are hundreds of radiomic features, many of which are inter-correlated, leading to the possibility that different high-ranking features might essentially represent variations of the same underlying feature.
Regarding the predictive capabilities of different models, several studies [33,34] have indicated that a combined model incorporating both PET and CT features yields a higher Area Under the Curve than models using only PET or CT features individually.Ren [35] analyzed preoperative clinical features, tumor markers, and PET and CT imaging characteristics, subsequently constructing four independent predictive models.The DeLong test revealed that the combined model exhibited superior performance in predicting the pathological subtypes of NSCLC, with an AUC of 0.932 for the training cohort and an AUC of 0.901 for the validation cohort.These findings align with those of our study, which determined that the combined model achieved higher AUC values compared to the pure PET-CT model and the clinical model alone.
In the construction of radiomics-based prediction models, the selection of suitable machine learning algorithms can enhance the predictive accuracy and stability of the model.Recent advancements in machine learning algorithms, including Gaussian processes, decision trees, RF, SVM, LGBM, and LR, have propelled the application and development of radiomics.Shen [36] evaluated seven different classifiers to optimize a model for classification: SVM with a linear kernel, SVM with a radial basis function kernel (SVM-RBF), RF, LR, Gaussian process classifier (GP), linear discriminant analysis (LDA), and the AdaBoost classifier.The study found that a PET/CT radiomics model using the SVM-RBF classifier demonstrated the best performance, with an AUC of 0.9155, when integrating subregion imaging from PET-CT scans and clinical features for classifying histological subtypes of NSCLC.Parmar et al. [37] demonstrated that the random forest method was the most effective in managing radiomic feature instability, outperforming 12 other machine learning classifiers, including bagging, Bayesian, boosting, decision trees, discriminant analysis, generalized linear models, multiple adaptive regression splines, nearest neighbors, neural networks, partial least squares and principal component regression, and SVMs-in terms of prognostic performance.Huang, et al. [38] employed the LGBM algorithm to develop both a radiomic model and a fusion model (clinical + radiomic) to predict EGFR mutation status in patients with NSCLC.Models based on radiomic signatures can provide relatively accurate non-invasive predictions of EGFR expression status.
In our study, we found that the combined model constructed using the Logistic Regression algorithm performs excellently in identifying pathological subtypes, even when compared with multiple algorithms, including Light Gradient Boosting Machine.Moreover, LGBM also demonstrated good predictive performance.Logistic regression, a linear model, is widely favored for binary classification problems due to its computational simplicity and interpretability.LR is adept at handling large datasets and excels with linearly separable problems.Our finding aligns with a similar study conducted by Ren et al. [35], which reported that a combined model incorporating clinico-biological features and 18F-FDG PET/CT data, utilizing the LR algorithm, showed strong capability in distinguishing SCC from ADC.Additionally, another study suggested that a model trained on 18F-FDG PET radiomics with the LR algorithm could be effective for predicting the histological subtypes of lung cancer [16].Given that different machine learning algorithms have their respective optimal application contexts, it is imperative to explore a variety of algorithms to identify the most suitable model for predicting NSCLC histology subtypes.For instance, Gao et al. proposed an improved adaptive neurofuzzy inference system-based machine learning method to predict the multi-axis fatigue life of various metal materials.The study found that this model exhibited superior predictive performance and extrapolation capabilities when compared with six classical machine learning models [39].Moreover, a recent study indicated that a 3D convolutional neural network (CNN) model effectively differentiated between benign and malignant pulmonary nodules in 2-[18F] FDG PET images [40].The optimization of machine learning algorithms should be prioritized in future research to improve the performance of predictions.
Our study has several limitations.Firstly, this study is based on data from a single center and only includes a training set and a inner validation set.Multi-center data can be included to enhance the stability of predictions.Secondly, the classification method adopted in this study is machine learning.In the future, we can attempt to incorporate deep learning techniques, even with multiomics to optimize the classification model.

Conclusion
In this study, we developed a comprehensive model that integrates clinical characteristics with PET/CT imaging features using the logistic regression algorithm.This model serves as an effective tool for the "virtual biopsy" of stage III non-small cell lung cancer, distinguishing between different pathological subgroups.It can aid physicians in making informed clinical decisions concerning treatment options and prognostic assessments.

Fig 1 .
Fig 1.The workflow of this study.https://doi.org/10.1371/journal.pone.0300170.g001 Fig 2 depicts the quantity and distribution of the handcrafted features extracted from the CT and PET images.Three models were constructed independently using selected clinical factors-metabolic parameters, PET/CT radiomic features, and a combination of the aforementioned variables, utilizing LASSO regression in the training cohort.The clinic model comprised two clinical factors (gender and age), one tumor marker (CEA), and two metabolic parameters (SUVmax and

Fig 4 .
Fig 4. Comparison of receiver operating characteristic (ROC) curves for predicting subtype of pathology.A shows the ROC curve of LR in the training cohort; B shows the ROC curve of LR in the validation cohort.https://doi.org/10.1371/journal.pone.0300170.g004

Fig 5 .
Fig 5. Clinical utility of prediction models.A shows Nomogram of a clinical radiomics model developed based on a logistic regression model for the training cohort.gender 1:male 2:female.B,C show that Decision curve analysis (DCA) was conducted for the prediction model based on the logistic regression model in the training (B) and validation cohorts (C).D,E show Calibration curves of the nomogram based on the logistic regression model in the training (D) and validation cohorts (E).https://doi.org/10.1371/journal.pone.0300170.g005

Table 1 . Baseline characteristics of patients in cohorts.
SUVmin) (Fig3A).The analysis suggested that SCC was more prevalent among older males with a long history of smoking, whereas ADC tended to occur in younger females, typically non-smokers (p < 0.05).Univariate logistic regression analysis identified gender, age, smoking history, T stage, white blood cell count (WBC), CEA, total lesion glycolysis (TLG), SUVmin,