Figures
Abstract
This multicenter study aims to enhance the preoperative prediction of pathological invasiveness in clinical stage I lung adenocarcinoma (LUAD) by developing and validating topologically distinct 2D and 3D intratumoral heterogeneity (ITH) scores derived from chest CT imaging. Patients with histopathologically confirmed LUAD were enrolled from three medical centers. We established a dual-scale computational framework to quantify ITH: the 2D ITH score was derived by integrating local radiomics features with global pixel distribution patterns on the largest cross-sectional slice, while the 3D ITH score captured volumetric heterogeneity using a voxel-based topology-aware approach. Subsequently, six machine learning models integrating clinicoradiologic (CR) features with these heterogeneity scores were developed. Model performance was optimized based on the area under the curve (AUC) across a training set and validated in both an internal test set and an independent external validation set. A total of 1,238 eligible patients were enrolled. Centers 1 and 2 provided 1,053 patients (Training: n=737; Internal Test: n=316), while Center 3 provided 185 patients for external validation. The CatBoost classifier integrating 2D/3D ITH scores with CR features (2DITH-3DITH-CR CatBoost) exhibited superior diagnostic performance, achieving AUCs of 0.867 in the internal test set and 0.881 in the external validation set. The integration of topologically distinct 3D ITH scores significantly improves the preoperative stratification of LUAD invasiveness. The 2DITH-3DITH-CR CatBoost model serves as a robust, non-invasive tool to guide individualized surgical decision-making in clinical practice.
Author summary
Lung adenocarcinoma is the predominant form of lung cancer, and for early-stage patients, surgical decisions hinge on accurately predicting tumor invasiveness. Distinguishing between non-invasive lesions suitable for limited resection and invasive tumors requiring lobectomy remains challenging with standard subjective CT interpretation. To address this, we developed a quantitative framework that analyzes the internal “texture” and structural complexity of lung nodules. A key innovation of our study is the introduction of a “3D intratumoral heterogeneity score,” which uses advanced topological analysis to map the spatial connectivity and fragmentation of tumor tissue across the entire volume, rather than just a single 2D slice. We integrated these scores into a machine learning model and validated its performance on a large cohort of 1,238 patients from three different medical centers. Our results confirm that this 3D approach significantly outperforms traditional methods in identifying invasive cancer. This non-invasive, robust tool offers clinicians a powerful objective metric to guide personalized surgical planning, helping to avoid overtreatment and preserve vital lung function for patients.
Citation: Zuo Z, Fan X, Zeng Y, Qi W, Liu W, Li W, et al. (2026) Topologically distinct 2D and 3D intratumoral heterogeneity scores for preoperatively predicting invasiveness in stage I lung adenocarcinoma: A multicenter study. PLOS Digit Health 5(2): e0001246. https://doi.org/10.1371/journal.pdig.0001246
Editor: Guochao Zhang, CACMS, CHINA
Received: September 29, 2025; Accepted: January 29, 2026; Published: February 20, 2026
Copyright: © 2026 Zuo et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data included in this study are available and can be accessed by requesting a detailed research proposal to the Ethics Committee/Data Access Committee at the Third Xiangya Hospital of Central South University (dbxy3yy@163.com).
Funding: This work was supported by the National Science Fund for Distinguished Young Scholars (Grant No. 2023JJ10091 to WL). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Lung cancer remains the leading global cause of cancer-related mortality, with adenocarcinoma accounting for over 40% of all cases [1,2]. The revised WHO classification stratifies lung adenocarcinoma (LUAD) precursors and early-stage lesions into progressive biological states: atypical adenomatous hyperplasia (AAH), adenocarcinoma in situ (AIS), minimally invasive adenocarcinoma (MIA), and invasive adenocarcinoma (IAC) [3]. Critically, AIS and MIA show exceptional outcomes, with a 10-year disease-free survival rate approaching 100% after complete resection [4]. In contrast, IAC exhibits a significantly reduced 5-year survival rate of 89% and a higher risk of recurrence [5]. This prognostic dichotomy necessitates individualized surgical strategies: sublobar resection (wedge or segmentectomy) is sufficient for preinvasive lesions (AAH, AIS, MIA) [6], while IAC requires radical lobectomy accompanied by systematic lymph node dissection to mitigate metastatic potential [7–9]. Consequently, accurate preoperative assessment of pathological invasiveness directly influences therapeutic decision-making.
The invasiveness of LUAD is intrinsically linked to intratumoral heterogeneity (ITH), which manifests as spatial variations in cellular composition, metabolic activity, and microenvironmental remodeling [10]. Computed tomography (CT) serves as a critical non-invasive tool for LUAD diagnosis and staging, capturing comprehensive morphological and textural information that may reflect underlying ITH. Distinct local patterns (e.g., necrosis, angiogenesis) or global patterns (e.g., density gradients) on CT images can arise from heterogeneous tumor biology [11]. Currently, clinical practice relies on the subjective evaluation of clinicoradiologic (CR) features, such as nodule size, CT density, lobulation sign, and patient age. However, these methodologies encounter significant limitations, including poor repeatability and a dependence on the interpreter’s level of experience [12].
Presently, CT-based quantification of ITH primarily relies on two paradigms, yet both often overlook the spatial topology of heterogeneous tissues. The conventional radiomics approach utilizes single or composite features (e.g., entropy, wavelet textures) as invasiveness biomarkers [13–15]; however, it typically assumes a uniform distribution of heterogeneity, failing to quantify localized pathological variations. Conversely, habitat analysis segments tumor subregions into distinct risk profiles but frequently neglects the topological organization—specifically the connectivity and spatial arrangement—of these habitats [16,17].
To bridge these gaps, recent advancements have introduced the “ITH score,” a metric originally proposed by Li et al. [18] that integrates local radiomics features with global pixel distribution patterns through unsupervised clustering. Crucially, this approach conceptualizes heterogeneity not merely as statistical variance, but as the fragmentation of topologically distinct phenotypic subregions. This methodology has demonstrated significant clinical utility in predicting pathological subtypes [19] and invasiveness in patients with LUAD [20–22]. Nevertheless, these methodologies are predominantly restricted to the largest cross-sectional CT slice operationally defined as the 2D ITH score which limits the comprehensive interpretation of subtle, volumetric heterogeneity signatures. Addressing this limitation, Zuo et al. [23] recently advanced the paradigm by introducing a “3D ITH score,” calculated from the entire tumor volume to provide a more holistic characterization of heterogeneity.
Building upon these foundational studies, the present research proposes a comprehensive framework that integrates topologically distinct 2D and 3D ITH scores with standard CR features. By explicitly quantifying the volumetric connectivity and spatial complexity of tumor subregions, this study aims to capture depth-wise invasive features that planar analysis may miss. Leveraging a large-scale multicenter cohort to ensure model robustness and generalizability, the primary objective of this study is to establish these topology-aware ITH scores as pivotal diagnostic biomarkers for the preoperative prediction of LUAD invasiveness, ultimately providing evidence-based guidance for surgical decision-making.
Materials and methods
Patient enrollment
We consecutively enrolled patients with histopathologically confirmed LUAD who underwent surgical resection and preoperative chest CT scans between January 2018 and January 2024 at three medical centers: Xiangtan Central Hospital, the Third Xiangya Hospital of Central South University, and the Affiliated Hospital of Southwest Medical University. Inclusion criteria included: (i) histopathological confirmation of lung adenocarcinoma with available surgical specimens; (ii) a preoperative thin-section CT scan (slice thickness mm) performed within 6 months before surgery; and (iii) lung nodules measuring 5-30 mm in maximum diameter, radiologically classified as clinical stage I (T1N0M0) according to the 9th TNM staging system [24]. Exclusion criteria included: (i) suboptimal CT image quality; (ii) presence of synchronous multiple lung adenocarcinomas or metastatic lesions; (iii) receipt of preoperative chemoradiotherapy; and (iv) unsuccessful computational processing (e.g., segmentation failures, clustering errors, or feature extraction abnormalities). A comprehensive enrollment schematic can be found in Fig 1.
A total of 1,238 eligible patients from three medical centers were consecutively enrolled. Following rigorous exclusion criteria, patients were stratified into a training set (n=737), an internal test set (n=316), and an independent external validation set (n=185) to ensure robust model development and validation.
This multicenter retrospective study was approved by the Institutional Review Boards of Xiangtan Central Hospital (No. 2021-07-009), The Third Xiangya Hospital of Central South University (No. K25083), and the Affiliated Hospital of Southwest Medical University (No. KY2020147), in accordance with the 2013 revision of the Declaration of Helsinki [25]. Informed consent was waived due to the retrospective nature of the study and the use of de-identified patient information.
Multicenter CT dataset construction and nodule segmentation
This multicenter retrospective analysis employed standardized thin-section CT protocols across all participating institutions. Scanners from multidetector CT systems were utilized, with a reconstructed slice thickness set to . Detailed specifications of the CT acquisition protocols are provided in S1 Appendix.
All DICOM images underwent nodule segmentation using ITK-SNAP (v3.6.0) following a rigorous two-stage workflow to generate the 3D masks (denoted as M). The initial stage involved manual contouring of volumetric boundaries on multiplanar reconstructions, performed under fixed lung window settings (window level –600 Hounsfield units [HU], width 1500 HU) by a board-certified cardiothoracic radiologist with 5 years of subspecialty experience. This was followed by a careful review and refinement conducted by an attending thoracic radiologist with 10 years of dedicated experience to ensure the accuracy of the delineation.
Calculation of 2D/3D ITH scores
To quantify the internal heterogeneity of LUAD nodules, we developed a dual-scale computational framework that transforms local radiomic features into topologically distinct tissue subregions. This approach extends established methodologies [18–23] by implementing a voxel-based topology-aware quantification strategy. The workflow generates two complementary metrics: a 2D ITH score that maps planar spatial patterns on the largest cross-sectional slice, and a 3D ITH score that captures volumetric connectivity across the entire tumor mask. The complete computational procedure is formalized in Algorithm 1.
Algorithm 1 Calculation of topologically distinct 2D and 3D ITH scores.
Require: CT Image, Tumor Mask Ω, Mode , Initialize accumulator
, feature number
, cluster Number
Ensure: ITH Score
1: Phase 1: Initialization
2: if M = 2D then
3: Largest cross-sectional axial slice of Ω
4: (Sliding window kernel)
5: (Pixel topology)
6: else if M = 3D then
7: (Full volumetric mask)
8: (Sliding window kernel)
9: (Voxel topology)
10: end if
11: Phase 2: Feature Extraction & Clustering
12: for each do
13: Extract generic feature vector f(x) using kernel w
14: end for
15: Apply K-means clustering on feature matrix to generate cluster label L(x)
16: Generate Cluster Map C by mapping labels to spatial coordinates
17: Phase 3: Topology-Aware Quantification
18: if M = 2D then
19: Calculate the total area of by Stotal
20: else if M = 3D then
21: Calculate the total volume of by
22: end if
23: for cluster i = 1 to K do
24: Extract binary subregion
25: Identify connected components in Ci by
26: if M = 2D then
27: ni Count of connected components
28: Si,max the largest connected component
29:
30: else if M = 3D then
31: mi Count of connected components
32:
the largest connected component
33:
34: end if
35: end for
36: Phase 4: ITH Score Derivation
37: if M = 2D then
38:
39: else if M = 3D then
40:
41: end if
42: return 2D and 3D ITH Scores
2D ITH score derivation.
The 2D ITH score quantifies heterogeneity within the representative axial plane by analyzing the spatial connectivity patterns of tissue phenotypes. Radiomic features were extracted using a sliding window, and pixels were partitioned into K = 6 topologically distinct subregions. The final score integrates the spatial dispersion of these clusters using an area-weighted formulation, as defined in Eq 1:
where K represents the total number of clusters, ni denotes the number of disconnected topological components for cluster i, is the area of the largest connected component within that cluster, and
is the total area of the tumor on the largest cross-sectional slice.
3D ITH score derivation.
The 3D ITH score extends this analysis to the volumetric domain to capture anisotropic texture variations and volumetric fragmentation along the z-axis. A sliding window was employed for voxel-wise feature extraction. To preserve the continuity of complex infiltrative patterns in 3D space, a 26-connectivity topology (encompassing face, edge, and corner adjacency) was utilized to identify connected voxel components. The volumetric score is calculated using Eq 2:
where mi represents the count of distinct connected volumes for cluster i, denotes the volume of the largest connected component for that cluster, and
represents the total tumor volume.
Clinicoradiologic feature acquisition
The evaluation of CR features was conducted independently by two senior radiologists, each with over 10 years of thoracic imaging experience, who were blinded to histopathological outcomes. To resolve any inter-observer discrepancies, a consensus was reached through consultation with a third radiologist. The CR profile comprised demographic variables (age, sex) and a comprehensive set of morphological descriptors derived from thin-section CT images. Nodule size was quantified as the maximum diameter on the largest axial cross-section. Nodule attenuation (CT density) was classified into three distinct categories: pure ground-glass nodules (pGGN), part-solid nodules (PSN), and solid nodules (SN). Furthermore, specific qualitative morphological signs were systematically assessed: lobulation was defined as a scalloped or wavy margin; spiculation as linear strands extending from the nodule into the lung parenchyma; vascular convergence as the convergence of vessel structures toward the tumor; the vacuole sign as bubble-like lucencies of <5 mm within the nodule; and pleural indentation as the retraction of the visceral pleura toward the lesion. Representative examples of these features are visually detailed in Fig 2.
The figure illustrates the key morphological descriptors assessed in this study.
Machine learning framework
Machine learning model development.
We implemented a comprehensive supervised learning pipeline utilizing the scikit-learn library [26]. Six advanced machine learning algorithms were constructed: Gradient Boosting Decision Tree (GBDT), Adaptive Boosting (AdaBoost), Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Categorical Boosting (CatBoost), and Random Forest (RF). Hyperparameter optimization was rigorously conducted within the training cohort using a grid search strategy embedded within a 5-fold cross-validation (CV). For each parameter combination, the model was trained on four folds and validated on the remaining hold-out fold to generate out-of-fold (OOF) predictions. The optimal hyperparameter configuration for each classifier was identified by maximizing the mean area under the curve (AUC) calculated across the five OOF validation sets [27].
Performance metrics and model selection strategy.
To ensure clinical robustness, model performance was comprehensively evaluated using the AUC as the primary discriminative metric. Complementary metrics—including accuracy, precision, recall, and F1 score were also calculated to assess classification balance. The “optimal model" was initially identified as the classifier achieving the highest AUC across the internal test set and independent external validation set, and subsequently validated through the incremental ablation study described below [22,28]. The overall design workflow of this study is illustrated in Fig 3.
The workflow encompasses three primary stages: (1) Acquisition of clinicoradiologic features and calculation of dual-scale (2D and 3D) intratumoral heterogeneity (ITH) scores; (2) Development and optimization of six machine learning classifiers (GBDT, AdaBoost, XGBoost, LightGBM, CatBoost, and RF); and (3) Model interpretation using SHAP values alongside rigorous performance evaluation across multicenter datasets.
Model interpretability and incremental feature optimization.
To thoroughly evaluate model robustness and identify the most parsimonious predictive signature, we implemented a Shapley Additive Explanations (SHAP)-guided incremental feature ablation strategy across all six machine learning classifiers. First, we utilized TreeSHAP to quantify the contribution of each feature. The SHAP value represents the marginal contribution of feature i, calculated as:
where F is the set of all input features, represents a subset of features excluding feature i, and f denotes the prediction function of the machine learning model.
For each classifier, the constituent features were ranked in descending order based on their global importance score (Ii), defined as the mean absolute SHAP value across the cohort (), where N is the total number of samples and
is the SHAP value of feature i for sample j.
Based on these rankings, we executed a sequential forward selection process for every classifier (Algorithm 2). This comprehensive analysis served two critical purposes. First, regarding model selection, we compared the AUC trajectories of all classifiers during the iterative addition of features; the classifier exhibiting consistently superior performance curves was identified as the best model (). Second, for feature optimization, we pinpointed the “knee point" within this optimal model—specifically, the minimal feature subset size l* where the marginal performance gain (Δ) dropped below a negligible threshold (ε)—thereby establishing the final stable feature signature, formally denoted as the optimal subset S*.
Algorithm 2 Multi-model SHAP-guided iterative feature elimination.
Require: Set of Classifiers (GBDT, CatBoost, etc.), Dataset
, Full Features F, feature number N
Ensure: best classifier , optimal Subset S*
1: Phase 1: Performance Trajectory Generation
2: for each classifier do
3: Train c using all features F
4: Rank features by Global SHAP Importance
5: for i = 1 to N do
6: Form subset from FR
7: Train classifier c with Fi using 5-fold CV
8: Record performance metric AUCc,i
9: end for
10: end for
11: Phase 2: Comparative Selection & Optimization
12: Select classifier with consistently highest AUC trajectory
13: Phase 3: Knee Point Identification (for )
14: Initialize , Threshold
15: while l < N do
16:
17: if then
Marginal gain negligible
18:
19: break
20: else
21:
22: end if
23: end while
24: return ,
Results
Patient characteristics
A total of 1,238 eligible patients were recruited from three medical centers. A cohort of 1,053 consecutively recruited patients from Centers 1 and 2 was randomly allocated via stratified sampling into training (n=737) and internal test (n=316) sets, following a 7:3 ratio to ensure consistent class distributions. Additionally, 185 prospectively enrolled patients from Center 3 comprised an independent external validation set. Within the training set, 737 patients were included, among whom 495 (67.2%) were diagnosed with IAC. The internal test cohort consisted of 316 patients, with 202 (63.9%) diagnosed with IAC. The external validation set included 185 patients, of whom 120 (64.9%) were diagnosed with IAC.
The detailed characteristics of the enrolled patients are presented in Table 1. No statistically significant differences were found among the three groups, as all p-values were greater than 0.05, indicating that the groups were comparable.
2D/3D ITH scores and pathological invasiveness
The distributions of 2D and 3D ITH scores across the training, internal test, and external validation sets are presented in Fig 4. In the context of clinical stage I LUAD, lesions classified as IAC demonstrated significantly higher 2D and 3D ITH scores compared with the AIS/MIA group across all cohorts (p < 0.001). These findings indicate that 2D/3D ITH score derived from preoperative CT images enables significant stratification between invasive and pre-/minimally invasive histological subtypes.
The violin plots display the distribution of heterogeneity scores across the training (a, d), internal test (b, e), and external validation sets (c, f). Statistical analysis reveals that invasive adenocarcinoma (IAC) lesions exhibit significantly higher 2D and 3D ITH scores compared to Adenocarcinoma in situ/minimally invasive adenocarcinoma (AIS/MIA) across all datasets (P < 0.001), validating the discriminatory power of these quantitative metrics.
Performance evaluation and optimal model selection
The ROC curves for distinguishing IAC from AIS/MIA across the six machine learning classifiers (GBDT, AdaBoost, XGBoost, LightGBM, CatBoost, and RF) are illustrated in Fig 5. All models incorporated CR features combined with 2D and 3D ITH scores as input predictors.
The curves illustrate the diagnostic performance of six machine learning classifiers in distinguishing IAC from AIS/MIA within the internal test set (Left) and the independent external validation set (Right). The 2DITH-3DITH-CR CatBoost model demonstrates superior efficacy, achieving the highest Area Under the Curve (AUC) in both internal (0.867) and external (0.881) validations.
Strictly adhering to the selection criteria defined in the Methods, we identified the “optimal model" based on the highest AUC performance across both the internal test and independent external validation sets. Among the evaluated classifiers, CatBoost emerged as the top performer, achieving an AUC of 0.867 in the internal test set and 0.881 in the external validation set. As summarized in Table 2, CatBoost also demonstrated superior balance across complementary metrics, including accuracy, precision, recall, and F1 score. Consequently, we established this optimal configuration—integrating 2D ITH scores, 3D ITH scores, and CR features—as the 2DITH-3DITH-CR CatBoost classifier for the identification of IAC.
Model interpretation and optimal signature identification.
The execution of the multi-model SHAP-Guided Iterative Feature Elimination framework yielded distinctive performance trajectories for the six machine learning classifiers, as visualized in Fig 6. During the trajectory generation stage, all classifiers exhibited rapid performance gains with the initial accumulation of high-importance features. In alignment with our comparative selection criteria, the CatBoost classifier (red line) demonstrated a consistently superior AUC trajectory across the majority of iterations, thereby validating it as the best classifier () for clinical implementation. Furthermore, consistent with the knee point identification logic, a decisive inflection point was observed at a feature subset size of l* = 5 (marked by the vertical dashed line). At this threshold, the marginal gain in AUC (Δ) dropped below a negligible threshold (ε), indicating that the top five features–anchored by the topology-aware heterogeneity metrics—constitute the optimal subset (S*) that maximizes diagnostic accuracy while minimizing model complexity.
The line graph tracks the AUC evolution for Random Forest (RF), Gradient Boosting Decision Tree (GBDT), XGBoost, LightGBM, CatBoost, and AdaBoost as features are sequentially added based on their SHAP importance rankings. The vertical dashed line (Optimal Features) marks the decisive “knee point” at l* = 5 for the top-performing CatBoost model, identifying the minimal feature subset required to achieve maximal diagnostic accuracy while ensuring model parsimony.
Fig 7 provides a comprehensive interpretation of the 2DITH-3DITH-CR CatBoost classifier using SHAP analysis. Specifically, the topologically distinct 3D ITH score was identified as the most significant contributor, exhibiting the highest mean absolute SHAP value (mean ) and accounting for approximately 24.0% of the total predictive power. This finding underscores the superiority of volumetric topology-aware quantification over planar analysis. Corroborated by the iterative evaluation in Fig 6, the 3D ITH score proved to be the most robust feature for assessing LUAD invasiveness, followed by nodule size, CT density, 2D ITH score, and patient age.
The bar chart ranks predictors by their mean absolute SHAP values, identifying the 3D ITH score as the most influential feature (contributing approximately 24.0% to the model’s output), followed by nodule size and CT density. This hierarchy highlights the dominance of quantitative volumetric heterogeneity over qualitative morphological signs in predicting invasiveness.
Feature ablation analysis
To evaluate the incremental contribution of each feature domain to the predictive capability, we conducted a systematic feature ablation study on the optimal 2DITH-3DITH-CR CatBoost classifier. Fig 8 illustrates the ROC curves representing the degradation in diagnostic performance as features were selectively removed, with detailed metrics summarized in Table 3. The full integration model (2DITH-3DITH-CR CatBoost) consistently achieved the highest AUCs across both the internal test set (AUC = 0.867) and the external validation set (AUC = 0.881), confirming that the synergistic combination of all three feature domains yields the most robust predictive signature.
The bar charts compare the AUC, Accuracy, Precision, Recall, and F1 scores across different model configurations in both internal and external cohorts.
Dissecting the individual contributions revealed distinct roles for each modality. The 3D ITH score exhibited the strongest generalization capability, as models relying solely on 3D ITH features significantly outperformed those relying solely on 2D ITH features (External AUC: 0.844 vs. 0.647). Concurrently, the CR features served as a critical performance baseline; even without heterogeneity metrics, the CR CatBoost model maintained a respectable performance (External AUC = 0.833). Furthermore, the inclusion of 2D ITH scores provided a fine-tuning effect to the model. While the exclusion of 2D ITH (resulting in the 3DITH-CR model) led to only a marginal decline in AUC (0.881 to 0.875 in external validation), its integration in the full model contributed to maximizing the overall diagnostic stability.
Discussion
To our knowledge, this multicenter study is the first to establish volumetric ITH quantification—termed the 3D ITH score—as a pivotal biomarker for the preoperative prediction of pathological invasiveness in clinical stage I LUAD. By integrating this novel 3D metric with the established 2D ITH score and standard CR features, we developed a multimodal 2DITH-3DITH-CR CatBoost classifier. This model demonstrated superior discriminative performance and robustness, achieving an AUC of 0.867 in the internal test set and 0.881 in the independent external validation set. Our SHAP-guided interpretability analysis revealed that the 3D ITH score was the paramount predictive feature, followed by nodule size, CT density, 2D ITH score, and patient age. Furthermore, feature ablation experiments provided definitive evidence that the 3D ITH score serves as the primary driver of the model’s generalization capability, distinguishing it as a non-redundant cornerstone for invasiveness assessment.
Intratumoral heterogeneity fundamentally originates from the spatial disorganization of diverse cellular populations and the adaptive remodeling of the tumor microenvironment. Capturing this complex biological phenomenon radiologically necessitates the concurrent quantification of both local radiomics features, which reflect variations in cellular phenotypes, and global spatial distributions, which encode architectural disruptions. Since its initial proposal by Li et al. [18], the concept of the ITH score has been progressively applied to evaluate biological characteristics across various malignancies, including breast [29,30], gastric [31], and colorectal cancers [32]. In the context of LUAD, the 2D ITH score—derived solely from the largest cross-sectional slice—has shown efficacy in predicting prognosis [18], pathological subtype [19], and invasiveness [20–22]. Consistent with these precedents, our study observed that IAC lesions exhibited significantly higher 2D ITH scores compared to AIS/MIA (p < 0.001), confirming that planar heterogeneity metrics effectively reflect the invasive potential of LUAD.
However, planar analysis inherently neglects the anisotropic growth patterns of tumors. Building upon the multiperspective framework established by Zuo et al. [23], our volumetric 3D ITH score addresses this limitation through the voxel-level integration of texture attributes with global distribution patterns. By utilizing a 26-connectivity topology, the 3D ITH score preserves the spatial relationships between tumor subregions along all three anatomical axes. This allows for the detection of depth-wise invasive features, such as discontinuous micrometastatic foci and gradients of necrosis-viability transitions, which may be obscured in 2D views. This theoretical advantage was empirically validated by our feature ablation study, where the 3D ITH score not only exhibited the strongest individual generalization capability (AUC = 0.844) but also significantly outperformed the planar-only approach (AUC = 0.647). Crucially, while the exclusion of 2D features yielded minimal impact, the removal of volumetric features precipitated a sharp decline in predictive accuracy, underscoring the indispensability of 3D heterogeneity in capturing the full spectrum of tumor invasiveness. Notably, the observation that our model achieved slightly higher performance in the external validation set (AUC=0.881) compared to the internal test set (AUC=0.867) addresses potential concerns regarding overfitting. Rather than indicating model bias, this superior generalization likely reflects the intrinsic stability of the 3D ITH metric. Unlike conventional radiomics that can be sensitive to scanner-specific noise, the topology-aware ITH score quantifies relative spatial connectivity ratios, making it inherently robust against the variability of CT acquisition protocols across different medical centers.
Additionally, our SHAP analysis offers critical insights into the clinical evaluation of LUAD. Among CR features, quantitative metrics—specifically nodule size and CT density—dominated the prediction of invasiveness, whereas qualitative morphological descriptors (e.g., spiculation and lobulation signs) exhibited negligible contributions. This finding challenges traditional diagnostic paradigms but aligns with emerging evidence. Meta-analyses by Yang et al. [33] and He et al. [34] have demonstrated that objective CT density metrics outperform subjective morphological signs in differentiating invasive lesions. Similarly, studies by Fu et al. [35] and Zuo et al. [36] identified nodule size as the primary independent risk factor for invasiveness. Collectively, these results underscore the limitations of subjective morphological interpretation in the era of precision oncology and support a shift towards quantitative, reproducible imaging biomarkers.
Methodologically, the superior performance of the CatBoost classifier over other machine learning algorithms (e.g., RF, SVM, LightGBM) can be attributed to its distinct algorithmic advantages tailored for heterogeneous biomedical data. Unlike traditional gradient boosting methods, CatBoost employs an “ordered boosting” scheme that effectively mitigates prediction shift and target leakage, thereby reducing the risk of overfitting on small-to-medium-sized medical datasets [37,38]. This architectural robustness likely contributed to the high stability observed in our multicenter validation. Furthermore, its native capability to process categorical features (e.g., tumor location, sex) without obligate one-hot encoding preserves the original data structure, allowing for more efficient modeling of non-linear interactions between complex radiomic signatures and clinical variables.
Despite these advancements, this study has limitations. First, the retrospective inclusion of patients with confirmed pathology introduces potential selection bias towards surgically managed cases. Future prospective studies enrolling all indeterminate nodules, including those under active surveillance, are necessary to confirm real-world applicability. Second, we did not perform a direct spatial registration between the ITH subregions and histopathological specimens. Consequently, while the 3D ITH score correlates strongly with invasiveness, the specific biological identity of these topological clusters (e.g., distinguishing necrosis from high cellularity) remains to be validated. Future research integrating spatial transcriptomics and multiplex immunohistochemistry is required to map these heterogeneity patterns against specific tissue microstructures, establishing direct radiopathologic correlates. Finally, while validated in LUAD, the generalizability of this volumetric scoring methodology to other solid tumors warrants further investigation. Clinically, this approach is highly cost-effective as it utilizes standard preoperative CT images without additional expense. However, routine implementation faces barriers, specifically the need for software integration into radiological workflows and prospective standardization of diagnostic cut-offs.
Conclusion
This study establishes volumetric ITH quantification as a robust biomarker for preoperatively predicting pathological invasiveness in clinical stage I LUAD. The proposed 2DITH-3DITH-CR CatBoost classifier demonstrated superior diagnostic performance and generalizability across multicenter cohorts compared to traditional methods. By capturing comprehensive volumetric spatial patterns, this approach facilitates precise risk stratification, marking a significant advancement from descriptive radiology to quantitative precision oncology for optimizing early-stage LUAD management.
Supporting information
S1 Appendix. CT acquisition protocols.
Detailed specifications of the CT acquisition protocols used at the three participating medical centers.
https://doi.org/10.1371/journal.pdig.0001246.s001
(PDF)
Acknowledgments
The authors thank the staff at the participating medical centers for their assistance with data collection. Funding, data availability, and author contributions are provided in the relevant sections of the submission system.
References
- 1. Zuo Z, Zhang G, Song P, Yang J, Li S, Zhong Z, et al. Survival nomogram for stage IB non-small-cell lung cancer patients, based on the SEER database and an external validation cohort. Ann Surg Oncol. 2021;28(7):3941–50. pmid:33249521
- 2. Zuo Z-C, Wang L, Peng K, Yang J, Li X, Zhong Z, et al. Development and validation of a nomogram for predicting the 1-, 3-, and 5-year survival in patients with acinar-predominant lung adenocarcinoma. Curr Med Sci. 2022;42(6):1178–85. pmid:36542324
- 3. Rami-Porta R, Nishimura KK, Giroux DJ, Detterbeck F, Cardillo G, Edwards JG, et al. The international association for the study of lung cancer lung cancer staging project: proposals for revision of the TNM stage groups in the forthcoming (ninth) edition of the TNM classification for lung cancer. J Thorac Oncol. 2024;19(7):1007–27. pmid:38447919
- 4. Yotsukura M, Asamura H, Motoi N, Kashima J, Yoshida Y, Nakagawa K, et al. Long-term prognosis of patients with resected adenocarcinoma in situ and minimally invasive adenocarcinoma of the lung. J Thorac Oncol. 2021;16(8):1312–20. pmid:33915249
- 5. Watanabe Y, Hattori A, Nojiri S, Matsunaga T, Takamochi K, Oh S, et al. Clinical impact of a small component of ground-glass opacity in solid-dominant clinical stage IA non-small cell lung cancer. J Thorac Cardiovasc Surg. 2022;163(3):791-801.e4. pmid:33516459
- 6. Godfrey CM, Marmor HN, Lambright ES, Grogan EL. Minimally invasive and sublobar resections for lung cancer. Surg Clin North Am. 2022;102(3):483–92. pmid:35671768
- 7. Saji H, Okada M, Tsuboi M, Nakajima R, Suzuki K, Aokage K, et al. Segmentectomy versus lobectomy in small-sized peripheral non-small-cell lung cancer (JCOG0802/WJOG4607L): a multicentre, open-label, phase 3, randomised, controlled, non-inferiority trial. Lancet. 2022;399(10335):1607–17. pmid:35461558
- 8. Righi I, Maiorca S, Diotti C, Bonitta G, Mendogni P, Tosi D, et al. Oncological outcomes of segmentectomy versus lobectomy in clinical stage I non-small cell lung cancer up to two centimeters: systematic review and meta-analysis. Life (Basel). 2023;13(4):947. pmid:37109476
- 9. Altorki N, Wang X, Kozono D, Watt C, Landrenau R, Wigle D, et al. Lobar or sublobar resection for peripheral stage IA non-small-cell lung cancer. N Engl J Med. 2023;388(6):489–98. pmid:36780674
- 10. Martínez-Ruiz C, Black JRM, Puttick C, Hill MS, Demeulemeester J, Larose Cadieux E, et al. Genomic-transcriptomic evolution in lung cancer and metastasis. Nature. 2023;616(7957):543–52. pmid:37046093
- 11. Prosper AE, Kammer MN, Maldonado F, Aberle DR, Hsu W. Expanding role of advanced image analysis in CT-detected indeterminate pulmonary nodules and early lung cancer characterization. Radiology. 2023;309(1):e222904. pmid:37815447
- 12. Hoye J, Solomon J, Sauer TJ, Robins M, Samei E. Systematic analysis of bias and variability of morphologic features for lung lesions in computed tomography. J Med Imaging (Bellingham). 2019;6(1):013504. pmid:30944842
- 13. Zuo Z, Zeng W, Peng K, Mao Y, Wu Y, Zhou Y, et al. Development of a novel combined nomogram integrating deep-learning-assisted CT texture and clinical-radiological features to predict the invasiveness of clinical stage IA part-solid lung adenocarcinoma: a multicentre study. Clin Radiol. 2023;78(10):e698–706. pmid:37487842
- 14. Zuo Z, Li Y, Peng K, Li X, Tan Q, Mo Y, et al. CT texture analysis-based nomogram for the preoperative prediction of visceral pleural invasion in cT1N0M0 lung adenocarcinoma: an external validation cohort study. Clin Radiol. 2022;77(3):e215–21. pmid:34916048
- 15. Zuo Z, Zhang G, Lin S, Xue Q, Qi W, Zhang W, et al. Radiomics nomogram based on optimal volume of interest derived from high-resolution CT for preoperative prediction of IASLC grading in clinical IA lung adenocarcinomas: a multi-center, large-population study. Technol Cancer Res Treat. 2024;23:15330338241300734. pmid:39569528
- 16. Shang Y, Zeng Y, Luo S, Wang Y, Yao J, Li M, et al. Habitat imaging with tumoral and peritumoral radiomics for prediction of lung adenocarcinoma invasiveness on preoperative chest CT: a multicenter study. AJR Am J Roentgenol. 2024;223(4):e2431675. pmid:39140631
- 17. Zuo Z, Deng J, Ge W, Zhou Y, Liu H, Zhang W, et al. Quantifying intratumoral heterogeneity within sub-regions to predict high-grade patterns in clinical stage I solid lung adenocarcinoma. BMC Cancer. 2025;25(1):51. pmid:39789523
- 18. Li J, Qiu Z, Zhang C, Chen S, Wang M, Meng Q, et al. ITHscore: comprehensive quantification of intra-tumor heterogeneity in NSCLC by multi-scale radiomic features. Eur Radiol. 2023;33(2):893–903. pmid:36001124
- 19. Zhang J, Sha J, Liu W, Zhou Y, Liu H, Zuo Z. Quantification of intratumoral heterogeneity: distinguishing histological subtypes in clinical T1 stage lung adenocarcinoma presenting as pure ground-glass nodules on computed tomography. Acad Radiol. 2024;31(10):4244–55. pmid:38627129
- 20. Zheng H, Chen W, Qi W, Liu H, Zuo Z. Enhancing the prediction of the invasiveness of pulmonary adenocarcinomas presenting as pure ground-glass nodules: integrating intratumor heterogeneity score with clinical-radiological features via machine learning in a multicenter study. Digit Health. 2024;10:20552076241289181. pmid:39381817
- 21. Qi H, Zuo Z, Lin S, Chen Y, Li H, Hu D, et al. Assessment of intratumor heterogeneity for preoperatively predicting the invasiveness of pulmonary adenocarcinomas manifesting as pure ground-glass nodules. Quant Imaging Med Surg. 2025;15(1):272–86. pmid:39839051
- 22. Zuo Z, Zeng Y, Deng J, Lin S, Qi W, Fan X, et al. Intratumoral heterogeneity score enhances invasiveness prediction in pulmonary ground-glass nodules via stacking ensemble machine learning. Insights Imaging. 2025;16(1):209. pmid:41006794
- 23. Zuo Z, Fan X, Zeng Y, Qi W, Liu W, Zhang J. Multiperspective tumor heterogeneity metrics for preoperative prediction of IASLC grading in clinical stage IA lung adenocarcinomas: a multicenter study. Computer Methods and Programs in Biomedicine. 2025:109137.
- 24. Kim S, Ahn Y, Lee GD, Choi S, Kim HR, Kim Y-H, et al. Validation of the 9th tumor, node, and metastasis staging system for patients with surgically resected non-small cell lung cancer. Eur J Cancer. 2025;222:115436. pmid:40252632
- 25. Ehni H-J, Wiesing U. The declaration of Helsinki in bioethics literature since the last revision in 2013 . Bioethics. 2024;38(4):335–43. pmid:38367022
- 26. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O. Scikit-learn: machine learning in python. Journal of Machine Learning Research. 2011;12:2825–30.
- 27. Li Y, Ding J, Wu K, Qi W, Lin S, Chen G, et al. Ensemble machine learning classifiers combining CT radiomics and clinical-radiological features for preoperative prediction of pathological invasiveness in lung adenocarcinoma presenting as part-solid nodules: a multicenter retrospective study. Technol Cancer Res Treat. 2025;24:15330338251351365. pmid:40525253
- 28. Obuchowski NA, Bullen JA. Receiver operating characteristic (ROC) curves: review of methods with applications in diagnostic medicine. Phys Med Biol. 2018;63(7):07TR01. pmid:29512515
- 29. Zuo Z, Feng Y, Deng J, Yang X, Zeng Y, Fan X. Dynamic contrast-enhanced MRI-derived intratumoral heterogeneity quantification score: Improving lymphovascular invasion and invasive breast cancer recurrence-free survival predictions. Radiography (Lond). 2026;32(1):103204. pmid:41566494
- 30. Huang Y, Wang X, Cao Y, Lan X, Hu X, Mou F, et al. Nomogram for predicting neoadjuvant chemotherapy response in breast cancer using MRI-based intratumoral heterogeneity quantification. Radiology. 2025;315(1):e241805. pmid:40232145
- 31. Li J, Li Z, Wang Y, Li Y, Zhang J, Li Z, et al. CT radiomics-based intratumoral and intertumoral heterogeneity indicators for prognosis prediction in gastric cancer patients receiving neoadjuvant chemotherapy. Eur Radiol. 2025;35(8):4448–60. pmid:39953151
- 32. Zhou Y, Zuo Z, Zhao J, Tan Y, Deng J, Wei X. Development and validation of time-to-event machine learning models for predicting Disease-free survival in patients with locally advanced colorectal cancer: a multicenter cohort study. Annals of Surgical Oncology. 2025:1–13.
- 33. Yang Y, Xu J, Wang W, Zhao J, Yang Y, Wang B, et al. Meta-analysis of the correlation between CT-based features and invasive properties of pure ground-glass nodules. Asian J Surg. 2023;46(9):3405–16. pmid:37328382
- 34. He S, Chen C, Wang Z, Yu X, Liu S, Huang Z, et al. The use of the mean computed-tomography value to predict the invasiveness of ground-glass nodules: a meta-analysis. Asian J Surg. 2023;46(2):677–82. pmid:35864044
- 35. Fu F, Zhang Y, Wang S, Li Y, Wang Z, Hu H, et al. Computed tomography density is not associated with pathological tumor invasion for pure ground-glass nodules. J Thorac Cardiovasc Surg. 2021;162(2):451-459.e3. pmid:32711984
- 36. Zuo Z, Wang P, Zeng W, Qi W, Zhang W. Measuring pure ground-glass nodules on computed tomography: assessing agreement between a commercially available deep learning algorithm and radiologists’ readings. Acta Radiol. 2023;64(4):1422–30. pmid:36317301
- 37. Kulkarni CS. Advancing gradient boosting: a comprehensive evaluation of the CatBoost algorithm for predictive modeling. JAIMLD. 2022;1(5):54–7.
- 38. Hancock JT, Khoshgoftaar TM. CatBoost for big data: an interdisciplinary review. J Big Data. 2020;7(1):94. pmid:33169094