Comparison of four models predicting the malignancy of pulmonary nodules: A single-center study of Korean adults

Objective Four commonly used clinical models for predicting the probability of malignancy in pulmonary nodules were compared. While three of the models (Mayo Clinic, Veterans Association [VA], and Brock University) are based on clinical and computed tomography (CT) characteristics, one model (Herder) additionally includes the 18F-fluorodeoxyglucose (FDG) uptake value among the positron emission tomography (PET) characteristics. This study aimed to compare the predictive power of these four models in the context of a population drawn from a single center in an endemic area for tuberculosis in Korea. Methods A retrospective analysis of 242 pathologically confirmed nodules (4–30 mm in diameter) in 242 patients from January 2015 to December 2015 was performed. The area under the receiver operating characteristic curve (AUC) was used to assess the predictive performance with respect to malignancy. Results Of 242 nodules, 187 (77.2%) were malignant and 55 (22.8%) were benign, with tuberculosis granuloma being the most common type of benign nodule (23/55). PET was performed for 227 nodules (93.8%). The Mayo, VA, and Brock models showed similar predictive performance for malignant nodules (AUC: 0.6145, 0.6042 and 0.6820, respectively). The performance of the Herder model (AUC: 0.5567) was not significantly different from that of the Mayo (vs. Herder, p = 0.576) or VA models (vs. Herder, p = 0.999), and there were no differences among the three models in determining the probability of malignancy of pulmonary nodules. However, compared with the Brock model, the Herder model showed a significantly lower ability to predict malignancy (adjusted p = 0.0132). Conclusions In our study, the Herder model including the 18FDG uptake value did not perform better than the other models in predicting malignant nodules, suggesting the limited utility of adding PET/CT data to models predicting malignancy in populations within endemic areas for benign inflammatory nodules, such as tuberculosis.


Introduction
Pulmonary nodules are being detected with increasing frequency because of the increased use of chest computed tomography (CT) [1,2]. Recent low-dose chest CT screening trials showed a beneficial effect on survival for individuals at increased risk of lung cancer [3][4][5][6]. However, the management of pulmonary nodules incidentally detected on CT is a pressing clinical concern because accurately predicting malignant nodules is not straightforward.
Recently, several prediction models using clinical and radiological values have been developed that can help physicians to distinguish between benign and malignant nodules [7]. The classical prediction models (Mayo Clinic [8], Veterans Association (VA) [9], and Brock University [10]) only include clinical values and radiological characteristics on CT, while a fourth model proposed by Herder et al. [11] additionally includes the 18 F-fluorodeoxyglucose (FDG) uptake value in positron emission tomography/computed tomography (PET/CT). The Mayo model [8] that was developed in 1997 includes older age, smoking history, cancer history, nodule diameter, location of nodule (especially the upper lobe), and spiculation. The VA model [9] developed in 2007 includes patient age, smoking history, and nodule diameter. The Brock model [10] developed in 2013 includes age, family history of lung cancer, sex, nodule size, emphysema, nodule count, location of the nodule in the upper lobe, spiculation and part-solid nodule. The Herder model was developed in 2005 at a single center, by analyzing data from 106 patients who underwent 18 FDG-PET, to allow optimization of the prior Mayo model [11] and has already been reported to be a more useful model for predicting malignancy versus the other models [7,12,13].
However, because 18 FDG is a marker of glucose metabolism and is not a specific tracer for malignancy, inflammatory lung lesions, such as tuberculous granuloma and parasite infection, can mimic malignancy and yield false-positive results on PET/CT scans. Moreover, given that the prevalence of tuberculosis among the Korean population in endemic areas is in the intermediate range (70-90/100,000 persons/year) [14], various malignancy prediction models for pulmonary nodules should be considered in Korean adults. Therefore, in this study, we aimed to compare the predictive power of these four models in patients with biopsy-proven pulmonary lung nodules at a single center in Korea.

Study population
We retrospectively reviewed the medical records of 429 consecutive adult patients, with pulmonary nodules 4-30 mm in size, who underwent histopathologic confirmation at the Samsung Medical Center (a 1,979-bed referral hospital in Seoul, South Korea) between January 1, 2015 and December 31, 2015. Of these patients, 20 with more than five nodules, 70 with pure ground glass nodules, and 97 with lymphadenopathy or suspected metastatic disease on chest CT were excluded. As a result, 242 patients were included in the study, and 242 nodules that were confirmed by surgical resection or percutaneous needle aspiration were analyzed; of these, 187 (77.2%) were malignant, and the remaining 55 (22.8%) were benign (Fig 1). PET/ CT was performed in 227 patients (93.8%).
The Institutional Review Board of Samsung Medical Center approved this study and permitted the review and publication of patient records (IRB No.2017-04-002). The requirement for informed consent of individual patients was waived given the retrospective nature of the study.

Statistical analysis
The data are reported as numbers (%) for categorical variables and as medians (interquartile range, IQR) for continuous variables. We compared categorical variables using chi-squared test or Fisher's exact test, and continuous variables using the Mann-Whitney U test. All tests were two-sided, and a P-value < 0.05 was deemed to indicate statistical significance. A receiver operating characteristic (ROC) curve was constructed, and the area under the ROC curve (AUC) was calculated. To compare the AUC values between two models, the nonparametric approach of DeLong et al. [15] was used, and Bonferroni correction was applied to adjust for multiple comparisons. All statistical analyses were performed using SPSS (ver. 23.0; IBM Corp., Armonk, NY, USA) and R software (ver. 3.0.3; R Development Core Team, Vienna, Austria).

Baseline characteristics of the study patients and nodules
The baseline characteristics of the study patients and nodules are summarized in Table 1. The median age of the 242 patients was 61.0 years (IQR: 54.0-67.0 years); 112 (46%) patients were male and 148 (61%) patients were never smokers. The mean number of nodules was 1.4 for benign nodules and 1.5 for malignant nodules.

Receiver operating characteristic (ROC) curves for the four risk prediction models
The area under the ROC curve values (AUC, 95% CI) for each model were as follows (Fig 2):

Decision analysis for the four risk prediction models
We also evaluated the malignancy probability thresholds informing clinical decision-making of the American College of Chest Physicians (ACCP) [16] and the British Thoracic Society (BTS) [17] (Table 2): ACCP guidelines, observe (< 5%), undetermined (5-65%), surgery (> 65%); and BTS guidelines, observe (< 10%), undetermined (10-70%), surgery (>70%). As shown in Table 2, the false-positive rate was highest with the Herder model (up to 6%) using both the ACCP and BTS thresholds, while the true negative rate with the Brock model was up to 4-5% using the ACCP and BTS thresholds. The decision curves for the four models also showed that the Brock model had the highest power for discriminating malignant nodules, while the Herder model showed the lowest discriminatory power (Fig 4).

Histopathological results of nodules
Histopathological results of the pulmonary nodules of the study patients are shown in Table 3.

Discussion
Malignancy prediction models for pulmonary nodules should be applied with due consideration afforded to characteristics that may vary according to geographic region. The majority of malignancy prediction models were developed with Western populations in mind, and as such they are limited in their applicability to Asian populations. No universally applicable models for determining malignancy in pulmonary nodules are available, and the development of any such model should be executed with carefully and compared against the various extant risk models. To our knowledge, this is the first study comparing the four models used to predict the probability of malignancy in the Korean population, especially in a tuberculosis-endemic area, to show how the various prediction models could be applied to local populations. Our results indicated that the Herder prediction model including the 18 FDG uptake value was not better than the other models in predicting malignant nodules, suggesting the limited utility of considering PET/CT in the malignancy prediction process in populations within endemic areas for benign inflammatory nodules, such as tuberculosis.
One explanation for our results is that PET/CT resulted in false-positive findings for benign conditions, including infection, inflammation (soft tissue trauma, collagen diseases), and granulomatous infections (sarcoidosis, tuberculosis) [18][19][20][21]. In our study, there was no difference in SUV max on PET/CT between benign and malignant nodules. Moreover, half (24/48) of the benign nodules showed moderate or intense SUV values on PET/CT. Thus, before assessing malignancy risk using a prediction model, physicians should consider whether the prediction models are useful for their patient populations, because there is geographical variation in the prevalence of granulomatous disease [7]. Therefore, physicians should also be aware of the possibility of false-positive findings when applying malignancy prediction models including PET/CT findings.
The differences and similarities of the four prediction models are as follows. In all models, the risk of malignancy increased with age and nodule size. In the Mayo and VA models [8][9], smoking history is included as an indicator to predict the malignancy of pulmonary nodules; in particular, the VA model includes the time of stopping smoking as a predictor [21]. The Mayo and Brock models [8,10] include the location of nodules and spiculation as predictors of malignancy, but the VA [9] model does not. In predicting the malignancy of pulmonary nodules, the VA and Brock models [9][10] exclude cancer history, whereas the Mayo and Herder models [8,11] include extrathoracic cancer more than 5 years prior. The Herder model is based on Mayo models including PET-CT characteristics [11]. Recent studies have reported that the Herder model incorporating FDG avidity has the highest accuracy in predicting the malignancy of pulmonary nodules, but this remains a subject of debate [2,7].
In our study, the AUCs for all models was lower than those in previous studies. For example, although the AUC of the Brock model in our study (0.682; 95% CI: 0.6009-0.7630) was higher than that of the other models, it was lower than that in the original article by McWilliams et al. (AUC: 0.96; 95% CI: 0.93-0.98) [10]. In addition, the AUC of the other models was lower than that in the original article. For example, the AUC of the Mayo model in our study (AUC: 0.6145; 95% CI: 0.5283-0.7008) was lower than that reported in the original article by Swensen et al. (AUC: 0.833; 95% CI: 0.811-0.855) [8], and the AUC of the VA model (AUC: 0.6042; 95% CI: 0.5162-0.6922) was also lower than that reported in the original article by Michael et al. (AUC: 0.78; 95% CI: 0.73-0.83) [9]. The Herder score including the patients who underwent PET-CT had the lowest accuracy in predicting the malignancy of pulmonary nodules in our patients (AUC: 0.5567; 95% CI: 0.4763-0.6371), and showed a worse performance than that seen in the original report (AUC: 0.92; 95% CI: 0.87-0.97) [11]. Several factors could explain this result. First, the lower AUC values in our study might be due to differences in the methods used to enroll the patients. In the present study, patients with pulmonary nodules confirmed by biopsy were retrospectively identified, a strategy that was different from that used in other studies. Second, compared with western populations, the incidence of lung cancer is higher in non-smoking, middle-aged Asian women. Third, our results were determined according to the prevalence of different types of benign nodule. In our study, 77% of lesions were malignant nodules, and 33% were benign nodules; of the benign nodules, granulomas accounted for 51% of the cases and showed high uptake in PET-CT. Our study population may reflect a degree of selection bias: as our hospital is a tertiary referral center and most patients were referred with suspected malignant nodules, the rate of malignancy among our study population was relatively higher than has been seen in previous studies. Moreover, we retrospectively analyzed patients with biopsy-proven nodules for whom surgery was strongly recommended, and therefore some patients whose nodules had not yet been surgically confirmed may have been excluded. Additionally, patients presenting with very small nodules may have undergone observation without further diagnostic evaluation, such as PET-CT. All of these factors may have had a bearing on our results.
As we also included nodules that had been confirmed by percutaneous needle aspiration, and not only by surgical resection, the median nodule size and incidence of malignancy were relatively high, possibly contributing further to selection bias. Finally, we used the categorized, semiquantitative SUV max values applied by Al-Ameri et al., which differs from the approach of Herder et al. [7,11].
In conclusion, we retrospectively evaluated four models for predicting malignancy in patients with biopsy-proven lung nodules. The highest AUC value was seen for the Brock model, but there was no significant difference between this value and those of the Mayo model and VA models. However, the Brock model showed significantly higher accuracy for predicting malignancy than the Herder model, which included the 18 FDG uptake value, indicating the limited utility of PET/CT for predicting malignancy. When using prediction models to screen for the risk of malignancy of pulmonary nodules, physicians should consider the effects of regional differences, for example in terms of the prevalence of granulomatous disease.